PDA

View Full Version : OT: Deserialize a serialized BASE64 encoding



Adrenalynn
11-23-2008, 11:34 PM
I had someone throw some seemingly random data at me with info hidden within. After substantial analysis, I suspect that it's serialized base64 encoded text. Anyone have any code to deserialize a serialized base64 string?

Rudolph
11-24-2008, 11:38 AM
If you can look at raw data and decide how it's encoded you're way over my head already... :) When some wad o' data is serialized in one programming language, can it be un-serialized from another? I know that, theoretically, that's the point, to share data between systems and languages. I just don't know how well that ends up working out with people and their IP.

Anyway, here's my wild stab; PHP has a built-in unserialize (http://us2.php.net/unserialize)(). Perl has the FreezeThaw (http://search.cpan.org/%7Eilyaz/FreezeThaw-0.43/FreezeThaw.pm) and Storable (http://search.cpan.org/%7Eams/Storable-2.18/Storable.pm) modules. Python has Pickle (http://www.python.org/doc/2.5.2/lib/module-pickle.html) built-in.

jes1510
11-24-2008, 11:45 AM
Ok that's a pretty tough one. The only way I would know how to handle it would be to brute force it with bitwise operators and change endianess as needed. I assume your trying to read it on a 32bit machine.

Adrenalynn
11-24-2008, 05:18 PM
Hi guys,

Thanks! I did end-up reversing it in C#. It was kinda a combination of brute-force and serialization. I knew I needed MIME base-64 as output [actually: "strongly suspected given the formatting" is more truth than "know", oft times cryptanalysis is as much 'art from the gut' as it is science...], so even though it didn't have a header, I could still do frequency analysis knowing the distribution in base-64. So I attacked a tiny portion of it with a combination of frequency analysis and brute force.

The next container was a polyalphabetic cypher with a three character distribution - that was actually relatively hard. Which produced a DES-56 layer with unknown key. That took my machine farm all night, but fortunately it was the old [broken] DES.

In the end, I got the data I needed from it. Took about five hours of effort and about 28x11 CPU hours... Which just goes to prove again that security by obfuscation is worse than no security at all... ;)

FryGuy
11-25-2008, 05:06 PM
Just out of curiosity, but what kind of serialization was it? Generally serialization is taking binary data in memory, and formatting in a standard container so that it can be re-created in the same structure, so there shouldn't generally be any decoding for that part, unless you mean it was encoded in some form of base-64. Is this the sequence the data is encoded in?

binary data -> des-56 -> poly-alphabet cypher -> encoded in a string of base-64 -> serialized

And I wonder how you were able to discern what kind of ciphers they were using. Maybe if I was working through the problem, it wouldn't seem so difficult, but it's amazing you could do that. I remember breaking the "checksum" of an online magic the gathering game 10 years ago in a VB app, and it was really satisfying, but nowhere near as complex. Congratulations :)

Adrenalynn
11-25-2008, 05:41 PM
Yes, it was data represented in Mime base-64 that was then serialized. It was deliberately obfuscated, and I had to do frequency analysis, keeping in mind the method of encoding Base-64 data, to reverse the serialization.

I figured out that it should be Mime Base-64 just from the formatting in how it was presented, acknowledging that it could also have been other MIME encodings - but the frequency will pretty quickly reveal that.

The poly cipher was the hardest to deduce by a long shot. In that case, I was fairly fortunate in that there was poly-alpha-ciphered-text before the DES block taunting me. That was a huge mistake because I could see by eye that there was a distribution there. You have an unknown keylength for the poly cipher which can be brutal, but with clear-text encoded you have character distribution, which is the weakest link of a poly-alpha cipher. Something like xxxxy xyxyxy yxyxyxyx yyyxyyyzyzzyyyyyyyy. What I do in that instance is look for the shortest discreet block to attack. In this case I found a few blocks of six characters. The shortest word in English is one character, but there's a good sample of two characters. Finding blocks of six characters led me to believe that the key length was 3 - each clear character represented by a substitution of a triplet. Once you know that, you can brute force against a dictionary in a matter of seconds. Once you get a few matching dictionary words you have enough of a start against the substitution to quickly bruteforce the remainder.

I wasn't certain it was DES-56, per-se. But I still have some decent tools I wrote over years of working in cryptanalysis... DES-56 was not immune to pattern analysis due to it being relatively "broken" in regard to doubles and triples. I had about a 20% likelihood after statistical analysis of it being a short DES or DES derivative. The exhaustive keyspace was further limited by things known about the source of the data. It was statistically good enough to throw the farm at it overnight. Sometimes it's better to be lucky than good. ;) There was also substantial weakness in the last couple steps because of ciphered plain-text preceding each encrypted block, which allowed for both frequency analysis and brute-forcing against a dictionary.

The upshot is that it was based largely in obfuscation and secondly in weak encryption (by today's standards), and that will always fall to any committed effort.

It wasn't really intended to be non-reversible, more to be non-trivial (much like your bruteforcing a checksum) to extract.

And yes - it did feel good. I'm still bragging about it to friends that know of that challenge. Good to see my kungfu isn't totally dead and rotted after years of inactivity... ;)