Obfuscation with proguard - obfuscation

I am trying to obfuscate the source code of a product. The code is written in java and has got lot of dependencies of a particular class on other classes. Initially i thought of obfuscating major classes. Is it feasible ? Please help me with any informational inputs.
Thanks,
Manu

The main point you should consider is the efficacy of code obfuscation.
It merely makes it slightly harder to read your code. This does not mean it makes it impossible to read, nor impervious to penetration or hacking. Remember that eventually all code must run on your customers machine, and that code will be binary instruction codes of some sort.
Bottom line - make sure you understand why you want to obfuscate your code, and what you are accomplishing by that.
Once you made sure you're on the right track - see Java obfuscation - ProGuard/yGuard/other? or any other related question.

Related

Can Smart Assembly 7+ strings be deobfuscated?

I am planning on using Smart Assembly 7+ for obfuscating my .NET C# library.
But when I look through some forums I came across that there are even programs to deobfuscate DLLs protected with Smart Assembly, particularly programs like de4dot.
So I tried to deobfuscate my program using de4dot, and I got most of my logic decompiled successfully to my surprise. But thankfully the strings were not decompiled.
They were in the form of Class24.getString_0(5050)
If the strings cannot be decompiled properly by any deobfuscator, then it is enough to protect my core logic. But I am paranoid that maybe I did not use the deobfuscator properly and there are ways to deobfuscate strings even(but I tried running the deobfuscator commands for strings, as stated in the repo wiki).
Basically my question is, can I be certain that strings obfuscated by the SmartAssembly cannot be decompiled by any deobfuscator program in the market.
Also, any good suggestions for obfuscating the .NET libraries are also welcomed.
Thank You All!
In order for your code to run, the computer must understand it. There is no way around that. If the CLR can understand your code, there is no reason that a de-obfuscator cannot understand your code either.
Plus, computers are much stupider than humans. If a computer can understand your code, then a human definitely can.
The typical approaches to protecting your code, are:
Don't give the customers your code. Run it on your own computer and give them access to it. (That's the "Google approach".)
Give the customers a computer that you control 100% with your code pre-installed. (That's the "PlayStation approach".)
Don't do business with criminals. Copying your code is illegal pretty much everywhere. Circumventing protections in your code is illegal in several countries, including some of the biggest markets (e.g. the US). Reverse engineering your code may be legal, but only under very strict circumstances. (E.g. in the EU, reverse engineering is only legal for purposes of interoperability, and only if you refuse to make the information required for interoperability available under reasonable and non-discriminatory terms.)
Offer your customers extra services that your competitors, even if they were stealing your code, don't or cannot offer. For a lot of companies, the mere fact of "having someone they can sue" is already reason enough to buy the original software from the original vendor. Criminals are lazy, that's why they are criminals. They will never understand the problem domain as deeply as you do, simply because they are too lazy to put in the work, so they will never be able to provide enhancements, consulting, support, or bug fixes as well, as fast, and as precise as you can.

Obfuscating C++ Shared Library

I've been asked to help obfuscate a library (written in C++) which will be distributed to clients. I've already discussed why obfuscation is not necessarily a good idea, and seeing as licensing will be integrated into the software many concerns regarding copy protection are moot.
Regardless, I've been asked to research methods anyway. I've looked into header mangling (and the like) as well as HARES, but I fail to find much that I can use for a library (naturally, these things would destroy any form of API rendering the library useless).
What techniques can I apply that would work for libraries? While I would appreciate recommendations for tools (or compiler flags, etc.) that might be helpful I would like to stress that this is not a tool-focused (i.e. closable) question, but rather one focused on applicable techniques.
I wouldn't put much energy to doing it very thoroughly, because the reverse engineer is going to win this round.
https://softwareengineering.stackexchange.com/questions/155131/is-it-important-to-obfuscate-c-application-code
Obfuscating C++ binaries is a bit of a losing battle. It depends on who you are dealing with, but if your reverse engineer is smart enough to use IDA Pro and a couple of plugins, and a good debugger then it shall all be for naught.
Obfuscation Priorities
Where you can give the reverse engineer useless function names.
Honestly, this doesn't help that much, since ultimately your code will have to call some kind of non obfuscated shared library to get anything done. At some point you will use the standard libary, or the STL, or even make a system call.
Add false pathways to confound static analysis
So that the reverse engineer can have fun with a debugger. Anti-analysis techniques are well known to the reverse engineer, and they can almost be circumvented with a debugger like ollydbg.
Write debugger foiling code
That reverse engineers love to play with. Again, this is an expected move, and the response is to just to step around the offending code, or to modify away the traps. Anyone with any formal training in RE will blast past this.
Pack most of my binary into an encryped region which is decryped by a stub just before execution.
Same answer as above. Reverse engineers train for this from day one.
Keep in mind the reverse engineers are looking for targeted morsals of information - very rarely are they trying to recreate the entire application. Security intensive code, code for license validation, code for home base communication, networking code. These are all prime targets - put your energy into making these thorny places to live.
Keep in mind that binaries from the largest corporations on the earth are routinely reverse engineered by people in their early 20's.
Don't leave your debugging symbols in the final binary, as those will definitely help with analysis.
If you are dedicated to doing this right, also focus on wasting the engineers time - time is always against the reverse engineer.
Remember, that any meaningful obfuscation might also cost you the performance gains that justified working in C++ in the first place. There are many zones in the C world (and for that matter the Java world) where meaningful obfuscation just isn't possible. Games for instance, cannot conceal their calls to the OpenGl APIs, nor can they truly prevent engineers from harvesting their shader code.
Also remember that the reverse engineer is watching your code at the assembly level most of the time. He'd rather have your function names, but he can live without it if need be. He can see what your program is doing at the most finite level possible. It is only a matter of time before he finds the critical routines.
For your purposes, find a program to mangle function names, make your boss happy, and call it a day. At least at that point, reverse engineering the software will not be trivial.
Well really you have 2 primary vectors that you have to guard against
Disassembley
Debugging
My favourite method for preventing the first issue is in memory decryption, take parts of your executable code and encrypt it, have it self decrypt in memory while your library is running, you can also checksum parts of the code and compare the checksum against what is loaded in ram ( have the encrypted portions check the decrypter and vice versa )
Another neat trick is to statically link libraries that you use into your executible so they cannot be easily swapped out to try to see what your code is doing.
Now debugging checking interrupt vectors helps, another trick is to check the 'timing' between various portions of code ( for example if more than a couple of milliseconds worth of delay occurs in code that should execute significantly faster than that then it can be assumed that the code is being debugged

Erlang source code guide

I am interested in delving into Erlang's C source code and try to understand what is going on under the hood. Where can I find info on the design and structure of the code?
First of all, you might want to have a look to Joe Armstrong's thesis, introducing Erlang at a high level. It will be useful to get an idea of what was the idea behind the language. Then, you could focus on the Erlang Run Time System (erts). The erlang.erl module could be a good start. Then, I would focus on the applications who constitutes the so-called minimal release, kernel and stdlib. Within the stdlib, have a look on how behaviours are implemented. May I suggest the gen_server.erl module as a start?
A Guide To The Erlang Source
http://www.trapexit.org/A_Guide_To_The_Erlang_Source
The short answer is that there is no good guide. And the code is not very well documented.
I recommend finding someone in your neighbourhood that knows the code reasonably well, and buy them dinner in exchange for a little chat.
If you don't have the possibility to do that, then I recommend starting with the loader.
./erts/emulator/beam/beam_load.c
Some useful information can also be found by pretty printing the beam representation. I don't know whether there is any way to do so supplied by OTP, but the HiPE project has some cheats.
hipe:c(MODULE, [pp_beam]).
Should get you started.
(And I also recommend Joe's book.)
Pretty printer of beam can be done by 'erlc -S', which is equivalent with hipe:c(M, [pp_beam]) mentioned by Daniel.
I also use erts_debug:df(Module). to disassemble the loaded beam code, which are instructions actually been interpreted by the VM.
Sometimes I use a debugger. OTP delivers tools supporting gdb very well. See example usage at http://www.erlang.org/pipermail/erlang-questions/2008-September/037793.html
A little late to the party here. If you just download the source from GitHub the internal documentation is really good. You have to generate some of it using make.
Get the documentation built and most of the relevant source is under /erts (Erlang Run Time System)
Edit: BEAM Wisdoms is also a really good guide but it may or may not be what you're after.

What can you gain from looking at the binary opposed to source in c?

My friend said he thinks i may have made a mistake in my programme and wanted to see if i really did. He asked me to send him the binary opposed to the source. As i am new to this i am paranoid that he is doing someting to it? What can you do with the binary that would mean you wouldnt want the source?
thank
Black-box testing. Having the source may skew your view on how the program may be behaving.
Not much, at least not much by staring at it. But you can run it with a debugger attached, so you can set breakpoints, inspect memory areas, investigate crashes ...
However, the soure code remains the primary tool for debugging. The binary by itself is a bit useless for serious debugging (not for testing, you can greatly test software without having access to its source).
I guess if he wants to recompile your code on his machine he may want to be able to check that the binary he gets is the same as the one you get to eliminate compile options or library differences.
Now when I debug, I frequently want to see the assembly - maybe this is what he meant?
He can run it, test it, report any bugs he finds. Not much else, but that itself may be useful; people are notoriously bad at testing their own code, because they tend to believe that it is robust, and don't want to break it. An independent tester will see breaking it as a challenge. Their performance is based on the number of bugs they find; whereas your performance is based on how difficult you make that for them.
Perhaps he wanted to debug. Also, depending on how the compiler is invoked (for instance with -g for gcc) the binary might contain source code information still.

Why do you obfuscate your code?

Have you ever obfuscated your code before? Are there ever legitimate reasons to do so?
I have obfuscated my JavaScript. It made it smaller, thus reducing download times. In addition, since the code is handed to the client, my company didn't want them to be able to read it.
Yes, to make it harder to reverse engineer.
To ensure a job for life, of course (kidding).
This is pretty hilarious and educational: How to Write Unmaintanable Code.
It's called "Job Security". This is also the reason to use Perl -- no need to do obfuscation as separate task, hence higher productivity, without loss of job security.
Call it "security through obsfuscability" if you will.
I don't believe making reverse engineering harder is a valid reason.
A good reason to obfuscate your code is to reduce the compiled footprint. For instance, J2ME appliactions need to be as small as possible. If you run you app through an obfuscator (and optimiser) then you can reduce the jar from a couple of Mb to a few hundred Kb.
The other point, nestled above, is that most obfuscators are also optimisers which can improve your application's performance.
Isn't this also used as security through obscurity? When your source code is publically available (javascript etc) you might want to at least it somewhat harder to understand what is actually occuring on the client side.
Security is always full of compromises. but i think that security by obscurity is one of the least effective methods.
I believe all TV cable boxes will have the java code obfuscated. This does make things harder to hack, and since the cable boxes will be in your home, they are theoretically hackable.
I'm not sure how much it will matter since the cable card will still control signal encryption and gets its authorization straight from the video source rather than the java code guide or java apps, but they are pretty dedicated to the concept.
By the way, it is not easy to trace exceptions thrown from an obfuscated stack! I actually memorized at one point that aH meant "Null Pointer Exception" for a particular build.
I remember creating a Windows Service for Online Backup application that was built in .NET. I could easily use either Visual Studio or tools like .NET Reflector to see the classes and the source code inside it.
I created a new Visual Studio Test application and added the Windows Service reference to it. Double clicked on the reference and I can see all the classes, namespaces everything (not the source code though). Anybody can figure out the internal working of your modules by looking at the class names. In my case, one such class was FTPHandler that clearly tells where the backups are going.
.NET Reflector goes beyond that by showing the actual code. It even has an option to Export the whole project so you get a VS project with all the classes and source code similar to what the developer had.
I think it makes sense to obfuscate, to make it atleast harder if not impossible for someone to disassemble. Also I think it makes sense for products involving large customer base where you do not want your competitors to know much about your products.
Looking at some of the code I wrote for my disk driver project makes me question what it means to be obfuscated.
((int8_t (*)( int32_t, void * )) hdd->_ctrl)( DISK_CMD_REQUEST, (void *) dr );
Or is that just system programming in C? Or should that line be written differently? Questions...
Yes and no, I haven't delivered apps with a tool that was easy decompilable.
I did run something like obfuscators for old Basic and UCSD Pascal interpreters, but that was for a different reason, optimizing run time.
If I am delivering Java Swing apps to clients, I always obfuscate the class files before distribution.
You can never be too careful - I once pointed a decent Java decompiler (I used the JD Java Decompiler - http://www.djjavadecompiler.com/ ) at my class files and was rewarded with an almost perfect reproduction of the original code. That was rather unnerving, so I started obfuscating my production code ever since. I use Klassmaster myself (http://www.zelix.com/klassmaster/)
I obfuscated code of my Android applications mostly. I used ProGuard tool to obfuscate the code.
When I worked on the C# project, our team used the ArmDot. It's licensing and obfuscation system.
Modern obfuscators are used not only to make hacking process difficult. They are able to protect programs and games from cheating, check licenses/keys and even optimize code.
But I don't think it is necessary to use obfuscator in every project.
It's most commonly done when you need to provide something in source (usually due to the environment it's being built in, such as systems without shared libraries, especially if you as the seller don't have the exact system being build for), but you don't want the person you're giving it to to be able to modify or extend it significantly (or at all).
This used to be far more common than today. It also led to the (defunct?) Obfuscated C Contest.
A legal (though arguably not "legitimate") use might be to release "source" for an app you're linking with GPL code in obfuscated fashion. It's source, it can be modified, it's just very hard. That would be a more extreme version of releasing it without comments, or releasing with all whitespace trimmed, or (and this would be pushing the legal grounds probably) releasing assembler source generated from C (and perhaps hand-tweaked so you can say it's not just intermediate code).

Resources