Why do you obfuscate your code? - obfuscation

Have you ever obfuscated your code before? Are there ever legitimate reasons to do so?

I have obfuscated my JavaScript. It made it smaller, thus reducing download times. In addition, since the code is handed to the client, my company didn't want them to be able to read it.

Yes, to make it harder to reverse engineer.

To ensure a job for life, of course (kidding).
This is pretty hilarious and educational: How to Write Unmaintanable Code.

It's called "Job Security". This is also the reason to use Perl -- no need to do obfuscation as separate task, hence higher productivity, without loss of job security.
Call it "security through obsfuscability" if you will.

I don't believe making reverse engineering harder is a valid reason.
A good reason to obfuscate your code is to reduce the compiled footprint. For instance, J2ME appliactions need to be as small as possible. If you run you app through an obfuscator (and optimiser) then you can reduce the jar from a couple of Mb to a few hundred Kb.
The other point, nestled above, is that most obfuscators are also optimisers which can improve your application's performance.

Isn't this also used as security through obscurity? When your source code is publically available (javascript etc) you might want to at least it somewhat harder to understand what is actually occuring on the client side.
Security is always full of compromises. but i think that security by obscurity is one of the least effective methods.

I believe all TV cable boxes will have the java code obfuscated. This does make things harder to hack, and since the cable boxes will be in your home, they are theoretically hackable.
I'm not sure how much it will matter since the cable card will still control signal encryption and gets its authorization straight from the video source rather than the java code guide or java apps, but they are pretty dedicated to the concept.
By the way, it is not easy to trace exceptions thrown from an obfuscated stack! I actually memorized at one point that aH meant "Null Pointer Exception" for a particular build.

I remember creating a Windows Service for Online Backup application that was built in .NET. I could easily use either Visual Studio or tools like .NET Reflector to see the classes and the source code inside it.
I created a new Visual Studio Test application and added the Windows Service reference to it. Double clicked on the reference and I can see all the classes, namespaces everything (not the source code though). Anybody can figure out the internal working of your modules by looking at the class names. In my case, one such class was FTPHandler that clearly tells where the backups are going.
.NET Reflector goes beyond that by showing the actual code. It even has an option to Export the whole project so you get a VS project with all the classes and source code similar to what the developer had.
I think it makes sense to obfuscate, to make it atleast harder if not impossible for someone to disassemble. Also I think it makes sense for products involving large customer base where you do not want your competitors to know much about your products.

Looking at some of the code I wrote for my disk driver project makes me question what it means to be obfuscated.
((int8_t (*)( int32_t, void * )) hdd->_ctrl)( DISK_CMD_REQUEST, (void *) dr );
Or is that just system programming in C? Or should that line be written differently? Questions...

Yes and no, I haven't delivered apps with a tool that was easy decompilable.
I did run something like obfuscators for old Basic and UCSD Pascal interpreters, but that was for a different reason, optimizing run time.

If I am delivering Java Swing apps to clients, I always obfuscate the class files before distribution.
You can never be too careful - I once pointed a decent Java decompiler (I used the JD Java Decompiler - http://www.djjavadecompiler.com/ ) at my class files and was rewarded with an almost perfect reproduction of the original code. That was rather unnerving, so I started obfuscating my production code ever since. I use Klassmaster myself (http://www.zelix.com/klassmaster/)

I obfuscated code of my Android applications mostly. I used ProGuard tool to obfuscate the code.
When I worked on the C# project, our team used the ArmDot. It's licensing and obfuscation system.
Modern obfuscators are used not only to make hacking process difficult. They are able to protect programs and games from cheating, check licenses/keys and even optimize code.
But I don't think it is necessary to use obfuscator in every project.

It's most commonly done when you need to provide something in source (usually due to the environment it's being built in, such as systems without shared libraries, especially if you as the seller don't have the exact system being build for), but you don't want the person you're giving it to to be able to modify or extend it significantly (or at all).
This used to be far more common than today. It also led to the (defunct?) Obfuscated C Contest.
A legal (though arguably not "legitimate") use might be to release "source" for an app you're linking with GPL code in obfuscated fashion. It's source, it can be modified, it's just very hard. That would be a more extreme version of releasing it without comments, or releasing with all whitespace trimmed, or (and this would be pushing the legal grounds probably) releasing assembler source generated from C (and perhaps hand-tweaked so you can say it's not just intermediate code).

Related

Can Smart Assembly 7+ strings be deobfuscated?

I am planning on using Smart Assembly 7+ for obfuscating my .NET C# library.
But when I look through some forums I came across that there are even programs to deobfuscate DLLs protected with Smart Assembly, particularly programs like de4dot.
So I tried to deobfuscate my program using de4dot, and I got most of my logic decompiled successfully to my surprise. But thankfully the strings were not decompiled.
They were in the form of Class24.getString_0(5050)
If the strings cannot be decompiled properly by any deobfuscator, then it is enough to protect my core logic. But I am paranoid that maybe I did not use the deobfuscator properly and there are ways to deobfuscate strings even(but I tried running the deobfuscator commands for strings, as stated in the repo wiki).
Basically my question is, can I be certain that strings obfuscated by the SmartAssembly cannot be decompiled by any deobfuscator program in the market.
Also, any good suggestions for obfuscating the .NET libraries are also welcomed.
Thank You All!
In order for your code to run, the computer must understand it. There is no way around that. If the CLR can understand your code, there is no reason that a de-obfuscator cannot understand your code either.
Plus, computers are much stupider than humans. If a computer can understand your code, then a human definitely can.
The typical approaches to protecting your code, are:
Don't give the customers your code. Run it on your own computer and give them access to it. (That's the "Google approach".)
Give the customers a computer that you control 100% with your code pre-installed. (That's the "PlayStation approach".)
Don't do business with criminals. Copying your code is illegal pretty much everywhere. Circumventing protections in your code is illegal in several countries, including some of the biggest markets (e.g. the US). Reverse engineering your code may be legal, but only under very strict circumstances. (E.g. in the EU, reverse engineering is only legal for purposes of interoperability, and only if you refuse to make the information required for interoperability available under reasonable and non-discriminatory terms.)
Offer your customers extra services that your competitors, even if they were stealing your code, don't or cannot offer. For a lot of companies, the mere fact of "having someone they can sue" is already reason enough to buy the original software from the original vendor. Criminals are lazy, that's why they are criminals. They will never understand the problem domain as deeply as you do, simply because they are too lazy to put in the work, so they will never be able to provide enhancements, consulting, support, or bug fixes as well, as fast, and as precise as you can.

Static / Dynamic source code analysis

I took a class named "Secure Code", and in our next assignment we are supposed to do static / dynamic analysis of some C files and of a JavaEE Web Project.
I checked out "Source Monitor" and ran it on the C files, but (unless I didn't get how to use it!) it doesn't seem to do what I'm looking for.
Considering the topic, I'd be interested in knowing if there are tools for detecting "insecure" code, i.e. code that is potentially attackable through buffer overflows, SQL-Injections, XSS ... So I'd like it to point out which functions should be "upgraded" (e.g. fgets instead of gets, or a PreparedStatement instead of a normal SQL statement)
Note: I'd prefer open source softwares, possibly for Windows (I have Ubuntu on a VM but I am not really good with it... I generally spend more time finding out how to configure the tools than running them).
Thank you for your tips!
Frama-C's value analysis is open-source, available pre-compiled for Windows, and was used to find such security bugs as this one in the QuickLZ C library or this one in Polar SSL.
This said, you may find that it is a lot to get used to for just a school assignment, and then again, are you actually expected to find security bugs in a school assignment?
For the JavaEE Web Project use Persistence API, and you can use non-SQL statements, where hacking is theoretically impossible! The best open source one is the Hibernate. It's easy to use and very flexible.

Is it a good idea to recreate Win32's headers?

I'm finding myself doing more C/C++ code against Win32 lately, and coming from a C# background I've developed an obsession with "clean code" that is completely consistent, so moving away from the beautiful System.* namespace back to the mishmash of #defines that make up the Win32 API header files is a bit of a culture shock.
After reading through MSDN's alphabetical list of core Win32 functions I realised how simple Win32's API design actually is, and it's unfortunate that it's shrouded with all the cruft from the past 25 years, including many references to 16-bit programming that are completely irrelevant in today's 64-bit world.
I'm due to start a new C/C++ project soon, and I was thinking about how I could recreate Win32's headers on an as-needed basis. I could design it to be beautiful, and yet it would maintain 100% binary (and source) compatibility with existing programs (because the #defines ultimately resolve the same thing).
I was wondering if anyone had attempted this in the past (Google turned up nothing), or if anyone wanted to dissuade me from it.
Another thing I thought of, was how with a cleaner C Win32 API, it becomes possible to design a cleaner and easier to use C++ Win32 API wrapper on top, as there wouldn't be any namespace pollution from the old C Win32 items.
EDIT:
Just to clarify, I'm not doing this to improve compilation performance or for any kind of optimisation, I'm fully aware the compiler does away with everything that isn't used. My quest here is to have a Win32 header library that's a pleasure to work with (because I won't need to depress Caps-lock every time I use a function).
Don't do this.
It may be possible, but it will take a long time and will probably lead to subtle bugs.
However, and more importantly, it will make your program utterly impossible for anyone other than you to maintain.
There's no point in doing this. Just because there's additional cruft doesn't mean it's compiled into the binary (anything unused will be optimized out). Furthermore, on the EXTREME off-chance that anything DOES change (I dunno, maybe WM_INPUT's number changes) it's just a lot easier to use the system headers. Furthermore, what's more intuitive? I think #include <windows.h> is a lot easier to understand than #include "a-windows-of-my-own.h".
Also, honestly you never should need to even look at the contents of windows.h. Yeah I've read it, yeah it's ugly as sin, but it does what I need it to and I don't need to maintain it.
Probably the ONLY downside of using the real windows.h is that it MAY slow down compilation by a few milliseconds.
No. What's the point? Just include <windows.h>, and define a few macros like WIN32_LEAN_AND_MEAN, VC_EXTRALEAN, NOGDI, NOMINMAX, etc. to prune out the things you don't want/need to speed up your compile times.
Although the Win32 headers might be considered "messy", you pretty much never have to (or want to) look inside them. All you need to know is documented in the Win32 SDK. The exact contents of the header files are an implementation detail.
There is a ton of stuff in there that would be time-consuming and unnecessarily finicky to replicate, particularly relating to different versions of the Win32 SDK.
I recommend:
#include <windows.h>
In my opinion, this is bad practice. Tidiness and brevity is achieved by keeping to the standard practice as much as possible, and leveraging as much as possible from the platform. You need to assume Microsoft to have the ultimate expertise in their own platform, with some aspects going beyond what you know right now. In simple words, it's their product and they know best.
By rolling your own:
... you branch off from Microsoft's API, so Microsoft could no longer deliver updates to you through their standard channels
... you may introduce bugs due to your own hubris, feeling you've figured something out while you haven't
... you'd be wasting a lot of time for no tangible benefit (as the C headers don't carry any overhead into the compiled binary)
... you'd eventually create a project that's less elegant
The most elegant code is one that carries more LOC of actual program logic and as little as possible LOC for "housekeeping" (i.e. code not directly related to the task at hand). Don't fail to leverage the Platform SDK headers to make your project more elegant.
This has been attempted in the past.
In its include directory, MinGW contains its own version of windows.h. Presumably this exists to make the headers work with gcc. I don't know if it will work with a Microsoft compiler.

Porting Autodesk Animator Pro to be cross platform

a previous relevant question from me is here Reverse Engineering old paint programs
I have set up my base of operations here: http://animatorpro.org
wiki coming soon.
Okay, so now I have a 300,000 line legacy MSDOS codebase. It's sort of a "be careful what you wish for" situation. I am not an experienced C programmer. I'm not entirely inexperienced either, but for all intents and purposes I'm a noob to the language and in particular the intricacies of its libraries. I am especially ignorant of the vagaries of the differences between C programs written specifically for MSDOS and programs that are cross platform. However I have been studying this code base for over a year now, and this is what I know about Animator Pro:
Compilers and tools used:
Watcom C compiler
tcmake (make program from Turbo C)
386asm, a specialised assembler for the Phar Lap dos extender
and of course, the Phar Lap dos extender itself.
a selection of obscure dos utilities
Much of the compilation seems to be driven by batch files. Though I have obtained copies of all these tools, I have not yet succeeded at compiling it. (though I have compiled its older brother, autodesk animator original.
It's got a plugin system that replicates DLL before DLL's were available, based on REX. The plugin system handles:
Video Drivers (with a plethora of included VESA drivers)
Input drivers (including wacom tablets, and keyboards)
Drawing Tools
Inks (Like photoshop's filters, or blending modes)
Scripting Addons (essentially compiled scripts)
File formats
It's got its own script interpreter named POCO, based on the C language- The scripting language has enough power to do virtually all the things the plugin system can do- Just slower.
Given this information, this is my development plan. Please criticise this. The source code is available in the link above, so you can easily, if you are so inclined, assess the situation yourself.
Compile with its original tools.
Switch to using DJGPP, and make the necessary changes to get it to compile with that, plus the original assembler.
Include the Allegro.cc "Game" library, and switch over as much functionality to that library as possible- Perhaps by simply writing new video and input drivers that use the Allegro API. I'm thinking allegro rather than SDL because: there is a DOS version of Allegro, and fascinatingly, one of its core functions is the ability to play Animator Pro's native format FLIC.
Hopefully after 3, I will have eliminated most or all of the Assembler in the project. I say hopefully, because it's in an obscure dialect that doesn't assemble in any modern free assembler without significant modification. I have tried them all. Whatever is left gets converted to assemble in NASM, or to C code if I can define the assembler's actual function.
Switch the dos extender from Phar Lap to HX Dos http://www.japheth.de/HX.html, Which promises to replicate as much of the WIN32 api as possible. Then make all the necessary code changes for that to work.
Switch to the win32 version of Allegro.cc, assuming that the win32 version can run on top of HXDos. Make any further necessary changes
Modify the plugin system to use some kind of standard cross platform plugin library. What this would be, I have no idea. Maybe you can offer some suggestions? I talked to the developer who originally wrote the plugin system, and he said some of the things it does aren't possible on modern OS's because of segmentation restrictions. I'm not sure what this means, but I'm guessing it means all the plugins will need to be rewritten almost from scratch.
Magically, I got all the above done, and we can try and make it run in windows, osx, and linux, whilst dealing with other cross platform niggles like long file names, and things I haven't thought of.
Anyone got a problem with any of this? Is allegro a good choice? if not, why? what would you do about this plugin system? What would you do different? Is this whole thing foolish, and should I just rewrite it from scratch, using the original as inpiration? (it would apparently take the original developer "About a month" to do that)
One thing I haven't covered above is the text/font system. Not sure what to do about that, but Animator Pro has its own custom font format, but also is able to use Postscript Type 1 fonts, and some other formats.
My biggest concern with your plan, in a nutshell: Your approach seems to be to attempt to keep the whole enormous thing working at all times, tweaking the environment ever-further away from DOS. During each tweak to the environment, that means you will have approximately a billion subtle assumptions that might have broken at once, none of which you necessarily understand yet. Untangling them all at once will be incredibly painful.
If I were doing the port, my approach would be to disable as much code as possible to get SOMETHING running in a modern environment, and bring the parts back online, one piece at a time. Write a simple test harness program that loads a display driver and draws some stuff, and compile it for DOS to make sure you understand the interface. Then write some C code that implements the same interface, but with Allegro (or SDL or SFML), and make that program work under Windows or Linux. When the output differs, you have a simple test case to work from.
Your entire job on this port is swapping out implementations of various interfaces and functions with completely new ones. This is a job that unit testing excels at. Don't write any new code without a test of some kind that runs on the old code under DOS! Make your potential problems as small and simple as you possibly can. Port assembly code instead of rewriting it only if you're reasonably confident that it will actually make your job easier (ie, algorithmic stuff that compiles fine with few tweaks under NASM). Don't bite off a bigger piece than you can comfortably fit in your brain at once.
I, for one, look forward to seeing your progress! I think what you're attempting to do is great. Thanks for doing it.
Hmmm - I might approach it by writing an OpenGL video "driver" for it. and todays machines are fast enough with tons of ram that you could do all the pixel specific algorithms on main CPU into a back buffer and it would work. As the "generic" VGA driver just mapped the video buffer to a pointer this would be a place to start. There was a zoom mode in the UI so you can look at the pixels on a high res display.
It is often very difficult to take an existing non-trivial code base that wasn't written with portability in mind - you mention a few - and then try to make it portable. There will be a lot of problems on the way. It is probably a better idea to start from scratch and rewrite the code using the existing code as reference only. If you start from scratch you can leverage existing portable UI solution in your new project like Qt.

Best way to implement plugin framework - are DLLs the only way (C/C++ project)?

Introduction:
I am currently developing a document classifier software in C/C++ and I will be using Naive-Bayesian model for classification. But I wanted the users to use any algorithm that they want(or I want in the future), hence I went to separate the algorithm part in the architecture as a plugin that will be attached to the main app # app start-up. Hence any user can write his own algorithm as a plugin and use it with my app.
Problem Statement:
The way I am intending to develop this is to have each of the algorithms that user wants to use to be made into a DLL file and put into a specific directory. And at the start, my app will search for all the DLLs in that directory and load them.
My Questions:
(1) What if a malicious code is made as a DLL (and that will have same functions mandated by plugin framework) and put into my plugins directory? In that case, my app will think that its a plugin and picks it and calls its functions, so the malicious code can easily bring down my entire app down (In the worst case could make my app as a malicious code launcher!!!).
(2) Is using DLLs the only way available to implement plugin design pattern? (Not only for the fear of malicious plugin, but its a generic question out of curiosity :) )
(3) I think a lot of softwares are written with plugin model for extendability, if so, how do they defend against such attacks?
(4) In general what do you think about my decision to use plugin model for extendability (do you think I should look at any other alternatives?)
Thank you
-MicroKernel :)
Do not worry about malicious plugins. If somebody managed to sneak a malicious DLL into that folder, they probably also have the power to execute stuff directly.
As an alternative to DLLs, you could hook up a scripting language like Python or Lua, and allow scripted plugins. But maybe in this case you need the speed of compiled code?
For embedding Python, see here. The process is not very difficult. You can link statically to the interpreter, so users won't need to install Python on their system. However, any non-builtin modules will need to be shipped with your application.
However, if the language does not matter much to you, embedding Lua is probably easier because it was specifically designed for that task. See this section of its manual.
See 1. They don't.
Using a plugin model sounds like a fine solution, provided that a lack of extensibility really is a problem at this point. It might be easier to hard-code your current model, and add the plugin interface later, if it turns out that there is actually a demand for it. It is easy to add, but hard to remove once people started using it.
Malicious code is not the only problem with DLLs. Even a well-meaning DLL might contain a bug that could crash your whole application or gradually leak memory.
Loading a module in a high-level language somewhat reduces the risk. If you want to learn about embedding Python for example, the documentation is here.
Another approach would be to launch the plugin in a separate process. It does require a bit more effort on your part to implement, but it's much safer. The seperate process approach is used by Google's Chrome web browser, and they have a document describing the architecture.
The basic idea is to provide a library for plugin writers that includes all the logic for communicating with the main app. That way, the plugin author has an API that they use, just as if they were writing a DLL. Wikipedia has a good list of ways for inter-process communication (IPC).
1) If there is a malicious dll in your plugin folder, you are probably already compromised.
2) No, you can load assembly code dynamically from a file, but this would just be reinventing the wheel, just use a DLL.
3) Firefox extensions don't, not even with its javascript plugins. Everything else I know uses native code from dynamic libraries, and is therefore impossible to guarantee safety. Then again Chrome has NaCL which does extensive analysis on the binary code and rejects it if it can't be 100% sure it doesn't violate bounds and what not, although I'm sure they will have more and more vulnerabilities as time passes.
4) Plugins are fine, just restrict them to trusted people. Alternatively, you could use a safe language like LUA, Python, Java, etc, and load a file into that language but restrict it only to a subset of API that wont harm your program or environment.
(1) Can you use OS security facilities to prevent unauthorized access to the folder where the DLL's are searched or loaded from? That should be your first approach.
Otherwise: run a threat analysis - what's the risk, what are known attack vectors, etc.
(2) Not necessarily. It is the most straigtforward if you want compiled plugins - which is mostly a question of performance, access to OS funcitons, etc. As mentioned already, consider scripting languages.
(3) Usually by writing "to prevent malicous code execution, restrict access to the plugin folder".
(4) There's quite some additional cost - even when using a plugin framework you are not yet familiar with. it increases cost of:
the core application (plugin functionality)
the plugins (much higher isolation)
installation
debugging + diagnostics (bugs that occur only with a certain combinaiton of plugins)
administration (users must know of, and manage plugins)
That pays only if
installing/updating the main software is much more complex than updating the plugins
individual components need to be updated individually (e.g. a user may combine different versions of plugins)
other people develop plugins for your main application
(There are other benefits of moving code into DLL's, but they don't pertain to plugins as such)
What if a malicious code is made as a DLL
Generally, if you do not trust dll, you can't load it one way or another.
This would be correct for almost any other language even if it is interpreted.
Java and some languages do very hard job to limit what user can do and this works only because they run in virtual machine.
So no. Dll loaded plug-ins can come from trusted source only.
Is using DLLs the only way available to implement plugin design pattern?
You may also embed some interpreter in your code, for example GIMP allows writing plugins
in python.
But be aware of fact that this would be much slower because if nature of any interpreted language.
We have a product very similar in that it uses modules to extend functionality.
We do two things:
We use BPL files which are DLLs under the covers. This is a specific technology from Borland/Codegear/Embarcadero within C++ Builder. We take advantage of some RTTI type features to publish a simple API similar to the main (argv[]) so any number of paramters can be pushed onto the stack and popped off by the DLL.
We also embed PERL into our application for things that are more business logic in nature.
Our software is an accounting/ERP suite.
Have a look at existing plugin architectures and see if there is anything that you can reuse. http://git.dronelabs.com/ethos/about/ is one link I came across while googling glib + plugin. glib itself might may it easier to develop a plugin architecture. Gstreamer uses glib and has a very nice plugin architecture that may give you some ideas.

Resources