Obfuscation and SHAs? - obfuscation

I'm trying to determine whether it's possible to utilize SHAs for obfuscating and still maintain a decent/full amount of functionality. I'm asking because I can't find anything pertaining to SHAs being used for obfuscating real run-time codes. Every answer I find refers to SHAs for just obfuscating variables into hash for comparison, but not for obfuscating run-time functions and the functionality of the codes.
Thank you everyone and thank you for this lovely site.

Obfuscation is a two way street. You want to be able to both obfuscate something and after deobfuscate and use it.
SHA is a hash function. It's one way street. If you took a hash of something, you can't go back to original value.
As result. You can't use SHA for obfuscation.
BTW. "obfuscating variables into hash" is wrong usage of terminology.

Related

C logging framework compile time optimization

For a certain time now, I'm looking to build a logging framework in C (not C++!), but for small microcontrollers or devices with a small footprint of some sort. For this, I've had the idea of hashing the strings that are being logged to a certain value and just saving the hashed value with the timestamp instead of the complete ASCII string. The hash can then be correlated with a 'database' file that would be generated from an external process that parses the strings out of the C source files and saves the logged strings along with the hash value.
After doing a little bit of research, this idea is not new, but I do not find an implementation of this idea in C. In other languages, this idea has been worked out, but that is not the goal of my exercise. An example may be this talk where the same concept has been worked out in C++: youtube.com/watch?v=Dt0vx-7e_B0
Some of the requirements that I've set myself for this library are the following:
as portable C code as possible
COMPILE TIME optimization/hashing for the string hash conversion, it should be equivalent to just printf("%d\n", hashed_value) for a single log statement. (Assuming no parameters/arguments for this particular logging statement).
arguments can be passed to the logging statement similar to the printf function.
user can define their own output function (being console, file descriptor, sending the data directly over an UART connection,...)
fast to run!! fast to compile is nice to have, but it should not be terribly slow.
very easy to use, no very complicated API to use the library.
But to achieve this in C, what is a good approach? I've tried several things now, but do not seem to have found a good method of achieving this.
An overview of things I've tried so far, along with the drawbacks are:
Full pre-processor string hashing: did get it working, but the compile time is terribly slow. Also, this code does not feel to be very portable over multiple C compilers.
Semi pre-processor string hashing: The idea was to generate a hash for each string and make an external header file with the defines in of each string with their hash value. The problem here is that I cannot figure out a way of converting the string to the correct define preprocessor value.
Letting go of the default logging macro with a string pointer: Instead of working with the most used method of LOG_DEBUG("Some logging statement"), converting it with an external parser to /*LOG_DEBUG("Some logging statement") */ LOG_RAW(45). This solves the problem of hashing the string since the hash will be replaced by the external parser with the correct hash, but is not the cleanest to read since the original statement will be a comment.
Also expanding this idea to take care of arguments proved to be tricky. How to take care of multiple types of variables as efficiently as possible?
I've tried some other methods but all without success. Especially when I want to add arguments to log the value of a variable, for example, it gets very complicated, and I do not get the required result...

Readable text in dissassembled code

is there any widely used procedure for hiding readable strings? After debugging my code i found a lot of plain text. I can use some simple encryption (Caesar cipher etc...) but this solution will totally slow down my code. Any ideas? Thanks for help
No, there is no widely used method for hiding referenced strings.
At some point an accessed string would have to be decrypted and this would reveal the key/method and your decryption becomes just obfuscation. If somebody wants to read all your referenced strings he could easily write some script to just convert them all to be readable.
I can't think of any reason to obfuscate strings like that. They are only visible to someone that analyses your executable. Those people would at the same time also be capable to reverse engineer your deobfuscation an apply it to all strings.
If secrecy of strings is vital to the security of your application, you have to rethink that.
Sidenote: There is no way that deciphering strings in C will slow down your application ...Except your application is full of strings and you do something very inefficient in the deciphering. Have you tested this?

c99 dynamic array

I'm writing a very small, project-specific OpenGLES engine for iphone and I really need to use a good, solid, and proven dynamic array library/macro in c99 dialect. (No C++, Obj-C, stl whatsoever)
It's strongly necessary for render batch and polygon mesh, so it should be able to handle various types of data, and additionally causes minimal overhead when array size changes and new data is inserted.
I've been searching around and found two candidates for my need.
the first one is from ccCArray from Cocos2d.
and another one is utarray written by Troy D. Hanson.
ccCArray IS rock solid, thoroughly proven by community. utarray looks fine but I cannot find anyone actually uses it.
Any more suggestion?
A library ?! A C++ template would be more than suitable for this need. I'd say about AT MOST 15 functions (excluding alternative constructors and const getters), and you're done. Also able to use it for ANY type, ANY size and ANY size type (byte, int etc.) And it's just one file: a .h or, better said, a .hpp
Any reason you're rejecting it ? Seems like you want to make life harder for yourself :)

Parsing: library functions, FSM, explode() or lex/yacc?

When I have to parse text (e.g. config files or other rather simple/descriptive languages), there are several solutions that come to my mind:
using library functions, e.g. strtok(), sscanf()
a finite state machine which processes one char at a time, tokenizing and parsing
using the explode() function I once wrote out of pure boredom
using lex/yacc (read: flex/bison) to generate an appropriate parser
I don't like the "library functions" approach. It feels clumsy and awkward. explode(), while it doesn't take much new code, feels even more blown up. And flex/bison often seems like sheer overkill.
I usually implement a FSM, but at the same time I already feel sorry for the poor guy that may have to maintain my code at a later point.
Hence my question:
What is the best way to parse relatively simple text files?
Does it matter at all?
Is there a commonly agreed-upon approach?
I'm going to break the rules a bit and answer your questions out of order.
Is there a commonly agreed-upon approach?
Absolutely not. IMHO the solution you choose should depend on (to name a few) your text, your timeframe, your experience, even your personality. If the text is simple enough to make flex and bison overkill, maybe C is itself overkill. Is it more important to be fast, or robust? Does it need to be maintained, or can it start quick and dirty? Are you a passionate C user, or can you be enticed away with the right language features? &c., &c.
Does it matter at all?
Again, this is something only you can answer. If you're working closely with a team of people, with particular skills and abilities, and the parser is important and needs to be maintained, it sure does matter! If you're writing something "out of pure boredom," I would suggest that it doesn't matter at all, no. :-)
What is the best way to parse relatively simple text files?
Well, I don't know that you're going to like my answer. Maybe first read some of the other fine answers here.
No, really, go ahead. I'll wait.
Ah, you're back and relaxed. Let's ease into things, shall we?
Never write it in 'C' if you can do it in 'awk';
Never do it in 'awk' if 'sed' can handle it;
Never use 'sed' when 'tr' can do the job;
Never invoke 'tr' when 'cat' is sufficient;
Avoid using 'cat' whenever possible.
-- Taylor's Laws of Programming
If you're writing it in C, but C feels like the wrong tool...it really might be the wrong tool. awk or perl will likely do what you're trying to do without all the aggravation. You may even be able to do it with cut or something similar.
On the other hand, if you're writing it in C, you probably have a good reason to write it in C. Maybe your parser is a tiny part of a much larger system, which, for the sake of argument, is embedded, in a refrigerator, on the moon. Or maybe you loooove C. You may even hate awk and perl, heaven forfend.
If you don't hate awk and perl, you may want to embed them into your C program. This is doable, in principle--I've never done it myself. For awk, try libmawk. For perl, there are proably a few ways (TMTOWTDI). You can run perl separately using popen to start it, or you can actually embed a Perl interpreter into your C program--see man perlembed.
Anyhow, as I've said, "the best way to parse" entirely depends on you and your team, the problem space, and your approach to the issue. What I can offer is my opinion.
I'm going to assume that in your C-only solutions (library functions and FSM (considering your explode to essentially be a library function)) you've already done your best at isolating the relevant code, designing the code and files well, and so forth.
Even so, I'm going to recommend lex and yacc.
Library functions feel "clumsy and awkward." A state machine seems unmaintainable. But you say that lex and yacc feel like overkill.
I think you should approach your complaints differently. What you're really doing is specifying a FSM. However, you're also hiring someone to write and maintain it for you, thereby solving most of the maintainability problem. Overkill? Did I mention they'll work for free?
I suspect, but do not know, that the reason lex and yacc originally felt like overkill was that your config / simple files just felt too, well, simple. If I'm right (a big if), you may be able to do most of your work in the lexer. (It's even conceivable that you can do all of your work in the lexer, but I know nothing about your input.) If your input is not only simple but widespread, you may be able to find a lexer/parser combination freely available for what you need.
In short: if you can do this not in C, try something else. If you want C, use lex and yacc--they have a little overhead, but they're a very good solution.
If you can get it to work, I'd go with an FSM, but with a huge assist from Perl-compatible regular expressions. This library is easy to understand, and you ought to be able to trim back sufficient extraneous spaghetti to give your monster that aerodynamic flair to which all flying monsters aspire. That, and plenty of comments in well-structured spaghetti, ought to make your code-maintaining successor comfortable. (And, as I'm sure you know, that code-maintaining successor is you after six months, when you've moved on to something else and the details of this code have slipped your mind.)
My short answer is to use the right too for the problem. If you have configuration files use existing standards and formats e.g. ini Files and parse them using Boost program_options.
If you enter the world of "own" languages use lex/yacc, since they provide you with the required features, but you have to consider the cost of maintaining the grammar and language implementation.
As a result I would recommend to further narrow you problem scope to find the right tool.

Trying to understand the MD5 algorithm

I am trying to do something in C with the MD5 (and latter trying to do something with the SHA1 algorithm). My main problem is that I never really did anything complex in C, just simple stuff (nothing like pointers to pointers or structs).
I got the md5 algorithm here.
I included the files md5.c and md5.h in my C project (using codeblocks) but the only problem is that I don't really understand how to use it. I have read and re-read the code and I don't understand how I use those functions to turn 'example' into a MD5 hash.
I haven't done C programming in a while (mostly php) so I am a bit lost here.
Basically what I am asking is for some examples of usage. They are provided via the md5main.c file but I don't understand them.
Am I aiming high here? Should I stop all this and start reading the C book again or can anyone give me some pointers and see if I can figure this out.
Thanks.
While I agree with Bill, you should go back to the C book if you want to really understand what you're doing. But, in an effort to help, I've modified and commented some of the code from md5main.c...
const char* testData = "12345"; // this is the data you want to hash
md5_state_t state; // this is a state object used by the MD5 lib to do "stuff"
// just treat it as a black box
md5_byte_t digest[16]; // this is where the MD5 hash will go
// initialize the state structure
md5_init(&state);
// add data to the hasher
md5_append(&state, (const md5_byte_t *)testData, strlen(testData));
// now compute the hash
md5_finish(&state, digest);
// digest will now contain a MD5 hash of the testData input
Hope this helps!
You should stop all this and start reading the C book again.
My experience is that when I am trying to learn a new programming language, it's not practical to try implementing a complex project at the same time. You should do simple exercises in C until you are comfortable with the language, and then tackle something like implementing MD5 or integrating an existing implementation.
By the way, reading code is a skill different from writing code. There are differences between these two skills, but both require that you understand the language well.
I think you picked about the worst thing to look at (by no fault of your own). Encryption and hash type algorithms are going to make the strangest use of the language possible to do the type of math they need to do quickly. They are almost guaranteed to be obfuscated and difficult to understand. Plus, you will need to get bogged down in math in order to really understand them.
If you just want a hashing algorithm, get a well-known implementation and use it as a black box. Don't try and implement it yourself, you will almost certainly introduce some cryptographic weakness into the implementation.
Edit: To be fully responsive if you want great books (or resources) on encryption, look to Bruce Schneier. Applied Cryptography is a classic.

Resources