How does one obfuscate code in C? [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I want to obfuscate code just for fun. I'm looking at code from the international obfuscated c contest: http://www.ioccc.org/ And I seriously just have no idea how to even start reverse engineering some of this code to make anything of sense.
What are some common obfuscation techniques and how do you make sense of obfuscated code?

There is a lot of different techniques to obfuscate code, here is a small, very incomplete list:
Identifier mangling. Either you will find people using names like a, b, c exclusively, or you find identifiers that have absolutely nothing to do with the actual purpose of the variable/function. Deobfuscation would be to assign sensible names.
Heavy use of the conditional evaluation operator ? :, replacing all occurences of if() else. In most cases that's a lot harder to read, deobfuscation would reinsert if().
Heavy use of the comma operator instead of ;. In combination with 2. and 4., this basically allows the entire program to be one single statement in main().
Recursive calls of main(). You can fold any function into main by having an argument that main can use to decide what to do. Combine this with replacing loops by recursion, and you end up with the entire program being the main function.
You can go the exact opposite direction to 3. and 4., and hack everything into pieces by creating an insane amount of functions that all do virtually nothing.
You can obfuscate the storage of an array by storing the values on the stack. Should you need to walk the data twice, there's always the fork() call handy to make a convenient copy of your stack.
As I said, this is a very incomplete list, but generally, obfuscation is usually the heavy, systematic abuse of any valid programming technique. If the IOCCC were allowing C++ entries, I would bet on a lot of template code entering, making heavy use of throwing exceptions as an if replacement, hiding structure behind polymorphism, etc.

Related

Questions about C as an intermediate language [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm writing a language that compiles to C right now, and when I say IL I mean as in C is the language I write the code as to then generate assembly by another c compiler, e.g. gcc or clang.
The C code I generate, will it be more beneficial to:
If I do some simple opt passes (constant propagation, dead code removal, ...) will this reduce the amount of work the C compiler has to do, or make it harder because it's not really human C code?
If I were to compile to say three-address code or SSA or some other form and then feed this into a C program with functions, labels, and variables - would that make it easier or harder for the C compiler to optimize?
Which kind of link together to form the following question...
What is the most optimal way to produce good C code from a language that compiles to C?
Is it worth doing any optimisations at all and leaving that to the compiler?
Generally there's not much point doing peephole type optimisations because the C compiler will simply do those for you. What is expensive is a) wasted or unnecessary "gift-wrapping" operations, b) memory accesses, c) branch mispredictions.
For a), make sure you're not passing data about too much, because whilst C will do constant propagation, there's a limit to how far it can detect that two buffers are in fact aliases of the same underlying data. For b) try to keep functions short and operations on the same data together, also limit heap memory use to improve cache performance. For c), the compiler understand for loops, it doesn't understand goto loops. So it will figure that
for(i=0;i<N;i++)
will usually take the loop body, it wont figure that
if(++i < N) goto do_loop_again
will usually take the jump.
So really the rule is to make your automatic code as human-like as possible. Though if it's too human-like, that raises the question of what your language has to offer that C doesn't - the whole point of a non-C language is to create a spaghetti of gotos in the C source, a nice structure in the input script.

If there are many functions with the same parameters, should I use a macro to avoid typing the parameters multiple times? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I have some old C programs to maintain. For some functions (at least 10) with exactly the same parameters, the programmer utilized a macro to avoid typing the same parameters again and again. Here is the macro definition:
#define FUNC_DECL(foo) int foo(int p1, int p2, ....)
Then, if I want to define function with the same parameters, I need only type:
FUNC_DECL(func1)
Besides avoiding the tedious work of typing same parameters many times, are there any other advantages of this implementation?
And this kind of implementation confuses me a little bit. Are there other disadvantages of it?
Is this kind of implementation a good one?
As I noted in comments to the main question, the advantage of using a macro to declare the functions with the same argument list is that it ensures the definitions do have the same argument list.
The primary disadvantage is that it doesn't look like regular C, so people reading the code have to search more code to work out what it means.
On the whole, I don't like that sort of macro-based scheme, but occasionally there are good enough reasons to use it — this might be a borderline example.
There are at least ten functions with the same parameters. Currently‌​, every function only has 3 parameters.
Oh, only 3 parameters? No excuse for using the macro then — I thought it was 10 parameters. Clarity is more important. I don't think that the code will be clearer using the macro. The chances that you'll need to change 10 functions to use 4 parameters instead of 3 is rather limited — and you'd have to change the code to use the extra parameter anyway. The saving of typing is not relevant; the saving of time spent puzzling over the meaning of the macro is relevant. And the first person who has to puzzle over the code will spend longer doing that than you'd save typing the function declarations out — even if you hunt and peck when typing.
Away with it — off with its head! Expunge the macro. Make your code happy again.
#define is a text processor kind of thing. So, whether you write the full function declaration or use the preprocessor instead, both will do the same thing with similar execution times. Using #define makes a program readable/short and doesn't affect end result at all but more number of #define means more compilation time and nothing else. But generally, programs are used more than they are compiled. So, the usage of #define doesn't hamper your production environment at all.

Recursive coroutines in C (C99) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
While implementing a communication protocol, we have an encoder that traverses some structs recursively and encodes them into a binary message.
So far so good, but now the buffer has to split out into multiple chunks of fixed size, e.g. the upper size of receiving buffer. Since allocating memory for the full message and cutting it consequently seems to be too wasteful (the size of the message is --in theory-- not bounded), the idea is now to implement a coroutine with means of setjmp/longjmp.
At the moment, I have a prototype with two jump buffers - one buffer for resuming the encode function and the second one for simulating the return behavior of the function to jump back to its caller.
Well, it seems to work, but the code looks like coming straight from hell. Are there any 'conventions' for implementing interruptible recursive functions, maybe a set of macros or something? I would like to use only standardized functions, no inline asm in order to stay portable.
Addition:
The prototype is here: https://github.com/open62541/open62541/compare/master...chunking_longjmp
The 'usage' is shown inside of the unit-test.
Currently, coroutine behavior is implemented for a non-recursive function Array_encodeBinary. However, the 'coroutine' behavior should be extended to the general recursive UA_encodeBinary function located here: https://github.com/open62541/open62541/blob/master/src/ua_types_encoding_binary.c#L1029
As pointed out by Olaf the easiest way would be to use an iterative algorithm. However, if for some reason this is difficult, you can always simulate the recursive algorithm with a stack container and a while loop. This at least makes the function easier to interrupt. Pretty good article of how to implement this can be found here. The article is written for c++, but it should not be difficult to convert it to c.

Is it a good practice to write multiple concise statements in one line? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am prone to writing code like this:
if (*t) while (*++t);
It reads: if string t does not start with /0, then move to the end.
Note the while loop has no body, so the semicolon terminates it.
I'd like to know if it is good practice to do this? Why and why not?
C is one of the oldest popular language in use today. I believe there's a good chance of finding one or more established style guide(s).
I know that Google has one for their C++ open source projects - http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
Can anyone point me to resources on why or why not write code in certain manner?
Usually it is a good practice to write separate lines of code. Like in case of large pieces of code, debugging is clearer if we write code in separate lines.
It depends! Who is going to have to read and maintain this code? Coding standards exist for two major reasons:
To make code more readable and maintainable. When there are multiple developers, it makes code more consisent (which is more readable).
To discourage common errors. For example, a standard might require putting literals first in conditionals to discourage the assignment-as-comparison bug.
How do these goals apply to your specific code? Are you prone to making mistakes? If this is Linux kernel code, it's a lot more tolerable to have code like this than if it's a web app maintained by entry level programmers.
It reads: if string t does not start with /0, then move to the end.
Then consider putting a comment on it that says that.
Surprisingly - it is usually more expensive to maintain code over time than to write it in the first place. Maintenance costs are minimized if code is more readable.
There are three audiences for your code. You should think of how valuable their time is while you are formatting:
Fellow coders, including your co-workers and code-reviewers. You
want these people to have a high reputation of you. You should write code that is easily understandable for them.
Your future self. Convoluted code may be obvious while you are
writing it, but pick it up again in two weeks, and you will not
remember what it means. The 'concise' statement that you wrote in 10
minutes will someday take you 20 minutes to decipher.
The Optimizing Compiler, which will produce efficient code no matter
whether your line is concise or not. The compiler does not care - try to save time for the other two. (Cue angry remarks about this item. I am in favor of writing efficient code, but concise styles like the one we are describing here will not affect compiler efficiency.)
Bad practice, because not easy to parse. I'd do
while (*t) ++t;
and let the compiler do the tiny bit of optimization.
The textual translation of it reads even shorter than yours
advance t until it points to a 0
Although you can write some pretty clever code in one line in C, it's usually not good practice in terms of readability and ease of maintenance. What's straightforward for you to understand may look completely foreign to someone maintaining your code in future.
You need to strike a balance between conciseness and readability. To this end, it's usually better to separate the code out so each line does one thing.

Are function pointers evil? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have been told by more senior, experienced and better-educated programmers than myself that the use of function-pointers in c should be avoided. I have seen the fact that some code contains function pointers as a rationale not to re-use that code, even when the only alternative is complete re-implementation. Upon further discussion I haven't been able to determine why this would be. I am happy to use function pointers where appropriate, and like the interesting and powerful things they allow you to do, but am I throwing caution to the wind by using them?
I see the pros and cons of function pointers as follows:
Pros:
Great opportunity for code modularity
OO-like features in non-OO c (i.e. code and data in the same object)
How else could you reasonably implement a callback?
Cons:
Negative impact to code readability - not always obvious what function is actually called when a function pointer is invoked
Minor performance hit compared to a direct function call
I think Con # 1. can usually reasonably be mitigated by well chosen symbol names and good comments. And Con # 2. will in general not be a big deal. Am I missing something - are there other reasons to avoid function pointers like the plague?
This question looks a little discussion-ey, but I'm looking for good reasons why I shouldn't use function pointers, not opinions
Function pointers are not evil. The main times you "shouldn't" use them are when either:
The use is gratuitous, i.e. not actually needed for what you're doing, or
In situations where you're writing hardened code and the function pointer might be stored at a location you're concerned may be a likely candidate for buffer overflow attacks.
As for when function pointers are needed, Adam's answer provided some good examples. The common theme in all those examples is that the caller needs to be able to provide part of the code that runs from the called function. Without function pointers, the only way you could do this would be to copy-and-paste the implementation of the function and change part of it to call a different function, for every individual usage case. For qsort and bsearch, which can be implemented portably, this would just be a nuisance and hideously ugly. For thread creation, on the other hand, without function pointers you would have to copy and paste part of the system implementation for the particular OS you're running on, and adapt it to call the function you want called. This is obviously unacceptable; your program would then be completely non-portable.
As such, function pointers are absolutely necessary for some tasks, and for other tasks, they are a major convenience which allows general code to be reused. I see no reason why they should not be used in such cases.
No, they're not evil. They're absolute necessary in order to implement various features such as callback functions in C.
Without function pointers, you could not implement:
qsort(3)
bsearch(3)
Window procedures
Threads
Signal handlers
And many more.

Resources