Can Smart Assembly 7+ strings be deobfuscated? - obfuscation

I am planning on using Smart Assembly 7+ for obfuscating my .NET C# library.
But when I look through some forums I came across that there are even programs to deobfuscate DLLs protected with Smart Assembly, particularly programs like de4dot.
So I tried to deobfuscate my program using de4dot, and I got most of my logic decompiled successfully to my surprise. But thankfully the strings were not decompiled.
They were in the form of Class24.getString_0(5050)
If the strings cannot be decompiled properly by any deobfuscator, then it is enough to protect my core logic. But I am paranoid that maybe I did not use the deobfuscator properly and there are ways to deobfuscate strings even(but I tried running the deobfuscator commands for strings, as stated in the repo wiki).
Basically my question is, can I be certain that strings obfuscated by the SmartAssembly cannot be decompiled by any deobfuscator program in the market.
Also, any good suggestions for obfuscating the .NET libraries are also welcomed.
Thank You All!

In order for your code to run, the computer must understand it. There is no way around that. If the CLR can understand your code, there is no reason that a de-obfuscator cannot understand your code either.
Plus, computers are much stupider than humans. If a computer can understand your code, then a human definitely can.
The typical approaches to protecting your code, are:
Don't give the customers your code. Run it on your own computer and give them access to it. (That's the "Google approach".)
Give the customers a computer that you control 100% with your code pre-installed. (That's the "PlayStation approach".)
Don't do business with criminals. Copying your code is illegal pretty much everywhere. Circumventing protections in your code is illegal in several countries, including some of the biggest markets (e.g. the US). Reverse engineering your code may be legal, but only under very strict circumstances. (E.g. in the EU, reverse engineering is only legal for purposes of interoperability, and only if you refuse to make the information required for interoperability available under reasonable and non-discriminatory terms.)
Offer your customers extra services that your competitors, even if they were stealing your code, don't or cannot offer. For a lot of companies, the mere fact of "having someone they can sue" is already reason enough to buy the original software from the original vendor. Criminals are lazy, that's why they are criminals. They will never understand the problem domain as deeply as you do, simply because they are too lazy to put in the work, so they will never be able to provide enhancements, consulting, support, or bug fixes as well, as fast, and as precise as you can.

Related

Obfuscating C++ Shared Library

I've been asked to help obfuscate a library (written in C++) which will be distributed to clients. I've already discussed why obfuscation is not necessarily a good idea, and seeing as licensing will be integrated into the software many concerns regarding copy protection are moot.
Regardless, I've been asked to research methods anyway. I've looked into header mangling (and the like) as well as HARES, but I fail to find much that I can use for a library (naturally, these things would destroy any form of API rendering the library useless).
What techniques can I apply that would work for libraries? While I would appreciate recommendations for tools (or compiler flags, etc.) that might be helpful I would like to stress that this is not a tool-focused (i.e. closable) question, but rather one focused on applicable techniques.
I wouldn't put much energy to doing it very thoroughly, because the reverse engineer is going to win this round.
https://softwareengineering.stackexchange.com/questions/155131/is-it-important-to-obfuscate-c-application-code
Obfuscating C++ binaries is a bit of a losing battle. It depends on who you are dealing with, but if your reverse engineer is smart enough to use IDA Pro and a couple of plugins, and a good debugger then it shall all be for naught.
Obfuscation Priorities
Where you can give the reverse engineer useless function names.
Honestly, this doesn't help that much, since ultimately your code will have to call some kind of non obfuscated shared library to get anything done. At some point you will use the standard libary, or the STL, or even make a system call.
Add false pathways to confound static analysis
So that the reverse engineer can have fun with a debugger. Anti-analysis techniques are well known to the reverse engineer, and they can almost be circumvented with a debugger like ollydbg.
Write debugger foiling code
That reverse engineers love to play with. Again, this is an expected move, and the response is to just to step around the offending code, or to modify away the traps. Anyone with any formal training in RE will blast past this.
Pack most of my binary into an encryped region which is decryped by a stub just before execution.
Same answer as above. Reverse engineers train for this from day one.
Keep in mind the reverse engineers are looking for targeted morsals of information - very rarely are they trying to recreate the entire application. Security intensive code, code for license validation, code for home base communication, networking code. These are all prime targets - put your energy into making these thorny places to live.
Keep in mind that binaries from the largest corporations on the earth are routinely reverse engineered by people in their early 20's.
Don't leave your debugging symbols in the final binary, as those will definitely help with analysis.
If you are dedicated to doing this right, also focus on wasting the engineers time - time is always against the reverse engineer.
Remember, that any meaningful obfuscation might also cost you the performance gains that justified working in C++ in the first place. There are many zones in the C world (and for that matter the Java world) where meaningful obfuscation just isn't possible. Games for instance, cannot conceal their calls to the OpenGl APIs, nor can they truly prevent engineers from harvesting their shader code.
Also remember that the reverse engineer is watching your code at the assembly level most of the time. He'd rather have your function names, but he can live without it if need be. He can see what your program is doing at the most finite level possible. It is only a matter of time before he finds the critical routines.
For your purposes, find a program to mangle function names, make your boss happy, and call it a day. At least at that point, reverse engineering the software will not be trivial.
Well really you have 2 primary vectors that you have to guard against
Disassembley
Debugging
My favourite method for preventing the first issue is in memory decryption, take parts of your executable code and encrypt it, have it self decrypt in memory while your library is running, you can also checksum parts of the code and compare the checksum against what is loaded in ram ( have the encrypted portions check the decrypter and vice versa )
Another neat trick is to statically link libraries that you use into your executible so they cannot be easily swapped out to try to see what your code is doing.
Now debugging checking interrupt vectors helps, another trick is to check the 'timing' between various portions of code ( for example if more than a couple of milliseconds worth of delay occurs in code that should execute significantly faster than that then it can be assumed that the code is being debugged

Moving from VMS to Unix

Once upon a time, a team of guys sat down and wrote an application in C, running on VMS on a VAX. It was a rather important undertaking and runs a reasonably important back-end operation at LargeCo. This whole shebang works so well that twenty-five years later it's still chugging along and doing it's thing.
Time passes and people retire and it so happens that the Last Man Standing has turned over the keys to a new generation who - we might imagine - are less than thrilled to find themselves caretakers of a system old enough to be their younger brother. Yet, as underwhelmed as they are by the idea of dealing with Ultra Legacy Systems, they can't justify the cost of replacing the venerable application.
LMS discovered that I habla unix and put this question to me. And since I habla unix but don't speak the C I shall summarize and put it to you. Long Story Short:
LMS wants to port LegacyApp, written in C. from VMS to unix. Resources? Any books he can read? People he can talk to?
The first question I'd need to ask is why, and I'd be leading the conversation in the direction of "Do you really need to port it off of VMS". There are a number of things worth mentioning about VMS:
-> VMS is still actively developed and maintained by HP. They just release V8.4 for Field Test last week (see http://h71000.www7.hp.com/openvmsft/).
-> VMS is available on new hardware; specifically HP's Integrity servers based on the Itanium processor.
-> VMS is also available on virtual platforms via the Charon Emulation products.
-> Popular estimates are that there are about 300,000 VMS systems still in active use today. LMS may be the last man at LargeCo, but he's far from the last man standing worldwide.
-> Lots of information out there, see openvms.org for example, to see lots of current information on VMS, all from current users.
OK - you still want to port off of VMS. How do you do it? Well, it depends on lots of stuff.
-> As others have said, how standard is the code? Chances are, not very. The more VMS-isms, the more difficult the job. 'nuff said.
-> What is the database? If it's Oracle, probably not too tough to move to Oracle on some other platform. If it's some sort of custom DB based on RMS index files, then you've got more work to do, you'll need to re-create that pseudo DB, or, understand it enough to replace it with some relational DB.
-> Besides C, what else is used to create the application? What's on the front end? DECforms? FMS? Is there a transaction engine, e.g. ACMS? RTR? These things will have a huge impact on the feasibility and effort required to port to UNIX.
-> What other products are involved? Are there any 3rd party libraries being used? Are there 3rd party products in use that are critical to the application or functionality?
-> Is this system clustered? If so why? You'll need to meet those same goals with the UNIX box.
-> There are companies out there that will help you do it, and claim to have tools to make it easier, but my experience is that these companies tend to be selling you more services than products (i.e. you need to hire them to use the tools. It'll be expensive).
The book UNIX for OpenVMS Users will give the VMS novice some help in understanding VMS, but, as the title says, the book is really intended for the opposite purpose.
Everything written on VMS uses lots of VMS specific stuff it was just so convenient.
There are a few companies that sell compatibility libs to make the port easier - they wont be cheap though, VMS tended to be used where reliability mattered more than cost.
The other option is to run openVMS on some modern hardware, possibly in a VM.
I am sure Brian has made his decision by now, but for my sins of working for many years in DEC OpenVMS language support (yes, some people had this dubious honour) the real question I would have asked a customer such as Brian is: is it a real-time application or not? If it is the former, then it would be heavily dependent on many VMS system services which would rule out a 'port' and indicate a re-write. If it were the latter then the frequency of VMS system services should (possibly) be limited and make a port viable.
The acid test for me, would be to SEARCH *.c "SYS$", "LIB$" i.e. to search all of the C source files for "SYS$" and "LIB$" tags which prefix VMS system services. If the count for these are in the 10s then a port is probably likely, between 10 and 100 makes it possibly likely, but over a 100 makes a successful port highly unlikely.
Hope this helps
You have several choices.
Get the OpenVMS source, and continue to maintain Open VMS as if it were a Linux distribution. Some folks don't mind keeping up with Linux distributions and OpenVMS distributions. It can be done.
Try to recompile the VMS C into Linux. This can be trivial if the C used only standard libraries. This can be very, very difficult if the C used a lot of VMS libraries.
Once you have facts at your fingertips, you can reevaluate this course of action. Since you didn't list a bunch of VMS library methods this program uses, it's impossible to tell how entangled it is with the OS.
This may be trivial or impossible. It's difficult to tell without analysis of the source.
Write bridge libraries from VMS to Linux. If your program only does a few VMS things, this isn't very difficult. If your program does extensive VMS things, this is craziness.
The bridge -- in the long run -- is a terrible idea. Managers love it, however.
An alternative is to replace the VMS library calls with proper, portable Linux calls rather than write bridges. This is better in the long run, because it excises the non-portable features of the program.
Rewrite it from scratch in Python. That is usually simpler than trying to port the C code. It will be shorter, cleaner, simpler, and portable.
If you're willing to keep running VMS in a VM, you can look into CHARON-VAX ( http://www.charon-vax.com/ ). As previously mentioned, the ease of porting really depends a lot on how much of the VMS extensions were used; searching the source code for $ characters embedded in strings (usually with a 3-character leading substring, such as lib$gettime or dsc$descriptor or sys$foobar etc) will give you at least a basic idea of what VMS system functions are called and how likely they are to be portable, if the name is reasonably obvious.
If it ain't broke, don't fix it! Why port it or migrate the app if you don't have to? Why not run it on a current install of OpenVMS running on an HP Itanium server; that is assuming you wish to upgrade the hardware, which may not even be necessary if your VAX hardware is still running strong.
To learn C, you might as well drag it from the horse's mouth: "The C Programming Language" by its inventors, Kernighan and Ritchie.
I can recommend "The UNIX programming environment" by (again) Brian Kernighan; a more authoritative source you'll hardly find, and it teaches you both Unix/C idioms and a bit of C programming at the same time.
For more depth and detail on C, I heartily enjoyed a book by Peter van der Linden: "Expert C Programming - Deep C Secrets".
You'll also want to wrestle LMS for a library documentation of VMS-specific C functions with (of course) special emphasis on those actually used in the app. That's where your porting effort will be.
The job could be easy or difficult, depending on how much machine-specific cleverness and bit-twiddling is done, and how many VMS-specific system calls are used. It would be very good if word size was equal (in other words, if your VMS box has a word size of 32 bits, don't run the code on a 64 bit version of Unix!)
Brian, I'm not sure if LMS specified/cared to port C-code or the WHOLE process. As too often people think of languages out of scope of systems.
If there're was a process built on VMS, most likely it used at least scheduling/batch facilities, which are often scripted in DCL (rather simple and clear language, unlike shell or perl scripting).
So the cost of porting the whole process may be higher than originally perceived by your LMS. Add here the reliability aspect, given your crunches with C, which is nothing impossible, of course, with enthusiasm and determination.
If you want simply give the C-code a try, as previously posted, search it for the "$" hits. Or just cc it with all headers present, the basics of compile-link command should be enough.
Alternatively, this looks like a consultant's call, as indeed such jobs were abundant at the "exodus" time. All said VMS remains quite a robust platform (24x7 is a norm!), unless the harware dies, then there're still tons of "exodus" spares. GOOD LUCK!
About a year and a half later, maybe you've already figured out what to do. My organization has recently decided to stick with OpenVMS instead of switching to Linux even though the old guard recently left. We just couldn't argue with what we felt was a very stable and reliable system. We are currently switching from Alpha servers to Integrity servers for end of life reasons. HP has been very helpful with our transition.
For that matter, there may be Linux vendors out there who can help with the transition. Ask your new hardware vendor if they have any recommendations.
Depending on what languages you already know, C is not that hard to learn. I taught myself C in the course of learning C++ after finally prying myself loose from Pascal.
(VAX Pascal, plus Rdb/VMS, plus DCL formed a combination that was hard to beat.)
If the software is typical C, you'll spend more time learning the library functions than learning the language.
It's pretty lightweight stuff, but I went through the online tutorials for C++ that Microsoft makes available in conjunction with the express edition of Visual Studio for C++.
Here's the beginner's tutorial:
http://msdn.microsoft.com/en-us/beginner/cc305129.aspx
It's probably worth making the effort to ask why LMS wants to port the application to Unix. The answer may seem obvious, but properly exploring the reasons has its benefits. I would assume:
OpenVMS is an "ultra legacy platform", and for that reason alone is something that is not worth running an application on anymore;
It's tough to find anyone who is willing to maintain an application that runs on OpenVMS these days;
The hardware on-which OpenVMS runs is threatening to become moribund.
We have a similar challenge, but in our case the application in question not only runs on OpenVMS but is also written in COBOL. I would have to say that your situation is rosy in comparison given that your application is written in a cross-platform language.
In any case, I think if you're about to make a big decision like moving from OpenVMS to Unix it would be prudent to do a little due diligence. In your case, try to assess just how portable the code is--only then will you know what the scale of the effort is (worst case could quite easily be a multiple of best case). In C, code portability is mostly a function of the dependencies--are they "standard" or are they VMS-specific?
Our enquiries revealed that HP would be supporting OpenVMS on Itanium until at least 2022. There isn't necessarily a need to rush to another platform--perhaps you could keep things on OpenVMS whilst embarking on an effort to prepare the application for porting (make it less dependent on OpenVMS specifics).
VMS has a surprisingly healthy community and if it's the lack of Unix that's the issue, then maybe GNV could help bridge the gap?
Well u have a few options. if this code needs to be ported rather quickly, i would write a bridge library to emulate the vms libs. whener you get it back up and running on a *nix, then go through replacing the vms library calls with native/portable calls for *nix.
Also if there is a lot of optimizations in the code ie inline assembly and bit twiddling. then you will have to rewrite thi code, which will take an understanding of the VAX arch. also. be sure to check word size differences and endian differences

Why do you obfuscate your code?

Have you ever obfuscated your code before? Are there ever legitimate reasons to do so?
I have obfuscated my JavaScript. It made it smaller, thus reducing download times. In addition, since the code is handed to the client, my company didn't want them to be able to read it.
Yes, to make it harder to reverse engineer.
To ensure a job for life, of course (kidding).
This is pretty hilarious and educational: How to Write Unmaintanable Code.
It's called "Job Security". This is also the reason to use Perl -- no need to do obfuscation as separate task, hence higher productivity, without loss of job security.
Call it "security through obsfuscability" if you will.
I don't believe making reverse engineering harder is a valid reason.
A good reason to obfuscate your code is to reduce the compiled footprint. For instance, J2ME appliactions need to be as small as possible. If you run you app through an obfuscator (and optimiser) then you can reduce the jar from a couple of Mb to a few hundred Kb.
The other point, nestled above, is that most obfuscators are also optimisers which can improve your application's performance.
Isn't this also used as security through obscurity? When your source code is publically available (javascript etc) you might want to at least it somewhat harder to understand what is actually occuring on the client side.
Security is always full of compromises. but i think that security by obscurity is one of the least effective methods.
I believe all TV cable boxes will have the java code obfuscated. This does make things harder to hack, and since the cable boxes will be in your home, they are theoretically hackable.
I'm not sure how much it will matter since the cable card will still control signal encryption and gets its authorization straight from the video source rather than the java code guide or java apps, but they are pretty dedicated to the concept.
By the way, it is not easy to trace exceptions thrown from an obfuscated stack! I actually memorized at one point that aH meant "Null Pointer Exception" for a particular build.
I remember creating a Windows Service for Online Backup application that was built in .NET. I could easily use either Visual Studio or tools like .NET Reflector to see the classes and the source code inside it.
I created a new Visual Studio Test application and added the Windows Service reference to it. Double clicked on the reference and I can see all the classes, namespaces everything (not the source code though). Anybody can figure out the internal working of your modules by looking at the class names. In my case, one such class was FTPHandler that clearly tells where the backups are going.
.NET Reflector goes beyond that by showing the actual code. It even has an option to Export the whole project so you get a VS project with all the classes and source code similar to what the developer had.
I think it makes sense to obfuscate, to make it atleast harder if not impossible for someone to disassemble. Also I think it makes sense for products involving large customer base where you do not want your competitors to know much about your products.
Looking at some of the code I wrote for my disk driver project makes me question what it means to be obfuscated.
((int8_t (*)( int32_t, void * )) hdd->_ctrl)( DISK_CMD_REQUEST, (void *) dr );
Or is that just system programming in C? Or should that line be written differently? Questions...
Yes and no, I haven't delivered apps with a tool that was easy decompilable.
I did run something like obfuscators for old Basic and UCSD Pascal interpreters, but that was for a different reason, optimizing run time.
If I am delivering Java Swing apps to clients, I always obfuscate the class files before distribution.
You can never be too careful - I once pointed a decent Java decompiler (I used the JD Java Decompiler - http://www.djjavadecompiler.com/ ) at my class files and was rewarded with an almost perfect reproduction of the original code. That was rather unnerving, so I started obfuscating my production code ever since. I use Klassmaster myself (http://www.zelix.com/klassmaster/)
I obfuscated code of my Android applications mostly. I used ProGuard tool to obfuscate the code.
When I worked on the C# project, our team used the ArmDot. It's licensing and obfuscation system.
Modern obfuscators are used not only to make hacking process difficult. They are able to protect programs and games from cheating, check licenses/keys and even optimize code.
But I don't think it is necessary to use obfuscator in every project.
It's most commonly done when you need to provide something in source (usually due to the environment it's being built in, such as systems without shared libraries, especially if you as the seller don't have the exact system being build for), but you don't want the person you're giving it to to be able to modify or extend it significantly (or at all).
This used to be far more common than today. It also led to the (defunct?) Obfuscated C Contest.
A legal (though arguably not "legitimate") use might be to release "source" for an app you're linking with GPL code in obfuscated fashion. It's source, it can be modified, it's just very hard. That would be a more extreme version of releasing it without comments, or releasing with all whitespace trimmed, or (and this would be pushing the legal grounds probably) releasing assembler source generated from C (and perhaps hand-tweaked so you can say it's not just intermediate code).

What type of programs are best written in C [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Joel and company expound on the virtues of learning C and how the best way to learn a language is to actually write programs using that use it. To that effect, which types of applications are most suitable to the C programming language?
Edit:
I am looking for what C is good at. This likely does not coincide with the best way of learning C.
Code where you need absolute control over memory management. Code where you need to be utterly in control of speed versus memory trade-offs. Very low-level file manipulation (such as access to the raw devices).
Examples include OS kernel, and embedded apps.
In the late 1980s, I was head of the maintenance team on a C system that was more than a million lines of code. It did database access (Oracle), X Windows graphics, interprocess communications, all sorts of good stuff. It ran on VMS and several varieties of Unix. But if I were to recreate that system today, I sure wouldn't use C, I'd use Java. Others would probably use C#.
Low level functions such as OS kernel and drivers. For those, C is unbeatable.
You can use C to write anything. It is a very general purpose language. After doing so for a little while you will understand why there are other "higher level" languages available.
"Learn C", by all means, but don't don't stick with it for too long. Other languages are far more productive.
I think the gist of the people who say you need to learn C is so that you understand the relationship between high level code and the machine internals and what exaclty happens with bits, bytes, program execution, etc.
Learn that, and then move on.
Those 100 lines of python code that were accounting for 80% of your execution time.
Small apps that don't have a UI, especially when you're trying to learn.
Edit: After thinking a little more on this, I'd add the following: if you already know a higher-level language and you're trying to learn more about C, a good route may be to not create a whole new C app, but instead create a C DLL and add functions to it that you can call from the higher language. In this way you can replace simple functions that your high language already has to ensure that you C function does what it should (gives you pre-built testing), lets you code mostly in what you're familiar with, uses the language in a problem you're already familiar with, and teaches you about interop.
Anything where you think of using assembly.
Number crunching (for example, libraries to be used at a higher level from some other language like Python).
Embedded systems programming.
A lot of people are saying OS kernel and device drivers which are, of course, good applications for C. But C is also useful for writing any performance critical applications that need to use every bit of performance the hardware is capable of.
I'm thinking of applications like database management systems (mySQL, Oracle, SQL Server), web servers (apache, IIS), or even we browsers (look at the way chrome was written).
You can do so many optimizations in C that are just impossible in languages that run in virtual machines like Java or .NET. For instance, databases and servers support many simultaneous users and need to scale very well. A database may need to share data structures between multiple users (threads/processes), but do so in a way that efficiently uses CPU caches. In C, you can use an operating system call to determine the size of the cache, and then align a data structure appropriately to the cache line so that the line does not "ping pong" between caches when multiple threads access adjacent, but unrelated data (so called "false sharing). This is one example. There are many others.
A bootloader. Some assembly also required, which is actually very nice..
Where you feel the need for 100% control over your program.
This is often the case in lower layer OS stuff like device drivers,
or real embedded devices based on MCU:s etc etc (all this and other is already mentioned above)
But please note that C is a mature language that has been around for many years
and will be around for many more years,
it has many really good debugging tools and still a huge number off developers that use it.
(It probably has lost a lot to more trendy languages, but it is still huge)
All its strengths and weaknesses are well know, the language will probably not change any more.
So there are not much room for surprises...
This also means that it would probably be a good choice if you have a application with a long expected life cycle.
/Johan
Anything where you need a minimum of "magic" and need the computer to do exactly what you tell it to, no more and no less. Anything where you don't trust the "magic" of garbage collection to handle your memory because it might not be as efficient as what you can hand-code. Anything where you don't trust the "magic" of built-in arrays, strings, etc. not to waste too much space. Anything where you want to be able to reason about exactly what ASM instructions the compiler will emit for a given piece of code.
In other words, not too much in the real world. Most things would benefit more from higher level abstraction than from this kind of control. However, OS code, device drivers, and a few things that have to be near optimal in both space and speed might make sense to write in C. Higher level languages can do pretty well competing with C on speed, but not necessarily on space.
Embedded stuff, where memory-usage and cpu-speed matters.
The interrupt handler part of an OS (and maybe two or three more functions in it).
Even if some of you will now start to bash heavily on me now:
I dont think that any decent app should be written in C - it is way too error prone.
(and yes, I do know what I am talking about, having written an awful lot of code in C myself (OSes, compilers, VMs, network protocols, RT-control stuff etc.).
Using a high level language makes you so much more productive. Speed is typically gained by keeping the 10-90 rule in mind: 90% of CPU time is spent in 10% of your code (which you should optimize first).
Also, using the right algorithm might give more performance than optimizing bits in C. And doing things right in C is so much more trouble.
PS: I do really mean what I wrote in the second sentence above; you can write a complete high performance OS in a high level language like Lisp or Smalltalk, with only a few minor parts in C. Think how the 70's Lisp machines would fly on todays hardware...
Garbage collectors!
Also, simply programs whose primary job is to make operating-system calls. For example, I need to write a short C program called timeout that
Takes a command line as argument, with a number of seconds for that command to run
Forks two child processes, one to run the command and one to sleep for N seconds
When the first of the child processes exits, kills the other, then exits
The effect will be to run a command with a limit on wall-clock time.
I and others on this forum have tried several different solutions using shells and/or perl. All are convoluted and none quite do the right thing. In C the solution will be easy, because all the OS facilities are right where you can get at them.
A few kinds that I can think of:
Systems programming that directly uses Unix/Linux or Win32 system calls
Performance-critical code that doesn't have much string manipulation in it (e.g., number crunching)
Embedded programming or other applications that are severely resource-constrained
Basically, C gives you portable, efficient access to the native capabilities of the machine; everything else is your responsibility. In particular, string manipulation in C is tedious, error-prone, and generally nasty; the most effective way to do extensive string operations with C may be to use it to implement a language that handles strings better...
examples are: embedded apps, kernel code, drivers, raw sockets.
However if you find yourself more productive in C then go ahead and build whatever you wish. Use the language as a tool to get your problem solved.
c compiler
Researches in maths and physics. There are probably two alternatives: C and C++, but such features of the latter as encapsulation and inheritance are not very useful there. One could prefer to use C++ "as a better C" or just stay with C.
Well most people are suggesting system programming related things like OS Kernels , Device Drivers etc. These are difficult and Time consuming. Maybe the most fun thing to with C is console programming. Have you heard of the HAM SDK? It is a complete software development kit for the Nintendo GBA , and making games for it is fun. There is also the CC65 Compiler which supports NES Programming (Althought Not Completely). You can also make good Emulators. Trust Me , C is pretty helpful. I was originally a Python fan, and hated C because it was complex. But after yuoget used to it, you can do anything with C. Now I use CPython to embed Python in my C Programs(if needed) and code mostly in C.
C is also great for portability , There is a C Compiler for like every OS and Almost Every Console And Mobile Device. Ive even seen one that supports some calculators!
Well, if you want to learn C and have some fun at the same time, might I suggest obtaining NXC and a Lego Mindstorms set? NXC is a C compiler for the Lego Mindstorms.
Another advantage of this approach is that you can compare the effort to program the Mindstorms "brick" with C and then try LEJOS and see how it goes with Java.
All great fun.
Implicit in your question is the assumption that a 'high-level' language like Python or Perl (or Java or ...) is fast enough, small enough, ... enough for most applications. This is of course true for most apps and some choice X of language. Given that, your language of choice almost certainly has a foreign function interface (FFI). Since you're looking to learn C, create a module in the FFI built in C.
For example, let's assume that your tool of choice is Python. Reimplement a subset of Numpy in C. Since C is a pretty fast language, and has, in C99, a clear numeric library interface, you'll get the opportunity to experience the power of C in an appropriate setting.
ISAPI filters for Internet Information Server.
Before actually write C code, i would suggest first read good C code.
Choose subject you want to concentrate on, basically any application can be written in C, but i assume GUI application will be not your first choice, and find few open source projects to look into.
Not any open source project is best code to look. I assume that after you will select a subject there is a place for another question, ask for best open source project in the field.
Play with it, understand how it's working modify some functionality...
Best way to learn is learn from somebody good.
Photoshop plugin filters. Lots and lots of interesting graphical manipulation you can do with those and you'll need pure C to do it in.
For instance that's a gateway to fractal generation, fourier transforms, fuzzy algorithms etc etc. Really anything that can manipulate image/color data and give interesting results
Don't treat C as a beefed up assembler. I've done some serious app's in it when it was the natural language (e.g., the target machine was a Solaris box with X11).
Write something with some meat on it. Write a client server chess program, where the AI is on a server and the UI is displaying in X11; once you've done that you will really know C.
I wonder why nobody stated the obvious:
Web applications.
Any place where the underlying libraries are entirely in C is a good candidate for staying in C - openGL, Lua extensions, PHP extensions, old-school windows.h, etc.
I prefer C for anything like parsing, code generation - anything that doesn't need a lot of data structure (read OOP). It's library footprint is very light, because class libraries are non-existent. I realize I'm unusual in this, but just as lots of things are "considered harmful", I try to have/use as little data structure as possible.
Following on from what someone else said. C seems a good language to implement the language in which you write the rest of your software.
And (mutatis mutandis) the virtual machine which runs the rest of your software.
I'd say that with the minuscule runtime environment and it's self-contained nature, you might start by creating some CLI utilities such as grep or tail (or half the commands in Unix). Anything that uses only STDOUT, STDIN and file manipulation is a good candidate.
This isn't exactly answering your question because I wouldn't actually CHOOSE to use C in such an app, but I hope it's answering the question you meant to ask--"what would be a good type of app to use learn C on?"
C isn't actually that bad a language--it's pretty easily to understand your code at an assembly language level which is quite useful, and the language constructs are few, leaving a very sparse language.
To answer your actual question, the very best use of C is the one for which it was created--porting itself (and UNIX) to other CPU architectures. This explains the lack of language features very well; it also explains the existence of Pointers which are really just a construct to make the compiler work less--any code created with pointers could be created without it (just as well optimized), but it becomes much harder to create a compiler to do so.
digital signal processing, all Pure Data extensions are written in C, this can be done in other languages also but has had good success in C

C (or any) compilers deterministic performance

Whilst working on a recent project, I was visited by a customer QA representitive, who asked me a question that I hadn't really considered before:
How do you know that the compiler you are using generates machine code that matches the c code's functionality exactly and that the compiler is fully deterministic?
To this question I had absolutely no reply as I have always taken the compiler for granted. It takes in code and spews out machine code. How can I go about and test that the compiler isn't actually adding functionality that I haven't asked it for? or even more dangerously implementing code in a slightly different manner to that which I expect?
I am aware that this is perhapse not really an issue for everyone, and indeed the answer might just be... "you're over a barrel and deal with it". However, when working in an embedded environment, you trust your compiler implicitly. How can I prove to myself and QA that I am right in doing so?
You can apply that argument at any level: do you trust the third party libraries? do you trust the OS? do you trust the processor?
A good example of why this may be a valid concern of course, is how Ken Thompson put a backdoor into the original 'login' program ... and modified the C compiler so that even if you recompiled login you still got the backdoor. See this posting for more details.
Similar questions have been raised about encryption algorithms -- how do we know there isn't a backdoor in DES for the NSA to snoop through?
At the end of the you have to decide if you trust the infrastructure you are building on enough to not worry about it, otherwise you have to start developing your own silicon chips!
For safety critical embedded application certifying agencies require to satisfy the "proven-in-use" requirement for the compiler. There are typically certain requirements (kind of like "hours of operation") that need to be met and proven by detailed documentation. However, most people either cannot or don't want to meet these requirements because it can be very difficult especially on your first project with a new target/compiler.
One other approach is basically to NOT trust the compiler's output at all. Any compiler and even language-dependent (Appendix G of the C-90 standard, anyone?) deficiencies need to be covered by a strict set of static analysis, unit- and coverage testing in addition to the later functional testing.
A standard like MISRA-C can help to restrict the input to the compiler to a "safe" subset of the C language. Another approach is to restrict the input to a compiler to a subset of a language and test what the output for the entire subset is. If our application is only built of components from the subset it is assumed to be known what the output of the compiler will be. The usually goes by "qualification of the compiler".
The goal of all of this is to be able to answer the QA representative's question with "We don't just rely on determinism of the compiler but this is the way we prove it...".
You know by testing. When you test, you're testing your both code and the compiler.
You will find that the odds that you or the compiler writer have made an error are much smaller than the odds that you would make an error if you wrote the program in question in some assembly language.
There are compiler validation suits available.
The one I remember is "Perennial".
When I worked on a C compiler for a embedded SOC processor we had to validate the compiler against this and two other validation suits (that I forget the name of). Validating the compiler to a certain level of conformance to these test suits was part of the contract.
It all boils down to trust. Does your customer trust any compiler? Use that, or at least compare output code between yours and theirs.
If they don't trust any, is there a reference implementation for the language? Could you convince them to trust it? Then compare yours against the reference or use the reference.
This all assuming you actually verify the actual code you get from the vendor/provider and that you check the compiler has not been tampered with, which should be the first step.
Anyhow this still leaves the question about how would you verify, without having references, a compiler, from scratch. That certainly looks like a ton of work and requires a definition of the language, which not always is available, sometimes the definition is the compiler.
How do you know that the compiler you are using generates machine code that matches the c code's functionality exactly and that the compiler is fully deterministic?
You don't, that's why you test the resultant binary, and why you make sure to ship the same binary you tested with. And why when you make 'minor' software changes, you regression test to make sure none of the old functionality broke.
The only software I've certified is avionics. FAA certification isn't rigorous enough to prove the software works correctly, while at the same time it does force you to jump through a certain amount of hoops. The trick is to structure your 'process' so it improves quality as much as possible, with as little extraneous hoop-jumping as you can get away with. So anything that you know is worthless and won't actually find bugs, you can probably weasel out of. And anything you know you should do because it will find bugs that isn't explicitly asked for by the FAA, your best bet is to twist words until it sounds like you're giving the FAA/your QA people what they asked for.
This actually isn't as dishonest as I've made it sound, in general the FAA cares more about you being conscientious and confident that you're trying to do a good job, than about what exactly you do.
Some intellectual ammunition might be found in Crosstalk, a magazine for defense software engineers. This question is the kind of thing they spend many waking hours on. http://www.stsc.hill.af.mil/crosstalk/2006/08/index.html (If i can find my old notes from an old project, i'll be back here...)
You can never fully trust the compiler, even highly recommended ones. They could release an update that has a bug, and your code compiles the same. This problem is compounded when updating old code with the buggy compiler, doing testing and shipping out the goods only to have the customer ring you 3 months later with a problem.
It all comes back to testing, and if there is one thing I have learnt it is to thouroughly test after any non-trivial change. If the problem seems impossible to find have a look at the compiled assembler and check it's doing what it should be doing.
On several occasions I have found bugs in the compiler. One time there was a bug where 16 bit variables would get incremented but without carry and only if the 16 bit variable was part of an extern struct defined in a header file.
...you trust your compiler implicitly
You'll stop doing that the first time you come across a compiler bug. ;-)
But ultimately this is what testing is for. It doesn't matter to your test regime how the bug got in to your product in the first place, all that matters is that it didn't pass your extensive testing regime.
Well.. you can't simply say that you trust your compiler's output - particularly if you work with embedded code. It is not hard to find discrepancies between the code generated when compiling the very same code with different compilers. This is the case because the C standard itself is too loose. Many details can be implemented differently by different compilers without breaking the standard. How do we deal with this stuff? We avoid compiler dependent constructs whenever possible. We may deal with it by choosing a safer subset of C like Misra-C as previously mentioned by the user cschol. I seldom have to inspect the code generated by the compiler but that has also happened to me at times. But, ultimately, you are relying on your tests in order to make sure that the code behaves as intended.
Is there a better option out there? Some people claim that there is. The other option is to write your code in SPARK/Ada. I have never written code in SPARK but my understanding is that you would still have to link it against routines written in C that would deal with the "bare metal" stuff. The beauty of SPARK/Ada is that you are absolutely guaranteed that the code generated by any compiler is always going to be the same. No ambiguity whatsoever. On top of that, the language allows you to annotate the code with explanations as to how the code is intended to behave. The SPARK toolset will use these annotations to formally prove that the code written does indeed do what the annotations have described. So I have been told that for critical systems, SPARK/Ada is a pretty good bet. I have never tried it myself though.
You don't know for sure that the compiler will do exactly what you expect. The reason is, of course, that a compiler is a peice of software, and is therefore susceptible to bugs.
Compiler writers have the advantage of working from a high quality spec, while the rest of us have to figure out what we're making as we go along. However, compiler specs also have bugs, and complex parts with subtle interactions. So, it's not exactly trivial to figure out what the compiler should be doing.
Still, once you decide what you think the language spec means, you can write a good, fast, automated test for every nuance. This is where compiler writing has a huge advantage over writing other kinds of software: in testing. Every bug becomes an automated test case, and the test suite can very thorough. Compiler vendors have a lot more budget to invest in verifying the correctness of the compiler than you do (you already have a day job, right?).
What does this mean for you? It means that you need to be open to the possibilities of bugs in your compiler, but chances are you won't find any yourself.
I would pick a compiler vendor that is not likely to go out of business any time soon, that has a history of high quality in their compilers, and that has demonstrated their ability to service (patch) their products. Compilers seem to get more correct over time, so I'd choose one that's been around a decade or two.
Focus your attention on getting your code right. If it's clear and simple, then when you do hit a compiler bug, you won't have to think really hard to decide where the problem lies. Write good unit tests, which will ensure that your code does what you expect it to do.
Try unit testing.
If that's not enough, use different compilers and compare the results of your unit tests. Compare strace outputs, run your tests in a VM, keep a log of disk and network I/O, then compare those.
Or propose to write your own compiler and tell them what it's going to cost.
The most you can easily certify is that you are using an untampered compiler from provider X. If they do not trust provider X, it's their problem (if X is reasonably trustworthy). If they do not trust any compiler provider, then they are totally unreasonable.
Answering their question: I make sure I'm using an untampered compiler from X through these means. X is well reputed, plus I have a nice set of tests that show our application behaves as expected.
Everything else is starting to open the can of worms. You have to stop somewhere, as Rob says.
Sometimes you do get behavioural changes when you request aggressive levels of optimisation.
And optimisation and floating point numbers? Forget it!
For most software development (think desktop applications) the answer is probably that you don't know and don't care.
In safety-critical systems (think nuclear power plants and commercial avionics) you do care and regulatory agencies will require you to prove it. In my experience, you can do this one of two ways:
Use a qualified compiler, where "qualified" means that it has been verified according to the standards set out by the regulatory agency.
Perform object code analysis. Essentially, you compile a piece of reference code and then manually analyze the output to demonstrate that the compiler has not inserted any instructions that can't be traced back to your source code.
You get the one Dijkstra wrote.
Select a formally verified compiler, like Compcert C compiler.
Changing the optimization level of the compiler will change the output.
Slight changes to a function may make the compiler inline or no longer inline a function.
Changes to the compiler (gcc versions for example) may change the output
Certain library functions may be instrinic (i.e., emit optimized assembly) while others most are not.
The good news is that for most things it really doesn't matter that much. Where it does, you may want to consider assembly if it really matters (e.g., in an ISR).
If you are concerned about unexpected machine code which doesn't produce visible results, the only way is probably to contact compiler vendor for certification of some sort which will satisfy your customer.
Otherwise you'll know it the same you know about bugs in your code - testing.
Machine code from modern compilers can be vastly different and totally incomprehensible for puny humans.
I think it's possible to reduce this problem to the Halting Problem somehow.
The most obvious problem is that if you use some kind of program to analyze the compiler and its determinism, how do you know that your program gets compiled correctly, and produces the correct result?
If you're using another, "safe" compiler though, I'm not sure. What I'm sure is writing a compiler from scratch would probably be an easier job.
Even a qualified or certified compiler can produce undesirable results. Keep your code simple and test, test, test. That or walk through the machine code by hand while not allowing any human error. PLus the operating system or whatever environment you are running on (preferably no operating system, just your program).
This problem has been solved in mission critical environments since software and compilers began. As many of the others who have responded also know. Each industry has its own rules from certified compilers to programming style (you must always program this way, never use this or that or the other), lots of testing and peer review. Verifying every execution path, etc.
If you are not in one of those industries, then you get what you get. A commercial program on a COTS operating system on COTS hardware. It will fail, that is a guarantee.
If your worried about malicious bugs in the compiler, one recommendation (IIRC, an NSA requirement for some projects) is that the compiler binary predate the writing of the code. At least then you know that no one has added bugs targeted at your program.

Resources