How to profiling sections of code? - c

I need to profile a piece of software written in C. Now the problem is that while gprof or my own begin timer/end timer function calls would provide me time spent in each function, I would have no information about which is the most time consuming part within each function. Some may term it as micro-optimization but that is what the need of the hour is!
One of achieving this to "manually" place begin/end timer calls across for loops (there could be more than one of these). In this case, a smarter thing would be to allow enabling/disabling these calls using macros.
But I want to automate this instrumentation?
Can you tell me if a good tool exists to achieve the same? It would be ideal if I could invoke the instrumented program repeatedly from a script and then find the average of time spent in each "section" of the code. For now section is a loosely defined term but that "tool" could have a more specific definition of what a section is.
It would also be helpful if I could somehow learn which tools would be

You might try using Callgrind (one of Valgrind tools) in conjunction with KCachegrind. See also this question.

I have not used it myself, but I have heard that the Valgrind instrumentation framework (http://www.valgrind.org/) has tools that enable the very fine-grained profiling necessary for what you are trying to accomplish.

You want the code to run as fast as possible, right?
gprof is a measuring tool. It can help in evaluating alternative implementations, as the original authors wrote.
They did not say it is effective for locating the code needing an alternative implementation, and it isn't, even though nearly everyone thinks it is.
The fallacy is that measuring locates, but if you want to find an elephant in the room, do you need to measure it to know it's there?
No, you open your eyes.
Here's a way to open your eyes to what your program is doing.

Related

Obfuscating C++ Shared Library

I've been asked to help obfuscate a library (written in C++) which will be distributed to clients. I've already discussed why obfuscation is not necessarily a good idea, and seeing as licensing will be integrated into the software many concerns regarding copy protection are moot.
Regardless, I've been asked to research methods anyway. I've looked into header mangling (and the like) as well as HARES, but I fail to find much that I can use for a library (naturally, these things would destroy any form of API rendering the library useless).
What techniques can I apply that would work for libraries? While I would appreciate recommendations for tools (or compiler flags, etc.) that might be helpful I would like to stress that this is not a tool-focused (i.e. closable) question, but rather one focused on applicable techniques.
I wouldn't put much energy to doing it very thoroughly, because the reverse engineer is going to win this round.
https://softwareengineering.stackexchange.com/questions/155131/is-it-important-to-obfuscate-c-application-code
Obfuscating C++ binaries is a bit of a losing battle. It depends on who you are dealing with, but if your reverse engineer is smart enough to use IDA Pro and a couple of plugins, and a good debugger then it shall all be for naught.
Obfuscation Priorities
Where you can give the reverse engineer useless function names.
Honestly, this doesn't help that much, since ultimately your code will have to call some kind of non obfuscated shared library to get anything done. At some point you will use the standard libary, or the STL, or even make a system call.
Add false pathways to confound static analysis
So that the reverse engineer can have fun with a debugger. Anti-analysis techniques are well known to the reverse engineer, and they can almost be circumvented with a debugger like ollydbg.
Write debugger foiling code
That reverse engineers love to play with. Again, this is an expected move, and the response is to just to step around the offending code, or to modify away the traps. Anyone with any formal training in RE will blast past this.
Pack most of my binary into an encryped region which is decryped by a stub just before execution.
Same answer as above. Reverse engineers train for this from day one.
Keep in mind the reverse engineers are looking for targeted morsals of information - very rarely are they trying to recreate the entire application. Security intensive code, code for license validation, code for home base communication, networking code. These are all prime targets - put your energy into making these thorny places to live.
Keep in mind that binaries from the largest corporations on the earth are routinely reverse engineered by people in their early 20's.
Don't leave your debugging symbols in the final binary, as those will definitely help with analysis.
If you are dedicated to doing this right, also focus on wasting the engineers time - time is always against the reverse engineer.
Remember, that any meaningful obfuscation might also cost you the performance gains that justified working in C++ in the first place. There are many zones in the C world (and for that matter the Java world) where meaningful obfuscation just isn't possible. Games for instance, cannot conceal their calls to the OpenGl APIs, nor can they truly prevent engineers from harvesting their shader code.
Also remember that the reverse engineer is watching your code at the assembly level most of the time. He'd rather have your function names, but he can live without it if need be. He can see what your program is doing at the most finite level possible. It is only a matter of time before he finds the critical routines.
For your purposes, find a program to mangle function names, make your boss happy, and call it a day. At least at that point, reverse engineering the software will not be trivial.
Well really you have 2 primary vectors that you have to guard against
Disassembley
Debugging
My favourite method for preventing the first issue is in memory decryption, take parts of your executable code and encrypt it, have it self decrypt in memory while your library is running, you can also checksum parts of the code and compare the checksum against what is loaded in ram ( have the encrypted portions check the decrypter and vice versa )
Another neat trick is to statically link libraries that you use into your executible so they cannot be easily swapped out to try to see what your code is doing.
Now debugging checking interrupt vectors helps, another trick is to check the 'timing' between various portions of code ( for example if more than a couple of milliseconds worth of delay occurs in code that should execute significantly faster than that then it can be assumed that the code is being debugged

Is there a way to disable or disallow gdb or lldb access to my compiled library?

Basically, all i would like to do is to make sure no one is able to step through sensitive code.
I read somewhere it was possible, only i can't find where i read that.
thanks!
No. Fundamentally, anyone who can run the object code can inspect it to any degree.
If you don't want them to be able to run the object code, you have to run it on a machine of your choice, and only interact with the user over a network.
All techniques that claim to disable debuggers simply exploit bugs, which are usually fixed a few months later when the next version of the debugger is released; and even those are completely useless against debugging through a VM.
i would like to do is to make sure no one is able to step through sensitive code.
The "no one" part is impossible: a sophisticated attacker will be able to do it no matter what you try.
There are many techniques that will stop less sophisticated attacker, this book shows some of them.
Generally, these techniques are not worth your time -- they make field support of your software hard, they don't stop sophisticated attacker (and you only need one to succeed to render your efforts useless), and your software usually isn't that interesting to begin with.
If it is useful enough, people will buy it. If it is not, adding protections will make it even less useful.

Profiling network software / Profiling software with lot of system call waiting

I'm working on a complex network software and I have trouble determining how to improve the systems performance.
Specifically in one part of the software which is using blocking synchronous calls. Since this part of the system is doing heavy computations it's nearly impossible to determine whether the slowness of this component is caused by these computations or the waiting for the other parts of the system.
Are there any light-weight profilers that can capture this information? I can't use heavy duty profile like valgrind since that would completely skew the results (although valgrind would be perfect, since it captures all the required information).
I tried using oProfile but I just wasn't able to get any meaningful results out of it (perhaps if there is a concise tutorial somewhere...).
What you need is something that gives you stack samples, on wall-clock time (not just CPU time like gprof), and reports by line (not just by function) the percent of samples containing the line.
Zoom will do it,
but I just do random-pausing. Here's why it works.
Here's a blow-by-blow example.
Here's another explanation.
Comment out your "heavy computations" and see if it's still slow. That will tell you if it's waiting on other systems over the network or the computations. The answer may not be either/or and may just be an accumulation of things.
You could also do some old fashioned printf debugging and print the time before and after executing the function to standard output or syslog. That is about as light-weight as profiling gets.

What can you gain from looking at the binary opposed to source in c?

My friend said he thinks i may have made a mistake in my programme and wanted to see if i really did. He asked me to send him the binary opposed to the source. As i am new to this i am paranoid that he is doing someting to it? What can you do with the binary that would mean you wouldnt want the source?
thank
Black-box testing. Having the source may skew your view on how the program may be behaving.
Not much, at least not much by staring at it. But you can run it with a debugger attached, so you can set breakpoints, inspect memory areas, investigate crashes ...
However, the soure code remains the primary tool for debugging. The binary by itself is a bit useless for serious debugging (not for testing, you can greatly test software without having access to its source).
I guess if he wants to recompile your code on his machine he may want to be able to check that the binary he gets is the same as the one you get to eliminate compile options or library differences.
Now when I debug, I frequently want to see the assembly - maybe this is what he meant?
He can run it, test it, report any bugs he finds. Not much else, but that itself may be useful; people are notoriously bad at testing their own code, because they tend to believe that it is robust, and don't want to break it. An independent tester will see breaking it as a challenge. Their performance is based on the number of bugs they find; whereas your performance is based on how difficult you make that for them.
Perhaps he wanted to debug. Also, depending on how the compiler is invoked (for instance with -g for gcc) the binary might contain source code information still.

How do you profile your code?

I hope not everyone is using Rational Purify.
So what do you do when you want to measure:
time taken by a function
peak memory usage
code coverage
At the moment, we do it manually [using log statements with timestamps and another script to parse the log and output to excel. phew...)
What would you recommend? Pointing to tools or any techniques would be appreciated!
EDIT: Sorry, I didn't specify the environment first, Its plain C on a proprietary mobile platform
I've done this a lot. If you have an IDE, or an ICE, there is a technique that takes some manual effort, but works without fail.
Warning: modern programmers hate this, and I'm going to get downvoted. They love their tools. But it really works, and you don't always have the nice tools.
I assume in your case the code is something like DSP or video that runs on a timer and has to be fast. Suppose what you run on each timer tick is subroutine A. Write some test code to run subroutine A in a simple loop, say 1000 times, or long enough to make you wait at least several seconds.
While it's running, randomly halt it with a pause key and sample the call stack (not just the program counter) and record it. (That's the manual part.) Do this some number of times, like 10. Once is not enough.
Now look for commonalities between the stack samples. Look for any instruction or call instruction that appears on at least 2 samples. There will be many of these, but some of them will be in code that you could optimize.
Do so, and you will get a nice speedup, guaranteed. The 1000 iterations will take less time.
The reason you don't need a lot of samples is you're not looking for small things. Like if you see a particular call instruction on 5 out of 10 samples, it is responsible for roughly 50% of the total execution time. More samples would tell you more precisely what the percentage is, if you really want to know. If you're like me, all you want to know is where it is, so you can fix it, and move on to the next one.
Do this until you can't find anything more to optimize, and you will be at or near your top speed.
You probably want different tools for performance profiling and code coverage.
For profiling I prefer Shark on MacOSX. It is free from Apple and very good. If your app is vanilla C you should be able to use it, if you can get hold of a Mac.
For profiling on Windows you can use LTProf. Cheap, but not great:
http://successfulsoftware.net/2007/12/18/optimising-your-application/
(I think Microsoft are really shooting themself in the foot by not providing a decent profiler with the cheaper versions of Visual Studio.)
For coverage I prefer Coverage Validator on Windows:
http://successfulsoftware.net/2008/03/10/coverage-validator/
It updates the coverage in real time.
For complex applications I am a great fan of Intel's Vtune. It is a slightly different mindset to a traditional profiler that instruments the code. It works by sampling the processor to see where instruction pointer is 1,000 times a second. It has the huge advantage of not requiring any changes to your binaries, which as often as not would change the timing of what you are trying to measure.
Unfortunately it is no good for .net or java since there isn't a way for the Vtune to map instruction pointer to symbol like there is with traditional code.
It also allows you to measure all sorts of other processor/hardware centric metrics, like clocks per instruction, cache hits/misses, TLB hits/misses, etc which let you identify why certain sections of code may be taking longer to run than you would expect just by inspecting the code.
If you're doing an 'on the metal' embedded 'C' system (I'm not quite sure what 'mobile' implied in your posting), then you usually have some kind of timer ISR, in which it's fairly easy to sample the code address at which the interrupt occurred (by digging back in the stack or looking at link registers or whatever). Then it's trivial to build a histogram of addresses at some combination of granularity/range-of-interest.
It's usually then not too hard to concoct some combination of code/script/Excel sheets which merges your histogram counts with addresses from your linker symbol/list file to give you profile information.
If you're very RAM limited, it can be a bit of a pain to collect enough data for this to be both simple and useful, but you would need to tell us a more about your platform.
nProf - Free, does that for .NET.
Gets the job done, at least enough to see the 80/20. (20% of the code, taking 80% of the time)
Windows (.NET and Native Exes): AQTime is a great tool for the money. Standalone or as a Visual Studio plugin.
Java: I'm a fan of JProfiler. Again, can run standalone or as an Eclipse (or various other IDEs) plugin.
I believe both have trial versions.
The Google Perftools are extremely useful in this regard.
I use devpartner with MSVC 6 and XP
How are any tools going to work if your platform is a proprietary OS? I think you're doing the best you can right now

Resources