Testcases for a C interpreter - c

I would like to test a little ANSI C interpreter.
My tool interprets my C program. It doesn't produce machine code; and I can't access the heap/stack after execution!
I was thinking of validating return values / outputs against GCC or something like this.
I was searching for something that fits for me, but i hardly found anything FREE or open source.
Does anybody have an idea/suggestion how to test my interpreter?
Can anybody recommend something like a test suite; or a package of test cases?

I also wrote a C interpreter and mainly relied on the gcc.c - torture/execute test cases. This test suite consists of "code fragments which have historically broken easily". The advantage of this test cases is, that they have to be executed to provide a result (and not, e.g., only compiled) and then the result is matched to an expected result. You do not need to compare it with the result of an executed program, that was compiled by GCC. You basically just need to include the files in the directory and execute them which contrasts some other test suites, where you have to parse expected results from configuration files or similar. You can download the tests from the GCC repository.
One disadvantage for you might be, that the test cases are often extensive and can also include GNU extensions such as the asm statement for executing assembler code or attributes on how to align memory content. They also include some test cases for some old K&C function style notation like in the following example (not taken from the suite):
int func(a)
int a;
{
return a + 1;
}
However, I still would recommend you to look through the test cases and see what you can execute, as they also test many "normal" corner cases for ASNI C.

Related

How do I test C functions with internal ifdefs for functional equivalency?

I have a library of C functions that I optimized internally using SIMD intrinsics. These functions all look something like this:
void add_array(...) {
#if defined(USE_SIMD)
// SIMD code here ...
#else
// Scalar code here ...
#end
}
contained in individual files, so a add.c file here, for example.
Now, I would like to ensure that both variants of that function are functionally equivalent. I found that simply generating random (but valid) input for both variants and comparing the results suffices for my application. I think this is called Monkey Testing. The scalar code (or rather its output values) acts as a golden reference.
(Because of the SIMD intrinsics formal verification is not an option.)
However, I have not found a scalable and sustainable way to run these tests from a C testing framework. So far, I manually copied the vector code into an extra vector function add_array_vector() and then ran them both after another from the same main function test harness; comparing the "golden reference output" from the add_array() function with the values from the add_array_vector() variant. But this approach does not scale, since I have more than 100 of these functions that all use the #if #else approach internally. Since I have to run all code in a simulator (or on a bare-metal embedded device), I also can't interact with a file system. I need a single test binary that contains all tests and test data. It has to report its results via a printf (UART) call.
What I see as my only option is to compile the functions twice: Once without the USE_SIMD and once with the USE_SIMD defined. Then I would need to link these two function variants into the same main (my test harness). However, how do I ensure that both variants have different function names? Is there a way I can "name mangle" the USE_SIMD define into the function name? And how would I link them?
Maybe I am completely on the wrong track here and there is a far simpler way to solve this. I surely can't be the first person who came across this core issue: ensuring that two variants of the same C (or C++) function are functionally equivalent.
Any help is greatly appreciated. Thanks
EDIT: I can't afford to print the numeric results via printfs (or UART) as they are a serious bottleneck in this randomized bruteforce approach. They dramatically reduce (multiple order of magnitude) the number of iterations / tests I can run per second. Printing the final outcome or an error, if one occurs, is fine. Printing every numerical test result value for "external validation" is not sustainable.

can I edit lines of code using gdb and is it also possible to save to actual source file and header file while in same debug session? linux

I have this program called parser I compiled with -g flag this is my makefile
parser: header.h parser.c
gcc -g header.h parser.c -o parser
clean:
rm -f parser a.out
code for one function in parser.c is
int _find(char *html , struct html_tag **obj)
{
char temp[strlen("<end")+1];
memcpy(temp,"<end",strlen("<end")+1);
...
...
.
return 0;
}
What I like to see when I debug the parser or something can I also have the capability to change the lines of code after hitting breakpoint and while n through the code of above function. If its not the job of gdb then is there any opensource solution to actually changing code and possible saving so when I run through the next statement in code then changed statement before doing n (possible different index of array) will execute, is there any opensource tool or can it be done in gdb do I need to do some compiling options.
I know I can assign values to variables at runtime in gdb but is this it? like is there any thing like actually also being capable of changing soure
Most C implementations are compiled. The source code is analyzed and translated to processor instructions. This translation would be difficult to do on a piecewise basis. That is, given some small change in the source code, it would be practically impossible to update the executable file to represent those changes. As part of the translation, the compiler transforms and intertwines statements, assigns processor registers to be used for computing parts of expressions, designates places in memory to hold data, and more. When source code is changed slightly, this may result in a new compilation happening to use a different register in one place or needing more or less memory in a particular function, which results in data moving back or forth. Merging these changes into the running program would require figuring out all the differences, moving things in memory, rearranging what is in what processor register, and so on. For practical purposes, these changes are impossible.
GDB does not support this.
(Appleā€™s developer tools may have some feature like this. I saw it demonstrated for the Swift programming language but have not used it.)

GCOV static library coverage for C source code

I want to perform code coverage on a static library. For this I wrote test cases using boost. In my library I have many functions defined in header files.
For example in a header file accuracy.h I have the following functions
static float absf( float x )
{
return (x >= 0.0f) ? x : -x;
}
static boolean almost_zero( float n, float tol )
{
return (boolean)(absf( n ) <= tol);
}
I have written test cases for these functions. But the problem is GCOV shows these functions are not covered. If I move the function definition to C file then I get the proper coverage results.
I have used -fprofile-arcs -ftest-coverag for performing coverage. Does anyone has any idea on this issue.
Note:
Test cases are executed properly. I have confirmed it by debugging.
I am using MinGW gcc version 4.8.1 (GCC).
Functions in header files are difficult for coverage. It's not just a technical difficulty - it's also a presentation difficulty. These functions are copied every time the header is #included. Does full coverage require that all copies are covered? Or that one instance is covered?
From the user's perspective, both answers may be wrong.
Also, there are likely to be functions lurking in header files that the user does not care about. For instance, ctype.h has a few of these.
That's probably why coverage tools tend to ignore them entirely.
I work on a coverage tool, RapiCover, and our approach is to ignore them by default but provide an option to turn on coverage for headers. The option can be used on a file-by-file basis, and you can also specifically name the functions that you want coverage for. We found that this was the best way to support typical customer requirements.
I suggest you try forcing gcov to believe that the functions are defined in C source code rather than the header. To do this, preprocess your source file (e.g. -E option for GCC) and then filter out the # markers that indicate files and line numbers. Then do gcov on this preprocessed, filtered file. It should see all functions as part of the source code. This trick would also work with RapiCover, though it would not be necessary there.

Combining source code into a single file for optimization

I was aiming at reducing the size of the executable for my C project and I have tried all compiler/linker options, which have helped to some extent. My code consists of a lot of separate files. My question was whether combining all source code into a single file will help with optimization that I desire? I read somewhere that a compiler will optimize better if it finds all code in a single file in place of separate multiple files. Is that true?
A compiler can indeed optimize better when it finds needed code in the same compilable (*.c) file. If your program is longer than 1000 lines or so, you'll probably regret putting all the code in one file, because doing so will make your program hard to maintain, but if shorter than 500 lines, you might try the one file, and see if it does not help.
The crucial consideration is how often code in one compilable file calls or otherwise uses objects (including functions) defined in another. If there are few transfers of control across this boundary, then erasing the boundary will not help performance appreciably. Therefore, when coding for performance, the key is to put tightly related code in the same file.
I like your question a great deal. It is the right kind of question to ask, in my view; and, though the complete answer is not simple enough to treat fully in a Stackexchange answer, your pursuit of the answer will teach you much. Though you may not yet realize it, your question really regards linking, a subject every advancing programmer eventually has to learn. Your question regards symbol tables, inlining, the in-place construction of return values and several, other, subtle factors.
At any rate, if your program is shorter than 500 lines or so, then you have little to lose by trying the single-file approach. If longer than 1000 lines, then a single file is not recommended.
It depends on the compiler. The Intel C++ Composer XE for example can automatically optimize over multiple files (when building using icc -fast *.c *.cpp or icl /fast *.c *.cpp, for linux/windows respectively).
When you use Microsoft Visual Studio, or a derived product (like Atmel Studio for microcontrollers), every single source file is compiled on its own (i. e. one cl, icl, or gcc command is issued for every c and cpp file in the project). This means no optimization.
For microcontroller projects I sometimes have to put everything in a single file in order make it even fit in the limited flash memory on the controller. If your compiler/IDE does it like visual studio, you can use a trick: Select all the source files and make them not participate in the build process (but leave them in the project), then create a file (I always use whole_program.c, and #include every single source (i.e. non-header) file in it (note that including c files is frowned upon by many high level programmers, but sometimes, you have to do it the dirty way, and with microcontrollers, that's actually more often than not).
My experience has been that with gnu/gcc the optimization is within the single file plus includes to create a single object. With clang/llvm it is quite easy and I recommend, DO NOT optimize the clang step, use clang to get from C to bytecode, the use llvm-link to link all of your bytecode modules into one bytecode module, then you can optimize the whole project, all source files optimized together, the llc adds more optimization as it heads for the target. Your best results are to tell clang using the something triple command line option what your ultimate target is. For the gnu path to do the same thing either use includes to make one big file compiled to one object, or if there is a machine code level optimizer other than a few things the linker does, then that is where it would have to happen. maybe gnu has an exposed ir file format, optimizer, and ir to target tool, but I think I would have seen that by now.
http://github.com/dwelch67 a number of my projects, although very simple programs, have llvm and gnu builds for the same source files, you can see where the llvm builds I make a binary from unoptimized bytecode and also optimized bytecode (llvm's optimizer has problems with small while loops and sometimes generates non-working code, a very quick check to see if it is you or them is to try the non-optimized llvm binary and the gnu binary to see if they all behave the same (you) or if only the optimized llvm doesnt work (them)).

removing unneeded code from gcc andd mingw

i noticed that mingw adds alot of code before calling main(), i assumed its for parsing command line parameters since one of those functions is called __getmainargs(), and also lots of strings are added to the final executable, such as mingwm.dll and some error strings (incase the app crashed) says mingw runtime error or something like that.
my question is: is there a way to remove all this stuff? i dont need all these things, i tried tcc (tiny c compiler) it did the job. but not cross platform like gcc (solaris/mac)
any ideas?
thanks.
Yes, you really do need all those things. They're the startup and teardown code for the C environment that your code runs in.
Other than non-hosted environments such as low-level embedded solutions, you'll find pretty much all C environments have something like that. Things like /lib/crt0.o under some UNIX-like operating systems or crt0.obj under Windows.
They are vital to successful running of your code. You can freely omit library functions that you don't use (printf, abs and so on) but the startup code is needed.
Some of the things that it may perform are initialisation of atexit structures, argument parsing, initialisation of structures for the C runtime library, initialisation of C/C++ pre-main values and so forth.
It's highly OS-specific and, if there are things you don't want to do, you'll probably have to get the source code for it and take them out, in essence providing your own cut-down replacement for the object file.
You can safely assume that your toolchain does not include code that is not needed and could safely be left out.
Make sure you compiled without debug information, and run strip on the resulting executable. Anything more intrusive than that requires intimate knowledge of your toolchain, and can result in rather strange behaviour that will be hard to debug - i.e., if you have to ask how it could be done, you shouldn't try to do it.

Resources