Learning C coming from managed OO languages - c

I am fairly comfortable coding in languages like Java and C#, but I need to use C for a project (because of low level OS API calls) and I am having some difficulty dealing with pointers and memory management (as seen here)
Right now I am basically typing up code and feeding it to the compiler to see if it works. That just doesn't feel right for me. Can anyone point me to good resources for me to understand pointers and memory management, coming from managed languages?

k&r - http://en.wikipedia.org/wiki/The_C_Programming_Language_(book)
nuff said

One of the good resources you found already, SO.
Of course you are compiling with all warnings on, don't you?
Learning by doing largely depends on the quality of your compiler and the warnings / errors he feeds you. The best in that respect that I found in the linux / POSIX world is clang. Nicely traces the origin of errors and tells you about missing header files quite well.

Some tips:
By default varibles are stored in the stack.
Varibles are passed into functions by Value
Stick to the same process for allocating and freeing memory. eg allocate and free in the same the function
C's equivalent of
Integer i = new Integer();
i=5;
is
int *p;
p=malloc(sizeof(int));
*p=5;
Memory Allocation(malloc) can fail, so check the pointer for null before you use it.
OS functions can fail and this can be detected by the return values.

Learn to use gdb to step through your code and print variable values (compile with -g to enable debugging symbols).
Use valgrind to check for memory leaks and other related problems (like heap corruption).

The C language doesn't do anything you don't explicitly tell it to do.
There are no destructors automatically called for you, which is both good and bad (since bugs in destructors can be a pain).
A simple way to get somewhat automatic destructor behavior is to use scoping to construct and destruct things. This can get ugly since nested scopes move things further and further to the right.
if (var = malloc(SIZE)) { // try to keep this line
use_var(var);
free(var); // and this line close and with easy to comprehend code between them
} else {
error_action();
}
return; // try to limit the number of return statements so that you can ensure resources
// are freed for all code paths
Trying to make your code look like this as much as possible will help, though it's not always possible.
Making a set of macros or inline functions that initialize your objects is a good idea. Also make another set of functions that allocate your objects' memory and pass that to your initializer functions. This allows for both local and dynamically allocated objects to easily be initialized. Similar operations for destructor-like functions is also a good idea.
Using OO techniques is good practice in many instances, and doing so in C just requires a little bit more typing (but allows for more control). Putters, getters, and other helper functions can help keep objects in consistent states and decrease the changes you have to make when you find an error, if you can keep the interface the same.
You should also look into the perror function and the errno "variabl".
Usually you will want to avoid using anything like exceptions in C. I generally try to avoid them in C++ as well, and only use them for really bad errors -- ones that aren't supposed to happen. One of the main reasons for avoiding them is that there are no destructor calls magically made in C, so non-local GOTOs will often leak (or otherwise screw up) some type of resource. That being said, there are things in C which provide a similar functionality.
The main exception like mechanism in C are the setjmp and longjmp functions. setjmp is called from one location in code and passed a (opaque) variable (jmp_buf) which can later be passed to longjmp. When a call to longjmp is made it doesn't actually return to the caller, but returns as the previously called setjmp with that jmp_buf. setjmp will return a value specified by the call to longjmp. Regular calls to setjmp return 0.
Other exception like functionality is more platform specific, but includes signals (which have their own gotchas).
Other things to look into are:
The assert macro, which can be used to cause program exit when the parameter (a logical test of some sort) fails. Calls to assert go away when you #define NDEBUG before you #include <assert.h>, so after testing you can easily remove the assertions. This is really good for testing for NULL pointers before dereferencing them, as well as several other conditions. If a condition fails assert attempts to print the source file name and line number of the failed test.
The abort function causes the program to exit with failure without doing all of the clean up that calling exit does. This may be done with a signal on some platforms. assert calls abort.

Related

is it possible to use c++ library with postgreSQL?

Currently, I working with PostgreSQL code. I want to add a feature with Postgres as part of my project. But I want to know "is it possible to add C++ library with Postgres?", so I able to use some functions from that library.thanks.
The documentation says:
Although the PostgreSQL backend is written in C, it is possible to write extensions in C++ if these guidelines are followed:
All functions accessed by the backend must present a C interface to the backend; these C functions can then call C++ functions. For example, extern C linkage is required for backend-accessed functions. This is also necessary for any functions that are passed as pointers between the backend and C++ code.
Free memory using the appropriate deallocation method. For example, most backend memory is allocated using palloc(), so use pfree() to free it. Using C++ delete in such cases will fail.
Prevent exceptions from propagating into the C code (use a catch-all block at the top level of all extern C functions). This is necessary even if the C++ code does not explicitly throw any exceptions, because events like out-of-memory can still throw exceptions. Any exceptions must be caught and appropriate errors passed back to the C interface. If possible, compile C++ with -fno-exceptions to eliminate exceptions entirely; in such cases, you must check for failures in your C++ code, e.g. check for NULL returned by new().
If calling backend functions from C++ code, be sure that the C++ call stack contains only plain old data structures (POD). This is necessary because backend errors generate a distant longjmp() that does not properly unroll a C++ call stack with non-POD objects.
In summary, it is best to place C++ code behind a wall of extern C functions that interface to the backend, and avoid exception, memory, and call stack leakage.

Microsoft C Run Time function implementation - now and before

I am trying to write some portable code and I've been thinking how did Microsoft implement old C runtime routines like gmtime or fopen, etc which were returning a pointer, opposite to todays gmtime_s or fopen_s which requires object to passed and are returning some errno status code (I guess).
One way would be to create static (better than global) object inside such routines and return pointer to it, but if one object is currently using this static pointer and another object invokes that routine, first object would get changed buffer - which is not good.
Furthermore, I doubt that such routines uses dynamic memory because that would lead to memory leaks.
As with other Microsoft stuff, implementation is not opened so that I can take a peak. Any suggestions?
Well, first, such globals and statics cannot be used anyway because of thread-safety.
The use of dynamic memory, or arrays, or arrays of handles, or other such combos DO leak resources if the programmer misuses them. On non-trivial OS, such resources are linked to the process and are released upon process termination, so it is a serious problem for the app, but not for the OS.
Regarding gmtime, you are correct; it could have operated upon a variable that has static storage duration (which is the same storage duration as variables declared "globally", btw... There is no "global" in C). Historically speaking, you should probably assume this is the case, because C doesn't require that there be any support for multithreading. If you're referring to an era where there was decent support for multithreading, it's probable that gmtime might return something that has thread specific storage duration, instead, as the MSDN documentation for gmtime says gmtime and other similar functions "... all use one common tm structure per thread for the conversion."
However, fopen is a function that creates resources, and as a result it's reasonable to expect that every return value will be unique (unless it's an erroneous return value).
Indeed, fopen does constitute dynamic management; you are expected to call fclose to close the FILE once you're done with it... If you forget to close a file every now and then, there is no need to panic, as the C standard requires that the program close all FILEs that are still open upon program termination. This implies that the program keeps track of all of your FILEs behind the scenes.
However, it would obviously be a bad practice to repeatedly leak file descriptors, over and over again, constantly, for a long period of time.
I'm not sure about the Visual Studio specifics, but these function libraries are typically implemented as opaque type. Which is why they return a pointer and why you can't know the contents of the FILE struct.
Meaning there will either be a static memory pool or a call to malloc inside the function. There are no guarantees of the C library functions are re-entrant.
Calling fopen without having a corresponding fclose might indeed create a memory leak: at any rate you have a "resource leak". Therefore, make sure that you always call fclose.
As for the implementation details: you can't have the Visual Studio source code, but you could download

Harmful C Source File Check?

Is there a way to programmatically check if a single C source file is potentially harmful?
I know that no check will yield 100% accuracy -- but am interested at least to do some basic checks that will raise a red flag if some expressions / keywords are found. Any ideas of what to look for?
Note: the files I will be inspecting are relatively small in size (few 100s of lines at most), implementing numerical analysis functions that all operate in memory. No external libraries (except math.h) shall be used in the code. Also, no I/O should be used (functions will be run with in-memory arrays).
Given the above, are there some programmatic checks I could do to at least try to detect harmful code?
Note: since I don't expect any I/O, if the code does I/O -- it is considered harmful.
Yes, there are programmatic ways to detect the conditions that concern you.
It seems to me you ideally want a static analysis tool to verify that the preprocessed version of the code:
Doesn't call any functions except those it defines and non I/O functions in the standard library,
Doesn't do any bad stuff with pointers.
By preprocessing, you get rid of the problem of detecting macros, possibly-bad-macro content, and actual use of macros. Besides, you don't want to wade through all the macro definitions in standard C headers; they'll hurt your soul because of all the historical cruft they contain.
If the code only calls its own functions and trusted functions in the standard library, it isn't calling anything nasty. (Note: It might be calling some function through a pointer, so this check either requires a function-points-to analysis or the agreement that indirect function calls are verboten, which is actually probably reasonable for code doing numerical analysis).
The purpose of checking for bad stuff with pointers is so that it doesn't abuse pointers to manufacture nasty code and pass control to it. This first means, "no casts to pointers from ints" because you don't know where the int has been :-}
For the who-does-it-call check, you need to parse the code and name/type resolve every symbol, and then check call sites to see where they go. If you allow pointers/function pointers, you'll need a full points-to analysis.
One of the standard static analyzer tool companies (Coverity, Klocwork) likely provide some kind of method of restricting what functions a code block may call. If that doesn't work, you'll have to fall back on more general analysis machinery like our DMS Software Reengineering Toolkit
with its C Front End. DMS provides customizable machinery to build arbitrary static analyzers, for the a language description provided to it as a front end. DMS can be configured to do exactly the test 1) including the preprocessing step; it also has full points-to, and function-points-to analyzers that could be used to the points-to checking.
For 2) "doesn't use pointers maliciously", again the standard static analysis tool companies provide some pointer checking. However, here they have a much harder problem because they are statically trying to reason about a Turing machine. Their solution is either miss cases or report false positives. Our CheckPointer tool is a dynamic analysis, that is, it watches the code as it runs and if there is any attempt to misuse a pointer CheckPointer will report the offending location immediately. Oh, yes, CheckPointer outlaws casts from ints to pointers :-} So CheckPointer won't provide a static diagnostic "this code can cheat", but you will get a diagnostic if it actually attempts to cheat. CheckPointer has rather high overhead (all that checking costs something) so you probably want to run you code with it for awhile to gain some faith that nothing bad is going to happen, and then stop using it.
EDIT: Another poster says There's not a lot you can do about buffer overwrites for statically defined buffers. CheckPointer will do those tests and more.
If you want to make sure it's not calling anything not allowed, then compile the piece of code and examine what it's linking to (say via nm). Since you're hung up on doing this by a "programmatic" method, just use python/perl/bash to compile then scan the name list of the object file.
There's not a lot you can do about buffer overwrites for statically defined buffers, but you could link against an electric-fence type memory allocator to prevent dynamically allocated buffer overruns.
You could also compile and link the C-file in question against a driver which would feed it typical data while running under valgrind which could help detect poorly or maliciously written code.
In the end, however, you're always going to run up against the "does this routine terminate" question, which is famous for being undecidable. A practical way around this would be to compile your program and run it from a driver which would alarm-out after a set period of reasonable time.
EDIT: Example showing use of nm:
Create a C snippet defining function foo which calls fopen:
#include <stdio.h>
foo() {
FILE *fp = fopen("/etc/passwd", "r");
}
Compile with -c, and then look at the resulting object file:
$ gcc -c foo.c
$ nm foo.o
0000000000000000 T foo
U fopen
Here you'll see that there are two symbols in the foo.o object file. One is defined, foo, the name of the subroutine we wrote. And one is undefined, fopen, which will be linked to its definition when the object file is linked together with the other C-files and necessary libraries. Using this method, you can see immediately if the compiled object is referencing anything outside of its own definition, and by your rules, can considered to be "bad".
You could do some obvious checks for "bad" function calls like network IO or assembly blocks. Beyond that, I can't think of anything you can do with just a C file.
Given the nature of C you're just about going to have to compile to even get started. Macros and such make static analysis of C code pretty difficult.

Is ARPACK thread-safe?

Is it safe to use the ARPACK eigensolver from different threads at the same time from a program written in C? Or, if ARPACK itself is not thread-safe, is there an API-compatible thread-safe implementation out there? A quick Google search didn't turn up anything useful, but given the fact that ARPACK is used heavily in large scientific calculations, I'd find it highly surprising to be the first one who needs a thread-safe sparse eigensolver.
I'm not too familiar with Fortran, so I translated the ARPACK source code to C using f2c, and it seems that there are quite a few static variables. Basically, all the local variables in the translated routines seem to be static, implying that the library itself is not thread-safe.
Fortran 77 does not support recursion, and hence a standard conforming compiler can allocate all variables in the data section of the program; in principle, neither a stack nor a heap is needed [1].
It might be that this is what f2c is doing, and if so, it might be that it's the f2c step that makes the program non thread-safe, rather than the program itself. Of course, as others have mentioned, check out for COMMON blocks as well. EDIT: Also, check for explicit SAVE directives. SAVE means that the value of the variable should be retained between subsequent invocations of the procedure, similar to static in C. Now, allocating all procedure local data in the data section makes all variables implicitly SAVE, and unfortunately, there is a lot of old code that assumes this even though it's not guaranteed by the Fortran standard. Such code, obviously, is not thread-safe. Wrt. ARPACK specifically, I can't promise anything but ARPACK is generally well regarded and widely used so I'd be surprised if it suffered from these kinds of dusty-deck problems.
Most modern Fortran compilers do use stack allocation. You might have better luck compiling ARPACK with, say, gfortran and the -frecursive option.
EDIT:
[1] Not because it's more efficient, but because Fortran was originally designed before stacks and heaps were invented, and for some reason the standards committee wanted to retain the option to implement Fortran on hardware with neither stack nor heap support all the way up to Fortran 90. Actually, I'd guess that stacks are more efficient on todays heavily cache-dependent hardware rather than accessing procedure local data that is spread all over the data section.
I have converted ARPACK to C using f2c. Whenever you use f2c and you care about thread-safety you must use the -a switch. This makes local variables have automatic storage, i.e. be stack based locals rather than statics which is the default.
Even so, ARPACK itself is decidedly not threadsafe. It uses a lot of common blocks (i.e. global variables) to preserve state between different calls to its functions. If memory serves, it uses a reverse communication interface which tends to lead developers to using global variables. And of course ARPACK probably was written long before multi-threading was common.
I ended up re-working the converted C code to systematically remove all the global variables. I created a handful of C structs and gradually moved the global variables into these structs. Finally I passed pointers to these structs to each function that needed access to those variables. Although I could just have converted each global into a parameter wherever it was needed it was much cleaner to keep them all together, contained in structs.
Essentially the idea is to convert global variables into local variables.
ARPACK uses BLAC right? Then those libraries need to be thread safe too.
I believe your idea to check with f2c might not be a bullet proof way of telling if the Fortran code is thread safe, I would guess it also depends on the Fortran compiler and libraries.
I don't know what strategy f2c uses in translating Fortran. Since ARPACK is written in FORTRAN 77, the first thing to do is check for the presence of COMMON blocks. These are global variables, and if used, the code is most likely not thread safe. The ARPACK webpage, http://www.caam.rice.edu/software/ARPACK/, says that there is a parallel version -- it seems likely that that version is threadsafe.

Should I use secure versions of POSIX functions on MSVC - C

I am writing some C code which is expected to compile on multiple compilers (at least on MSVC and GCC). Since I am beginner in C, I have all warnings turned on and warnings are treated as errors (-Werror in GCC & /WX in MSVC) to prevent me from making silly mistakes.
When I compiled some code that uses strcpy on MSVC, I get warning like,
warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
I am bit confused. Lot of common functions are deprecated on MSVC. Should I use this secured version when on Windows? If yes, should I wrap strcpy something like,
my_strcpy()
{
#ifdef WIN32
// use strcpy_s
#ELSE
// use strcpy
}
Any thoughts?
Whenever you move data between non-constant-size buffers, you have to (gasp! omg!) actually think about whether it fits. Using functions (like the MS-specific strcpy_s or the BSD strlcpy) that purport to be "safe" will protect you from some obvious buffer overflow conditions, but won't protect you from the bugs that result from string truncation. It also won't protect you from integer overflows in computing the necessary sizes of buffers.
Unless you're an expert dealing with C strings, I would recommend forgetting about special functions and commenting every line of your code that will perform variable-length/position writes with a justification for how you know, at this point in the program, that the length/offset you're about to use is within the bounds of the size of the buffer. Do this for lines where you perform arithmetic on sizes/offsets too - document how you know that the arithmetic will not overflow, and add tests for overflow if you find you don't know.
Another approach is to completely wrap all your string handling in a string object that stores the length of the buffer along with the string and automatically reallocates when a string needs to be enlarged, and then only use const char * for read-only access to strings when you need to pass them to system functions or other libraries. This will sacrifice a good bit of the performance you'd expect from C, but it will help you ensure that you don't make mistakes. Just don't take it to the extreme. There's no need to duplicate stuff like strchr, strstr, etc. in your string wrapper. Just provide methods to duplicate string objects, concatenate them, and truncate them, and then with the existing library functions that operate on const char * you can do just about anything you'd want to.
There are lots and lots of discussions about this topic here on SO. The usual suspects like strncpy, strlcpy and whatever will pop up here again, I'm sure. Just type "strcpy" in the search box and read some of the longer threads to get an overview.
My advice is: Whatever your final choice will be, it is a good idea to follow the DRY principle and continue to do it as in your example of my_strcpy(). Don't throw the raw calls all over your code, use wrappers and centralize them in your own string handling library. This will reduce overall code (boilerplate), and you have one central location to make modifications, if you change your mind later.
Of course this opens up some other cans of worms, especially for a beginner: Memory handling responsibility and interface design. Both a topic on its own, and 5 people will give you 10 suggestions of how to do it. A central library usually has the nice effect that it enforces a decision, which you will follow throughout your whole codebase, instead of using method a in module A and method b in module B, causing you trouble when you try to connect A with B...
I would tend to use the safer function snprintf † which is available on both platforms rather than having different paths depending on platform. You will need to use the define to prevent the warnings on MSVC.
† though possibly slightly less safer - it will return a string which is not nul-terminated on error, so you must check the return, but it won't cause a buffer overflow.

Resources