Safer Alternatives to the C Standard Library - c

The C standard library is notoriously poor when it comes to I/O safety. Many functions have buffer overflows (gets, scanf), or can clobber memory if not given proper arguments (scanf), and so on. Every once in a while, I come across an enterprising hacker who has written his own library that lacks these flaws.
What are the best of these libraries you have seen? Have you used them in production code, and if so, which held up as more than hobby projects?

I use GLib library, it has many good standard and non standard functions.
See https://developer.gnome.org/glib/stable/
and maybe you fall in love... :)
For example:
https://developer.gnome.org/glib/stable/glib-String-Utility-Functions.html#g-strdup-printf
explains that g_strdup_printf is:
Similar to the standard C sprintf() function but safer, since it calculates the maximum space required and allocates memory to hold the result.

This isn't really answering your question about the safest libraries to use, but most functions that are vulnerable to buffer overflows that you mentioned have safer versions which take the buffer length as an argument to prevent the security holes that are opened up when the standard methods are used.
Unless you have relaxed the level of warnings, you will usually get compiler warnings when you use the deprecated methods, suggesting you use the safer methods instead.

I believe the Apache Portable Runtime (apr) library is safer than the standard C library. I use it, well, as part of an apache module, but also for independent processes.

For Windows there is a 'safe' C/C++ library.

You're always at liberty to implement any library you like and to use it - the hard part is making sure it is available on the platforms you need your software to work on. You can also use wrappers around the standard functions where appropriate.
Whether it is really a good idea is somewhat debatable, but there is TR24731 published by the C standard committee - for a safer set of C functions. There's definitely some good stuff in there. See this question: Do you use the TR 24731 Safe Functions in your C code?, which includes links to the technical report.

Maybe the first question to ask is if your really need plain C? (maybe a language like .net or java is an option - then e.g. buffer overflows are not really a problem anymore)
Another option is maybe to write parts of your project in C++ if other higher level languages are not an option. You can then have a C interface which encapsulates the C++ code if you really need C.
Because if you add all the advanced functions the C++ standard library has build in - your C code would only be marginally faster most times (and contain a lot more bugs than an existing and tested framework).

Related

Stand-alone portable snprintf(), independent of the standard library?

I am writing code for a target platform with NO C-runtime. No stdlib, no stdio. I need a string formatting function like snprintf but that should be able to run without any dependencies, not even the C library.
At most it can depend on memory alloc functions provided by me.
I checked out Trio but it needs stdio.h header. I can't use this.
Edit
Target platform : PowerPC64 home made OS(not by me). However the library shouldn't rely on OS specific stuff.
Edit2
I have tried out some 3rd-party open source libs, such as Trio(http://daniel.haxx.se/projects/trio/), snprintf and miniformat(https://bitbucket.org/jj1/miniformat/src) but all of them rely on headers like string.h, stdio.h, or(even worse) stdlib.h. I don't want to write my own implementation if one already exists, as that would be time-wasting and bug-prone.
Try using the snprintf implementation from uclibc. This is likely to have the fewest dependencies. A bit of digging shows that snprintf is implemented in terms of vsnprintf which is implemented in terms of vfprintf (oddly enough), it uses a fake "stream" to write to string.
This is a pointer to the code: http://git.uclibc.org/uClibc/tree/libc/stdio/_vfprintf.c
Also, a quick google search also turned up this:
http://www.ijs.si/software/snprintf/
http://yallara.cs.rmit.edu.au/~aholkner/psnprintf/psnprintf.html
http://www.jhweiss.de/software/snprintf.html
Hopefully one is suitable for your purposes. This is likely to not be a complete list.
There is a different list here:
http://trac.eggheads.org/browser/trunk/src/compat/README.snprintf?rev=197
You will probably at least need stdarg.h or low level knowledge of the specific compiler/architecture calling convention in order to be able to process the variadic arguments.
I have been using code based on Kustaa Nyholm's implementation It provides printf() (with user supplied character output stub) and sprintf(), but adding snprintf() would be simple enough. I added vprintf() and vsprintf() for example in my implementation.
No dynamic memory application is required, but it does have a dependency on stdarg.h, but as I said, you are unlikely to be able to get away without that for any variadic function - though you could potentially implement your own.
I am guessing you are in a norming enivronment where you need to explicitly document and verify COTS code.
However, I think in the case of stdarg.h this is worthwhile. You could pull in the source for just this and treat it like handwritten code (review, lint, unit-test, etc.). Any self-written replacement will be a lot of work, probably less stable and absolutely not portable.
That said, the actual snprintf implementation should not be too hard, and you could do this yourself, probably. Especially if you might be able to strip a few features away.
Keep in mind that vararg code has no typechecking and is prone to errors. For library snprintf you may find gcc's warnings helpful.

Why there is so many functions in string.h library that are "not recommended for use"?

There is something I try to understand about C origins, why there are functions that are not recommended for use in most of SO questions. Like strtok or strncpy, they are simply not safe to work with. Evrywhere I see recomendations to write my own implementation. Why wouldn't the standard change strncpy for example to BSD strlcpy, but is left instead with these "monsters"?
C is a product of the early 1970s, and it shows. Many of the iffier library functions were written when the C user community was very small and limited to academia, most of whom were experienced programmers.
By the time the first standard was released in 1989, those original library functions were already entrenched in 10 to 15 years' worth of legacy code (not the least of which was the Unix operating system and most of its tools). The committee in charge of standardization was loath to break the existing codebase, so those functions were incorporated into the standard pretty much as-is; all that really changed was adding prototype syntax to the declarations and changing char * to void * where necessary (malloc, memcpy, memset, etc.).
AFAIK, only one library function has actually been removed from the language since standardization - gets. The mayhem caused by that one library call is scarier than the prospect of breaking what is by now almost 40 years' worth of legacy code.
There is a LOT of legacy "C" and "C++" code out there. If they removed all the "unsafe" functions from the "C" runtime libraries, it would be prohibitive for many developers to upgrade their compilers because all the old code wouldn't build any more.
Sometimes they will give "deprecated" compiler messages (MSFT is fond of this) so you will find and change to using the new, safer functions.
New code should use the "safe" functions, of course, but many of us are stuck with old compilers and legacy code to maintain :)
They still exist because of historical ancestral relationship with the "old system" / "codes" that still use them - i.e. to support "Backward Compatibility"
Own implementation is suggested to make the programmer use their own logic at their own risk as no one can know much better about their environment then the programmer himself, as for example, strtok is not thread safe.
It's all just dogma. Use the functions just be aware that they're indifferent to your goals in that they might not work in all circumstances (ie strtok and multi-threading) or they expect conditions to be caught before/after usage (ie strncpy and missing termination characters).

Is glib usable in an unobtrusive way?

I was looking for a good general-purpose library for C on top of the standard C library, and have seen several suggestions to use glib. How 'obtrusive' is it in your code? To explain what I mean by obtrusiveness, the first thing I noticed in the reference manual is the basic types section, thinking to myself, "what, am I going to start using gint, gchar, and gprefixing geverything gin gmy gcode gnow?"
More generally, can you use it only locally without other functions or files in your code having to be aware of its use? Does it force certain assumptions on your code, or constraints on your compilation/linking process? Does it take up a lot of memory in runtime for global data structures? etc.
The most obtrustive thing about glib is that any program or library using it is non-robust against resource exhaustion. It unconditionally calls abort when malloc fails and there's nothing you can do to fix this, as the entire library is designed around the concept that their internal allocation function g_malloc "can't fail"
As for the ugly "g" types, you definitely don't need any casts. The types are 100% equivalent to the standard types, and are basically just cruft from the early (mis)design of glib. Unfortunately the glib developers lack much understanding of C, as evidenced by this FAQ:
Why use g_print, g_malloc, g_strdup and fellow glib functions?
"Regarding g_malloc(), g_free() and siblings, these functions are much safer than their libc equivalents. For example, g_free() just returns if called with NULL.
(Source: https://developer.gnome.org/gtk-faq/stable/x908.html)
FYI, free(NULL) is perfectly valid C, and does the exact same thing: it just returns.
I have used GLib professionally for over 6 years, and have nothing but praise for it. It is very light-weight, with lots of great utilities like lists, hashtables, rand-functions, io-libraries, threads/mutexes/conditionals and even GObject. All done in a portable way. In fact, we have compiled the same GLib-code on Windows, OSX, Linux, Solaris, iOS, Android and Arm-Linux without any hiccups on the GLib side.
In terms of obtrusiveness, I have definitely "bought into the g", and there is no doubt in my mind that this has been extremely beneficial in producing stable, portable code at great speed. Maybe specially when it comes to writing advanced tests.
And if g_malloc don't suit your purpose, simply use malloc instead, which of course goes for all of it.
Of course you can "forget about it elsewhere", unless of course those other places somehow interact with glib code, then there's a connection (and, arguable, you're not really "elsewhere").
You don't have to use the types that are just regular types with a prepended g (gchar, gint and so on); they're guaranteed to be the same as char, int and so on. You never need to cast to/from gint for instance.
I think the intention is that application code should never use gint, it's just included so that the glib code can be more consistent.

What libraries would be useful for implementing a small language interpreter in C?

For my own learning experience, I want to try writing an interpreter for a simple programming language in C – the main thing I think I need is a hash table library, but a general purpose collection of data structures and helper functions would be pretty helpful. What would you guys recommend?
libbasekit - by the author of Io. You can also use libcoroutine.
One library I recommend looking into is libgc, a garbage collector for C.
You use it by replacing calls to malloc, realloc, strdup, etc. with their libgc counterparts (e.g. GC_MALLOC). It works by scanning the stack, global variables, and GC-allocated blocks, looking for numbers that might be pointers. Believe it or not, it actually performs quite well (almost on par with the very good ptmalloc, which is the default (non-garbage collected) malloc implementation in GNU/Linux), and a lot of programs use it (including Mono and GCJ). A disadvantage, though, is it might not play well with other libraries you may want to use, and you may even have to recompile some of them by hand to replace calls to malloc with GC_MALLOC.
Honestly - and I know some people will hate me for it - but I recommend you use C++. You don't have to bust a gut to learn it just to be able to start your project. Just use it like C, but in an hour you can learn how to use std::map<> (an associative container), std::string for easy textual data handling, and std::vector<> for a resizable heap-allocated array. If you want to spend an extra hour or two, learn to put member functions in classes (don't worry about polymorphism, virtual functions etc. to begin with), and you'll get a more organised program.
You need no more than the standard library for a suitably small language with simple constructs. The most complex part of an interpreted language is probably expression evaluation. For both that, procedure-calling, and construct-nesting you will need to understand and implement stack data structures.
The code at the link above is C++, but the algorithm is described clearly and you could re-implement it easily in C. There again there are few valid arguments for not using C++ IMO.
Before diving into what libraries to use I suggest you learn about grammars and compiler design. Especially input parsing is for compilers and interpreters similar, that is tokenizing and parsing. The process of tokenizing converts a stream characters (your input) into a stream of tokens. A parser takes this stream of tokens and matches it with your grammar.
You don't mention what language you're writing an interpreter for. But very likely that language contains recursion. In that case you need to use a so-called bottom-up parser which you cannot write by hand but needs to be generated. If you try write such a parser by hand you will end up with a error-prone mess.
If you're developing for a posix platform then you can use lex and yacc. These tools are a bit old but very powerful for building parsers. Lex can generate code that implements the tokenizing process and yacc can generate a bottom-up parser.
My answer probably raises more questions than it answers. That's because the field of compilers/interpreters is quite complex and cannot simply be explained in a short answer. Just get a good book on compiler design.

A pure bytes version of strstr?

Is there a version of strstr that works over a fixed length of memory that may include null characters?
I could phrase my question like this:
strncpy is to memcpy as strstr is to ?
memmem, unfortunately it's GNU-specific rather than standard C. However, it's open-source so you can copy the code (if the license is amenable to you).
Not in the standard library (which is not that large, so take a look). However writing your own is trivial, either directly byte by byte or using memchr() followed by memcmp() iteratively.
In the standard library, no. However, a quick google search for "safe c string library" turns up several potentially useful results. Without knowing more about the task you are trying to perform, I cannot recommend any particular third-party implementation.
If this is the only "safe" function that you need beyond the standard functions, then it may be best to roll your own rather than expend the effort of integrating a third-party library, provided you are confident that you can do so without introducing additional bugs.

Resources