something like an "extended" C string library? - c

I have used several dynamically typed languages and I have been avoiding C but enough is enough, it's the right tool for the job sometimes and I need to get over it.
The things I miss working with C are associative arrays and large string libraries. Is there a library that gives more options then string.h? Any general advice when it comes to make the transition with strings?
Thanks for reading-Patrick

You can take a look at the Better String Library. The description from the site:
The Better String Library is an
abstraction of a string data type
which is superior to the C library
char buffer string type, or C++'s
std::string. Among the features
achieved are:
Substantial mitigation
of buffer overflow/overrun problems
and other failures that result from
erroneous usage of the common C string
library functions
Significantly
simplified string manipulation
High
performance interoperability with
other source/libraries which expect
'\0' terminated char buffers
Improved
overall performance of common string
operations
Functional equivalency with
other more modern languages
The
library is totally stand alone,
portable (known to work with gcc/g++,
MSVC++, Intel C++, WATCOM C/C++, Turbo
C, Borland C++, IBM's native CC
compiler on Windows, Linux and Mac OS
X), high performance, easy to use and
is not part of some other collection
of data structures. Even the file I/O
functions are totally abstracted (so
that other stream-like mechanisms,
like sockets, can be used.)
Nevertheless, it is adequate as a
complete replacement of the C string
library for string manipulation in any
C program.

POSIX gives you <string.h>, <strings.h> and <regex.h>.
If you really need more of a string library than this, C is probably not the right tool for that particular job.
As for a hash table, you can't get a type-safe hash table in C without a lot of nasty macros.
If you're OK with just storing void-pointers, or with doing some manual work for each type of map, then you shouldn't be lacking for options. Coding your own hash table is a hoot and a half - just search Stackoverflow for help with the hash function. If you don't want to roll your own, strmap [LGPL] looks decent.

GLib provides many pre-made data structures and string handling functions, but it's a set of functions and types completely separated from the "usual" ones, and it's not a very lightweight dependency.
If instead C++ is a viable alternative for your task, it bundles a string class and several generic containers ready-made into the standard library (and much other related stuff can be found in Boost).

What specifically are you looking for in your extended c-string library?
One way to get better at C, is to create your own c-string library. Then make it open source, and let others help refine it.
I don't usually advocate creating your own string libaries, but w.r.t. C, it's a great way to learn C.

Much of the power of C consists of the ability to have direct control over the memory as a sequence of bytes. It is a bit against the philosophy of the language to treat strings as something higher-level than that.
I would recommend rolling your own very basic one. It will be an enlightening experience especially to learn pointer arithmetics and loops.
For example, learn about "Schlemiel the Painter's algorithm" regarding strcat and design your library to solve this problem.

I've not used it myself, but you should at least review the SEI/CERT library Specifications for Managed Strings, 2nd Edition. The code can be found at CERT.

An associative array associating string keys and struct values in C consists of:
A hash function for strings
An array with a prime number of elements, inside each of which is a linked-list head.
Linked-list elements containing char * pointers to the stored keys and (optionally) a struct * pointer to the corresponding value for each key.
To store a string key in your associative array:
Hash it modulo that prime array size.
In that array bin, add it to the linked-list.
Assign the value pointer to the value you are adding.

Related

c - sockets, why do ip are sent in integer format?

Question
I am wondering why do we connect to sockets by using functions like hton to take care of endianness when we could have sent the ip in plain char array.
Say we want to connect to 184.54.12.169
There is an explanation to this but I cannot figure out why we use integers instead of char, and so involving ourself in endianness hell.
I think char out_ip[] = "184.54.12.169" could have theoretically made it.
Please explain me the subtleties i don't get here.
The basic networking APIs are low level functions. These are very thin wrappers around kernel system calls. Removing these low level functions, forcing everything to use strings, would be rather bad for a low-level API like that, especially considering how tedious string handling is in C. As a concrete hurdle, even IP strings would not be fixed length, so handling them is a lot more complex than just plain 32 bit integers. And moving string handling to kernel is really quite against what kernel is supposed to be, handling arbitrary user strings is really user space problem.
So, you want to create higher-level functions which would accept strings and do the conversion in the library. But, adding such higher level "convenience" functions all over the place in the core libraries would bloat them, because certainly passing IP numbers is not the only place for such convenience. These functions would need to be maintained forever and included everywhere, after they became part of standard (official like POSIX, or de-facto) libraries.
So, removing the low-level functions is not really an option, and adding more functions for higher-level API in the same library is not a good option either.
So solution is to use another library to provide higher level networking API, which could for example handle address strings directly. Not sure what's out ther for C, but it's almost a given for other languages, which also have "real" strings built in so using them is not a hassle.
Because that's how an IP is transmitted in a packet. The "www.xxx.yyy.zzz" string form is really just a human readable form of a 4 byte integer that allows us to see the hierarchical nature a little easier. Sending a whole string would take up a lot more space as well.
Say number 127536 that requires 7 bytes not four. In addition you need to parse it.
I.e. more efficient and do not have to deal with invalid values.

Why do C written libraries use so many structs?

I've looked to some open source Libraries in some places. And, I've realized which that Libraries are basically a great stack of structs. I've seen few methods.
Why does C written libraries uses so much structs? What's the basis behind this? This, for me, looked like a attempt to simulate object orientation, 'cause a fast searching told me that each struct is "instantiated" by the using program to make something, per example, in some Desktop enviroments for linux that I've seen that each window was a struct in the used GUI library.
Anyway, the question is that.
Structs are a great way to organize data. And data is fundamental, as Fred Brooks knew decades ago:
Show me your flowcharts and conceal your tables, and I shall continue
to be mystified. Show me your tables, and I won't usually need your
flowcharts; they'll be obvious.
Object-oriented programming doesn't have to be merely simulated in C, it can be realized. For example, did you know that inside your structs you can store function pointers which operate on those same structs, and then you are a little bit closer to C++'s classes?
Also consider extensibility: even a function taking many arguments may be improved by taking a single struct, because then its signature does not need to change when a new argument is added.
Finally, C does not have multiple return values from a single function call. But it can return a struct, which is about the same thing. C is a lot about building your own tools from the raw language, and being able to stash a bunch of related data and/or functions together in one place is a good building block.
With or without object orientation, structures are a useful way to group aggregate data into a single symbol. You can copy the structure wherever you like without having to write out all the members each time, and this makes the structure easier to change if you have to.
It also makes it easier to reference certain members using pointer arithmetic, if you're careful (see sockaddr).
Same argument as with arrays.
Simply put, there's no reason not to use structures.
Structures are useful while retrieving data using a pointer. Because single pointer is enough for complete bunch of data with in a structure.
One, it keeps the APIs clean. Instead of passing N separate arguments to a function, you pass a single argument containing N members.
Two, it allows the library to hide implementation details from the programmer. For example, the C FILE type abstracts away some details of stream I/O, details which vary from implementation to implementation. We don't need to know those details, so they're not exposed to us; we just use the FILE type to pass that information around.

Why strcpy() and strcat() is not good in Embedded Domain

Here i want to know about strcpy() and strcat() disadvantages
i want to know about these functions danger area in embedded domain/environment.
somebody told me we never use strcpy,strcat and strlen functions in embedded domain because its end with null and sometimes we works on encrypted data and null character comes so we cant got actual result because these functions stop on null character.
So i want to know all things and other alternative of these functions. how we can use other alternatives functions
The str* functions works with strings. If you are dealing with strings, they're fine to use as long as you use them correctly - it's easy to create a buffer overflow if you use them incorrectly.
If you are dealing with binary data, which it sounds like you are, string handling functions are unsuitable (They're meant for strings after all, not binary data). Use mem* functions for dealing with binary data.
In C , a string is a sequence of chars that end with a nul byte. If you're dealing with binary data, there might very well be a char with the value 0 in that data, which string handling functions assume to be the end of the string, or the data does not contain any nul bytes and is not nul terminated, which will cause the string functions to run past the end of your buffer.
Well, these functions indeed copy null-terminated strings and not only in embedded domain. Depending on your need you may want to use mem* functions instead.
As others have already answered, they work fine for strings. Encrypted data can't be regarded as strings.
There is however the aspect of using any C library function in embedded systems, particularly in high-integrity real-time embedded systems, such as automotive/medical/avionics etc. On such projects, a coding standard will be used, such as MISRA-C.
The vast majority of C libraries are likely not compatible with your coding standard. And even if you have the option (at least in MISRA-C) to make deviations, you would still have to verify the whole library. For example you will have to verify the whole string.h, just because you used strlen(). Common practice in such systems is to write all functions yourself, particularly simple ones like strlen() which you can write yourself in a minute.
But most embedded systems don't have such high requirements for quality and safety, and then the library functions are to prefer. Particularly memcpy() and similar search/sort/move functions, that will likely be heavily optimized by the compiler.
If you are worried about overwriting buffers (which everybody really should be), use strncpy or strncat instead. I see no problem with strlen.
This issue is specific to the system you describe, not to embedded systems per-se. Either way the string functions are simply not suited to the application you describe. I think you should simply have been told that you can't use string functions on the encrypted data in your particular application. That is not an issue with embedded systems, or even the string library. It is entirely about the nature you your encrypted strings - they are no longer C strings once encrypted, so any string library operation would no longer be valid - it becomes just data, and it would be your responsibility to retain any necessary meta-data regarding length etc. You could use Pascal style strings to do that for example (with a suitable accompanying library).
Now in general the C string library, and C-strings themselves present a number of issues for all systems, not just embedded. See this article by Joel Spolsky to see why caution should be used when using C strings functions, especially strcat().
The reason is just what you said:
because its end with null and sometimes we works on encrypted data and null character comes so we cant got actual result because these functions stop on null character.
And for alternatives, I recommend strn* series like strncpy, strnlen. n here means the maximum possible length of string.
You may want to find a C-standard library reference and seek for some details about those strn* functions.
As others have said str* functions are for strings, not binary data.
However, I suggest that when you do come to use strings, you should consider functions such as strlcpy() instead of strcpy(), and strlcat() instead of strcat().
They're not standard functions, but you'll be able to find copies of them readily enough (or really just write your own). They take the size of the destination buffer as an extra parameter to their standard cousins and are designed to avoid buffer overflows.
It probably seems like an imposition to have to pass around the size of a pointer's block wherever you use it, but I'm afraid that's what programming in C is about.
At least until we get smarter pointers that is.

Alternative to Hash Map for Small Data set in C

I am currently working on a command line interface for a particle simulator. Its parser takes reads input in the following format:
[command] [argument]* (-[flag] [flag argument])
Currently, the command is sent through a conditional block, compared to various known commands and its corresponding data packet is sent to the matching function. This, however, seems clunky, inefficient and inelegant.
I am thinking about using a hashmap instead, with a string representation of a command as the key and a function pointer as the value. The function referenced would then be sent a data packet containing arguments, flags, etc.
Is a hash map overkill in this situation? Does the extra infrastructure required to implement one outweigh the potential benefits? I am aiming for speed, elegance, function, and, since this is an open-source project, extensibility.
Thanks for the help.
You might want to consider the Ternary Search Tree. It has good performnce, efficient use of storage; and you don't need a hash function or a collision strategy.
The linked Bentley/Sedgwick article is a very thorough-yet-readable explanation of the accompanying C source.
I've been using a TST for name-lookup in the past 3 versions of my postscript interpreter. The only changes that have been needed have been due to changes in memory management. Here's a version I modified (lightly) to use explicit pointers. I use yet another version in my postscript interpreter, any of the xpost2*.zip versions, in the file core.c, which uses byte-offsets for pointers (have to be added to the user-memory byte-pointer to yield a real pointer).
Speed gained will probably be minimal, but you could hash the command to convert it to a number and then use a switch statement. Faster than a hash map.

c99 dynamic array

I'm writing a very small, project-specific OpenGLES engine for iphone and I really need to use a good, solid, and proven dynamic array library/macro in c99 dialect. (No C++, Obj-C, stl whatsoever)
It's strongly necessary for render batch and polygon mesh, so it should be able to handle various types of data, and additionally causes minimal overhead when array size changes and new data is inserted.
I've been searching around and found two candidates for my need.
the first one is from ccCArray from Cocos2d.
and another one is utarray written by Troy D. Hanson.
ccCArray IS rock solid, thoroughly proven by community. utarray looks fine but I cannot find anyone actually uses it.
Any more suggestion?
A library ?! A C++ template would be more than suitable for this need. I'd say about AT MOST 15 functions (excluding alternative constructors and const getters), and you're done. Also able to use it for ANY type, ANY size and ANY size type (byte, int etc.) And it's just one file: a .h or, better said, a .hpp
Any reason you're rejecting it ? Seems like you want to make life harder for yourself :)

Resources