C library for parsing query strings? - c

I'm writing a C/CGI web application. Is there a library to parse a query string into something like a GHashTable? I could write my own but it certainly doesn't seem to be worth the effort to reinvent the wheel.

The uriparser library can parse query strings into key-value pairs.

If you're really writing C and the set of keys is known, you'd do much better to store the values in a struct instead of some bloated hash table. Have a static const table of:
key name
type (integer/string is probably sufficient)
offset in struct (using offsetof macro)
and use that to parse the query string and fill in the struct.

You may find usefull the ap_getword functions family from the Apache Portable Runtime (apr) library.
Most of the string parsing routines belong to the ap_getword* family, which together provide functionality similar to the Perl split() function. Each member of this family is able to extract a word from a string, splitting the text on delimiters such as whitespace or commas. Unlike Perl split(), in which the entire string is split at once and the pieces are returned in a list, the ap_getword* functions operate on one word at a time. The function returns the next word each time it's called and keeps track of where it's been by bumping up a pointer.
You may search google for "ap_getword(r->pool" "httpd.h" and find some relevant source code to learn from.

Related

C logging framework compile time optimization

For a certain time now, I'm looking to build a logging framework in C (not C++!), but for small microcontrollers or devices with a small footprint of some sort. For this, I've had the idea of hashing the strings that are being logged to a certain value and just saving the hashed value with the timestamp instead of the complete ASCII string. The hash can then be correlated with a 'database' file that would be generated from an external process that parses the strings out of the C source files and saves the logged strings along with the hash value.
After doing a little bit of research, this idea is not new, but I do not find an implementation of this idea in C. In other languages, this idea has been worked out, but that is not the goal of my exercise. An example may be this talk where the same concept has been worked out in C++: youtube.com/watch?v=Dt0vx-7e_B0
Some of the requirements that I've set myself for this library are the following:
as portable C code as possible
COMPILE TIME optimization/hashing for the string hash conversion, it should be equivalent to just printf("%d\n", hashed_value) for a single log statement. (Assuming no parameters/arguments for this particular logging statement).
arguments can be passed to the logging statement similar to the printf function.
user can define their own output function (being console, file descriptor, sending the data directly over an UART connection,...)
fast to run!! fast to compile is nice to have, but it should not be terribly slow.
very easy to use, no very complicated API to use the library.
But to achieve this in C, what is a good approach? I've tried several things now, but do not seem to have found a good method of achieving this.
An overview of things I've tried so far, along with the drawbacks are:
Full pre-processor string hashing: did get it working, but the compile time is terribly slow. Also, this code does not feel to be very portable over multiple C compilers.
Semi pre-processor string hashing: The idea was to generate a hash for each string and make an external header file with the defines in of each string with their hash value. The problem here is that I cannot figure out a way of converting the string to the correct define preprocessor value.
Letting go of the default logging macro with a string pointer: Instead of working with the most used method of LOG_DEBUG("Some logging statement"), converting it with an external parser to /*LOG_DEBUG("Some logging statement") */ LOG_RAW(45). This solves the problem of hashing the string since the hash will be replaced by the external parser with the correct hash, but is not the cleanest to read since the original statement will be a comment.
Also expanding this idea to take care of arguments proved to be tricky. How to take care of multiple types of variables as efficiently as possible?
I've tried some other methods but all without success. Especially when I want to add arguments to log the value of a variable, for example, it gets very complicated, and I do not get the required result...

Can I get a chars_ptr from an Unbounded_String with minimal copying?

I have an Ada.Strings.Unbounded.Unbounded_String, which I'd like to pass to a C function which takes a char * (Interfaces.C.chars_ptr in Ada).
My current implementation involves copying the unbounded string to a fixed string, then copying the fixed string to a chars_ptr.
F_String : String := To_String (UB_String);
C_String : chars_ptr := New_String (F_String);
Notice that this copies twice, when all in the end I would just be calling
Imported_Function (C_String);
Free (C_String);
And discarding F_String.
Can I achieve the desired behaviour (passing the string held in an Unbounded String to an imported C function) using minimal copying? Ideally it would be cool to pass the internal buffer of UB_String straight to the function, but this probably wouldn't be portable. Can I at least achieve the same behaviour without needing the intermediate fixed string?
I saw that GNAT has the internal package Ada.Strings.Unbounded.Aux which allows you to get an access to the internal buffer of the Unbounded String, but this isn't portable and could potentially break between versions of GNAT.
String manipulation in ada is a pain and there is not much you can do besides avoid using unbounded strings in the first place (maybe only use them when you need a string field in a record?).
with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;
with Interfaces.C; use Interfaces.C;
...
procedure Imported_Function(Item : in access Interfaces.C.Char);
pragma Import(C, Imported_Function, "...");
Item : aliased Interfaces.C.Char_Array := To_C(To_String(UB_String)); -- no need to free
begin
Imported_Function(Item(Item'first)'access);
...
The short and disappointing answer is: Not in a portable way.
But you have some options available:
Drop the portability and hook into the internals of the specific version of the standard library coming with your compiler.
Write your own string type, which lives up to your requirements.
Accept the extra copy until you can see that it is a problem for your application.
Ad. 1:
It seems like you have found out how to hook into the GNAT Ada.Strings.Unbounded package.
Ad. 2:
Consider how "unbounded" your string type really has to be. Do you know an almost certain upper bound on the length? Would it be unreasonable to allocate twice as many VM pages per string as you actually need in the worst known case? What is an acceptable action in case you haven't allocated enough memory? Failing? Allocating more and copying? Creating a linked list and then having to make a copy for "export" purposes? It is not all that difficult to create custom types in Ada, so consider that, if you have some special requirements (as it sounds like in this case).

Standard (or convenient) method to read and write tabular data to a text file in c

This might sound rather awkward, but I want to ask if there is a commonly practiced way of storing tabular data in a text file to be read and written in C.
Like in python you can load a full text file nto an array by f.readlines then go through all the lines and split each line by a specific character or sequence of characters (delimiter).
How do you approach this problem in C?
Pretty much the same way you would in any other language. Pick a field separator (I.E., tab character), open the text file for reading and parse each line.
Of course, in C it will never be as easy as it is in Python, but approaches are similar.
Whoa. I am a bit baffled by the other answers which make me feel like I'm on Mainframes.stackexchange.com instead of stackoverflow.com
Why don't you pick a modern data format like JSON or XML and follow best practices for the data format of your choice?
If you want a good JSON reader/writer for C, I've used Jansson, and it's very easy and fast.
If you want a good XML reader/writer for C, I've used miniXML and it's also easy and fast. Also has SAX *and * DOM support depending on how you want to read in the XML.
Obviously there are a wealth of other libraries available as well.
Please don't give the next guy to come along and support your program some wacky custom file format to deal with.
I find getline() and strtok() to be quite convenient (getline was a gnu extension, standardized in POSIX.1-2008).
There's a handful of mechanisms, but there's a reason why scripting languages have become so popular over the least twenty years -- some of the tasks that seem simple in scripting languages are ponderous in C.
You could use flex and bison to write a parser for your tables. This really only works if the format is very well defined and "static". They're amazing tools that can do more than you might suspect, but it is very heavy machinery for what could be done simply with a split() in a scripting language.
You could read individual fields using getdelim(3). However, this was only standardized with POSIX.1-2008, so this is far from ubiquitous. (Every Linux machine with glibc should have them.)
You could read lines with fgets(3) and discover the split locations using strchr(3).
You could read lines with fgets(3) and use strtok(3) to tokenize strings.
You can use scanf(3) to perform input and scanning in one go; it seems from the questions here that scanf(3) is difficult to use correctly.
You could use character-at-a-time parsing approaches: read characters using getc(3), inspect it, do something with it, iterate until no more characters.

Alternative to Hash Map for Small Data set in C

I am currently working on a command line interface for a particle simulator. Its parser takes reads input in the following format:
[command] [argument]* (-[flag] [flag argument])
Currently, the command is sent through a conditional block, compared to various known commands and its corresponding data packet is sent to the matching function. This, however, seems clunky, inefficient and inelegant.
I am thinking about using a hashmap instead, with a string representation of a command as the key and a function pointer as the value. The function referenced would then be sent a data packet containing arguments, flags, etc.
Is a hash map overkill in this situation? Does the extra infrastructure required to implement one outweigh the potential benefits? I am aiming for speed, elegance, function, and, since this is an open-source project, extensibility.
Thanks for the help.
You might want to consider the Ternary Search Tree. It has good performnce, efficient use of storage; and you don't need a hash function or a collision strategy.
The linked Bentley/Sedgwick article is a very thorough-yet-readable explanation of the accompanying C source.
I've been using a TST for name-lookup in the past 3 versions of my postscript interpreter. The only changes that have been needed have been due to changes in memory management. Here's a version I modified (lightly) to use explicit pointers. I use yet another version in my postscript interpreter, any of the xpost2*.zip versions, in the file core.c, which uses byte-offsets for pointers (have to be added to the user-memory byte-pointer to yield a real pointer).
Speed gained will probably be minimal, but you could hash the command to convert it to a number and then use a switch statement. Faster than a hash map.

something like an "extended" C string library?

I have used several dynamically typed languages and I have been avoiding C but enough is enough, it's the right tool for the job sometimes and I need to get over it.
The things I miss working with C are associative arrays and large string libraries. Is there a library that gives more options then string.h? Any general advice when it comes to make the transition with strings?
Thanks for reading-Patrick
You can take a look at the Better String Library. The description from the site:
The Better String Library is an
abstraction of a string data type
which is superior to the C library
char buffer string type, or C++'s
std::string. Among the features
achieved are:
Substantial mitigation
of buffer overflow/overrun problems
and other failures that result from
erroneous usage of the common C string
library functions
Significantly
simplified string manipulation
High
performance interoperability with
other source/libraries which expect
'\0' terminated char buffers
Improved
overall performance of common string
operations
Functional equivalency with
other more modern languages
The
library is totally stand alone,
portable (known to work with gcc/g++,
MSVC++, Intel C++, WATCOM C/C++, Turbo
C, Borland C++, IBM's native CC
compiler on Windows, Linux and Mac OS
X), high performance, easy to use and
is not part of some other collection
of data structures. Even the file I/O
functions are totally abstracted (so
that other stream-like mechanisms,
like sockets, can be used.)
Nevertheless, it is adequate as a
complete replacement of the C string
library for string manipulation in any
C program.
POSIX gives you <string.h>, <strings.h> and <regex.h>.
If you really need more of a string library than this, C is probably not the right tool for that particular job.
As for a hash table, you can't get a type-safe hash table in C without a lot of nasty macros.
If you're OK with just storing void-pointers, or with doing some manual work for each type of map, then you shouldn't be lacking for options. Coding your own hash table is a hoot and a half - just search Stackoverflow for help with the hash function. If you don't want to roll your own, strmap [LGPL] looks decent.
GLib provides many pre-made data structures and string handling functions, but it's a set of functions and types completely separated from the "usual" ones, and it's not a very lightweight dependency.
If instead C++ is a viable alternative for your task, it bundles a string class and several generic containers ready-made into the standard library (and much other related stuff can be found in Boost).
What specifically are you looking for in your extended c-string library?
One way to get better at C, is to create your own c-string library. Then make it open source, and let others help refine it.
I don't usually advocate creating your own string libaries, but w.r.t. C, it's a great way to learn C.
Much of the power of C consists of the ability to have direct control over the memory as a sequence of bytes. It is a bit against the philosophy of the language to treat strings as something higher-level than that.
I would recommend rolling your own very basic one. It will be an enlightening experience especially to learn pointer arithmetics and loops.
For example, learn about "Schlemiel the Painter's algorithm" regarding strcat and design your library to solve this problem.
I've not used it myself, but you should at least review the SEI/CERT library Specifications for Managed Strings, 2nd Edition. The code can be found at CERT.
An associative array associating string keys and struct values in C consists of:
A hash function for strings
An array with a prime number of elements, inside each of which is a linked-list head.
Linked-list elements containing char * pointers to the stored keys and (optionally) a struct * pointer to the corresponding value for each key.
To store a string key in your associative array:
Hash it modulo that prime array size.
In that array bin, add it to the linked-list.
Assign the value pointer to the value you are adding.

Resources