Creating a good interface for functions which works with paths - c

I have functions which get file path as their input argument. This functions are cross platform. Functions support both unicode and regular file paths. What is the best interface for this functions, know I have 2 chooses:
make two version of each function FunctionW and FunctionA as in WinAPI.
make one version which will get char * as input argument, but this string must be in UTF8 format.
Which one is better?
Thanks in advance!

This really depends on the rest of your code and how you're going to use them. There is no correct answer here - try to approximate the time it will take you to write, to use and to maintain each one of the options, and try to take the one where it's easier.
You should also consider the difference between FunctionA and FunctionW. If the difference isn't big, then you can likely use a single inner helper function that both of them will call, and so the extra time for writing and maintaining a second function is minimal. If it is, consider how tough it would be (if at all) to convert strings to UTF8 for the 2nd option you presented.

Related

C logging framework compile time optimization

For a certain time now, I'm looking to build a logging framework in C (not C++!), but for small microcontrollers or devices with a small footprint of some sort. For this, I've had the idea of hashing the strings that are being logged to a certain value and just saving the hashed value with the timestamp instead of the complete ASCII string. The hash can then be correlated with a 'database' file that would be generated from an external process that parses the strings out of the C source files and saves the logged strings along with the hash value.
After doing a little bit of research, this idea is not new, but I do not find an implementation of this idea in C. In other languages, this idea has been worked out, but that is not the goal of my exercise. An example may be this talk where the same concept has been worked out in C++: youtube.com/watch?v=Dt0vx-7e_B0
Some of the requirements that I've set myself for this library are the following:
as portable C code as possible
COMPILE TIME optimization/hashing for the string hash conversion, it should be equivalent to just printf("%d\n", hashed_value) for a single log statement. (Assuming no parameters/arguments for this particular logging statement).
arguments can be passed to the logging statement similar to the printf function.
user can define their own output function (being console, file descriptor, sending the data directly over an UART connection,...)
fast to run!! fast to compile is nice to have, but it should not be terribly slow.
very easy to use, no very complicated API to use the library.
But to achieve this in C, what is a good approach? I've tried several things now, but do not seem to have found a good method of achieving this.
An overview of things I've tried so far, along with the drawbacks are:
Full pre-processor string hashing: did get it working, but the compile time is terribly slow. Also, this code does not feel to be very portable over multiple C compilers.
Semi pre-processor string hashing: The idea was to generate a hash for each string and make an external header file with the defines in of each string with their hash value. The problem here is that I cannot figure out a way of converting the string to the correct define preprocessor value.
Letting go of the default logging macro with a string pointer: Instead of working with the most used method of LOG_DEBUG("Some logging statement"), converting it with an external parser to /*LOG_DEBUG("Some logging statement") */ LOG_RAW(45). This solves the problem of hashing the string since the hash will be replaced by the external parser with the correct hash, but is not the cleanest to read since the original statement will be a comment.
Also expanding this idea to take care of arguments proved to be tricky. How to take care of multiple types of variables as efficiently as possible?
I've tried some other methods but all without success. Especially when I want to add arguments to log the value of a variable, for example, it gets very complicated, and I do not get the required result...

A function returning "ENOTDIR", "EBUSY", etc. as strings?

strerror() function returns a short error description, given error number as argument. For example, if the argument is ENOTDIR, it will return "Not a directory", if the argument is EBUSY, it will return "Device or resource busy", etc.
But, is there a function that returns "ENOTDIR" for argument equals ENOTDIR, "EBUSY" for argument EBUSY, etc.?
I just want to avoid writing a huge switch statement for this purpose.
No- there is no standard or commonly used nonstandard function that provides this functionality.
One approach would be to write a huge switch statement, but this might not be the best approach for you to take. Most values of errno are not specified by any standard, so their values may or may not be consistent across different operating systems or even different versions of the same operating system.
Plus it would be a pain in the rear end.
A more elegant approach, if some runtime overhead is acceptable, would be to write a function that looks up these errors codes when they occur, rather than hard-coding the values into a big table. GNU/Linux systems have a list of all possible errno values at:
/usr/include/asm-generic/errno-base.h
/usr/include/asm-generic/errno.h
These files provide a #define of each errno value along with their value and a short description in an adjacent comment. It'd be pretty trivial to search these files line-by-line and print out the matching error code. Even if not, these files would be the things to start with in your quest to write a huge switch statement.
Beware that the kernel might negate these values when they're passed to userspace.
As David points out above, the problem is there is no standard function that can provide the desired functionality. So thinking that this would be a neat problem to try to write a script for, I wrote a little something to automatically generate the switch function (if it should come to be necessary) and posted the code here. Seems to work alright on OS X, otherwise mileage may vary. A script such as this could be added to the build process to make sure that the values were defined correctly.

The name and number of parameters

Up to this moment I know that it's quite important that the paramaters which are included in your code to have suggestive names, so that the code could be easy to read by anyone who has to read it. But ... in matters of memory, run time, how important is that the used parameters to not be too many or to have too long names? Could this be something to be aware of or is it not so important for the efficiency of the code?
The name of the parameters/arguments matters absolutely zero at runtime. The compiler does not use the names when generating object code. They will not appear in your binary unless you take special effort to get them there. They are only for the human who reads the code. As such, they should be as long and descriptive as necessary, but no longer.
On the other hand, having too many parameters can indeed have a minor effect on the runtime speed of your code, since each time that function is called, all those parameters have to be pushed. But that is really not the most significant issue. A bigger problem is usability—a function becomes very hard to understand and use [correctly] if it takes a bazillion parameters. Design your functions so that they are easy to use correctly and hard to use incorrectly. (It is also worth pointing out that a function that takes a lot of parameters is probably violating the single responsibility principle.)

What are the benefits to using BIO_printf() instead of printf()?

I have been reviewing example code for using OpenSSL and in every example I locate, the creator has chosen to use BIO_printf() to write things to stdout instead of printf().
I have taken their code, removed the openssl/bio.h header declaration, and changed all calls to BIO_printf() to regular printf() statements. The programs ran with identical results.
The problem I'm grasping with is why these coders use BIO_printf() when it takes a lot more to setup than just using printf(). You have to include another header (which will increase program size), you need to set the file pointer to the stream you want to write to. Then you can print your message to stdout. It seems a lot more complicated than using printf().
When I do a search on BIO_printf() it lists possible man pages for BIO_printf (3), but none of the pages actually contain any information!
I decided to do a benchmark test on both methods. I looped printf("Hey\n"); 1,000,000 times. Then I did it for BIO_printf(fp, "Hey\n");. I only timed the BIO_printf() statement and not the setting up of the file pointer (which would have increased the time). The difference came out to printf() being ~4.7x faster than using BIO_printf().
Why are they using it? What is the benefit? It's my understanding that in programming you either want code to be simple or efficient, and in the case of BIO_printf() it's neither.
In general, a BIO might not be writing to stdout.
You can have a BIO that writes to a file, or null, or a socket, or a network drive, or another BIO, etc.
By using the BIO_printf family, the code can easily be changed to have its output sent to a different location or another BIO which might do some further filtering and then pass the output onto wherever else.
As pointed by others, BIO can be stacked contrary to FILE. snprintf() and vnsprintf() were added in C99. OpenSSL/SSLeay is older than this. Hence, the SSLeay developpers had to write their own implementation. Unfortunately, having a little used implementation leads to the performance issues described by the OP or to CVE-2016-0799.

Alternative to Hash Map for Small Data set in C

I am currently working on a command line interface for a particle simulator. Its parser takes reads input in the following format:
[command] [argument]* (-[flag] [flag argument])
Currently, the command is sent through a conditional block, compared to various known commands and its corresponding data packet is sent to the matching function. This, however, seems clunky, inefficient and inelegant.
I am thinking about using a hashmap instead, with a string representation of a command as the key and a function pointer as the value. The function referenced would then be sent a data packet containing arguments, flags, etc.
Is a hash map overkill in this situation? Does the extra infrastructure required to implement one outweigh the potential benefits? I am aiming for speed, elegance, function, and, since this is an open-source project, extensibility.
Thanks for the help.
You might want to consider the Ternary Search Tree. It has good performnce, efficient use of storage; and you don't need a hash function or a collision strategy.
The linked Bentley/Sedgwick article is a very thorough-yet-readable explanation of the accompanying C source.
I've been using a TST for name-lookup in the past 3 versions of my postscript interpreter. The only changes that have been needed have been due to changes in memory management. Here's a version I modified (lightly) to use explicit pointers. I use yet another version in my postscript interpreter, any of the xpost2*.zip versions, in the file core.c, which uses byte-offsets for pointers (have to be added to the user-memory byte-pointer to yield a real pointer).
Speed gained will probably be minimal, but you could hash the command to convert it to a number and then use a switch statement. Faster than a hash map.

Resources