Looking for code to benchmark C lib string and memory functions - benchmarking

I'm looking for existing code I can use to benchmark C lib memory and string functions like memcpy, memset, strcpy, strcmp, etc. I've done a google search and there are several hits for people who have done such benchmarking but everybody seems to write their own code or they don't mention what they used?
There is cachebench which has test for memset and memcpy. I'm wonder if there are any other popular benchmark suites with such tests? I don't want to reinvent the wheel here. Thanks.

Gnu.org provides a program which benchmarks:
strcpy
strcat
strlen
strcmp
strrchr
strstr

Related

Portable way to use the regex(3) functions on a wide char string in C

There are functions like regwcomp(3) etc. on some systems, but this does not seem to be a portable solution at the moment. When there is a wchar_t string, what is the suggested portable solution (not Linux or GNU specific) to use the regex(3) functions (which normally work with char strings only)? In my case it is not really necessary that the pattern or text to match is non-7-bit ASCII, the problem is that the code used wchar_t for other reasons.
If anyone else has this problem, feel free to borrow the functions my_regwcomp and my_regwexec that I had to write recently. You can find them in this source file in the ProofPower system. These functions simulate the regwcomp and regwexec functions of Free BSD using the POSIX regcomp and regexec functions.
PS: my code is part of a Motif application, if you replace XtMalloc, XtRealloc and XtFree by malloc, ralloc and free it should work in any standard C/C++ development framework. Please add a comment to this answer if you need any help getting my functions working in your environment.

Linux kernel mode string copy

After few minutes of search for string copy on kernel mode, thought of posting a new question.
What utility will help me to do a string copy when the code runs in Kernel Mode ?
I'm sure there should be something there and in my current project I see code that uses memcpy() that does the function of strcpy().
Yes, I agree that it's just a single line code for strcpy(), but you know that those may not be accepted in code reviews :)
EDIT : I'll put my question in a better way,
Does strcpy() be used in kernel mode ? (say linux kernel 2.6 and after) If so, is it using the libc or some other util ?
PS:
I see strcpy() http://livegrep.com/search/linux?q=strcpy is used in kernel source code.
I'm not sure exactly what you're looking for as an answer... But the linux kernel provides strcpy() and memcpy().
If you're asking about how strcpy() and friends are recommended against in code reviews, you could use strncpy(). Most of the traditional C string functions are defined, and most are defined in include/linux/string.h
In the Linux source tree, various utility functions are in the lib sub-directory. There is the string.c file where functions like strcpy(), strcasecmp() and so on are provided.
But if you want a robust code which passes code reviews, you should use the robust versions of some services when it is possible. The Documentation in the source code tree specifies the functions which are not safe and the proposed replacements:
. Use strscpy() instead of strcpy(), strncpy() and strlcpy()
. Use kstrtol(), kstrtoll(), kstrtoul(), kstrtoull() respectively instead of simple_strtol(), simple_strtoll(), simple_strtoul() and simple_strtoull()

Which string manipulation functions should I use?

On my Windows/Visual C environment there's a wide number of alternatives for doing the same basic string manipulation tasks.
For example, for doing a string copy I could use:
strcpy, the ANSI C standard library function (CRT)
lstrcpy, the version included in kernel32.dll
StrCpy, from the Shell Lightweight Utility library
StringCchCopy/StringCbCopy, from a "safe string" library
strcpy_s, security enhanced version of CRT
While I understand that all these alternatives have an historical reason, can I just choose a consistent set of functions for new code? And which one? Or should I choose the most appropriate function case by case?
First of all, let's review pros and cons of each function set:
ANSI C standard library function (CRT)
Functions like strcpy are the one and only choice if you are developing portable C code. Even in a Windows-only project, it might be a wise thing to have a separation of portable vs. OS-dependent code.
These functions have often assembly level optimization and are therefore very fast.
There are some drawbacks:
they have many limitations and therefore often you still have to call functions from other libraries or provide your own versions
there are some archaisms like the infamous strncpy
Kernel32 string functions
Functions like lstrcpy are exported by kernel32 and should be used only when trying to avoid any dependency to the CRT. You might want to do that for two reasons:
avoiding the CRT payload for an ultra lightweight executable (unusual these days but not in the 90s!)
avoiding initialization issues (if you launch a thread with CreateThread instead of _beginthread).
Moreover, the kernel32 function could be more optimized that the CRT version: when your executable will run on Windows 12 optimized for a Core i13, kernel32 could use an assembly-optimized version.
Shell Lightweight Utility Functions
Here are valid the same considerations made for the kernel32 functions, with the added value of some more complex functions. However I doubt that they are actively maintained and I would just skip them.
StrSafe Function
The StringCchCopy/StringCbCopy functions are usually my personal choice: they are very well designed, powerful, and surprisingly fast (I also remember a whitepaper that compared performance of these functions to the CRT equivalents).
Security-Enhanced CRT functions
These functions have the undoubted benefit of being very similar to ANSI C equivalents, so porting legacy code is a piece of cake. I especially like the template-based version (of course, available only when compiling as C++). I really hope that they will be eventually standardized. Unfortunately they have a number of drawbacks:
although a proposed standard, they have been basically rejected by the non-Windows community (probably just because they came from Microsoft)
when fail, they don't just return an error code but execute an invalid parameter handler
Conclusions
While my personal favorite for Windows development is the StrSafe library, my advice is to use the ANSI C functions whenever is possible, as portable-code is always a good thing.
In the real life, I developed a personalized portable library, with prototypes similar to the Security-Enhanced CRT functions (included the powerful template based technique), that relies on the StrSafe library on Windows and on the ANSI C functions on other platforms.
My personal preference, for both new and existing projects, are the StringCchCopy/StringCbCopy versions from the safe string library. I find these functions to be overall very consistent and flexible. And they were designed from the groupnd up with safety / security in mind.
I'd answer this question slightly different. Do you want to have portable code or not? If you want to be portable you can not rely on anything else but strcpy, strncpy, or the standard wide character "string" handling functions.
Then if your code just has to run under Windows you can use the "safe string" variants.
If you want to be portable and still want to have some extra safety, than you should check cross-platform libraries like e.g
glib or
libapr
or other "safe string libraries" like e.g:
SafeStrLibrary
I would suggest using functions from the standard library, or functions from cross-platform libraries.
I would stick to one, I would pick whichever one is in the most useful library in case you need to use more of it, and I would stay away from the kernel32.dll one as it's windows only.
But these are just tips, it's a subjective question.
Among those choices, I would simply use strcpy. At least strcpy_s and lstrcpy are cruft that should never be used. It's possibly worthwhile to investigate those independently written library functions, but I'd be hesitant to throw around nonstandard library code as a panacea for string safety.
If you're using strcpy, you need to be sure your string fits in the destination buffer. If you just allocated it with size at least strlen(source)+1, you're fine as long as the source string is not simultaneously subject to modification by another thread. Otherwise you need to test if it fits in the buffer. You can use interfaces like snprintf or strlcpy (nonstandard BSD function, but easy to copy an implementation) which will truncate strings that don't fit in your destination buffer, but then you really need to evaluate whether string truncation could lead to vulnerabilities in itself. I think a much better approach when testing whether the source string fits is to make a new allocation or return an error status rather than performing blind truncation.
If you'll be doing a lot of string concatenation/assembly, you really should write all your code to manage the length and current position as you go. Instead of:
strcpy(out, str1);
strcat(out, str2);
strcat(out, str3);
...
You should be doing something like:
size_t l, n = outsize;
char *s = out;
l = strlen(str1);
if (l>=outsize) goto error;
strcpy(s, str1);
s += l;
n -= l;
l = strlen(str2);
if (l>=outsize) goto error;
strcpy(s, str2);
s += l;
n -= l;
...
Alternatively you could avoid modifying the pointer by keeping a current index i of type size_t and using out+i, or you could avoid the use of size variables by keeping a pointer to the end of the buffer and doing things like if (l>=end-s) goto error;.
Note that, whichever approach you choose, the redundancy can be condensed by writing your own (simple) functions that take pointers to the position/size variable and call the standard library, for instance something like:
if (!my_strcpy(&s, &n, str1)) goto error;
Avoiding strcat also has performance benefits; see Schlemiel the Painter's algorithm.
Finally, you should note that a good 75% of the string copying and assembly people perform in C is utterly useless. My theory is that the people doing it come from backgrounds in script languages where putting together strings is what you do all the time, but in C it's not useful that often. In many cases, you can get by with never copying strings at all, using the original copies instead, and get much better performance and simpler code at the same time. I'm reminded of a recent SO question where OP was using regexec to match a regular expression, then copying out the result just to print it, something like:
char *tmp = malloc(match.end-match.start+1);
memcpy(tmp, src+match.start, match.end-match.start);
tmp[match.end-match.start] = 0;
printf("%s\n", tmp);
free(tmp);
The same thing can be accomplished with:
printf("%.*s\m", match.end-match.start, src+match.start);
No allocations, no cleanup, no error cases (the original code crashed if malloc failed).

A pure bytes version of strstr?

Is there a version of strstr that works over a fixed length of memory that may include null characters?
I could phrase my question like this:
strncpy is to memcpy as strstr is to ?
memmem, unfortunately it's GNU-specific rather than standard C. However, it's open-source so you can copy the code (if the license is amenable to you).
Not in the standard library (which is not that large, so take a look). However writing your own is trivial, either directly byte by byte or using memchr() followed by memcmp() iteratively.
In the standard library, no. However, a quick google search for "safe c string library" turns up several potentially useful results. Without knowing more about the task you are trying to perform, I cannot recommend any particular third-party implementation.
If this is the only "safe" function that you need beyond the standard functions, then it may be best to roll your own rather than expend the effort of integrating a third-party library, provided you are confident that you can do so without introducing additional bugs.

string.c test suite

For practice I decided to rewrite string.c in C.
I can not find online (via google or other search engines) a test suite for either the entire standard or for specifically string.h
More specifically I am looking to see if my strcat(), strncat(), strchr(), strrchr(), strcmp(), strncmp(), strcpy(), strncpy(), strlen(), and strstr() functions conform to ISO/IEC 9899:1990 (``ISO C90'').
Can anyone recommend a good way to test these functions?
edit: the title and paragraph above used to say string.h - I meant string.c
P.J. Plauger, in his book The Standard C Library (which covers implementing the whole library) presents test programs for each part of the library. These are not exhaustive, and I don't have an on-line source for them, but the book is definitely of interest to anyone implementing the library.
Compiler and library validation test suites are readily available. IIRC, Rogue Wave was in that business. But they cost money, companies that use them tend to have deep pockets.
The GNU folks have them too, this looks like a good place to start. You'll have to do some digging to isolate the string.h tests. But, don't hesitate to write these tests yourself. It strikes me as a better way to learn what the string function should and should not do than reimplementing the functions.

Resources