A pure bytes version of strstr? - c

Is there a version of strstr that works over a fixed length of memory that may include null characters?
I could phrase my question like this:
strncpy is to memcpy as strstr is to ?

memmem, unfortunately it's GNU-specific rather than standard C. However, it's open-source so you can copy the code (if the license is amenable to you).

Not in the standard library (which is not that large, so take a look). However writing your own is trivial, either directly byte by byte or using memchr() followed by memcmp() iteratively.

In the standard library, no. However, a quick google search for "safe c string library" turns up several potentially useful results. Without knowing more about the task you are trying to perform, I cannot recommend any particular third-party implementation.
If this is the only "safe" function that you need beyond the standard functions, then it may be best to roll your own rather than expend the effort of integrating a third-party library, provided you are confident that you can do so without introducing additional bugs.

Related

Is there a particular reason for memmem being a GNU extension?

In C, the memmem function is used to locate a particular sequence of bytes in a memory area. It can be assimilated to strstr, which is dedicated to null-terminated strings.
Is there any particular reason for this function to be available as a GNU extension, and not directly in the standard libraries? The manual states :
This function was broken in Linux libraries up to and including libc 5.0.9; there the needle and haystack arguments were interchanged, and a pointer to the end of the first occurrence of needle was returned.
Both old and new libc's have the bug that if needle is empty, haystack-1 (instead of haystack) is returned. And glibc 2.0 makes it worse, returning a pointer to the last byte of haystack. This is fixed in glibc 2.1.
I can see it went through several fixes, yet I'd like to know why it was not made as directly available (if not more) as strstr on some distributions. Does it still bring up implementation issues?
Edit (motivations): I wouldn't ask this question if the standard had decided it the other way around: including memmem but not strstr. Indeed, strstr could be something like:
memmem(str, strlen(str), "search", 6);
Slightly trickier, but still a pretty logical one-liner considering that it is very usual in C functions to require both the data chunk and its length.
Edit (2): another motivation from comments and answers. Quoting Theolodis:
Not every function is necessary to every single, or at least most of the C developers, so it would actually make the standard libraries unnecessarily huge.
Well, I couldn't agree more, I'm always in when it comes to making the librairies lighter and faster. But then... why both strncpy and memcpy (from keltar's comment)...? I could almost ask: why has poor memmem been "black-sheeped"?
Historically, that is before the first revision of the Standard, C has been made by compiler writers.
In the case of strstr, it is a little bit different because it has been introduced by the C Committee, the C89 Rationale document tells us that:
"The strstr function is an invention of the Committee. It is included as a hook for efficient algorithms, or for built-in substring instruction."
The C Committee does not explain why it has not made a more general function not limited to strings so any reasoning may only be speculation. My only guess is the use case has been considered not important enough to have a generic memmem instead of strstr. Remember that in the goals of C there is this requirement (in the C99 Rationale) "Keep the language small and simple". Also even POSIX didn't consider it for inclusion.
In any case to my knowledge nobody has proposed any Defect Report or proposal to have memmem included.

Stand-alone portable snprintf(), independent of the standard library?

I am writing code for a target platform with NO C-runtime. No stdlib, no stdio. I need a string formatting function like snprintf but that should be able to run without any dependencies, not even the C library.
At most it can depend on memory alloc functions provided by me.
I checked out Trio but it needs stdio.h header. I can't use this.
Edit
Target platform : PowerPC64 home made OS(not by me). However the library shouldn't rely on OS specific stuff.
Edit2
I have tried out some 3rd-party open source libs, such as Trio(http://daniel.haxx.se/projects/trio/), snprintf and miniformat(https://bitbucket.org/jj1/miniformat/src) but all of them rely on headers like string.h, stdio.h, or(even worse) stdlib.h. I don't want to write my own implementation if one already exists, as that would be time-wasting and bug-prone.
Try using the snprintf implementation from uclibc. This is likely to have the fewest dependencies. A bit of digging shows that snprintf is implemented in terms of vsnprintf which is implemented in terms of vfprintf (oddly enough), it uses a fake "stream" to write to string.
This is a pointer to the code: http://git.uclibc.org/uClibc/tree/libc/stdio/_vfprintf.c
Also, a quick google search also turned up this:
http://www.ijs.si/software/snprintf/
http://yallara.cs.rmit.edu.au/~aholkner/psnprintf/psnprintf.html
http://www.jhweiss.de/software/snprintf.html
Hopefully one is suitable for your purposes. This is likely to not be a complete list.
There is a different list here:
http://trac.eggheads.org/browser/trunk/src/compat/README.snprintf?rev=197
You will probably at least need stdarg.h or low level knowledge of the specific compiler/architecture calling convention in order to be able to process the variadic arguments.
I have been using code based on Kustaa Nyholm's implementation It provides printf() (with user supplied character output stub) and sprintf(), but adding snprintf() would be simple enough. I added vprintf() and vsprintf() for example in my implementation.
No dynamic memory application is required, but it does have a dependency on stdarg.h, but as I said, you are unlikely to be able to get away without that for any variadic function - though you could potentially implement your own.
I am guessing you are in a norming enivronment where you need to explicitly document and verify COTS code.
However, I think in the case of stdarg.h this is worthwhile. You could pull in the source for just this and treat it like handwritten code (review, lint, unit-test, etc.). Any self-written replacement will be a lot of work, probably less stable and absolutely not portable.
That said, the actual snprintf implementation should not be too hard, and you could do this yourself, probably. Especially if you might be able to strip a few features away.
Keep in mind that vararg code has no typechecking and is prone to errors. For library snprintf you may find gcc's warnings helpful.

Which string manipulation functions should I use?

On my Windows/Visual C environment there's a wide number of alternatives for doing the same basic string manipulation tasks.
For example, for doing a string copy I could use:
strcpy, the ANSI C standard library function (CRT)
lstrcpy, the version included in kernel32.dll
StrCpy, from the Shell Lightweight Utility library
StringCchCopy/StringCbCopy, from a "safe string" library
strcpy_s, security enhanced version of CRT
While I understand that all these alternatives have an historical reason, can I just choose a consistent set of functions for new code? And which one? Or should I choose the most appropriate function case by case?
First of all, let's review pros and cons of each function set:
ANSI C standard library function (CRT)
Functions like strcpy are the one and only choice if you are developing portable C code. Even in a Windows-only project, it might be a wise thing to have a separation of portable vs. OS-dependent code.
These functions have often assembly level optimization and are therefore very fast.
There are some drawbacks:
they have many limitations and therefore often you still have to call functions from other libraries or provide your own versions
there are some archaisms like the infamous strncpy
Kernel32 string functions
Functions like lstrcpy are exported by kernel32 and should be used only when trying to avoid any dependency to the CRT. You might want to do that for two reasons:
avoiding the CRT payload for an ultra lightweight executable (unusual these days but not in the 90s!)
avoiding initialization issues (if you launch a thread with CreateThread instead of _beginthread).
Moreover, the kernel32 function could be more optimized that the CRT version: when your executable will run on Windows 12 optimized for a Core i13, kernel32 could use an assembly-optimized version.
Shell Lightweight Utility Functions
Here are valid the same considerations made for the kernel32 functions, with the added value of some more complex functions. However I doubt that they are actively maintained and I would just skip them.
StrSafe Function
The StringCchCopy/StringCbCopy functions are usually my personal choice: they are very well designed, powerful, and surprisingly fast (I also remember a whitepaper that compared performance of these functions to the CRT equivalents).
Security-Enhanced CRT functions
These functions have the undoubted benefit of being very similar to ANSI C equivalents, so porting legacy code is a piece of cake. I especially like the template-based version (of course, available only when compiling as C++). I really hope that they will be eventually standardized. Unfortunately they have a number of drawbacks:
although a proposed standard, they have been basically rejected by the non-Windows community (probably just because they came from Microsoft)
when fail, they don't just return an error code but execute an invalid parameter handler
Conclusions
While my personal favorite for Windows development is the StrSafe library, my advice is to use the ANSI C functions whenever is possible, as portable-code is always a good thing.
In the real life, I developed a personalized portable library, with prototypes similar to the Security-Enhanced CRT functions (included the powerful template based technique), that relies on the StrSafe library on Windows and on the ANSI C functions on other platforms.
My personal preference, for both new and existing projects, are the StringCchCopy/StringCbCopy versions from the safe string library. I find these functions to be overall very consistent and flexible. And they were designed from the groupnd up with safety / security in mind.
I'd answer this question slightly different. Do you want to have portable code or not? If you want to be portable you can not rely on anything else but strcpy, strncpy, or the standard wide character "string" handling functions.
Then if your code just has to run under Windows you can use the "safe string" variants.
If you want to be portable and still want to have some extra safety, than you should check cross-platform libraries like e.g
glib or
libapr
or other "safe string libraries" like e.g:
SafeStrLibrary
I would suggest using functions from the standard library, or functions from cross-platform libraries.
I would stick to one, I would pick whichever one is in the most useful library in case you need to use more of it, and I would stay away from the kernel32.dll one as it's windows only.
But these are just tips, it's a subjective question.
Among those choices, I would simply use strcpy. At least strcpy_s and lstrcpy are cruft that should never be used. It's possibly worthwhile to investigate those independently written library functions, but I'd be hesitant to throw around nonstandard library code as a panacea for string safety.
If you're using strcpy, you need to be sure your string fits in the destination buffer. If you just allocated it with size at least strlen(source)+1, you're fine as long as the source string is not simultaneously subject to modification by another thread. Otherwise you need to test if it fits in the buffer. You can use interfaces like snprintf or strlcpy (nonstandard BSD function, but easy to copy an implementation) which will truncate strings that don't fit in your destination buffer, but then you really need to evaluate whether string truncation could lead to vulnerabilities in itself. I think a much better approach when testing whether the source string fits is to make a new allocation or return an error status rather than performing blind truncation.
If you'll be doing a lot of string concatenation/assembly, you really should write all your code to manage the length and current position as you go. Instead of:
strcpy(out, str1);
strcat(out, str2);
strcat(out, str3);
...
You should be doing something like:
size_t l, n = outsize;
char *s = out;
l = strlen(str1);
if (l>=outsize) goto error;
strcpy(s, str1);
s += l;
n -= l;
l = strlen(str2);
if (l>=outsize) goto error;
strcpy(s, str2);
s += l;
n -= l;
...
Alternatively you could avoid modifying the pointer by keeping a current index i of type size_t and using out+i, or you could avoid the use of size variables by keeping a pointer to the end of the buffer and doing things like if (l>=end-s) goto error;.
Note that, whichever approach you choose, the redundancy can be condensed by writing your own (simple) functions that take pointers to the position/size variable and call the standard library, for instance something like:
if (!my_strcpy(&s, &n, str1)) goto error;
Avoiding strcat also has performance benefits; see Schlemiel the Painter's algorithm.
Finally, you should note that a good 75% of the string copying and assembly people perform in C is utterly useless. My theory is that the people doing it come from backgrounds in script languages where putting together strings is what you do all the time, but in C it's not useful that often. In many cases, you can get by with never copying strings at all, using the original copies instead, and get much better performance and simpler code at the same time. I'm reminded of a recent SO question where OP was using regexec to match a regular expression, then copying out the result just to print it, something like:
char *tmp = malloc(match.end-match.start+1);
memcpy(tmp, src+match.start, match.end-match.start);
tmp[match.end-match.start] = 0;
printf("%s\n", tmp);
free(tmp);
The same thing can be accomplished with:
printf("%.*s\m", match.end-match.start, src+match.start);
No allocations, no cleanup, no error cases (the original code crashed if malloc failed).

Should I use secure versions of POSIX functions on MSVC - C

I am writing some C code which is expected to compile on multiple compilers (at least on MSVC and GCC). Since I am beginner in C, I have all warnings turned on and warnings are treated as errors (-Werror in GCC & /WX in MSVC) to prevent me from making silly mistakes.
When I compiled some code that uses strcpy on MSVC, I get warning like,
warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
I am bit confused. Lot of common functions are deprecated on MSVC. Should I use this secured version when on Windows? If yes, should I wrap strcpy something like,
my_strcpy()
{
#ifdef WIN32
// use strcpy_s
#ELSE
// use strcpy
}
Any thoughts?
Whenever you move data between non-constant-size buffers, you have to (gasp! omg!) actually think about whether it fits. Using functions (like the MS-specific strcpy_s or the BSD strlcpy) that purport to be "safe" will protect you from some obvious buffer overflow conditions, but won't protect you from the bugs that result from string truncation. It also won't protect you from integer overflows in computing the necessary sizes of buffers.
Unless you're an expert dealing with C strings, I would recommend forgetting about special functions and commenting every line of your code that will perform variable-length/position writes with a justification for how you know, at this point in the program, that the length/offset you're about to use is within the bounds of the size of the buffer. Do this for lines where you perform arithmetic on sizes/offsets too - document how you know that the arithmetic will not overflow, and add tests for overflow if you find you don't know.
Another approach is to completely wrap all your string handling in a string object that stores the length of the buffer along with the string and automatically reallocates when a string needs to be enlarged, and then only use const char * for read-only access to strings when you need to pass them to system functions or other libraries. This will sacrifice a good bit of the performance you'd expect from C, but it will help you ensure that you don't make mistakes. Just don't take it to the extreme. There's no need to duplicate stuff like strchr, strstr, etc. in your string wrapper. Just provide methods to duplicate string objects, concatenate them, and truncate them, and then with the existing library functions that operate on const char * you can do just about anything you'd want to.
There are lots and lots of discussions about this topic here on SO. The usual suspects like strncpy, strlcpy and whatever will pop up here again, I'm sure. Just type "strcpy" in the search box and read some of the longer threads to get an overview.
My advice is: Whatever your final choice will be, it is a good idea to follow the DRY principle and continue to do it as in your example of my_strcpy(). Don't throw the raw calls all over your code, use wrappers and centralize them in your own string handling library. This will reduce overall code (boilerplate), and you have one central location to make modifications, if you change your mind later.
Of course this opens up some other cans of worms, especially for a beginner: Memory handling responsibility and interface design. Both a topic on its own, and 5 people will give you 10 suggestions of how to do it. A central library usually has the nice effect that it enforces a decision, which you will follow throughout your whole codebase, instead of using method a in module A and method b in module B, causing you trouble when you try to connect A with B...
I would tend to use the safer function snprintf † which is available on both platforms rather than having different paths depending on platform. You will need to use the define to prevent the warnings on MSVC.
† though possibly slightly less safer - it will return a string which is not nul-terminated on error, so you must check the return, but it won't cause a buffer overflow.

Safer Alternatives to the C Standard Library

The C standard library is notoriously poor when it comes to I/O safety. Many functions have buffer overflows (gets, scanf), or can clobber memory if not given proper arguments (scanf), and so on. Every once in a while, I come across an enterprising hacker who has written his own library that lacks these flaws.
What are the best of these libraries you have seen? Have you used them in production code, and if so, which held up as more than hobby projects?
I use GLib library, it has many good standard and non standard functions.
See https://developer.gnome.org/glib/stable/
and maybe you fall in love... :)
For example:
https://developer.gnome.org/glib/stable/glib-String-Utility-Functions.html#g-strdup-printf
explains that g_strdup_printf is:
Similar to the standard C sprintf() function but safer, since it calculates the maximum space required and allocates memory to hold the result.
This isn't really answering your question about the safest libraries to use, but most functions that are vulnerable to buffer overflows that you mentioned have safer versions which take the buffer length as an argument to prevent the security holes that are opened up when the standard methods are used.
Unless you have relaxed the level of warnings, you will usually get compiler warnings when you use the deprecated methods, suggesting you use the safer methods instead.
I believe the Apache Portable Runtime (apr) library is safer than the standard C library. I use it, well, as part of an apache module, but also for independent processes.
For Windows there is a 'safe' C/C++ library.
You're always at liberty to implement any library you like and to use it - the hard part is making sure it is available on the platforms you need your software to work on. You can also use wrappers around the standard functions where appropriate.
Whether it is really a good idea is somewhat debatable, but there is TR24731 published by the C standard committee - for a safer set of C functions. There's definitely some good stuff in there. See this question: Do you use the TR 24731 Safe Functions in your C code?, which includes links to the technical report.
Maybe the first question to ask is if your really need plain C? (maybe a language like .net or java is an option - then e.g. buffer overflows are not really a problem anymore)
Another option is maybe to write parts of your project in C++ if other higher level languages are not an option. You can then have a C interface which encapsulates the C++ code if you really need C.
Because if you add all the advanced functions the C++ standard library has build in - your C code would only be marginally faster most times (and contain a lot more bugs than an existing and tested framework).

Resources