Usage of SafeStr in C - c

I am reading about using of safe strings at following location
https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=5111861
It is mentioned as below.
SafeStr strings, when used properly, can eliminate many of these errors and provide backward compatibility to legacy code as well.
My question is what does author mean by "provide backward compatibility to legacy code as well." ? Request to explain with example.
Thanks for your time and help

It means that functions from the standard libc (and others) which expects plain, null terminated char arrays, will work even on those SafeStrs. This is probably achieved by putting a control structure at a negative offset (or some other trick) from the start of the string.
Examples: strcmp() printf() etc can be used directly on the strings returned by SafeStr.
In contrast, there are also other string libraries for C which are very "smart" and dynamic, but these strings can not be sent without conversion to "old school" functions.

From that page:
The library is based on the safestr_t type which is completely
compatible with char *. This allows casting of safestr_t structures to
char *.
That's some backward compatibility with all the existing code that takes char * or const char * pointers.

Related

Is passing empty strings ("") in C bad practice?

I'm trying to fix a bug in very old C code that just popped up recently on Windows after we updated Visual Studio to version 2017. The code still runs on several linux platforms. That tells me we're probably relying on some undefined behavior, and we previously got lucky.
We have a bunch of function calls like this:
get_some_data(parent, "ge", "", type);
Running in debug, I noticed that upon entry to this function, that empty string is immediately filled with garbage, before the function has done anything. The function is declared like this:
static void get_some_data(
KEY Parent,
char *Prefix,
char *Suffix,
ENT EntType)
So is it unwise to pass the strings directly ("ge", "")? I know it's trivial to fix this case by declaring char *suffix="" and passing suffix instead of "", but I'm now questioning whether I need to go through this entire suite of code looking for this type of function call.
So is it unwise to pass the strings directly ("ge", "")?
There is nothing inherently wrong with passing string literals to functions in general.
However, it is unwise to pass pointers (in)to string literals specifically to parameters declared as pointers to non-const char, because C specifies that undefined behavior results from attempting to modify a string literal, and in practice, that UB often manifests as abrupt program termination. If a function declares a parameter as const char * then you can reasonably take that as a promise that it will not attempt to modify the target of that pointer -- which is what you need to ensure -- but if it declares a parameter as just char * then no such promise is made, and the function doesn't even have a way to check at runtime whether the argument is writable.
Possibly you can rely on documentation in place of const-qualification, for you're ok in this regard as long as no attempt is made in practice to modify a string literal, but that still leaves you more open to bugs than you otherwise would be.
I know it's trivial to fix this case by declaring char *suffix="" and passing suffix instead of ""
Such a change may disguise what you're doing from the compiler, so that it does not warn about the function call, but it does not fix anything. The same pointer value is passed to the function either way, and the same semantics and constraints apply. Also, if the compiler warned about the function call then it should also warn about the assignment.
This is not an issue in C++, by the way, or at least not the same issue, because in C++, string literals represent arrays of const char in the first place.
, but I'm now questioning whether I need to go through this entire suite of code looking for this type of function call.
Better might be to modify the signatures of the called functions. Where you intend for it to be ok to pass a string literal, ensure that the parameter has type const char *, like so:
static void get_some_data(
KEY Parent,
const char *Prefix,
const char *Suffix,
ENT EntType)
But do note that is highly likely to cause new warnings about violations of const-correctness. To ensure safety, you need to fix these, too, without casting away constness. This could well cascade broadly, but the exercise will definitely help you identify and fix places where your code was mishandling string literals.
On the other hand, a genuine fix that might be less pervasive would be to pass pointers to modifiable arrays instead of (unmodifiable) string literals. Perhaps that's what you had in mind with your proposed fix, but the correct way to do that is this:
char prefix[] = "ge";
char suffix[] = "";
get_some_data(parent, prefix, suffix, type);
Here, prefix and suffix are separate (modifiable) local arrays, initialized with copies of the string literals' contents.
With all that said, I'm inclined to suspect that if you're getting bona fide runtime errors related to these arguments with VS-compiled executables but not GCC-compiled ones, then the source of those is probably something else. My first guess would be that array bounds are being overrun. My second guess would be that you are compiling C code as C++, and running afoul of one or more of the (other) differences between them.
That's not to say that you shouldn't take a good look at the constness / writability concerns involved here, but it would suck to go through the whole exercise just to find out that you were ok to begin with. You could still end up with better code, but that's a little tricky to sell to the boss when they ask why the bug hasn't been fixed yet.
No, there is absolutely nothing wrong with passing a string literal, empty or not. Quite the opposite — if you try to "fix" your code by doing your trivial change, you will hide the bug and make life harder for whoever is going to fix it in future.
I encountered this same problem and the cause was a similar situation in a recently executed function where, crucially, the contents of the string were being changed.
Prior to the call where this strange behaviour was being observed, there was a call to older code with a parameter of PSTR type. Not realizing that the contents of that parameter were going to be changed, a programmer had supplied an empty string. The code was only ever updating the first character so declaring a char type parameter and supplying the address of that was sufficient to solve the problem that was exhibiting in the later call.

Does some kind of 'strcmpf' implementation exists?

I am looking for a function that checks if a string follows (matches exactly) the pattern of data specified by the additional arguments corresponding to the format string
Like this:
/* int strcmpf (char *str1, char *format, ...); */
char *test = "Hello World !"
if(!strcmpf(test, "%s%*s %c", "Hello ", '!')
return HELLO_HAS_BEEN_SAID;
else
return PROGRAM_ISNT_POLITE;
Implementing this myself I assume will be very difficult, but such function could be very useful for semantic parsing of content. Before I attempt to write such function myself, I need to know if there is already a library or code snippet that provides implementation of a function like this ?
To be more specific, I need pattern-matching behavior. So test must match exactly the pattern specified by the data corresponding to the format parameter.
I need to know if there is already a library or code snippet that provides implementation of a function like this
The standard library has no such functionality. Requests for third-party library recommendations are off-topic here, but to the extent that I understand the functionality you want, I am anyway unaware of an existing third-party implementation.
As I said in comments, I suggest that you design the pattern-matching aspect around bona fide regular expressions instead of around printf() or scanf() formats (which are not entirely the same). There are several regular expression libraries available to support that part.

Why there is no strnchr function?

As the other C library function, like strcpy, strcat, there is a version which limits the size of string (strncpy, etc.), I am wondering why there is no such variant for strchr?
It does exist -- it is called memchr:
http://en.wikipedia.org/wiki/C_string_handling
In C, the term "string" usually means "null terminated array of characters", and the str* functions operate on those kinds of strings. The n in the functions you mention is mostly for the sake of controlling the output.
If you want to operate on an arbitary byte sequence without any implied termination semantics, use the mem* family of functions; in your case memchr should serve your needs.
strnchr and also strnlen are defined in some linux environments, for example https://manpages.debian.org/experimental/linux-manual-4.11/strnchr.9.en.html. It is really necessary. A program may crash on end of memory area if strlen or strcmp do not found any \0-termination. Unfortunately such things often are not standardized or too late and too sophisticated standardized. strnlen_s is existing in C11, but strnchr_s is not available.
You may found some more information about such problems in my internet page: https://www.vishia.org/emcdocu/html/portability_emC.html. I have defined some C-functions strnchr_emC... etc. which delivers the required functionality. To achieve compatibility you can define
#define strnchr strnchr_emC
In a common header but platform-specific. Refer the further content on https://www.vishia.org/emc/. You find the sources in https://github.com/JzHartmut

Type-casting in C for standard function

I am working on a C project (still pretty new to C), and I am trying to remove all of the warnings when it's compiled.
The original coders of this project have made a type called dyn_char (dynamic char arr) and it's an unsigned char * type. Here's a copy of one of the warnings:
warning: argument #1 is incompatible with prototype: prototype:
pointer to char : ".../stdio_iso.h", line 210 argument : pointer to
unsigned char
They also use lots of standard string functions like strlen(); so the way that I have been removing these warnings is like this:
strlen((char *)myDynChar);
I can do this but some of the files have hundreds of these warnings. I could do a Find and Replace to search for strlen( and replace with strlen((char*), but is there a better way?
Is it possible to use a Macro to do something similar? Maybe something like this:
#define strlen(s) strlen((char *)s)
Firstly, would this work? Secondly, if so, is it a bad idea to do this?
Thanks!
This is an annoying problem, but here's my two cents on it.
First, if you can confidently change the type of dyn_char to just be char *, I would do that. Perhaps if you have a robust test program or something you can try it out and see if it still works?
If not, you have two choices as far as I can see: fix what's going into strlen(), or have your compiler ignore those warnings (or ignore them yourself)! I'm not one for ignoring warnings unless I have to, but as far as fixing what goes into strlen...
If your underlying type is unsigned char *, then casting what goes into strlen() is basically telling the compiler to assume that the argument, for the purposes of being passed to strlen(), is a char *. If strlen() is the only place this is causing an issue and you can't safely change the type, then I'd consider a search-and-replace to add in casts to be the preferable option. You could redefine strlen with a #define like you suggested (I just tried it out and it worked for me), but I would strongly recommend not doing this. If anything, I'd search-replace strlen() with USTRLEN() or something (a fake function name), and then use that as your casting macro. Overriding C library functions with your own names transparently is a maintainability nightmare!
Two points on that: first, you're using a different name. Second, you're using all-caps, as is the convention for defining such a macro.
This may or may not work
strlen may be a macro in the system header, in which case you will get a warning about redefining a macro, and you won't get the functionality of the existing macro.
(The Visual Studio stdlib does many interesting things with macros in <string.h>. strcpy is defined this way:
__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1(char *, __RETURN_POLICY_DST, __EMPTY_DECLSPEC, strcpy, _Pre_cap_for_(_Source) _Post_z_, char, _Dest, _In_z_ const char *, _Source))
I wouldn't be surprised at all if #defining strcpy broke this)`
Search and replace may be your best option. Don't hide subtle differences like this behind macros - you will just pass your pain on to the next maintainer.
Instead of adding a cast to all the calls, you may want to change all the calls to dyn_strlen, which is a function you create that calls strlen with the appropriate cast.
You can define a function:
char *uctoc(unsigned char*p){ return (char*)(p); }
and do the search replace strstr(x with strstr(uctoc(x). At least you can have some type checking. Later you can convert uctoc to a macro for performance.

Is it good practice to ALWAYS cast variables in C?

I'm writing some C code and use the Windows API. I was wondering if it was in any way good practice to cast the types that are obviously the same, but have a different name? For example, when passing a TCHAR * to strcmp(), which expects a const char *. Should I do, assuming I want to write strict and in every way correct C, strcmp((const char *)my_tchar_string, "foo")?
Don't. But also don't use strcmp() but rather _tcscmp() (or even the safe alternatives).
_tcs* denotes a whole set of C runtime (string) functions that will behave correctly depending on how TCHAR gets translated by the preprocessor.
Concerning safe alternatives, look up functions with a trailing _s and otherwise named as the classic string functions from the C runtime. There is another set of functions that returns HRESULT, but it is not as compatible with the C runtime.
No, casting that away is not safe because TCHAR is not always equal to char. Instead of casting, you should pick a function that works with a TCHAR. See http://msdn.microsoft.com/en-us/library/e0z9k731(v=vs.71).aspx
Casting is generally a bad idea. Casting when you don't need to is terrible practice.
Think what happens if you change the type of the variable you are casting? Suppose that at some future date you change my_tchar_string to be wchar_t* rather than char*. Your code will still compile but will behave incorrectly.
One of your primary goals when writing C code is to minimise the number of casts in your code.
My advice would be to just avoid TCHAR (and associated functions) completely. Their real intent was to allow a single code base to compile natively for either 16-bit or 32-bit versions of Windows -- but the 16-bit versions of Windows are long gone, and with them the real reason to write code like this.
If you want/need to support wide characters, do it. If you're fine with only narrow/multibyte characters, do that. At least IME, trying to sit on the fence and do some of both generally means you end up not doing either one well. It also means roughly doubling the amount of testing necessary without even coming close to doubling the functionality you provide to the user.

Resources