In C, what is a safer function to use than strtrns? - c

strtrns has the following descriptions: desc-1, desc-2
The strtrns() function transforms string and copies it into
result.
Any character that appears in old is replaced with the character in
the same position in new. The new result is returned. .........
This function is a security risk because it is possible to overflow
the newString buffer. If the currentString buffer is larger than the
newString buffer, then an overflow will occur.
And this is its prototype( or "signature"? ":
char * strtrns(const char *string, const char *old, const char *new, char *result);
I've been googling to no avail. I appreciate any tips or advice.

I think you can write your own safe one pretty quickly.
It won't be a direct replacement, as the signature is slightly different, and it will allocate memory that the caller must free, but it can serve mostly the same job.
(I'm also changing the parameter name new, which is a reserved word in C++, and the parameter string which is a very common type in C++. These changes makes the function compatible with C++ code as well)
char* alloc_strtrns(const char *srcstr, const char *oldtxt, const char *newtxt)
{
if (strlen(oldtxt) != strlen(newtxt))
{
return NULL; /* Old and New lengths MUST match */
}
char* result = strdup(srcstr); /* TODO: check for NULL */
/* Caller is responsible for freeing! */
return strtrns(srcstr, oldtxt, newtxt, result);
}

The claim that this function is unsafe is nonsense. In C, whenever you have an interface that takes a pointer to a buffer and fills it with some amount of data, you must have a contract between the caller and callee regarding the buffer size. For some functions where the caller cannot know in advance how much data the callee will write, the most logical interface design (contract) is to have the caller pass the buffer size to the callee and have the callee return an error or truncate the data if the buffer is too small. But for functions like strcpy or in your case strtrns where the number of output bytes is a trivial function (like the identity function) of the number of input bytes, it makes perfectly good sense for the contract to simply be that the output buffer provided by the caller must be at least as large as the input buffer.
Anyone who is not comfortable with strict adherence to interface contracts should not be writing C. There is really no way around this; adding complex bounds-checking interfaces certainly does not solve the problem but just shifts around the nature of the contracts you have to follow.
By the way, strtrns is not a standard function anyway so if you'd prefer a different contract anyway you might be better off writing your own similar function. This would increase portability too.

You don't really have any options in C. You simply have to ensure that the destination buffer is large enough.

Related

How to safely use c-string in C embedded applications

After realizing that String type (with capital S) on Arduino was a big source of troubles (cf. https://hackingmajenkoblog.wordpress.com/2016/02/04/the-evils-of-arduino-strings/), I am trying to deal with c-string to be more safe and robust for embedded applications.
However, I am facing some issues regarding the safety. To illustrate my problem, let's take a function of md5 hashing that will receive a string message, concatenate a private key and then compute and return the hash as a string.
I came to this function:
#define MD5_PRIVATE_KEY "my_private_key"
void ComputeMd5(const char* msg, char* hashBuffer, uint8_t hashBufferSize)
{
if( (hashBuffer == NULL) || (hashBufferSize == 0) )
{
/* INVALID ARG */
*hashBuffer = '\0';
return;
}
if(hashBufferSize <= HASH_SIZE)
{
/* SIZE ERROR */
*hashBuffer = '\0';
return;
}
uint16_t toHashSize = strlen(msg) + strlen(MD5_PRIVATE_KEY) + 1;
char toHash[toHashSize] = "";
strcat(toHash, MD5_PRIVATE_KEY);
strcat(toHash, msg);
strncpy(hashBuffer, MD5(toHash, HASH_SIZE), hashBufferSize);
}
With this function, calls to strlen(msg) and strcat(toHash, msg) are not safe since we don't know the length of msg to use strnlen() and strncat() instead, and we don't even know if msg is a valid null-terminated string.
My question is, would it be a good practice to add the msg length in the prototype such as void ComputeMd5(const char* msg, uint16_t msgSize, char* hashBuffer, uint8_t hashBufferSize) in order to use the 'n' version of strlenand strcat? And, is it acceptable to rely on the caller to provide a valid null-terminated string or is it the responsibility of this function to make check (if yes, how?).
Maybe there is a complete different design to do it and I don't know it (but I still want to avoid dynamic allocation since it's considered as not safe for embedded applications).
Sorry if this isn't very clear, it is still confuse in my head. I'm looking for a discussion about best practices to use c-string in the safer way.
Thanks.
Arduino is not C only C++;
There is no 100% safe way of dealing with pointers and arrays (including null character terminated char arrays called C strings).
Your attempt to make them "safe" is extremely unsafe.
1.
if( (hashBuffer == NULL) || (hashBufferSize == 0) )
{
/* INVALID ARG */
*hashBuffer = '\0';
return;
}
If hashBuffer == NULL it is undefined behaviour.
You do not check if msg is not NULL
You do not check if toHashSize is a reasonable sized.
strncpy is not a safe function. If MD5 function returns a string longer or the same size as hashBufferSize the string will not be null character terminated.
You may have a problem even if the parameters are perfectly valid. For example:
uint16_t toHashSize = strlen(msg) + strlen(MD5_PRIVATE_KEY) + 1;
char toHash[toHashSize] = "";
VLAs may be (and to the best of my knowledge usually are) allocated on stack. How much stack you have, and whether it will fit in it in the case of a particular msg you don't know and don't check (ignoring the initialization issue for a second). So in your attempts to avoid a vulnaribility you just created a new one.
So yes, if you add length to the parameters you could validate that the msg is in fact a null-terminated string of that length, but that's assuming you can trust the caller to provide the correct length. But if you trust the caller to provide a correct length - you should be able to trust them to provide a null-terminated string to begin with, shouldn't you?
Or, just don't treat it as a string but rather as a random binary buffer that you're hashing, in which case length parameter is required and \0 terminator is not. I don't know which MD5 implementation you're using, but you can hash a random binary buffer, it doesn't have to be a string (your MD5 call seems to be expecting a string).
The way to deal with error handling in C (or in any sane program) is to place the error handling as close to the error source as possible. If you suspect that strings are too large or not null terminated for whatever reason, then you need to write code ensuring that they are valid at the point where you receive the strings. Not in some completely unrelated hash function!
Similarly, if you suspect that some pointer is set to NULL for reasons unknown, you need to deal with that at the point where it might occur, not inside some completely unrelated function.
It is also bad practice to place checks like if( (hashBuffer == NULL) || (hashBufferSize == 0) ) inside functions, because it slows down the normal use-case where the caller is passing on perfectly valid data.
In case you do implement a function with extensive error handling, you should let it return a proper error code with an enum. Not by silently setting data to something like *hashBuffer = '\0'; and then merrily move on with the execution - that's not error handling, that's error hiding.
char toHash[toHashSize] = ""; Using VLA is usually not a great idea in embedded systems that might be tight on stack space. If you are sure that this function isn't called often and that your stack has margins, then sure you can do things like this. But it would make far more sense to do static char toHash[MAX_SIZE]=""; since your function always needs to be able with the worst case scenario.
Although in this specific case, what makes the most sense is to do caller allocation. There's no obvious reason why you need to create a local temporary buffer, it just chews up memory and performance. Just access hashBuffer directly without any middle man.
strncpy is a dangerous function that should never be used in C programs, because it wasn't designed to be used with C strings. Detailed explanation here: Is strcpy dangerous and what should be used instead?

Filling a pre-allocated string of fixed size passed as an argument in C

I need to get a fixed length name from a 3rd party developer in a clean way that (hopefully) doesn't require any allocation on their side, and is checked well by the compiler.
I provide a prototype like this:
void getName(char name[9]);
And they would write a function something like this:
void getName(char name[9]) {
strncat(_name, "Hello World", 8);
}
Then I call it (on my side) sort of like this:
char buf[9];
*buf = '\0';
getName(buf);
doSomethingWith(buf);
It compiles and seems to work, but I'm not sure its the best way to handle this.
Any advice?
Edit: To clarify, the name string is used as an identifier in a packed binary save file. It needs to be exactly 8 ASCII 8-bit chars.
I wonder now if I should just receive any string and truncate it on my side. I was hoping the compiler would help instead of this being a runtime check.
In your example, the name is a static string. In this case the function could look like below, where no additional data copy is required:
const char* getName(void)
{
return "Hello World";
}
...
const char* const pName = getName();
Alternatively:
void getName(FUNCPTR func)
{
func("Hello World");
}
where the void func(const char* const pName) is implemented at your side. Then you also don't need to allocate/copy data.
Your first job is to agree the data type of the returned string.
Although it's tempting to use char*, you ought not to since the type of char is not sufficiently well-defined by the standard (could be unsigned, signed 2's complement or signed 1's complement). If you're not careful the behaviour of your program could be undefiend if you mix up your types.
So you ought to decide on a type and use #DEFINE CharType accordingly.
Then as for the function itself, don't rely on the 3rd party to allocate memory unless you call their library to release that memory. Your C runtime might use a different allocation system to theirs. In order to address this common issue, a sort of convention has grown up: if you pass NULL for the output buffer then the 3rd party function should return the length of the buffer required. Then you allocate the memory yourself with the required length, and call the function a second time with the size of the allocated buffer explicitly sent. In that mode, the function returns the length of the allocated string as well as the result populated into the buffer.
Putting this all together, a good prototype would be
SizeType getName(CharType* buffer, SizeType length);
where SizeType is again agreed upon between you and the 3rd party. Broadly speaking, this is how the Windows API works.

Using function parameters as both input and output

I found myself using function parameters for both input and output and was wondering if what I'm doing is going to bite me later.
In this example buffer_len is such a parameter. It is used by the foo to determine the size of the buffer and tell the caller, main, how much of the buffer has been used up.
#define MAX_BUFFER_LENGTH 16
char buffer[MAX_BUFFER_LENGTH] = {0};
void main(void)
{
uint32_t buffer_len = MAX_BUFFER_LENGTH;
printf("BEFORE: Max buffer length = %u", buffer_len);
foo(buffer, &buffer_len);
printf("BEFORE: Buffer length used = %u", buffer_len);
}
void foo(char *buffer, uint32_t *buffer_len)
{
/* Remember max buffer length */
uint32_t buffer_len_max = *buffer_len;
uint32_t buffer_len_left = buffer_len_max;
/* Add things to the buffer, decreasing the buffer_len_left
in the process */
...
/* Return the length of the buffer used up to the caller */
*buffer_len = buffer_len_max - buffer_len_left;
}
Is this an OK thing to do?
EDIT:
Thank you for your responses, but I'd prefer to keep the return value of foo for the actual function result (which makes sense with larger functions). Would something like this be more pain-free in the long run?
typedef struct
{
char *data_ptr;
uint32_t length_used;
uint32_t length_max;
} buffer_t;
#define ACTUAL_BUFFER_LENGTH 16
char actual_buffer[ACTUAL_BUFFER_LENGTH] = {0};
void main(void)
{
buffer_t my_buffer = { .data_ptr = &actual_buffer[0],
.length_used = 0,
.length_max = ACTUAL_BUFFER_LENGTH };
}
For the original version of the question where the called function doesn't return a value, you got three similar answers, all roughly saying "Yes, but…":
Jonathan Leffler said:
It's "OK" as long as it is documented, but really not the preferred way of operating. Why not have the function return the length used, and leave the buffer length parameter as a regular uint32_t (or perhaps size_t, so you can pass sizeof(buffer) to the function)?
DevSolar said:
It's syntactically OK, but personally I much prefer return codes. If that is not possible, I would want a dedicated output parameter, especially if it's pass-by-reference. (I might not pay attention and continue under the assumption that my buffer_len still holds the original value.)
BeyelerStudios said:
As long as you have not used up your return value, I prefer to receive the result rather than having to defining a variable every time I use the function (or maybe I want to input an expression).
The unanimity is remarkable.
The question was then updated to indicate that instead of returning void, the function's return value would be used for another purpose. This completely changes the assessment.
Don't show a void function if your real function is going to return a value. Doing so completely alters the answers. If you need to return more than one value, then an in-out parameter is OK (even necessary — getchar() is a counter-example), though a pure in parameter and a separate pure out parameter might be better. Using a structure is OK too.
Perhaps I should explain the 'counter-example' a bit. The getchar() function returns a value that indicates failure or a char value. This leads to many pitfalls for beginners (because getchar() returns an int, not a char as its name suggests). It would be better, in some ways, if the function was:
bool get_char(char *c);
returning true if it reads a character and false if it fails, and assigning the character value to c. It could be used like:
char c;
while (get_char(&c))
…use character just read…
This is a case where the function needs to return two values.
Harking back to the suggested revised code in the question, the code using the structure.
That is not a bad idea at all; it is often sensible to package up a set of values into a structure. It would make a lot of sense if the called function has to compute some value that it will return and yet it also modifies the buffer array and needs to report on the number of entries in it (as well as knowing how much space there is to be used). Here, keeping the 'space available' separate from the 'space used' is definitely preferable; it will be easier to see what's going on than having the 'in-out' parameter which informs the function how much space is available on entry and reports back how much space was used on exit. Even if it reported how much space was still available on exit, it would be harder to use.
Which gets back to the original diagnosis: yes, the in-out parameter is technically legal, and can be made to work, but isn't as easy to use as separate values.
Side note: void main(void) is not the standard way to write main() — see What should main() return in C and C++? for the full story.
There's nothing wrong with using the same buffer for input and output, but it probably limits the functions utility elsewhere. For example, what if you want to use it with two different values? (for some reason as a user of the function I need the original preserved). In the example you provide there's no harm in taking two parameters in the function and then just passing the same pointer in twice. Then you've wrapped up both uses and it probably simplifies the function code.
For more complex data types like arrays, as well as the same problem above, you'll need to make sure your function doesn't need a larger output, or if it shrinks the buffer that you memset( 0.. ) the difference and so on.
So for those headaches I do tend to avoid as a pattern, but as I say nothing particularly wrong.

Inconveniences of pointers to static variables

I often use convenience functions that return pointers to static buffers like this:
char* p(int x) {
static char res[512];
snprintf(res, sizeof(res)-1, "number is %d", x));
return res;
}
and use them all over the place as arguments to other functions:
...
some_func( somearg, p(6) );
....
However, this "convenience" has an annoying drawback besides not being thread-safe (and probably many more reasons):
some_func( somearg, p(6), p(7) );
The above obviously doesn't do what I want since the last two arguments will point to the same memory space. I would like to be able get the above to work properly without to many hassles.
So my question is:
Is there some magic way I have missed to accomplish what I want without doing cumbersome allocation & freeing?
***** UPDATE 2010-04-20 *****
Shameless plug: look at my own answer here
I guess it will work but it's also bordering to overkill. Opinions?
Well, one widely used approach is to put the responsibility of preparing the memory buffer for the result on the caller. The caller can choose whatever method it likes best.
In your case write your p as
char* p(char *buffer, size_t max_length, int x) {
snprintf(buffer, max_length, "number is %d", x);
return buffer;
}
and call it as
char buffer1[512], buffer2[512];
some_func( somearg, p(buffer1, sizeof buffer1 - 1, 6), p(buffer2, sizeof buffer2 - 1, 7) );
This approach has at least one obvious drawback though: in general case the caller does not know in advance how many characters it needs to allocate for the buffer. If a good constant compile-time value is available, then it is easy, but in more complicated cases an additional effort is required, like providing some kind of "pre-calculation" functionality that returns the required buffer size as a run-time value. (The snprintf function actually works that way: you can call it with null buffer pointer and zero buffer size, just to make a fictive run in order to determine the buffer size).
Have the caller provide the buffer (and the size of the buffer). It's thread safe, and the buffer can usually go on the stack, so you don't bring on the overhead of heap allocation.
So long as you understand this to be thread-unsafe and so long that your logic expect the values returned by the "convenience" methods to only be valid for the duration of a few calls to [possibly various] methods, you can maybe extend this convenience in two unrelated and possibly complementary ways.
Add an extra argument to the convenience methods so these can pass -optionally- a container for the return value (also add a size_of_passed_buffer argument if such size cannot be implicitly set). Whenever the caller supplies a buffer, use it, otherwise, use the static buffer.
BTW, if the buffer passed to the convenience methods are local variables, their allocation will be automatically (and adequately) managed, following the lifespan of the subroutines where the convenience methods are called.
Use a circular buffer, allowing for a given number of calls before the buffer elements are reused.
Such a buffer can also be global, i.e. shared with multiple "convenience" methods (which of course also need to a) play nice, thread-wise, and b) share the pointer to next available byte/element in buffer.
Implementing all this may seem to be
a lot of work (for the convenience logic), and also
a potential for several bugs/issues
However, provided that
the buffer(s) is (are) adequately sized, and that
the logic using these convenience methods understand the "rules of the game",
this pattern supplies a simplistic automated heap management system, which is a nice thing to have in C (which unlike with Java, .NET and other systems does not offer a built in GC-based heap management)
In short, no. C does not provide any form of automatic heap management, so you're own for tracking allocated memory. A standard C-like solution is to have the caller provide a buffer, instead of allocating one internally. Although this just moves the responsibility of tracking memory around, it often ends up in a more convenient place. You could, I suppose, look in to Boehm's conservative garbage collector if you want to get a form of garbage collection in C.
If the args to the helpers are always literals (as in the example), you could use a macro:
#define P(NUMLIT) ("number is " #NUMLIT)
...
somefunc(somearg, P(6), P(7));
...
The preprocessor creates a string from the macro argument NUMLIT and appends it to "number is " to create a single string literal, just as
"this" " " "string"
is emitted as a single string literal, "this string".
I found an alternate method. Something like this:
#define INIT(n) \
int xi = 0; \
char *x[n]; \
#define MACRO(s) \
(++xi, xi %= sizeof(x)/sizeof(*x), x[xi] = alloca(strlen(s)+1), strcpy(x[xi], (s)), x[xi])
that I can call like this:
INIT(2);
some_func( somearg, MACRO("testing1"), MACRO("testing2"));
So the buffers are on the stack without needing any freeing. And it's even thread-safe.

Disabling NUL-termination of strings in GCC

Is it possible to globally disable NUL-terminated strings in GCC?
I am using my own string library, and I have absolutely no need for the final NUL characters as it already stores the proper length internally in a struct.
However, if I wanted to append 10 strings, this would mean that 10 bytes are unnecessarily allocated on the stack. With wide strings it is even worse: As for x86, there are 40 bytes wasted; and for x86_64, 80 bytes!
I defined a macro to add those stack-allocated strings to my struct:
#define AppendString(ppDest, pSource) \
AppendSubString(ppDest, (*ppDest)->len + 1, pSource, 0, sizeof(pSource) - 1)
Using sizeof(...) - 1 works quite well but I am wondering whether I could get rid of NUL termination in order to save a few bytes?
This is pretty awful, but you can explicitly specify the length of every character array constant:
char my_constant[6] = "foobar";
assert(sizeof my_constant == 6);
wchar_t wide_constant[6] = L"foobar";
assert(sizeof wide_constant == 6*sizeof(wchar_t));
I understand you're only dealing with strings declared in your program:
....
char str1[10];
char str2[12];
....
and not with text buffers you allocate with malloc() and friends otherwise sizeof is not going to help you.
Anyway, i would just think twice about removing the \0 at the end: you would lose the compatibility with C standard library functions.
Unless you are going to rewrite any single string function for your library (sprintf, for example), are you sure you want to do it?
I can't remember the details, but when I do
char my_constant[5]
it is possible that it will reserve 8 bytes anyway, because some machines can't address the middle of a word.
It's nearly always best to leave this sort of thing to the compiler and let it handle the optmisation for you, unless there is a really really good reason to do so.
If you're not using any of the Standard Library function that deal with strings you can forget about the NUL terminating byte.
No strlen(), no fgets(), no atoi(), no strtoul(), no fopen(), no printf() with the %s conversion specifier ...
Declare your "not quite C strings" with just the needed space;
struct NotQuiteCString { /* ... */ };
struct NotQuiteCString variable;
variable.data = malloc(5);
data[0] = 'H'; /* ... */ data[4] = 'o'; /* "hello" */
Indeed this is only in case you are really low in memory. Otherwise I don't recommend to do so.
It seems most proper way to do thing you are talking about is:
To prepare some minimal 'listing' file in a form of:
string1_constant_name "str1"
string2_constant_name "str2"
...
To construct utility which processes your file and generates declarations such as
const char string1_constant[4] = "str1";
Of course I'd not recommend to do this by hands, because otherwise you can get in trouble after any string change.
So now you have both non-terminated strings because of fixed auto-generated arrays and also you have sizeof() for every variable. This solution seems acceptable.
Benefits are easy localization, possibility to add some level of checks to make this solution risk lower and R/O data segment savings.
Drawback is need to include all of such string constants in every module (as include to keep sizeof() known). So this only makes sense if your linker merges such symbols (some don't).
Aren't these similar to Pascal-style strings, or Hollerith Strings? I think this is only useful if you actually want the String data to preserve NULLs, in which you're really pushing around arbitrary memory, not "strings" per se.
The question uses false assumptions - it assumes that storing the length (e.g. implicitly by passing it as a number to a function) incurs no overhead, but that's not true.
While one might save space by not storing the 0-byte (or wchar), the size must be stored somewhere, and the example hints that it is passed as a constant argument to a function somewhere, which almost certainly takes more space, in code. If the same string is used multiple times, the overhead is per use, not per-string.
Having a wrapper that uses strlen to determine the length of a string and isn't inlined will almost certainly save more space.

Resources