Inconveniences of pointers to static variables - c

I often use convenience functions that return pointers to static buffers like this:
char* p(int x) {
static char res[512];
snprintf(res, sizeof(res)-1, "number is %d", x));
return res;
}
and use them all over the place as arguments to other functions:
...
some_func( somearg, p(6) );
....
However, this "convenience" has an annoying drawback besides not being thread-safe (and probably many more reasons):
some_func( somearg, p(6), p(7) );
The above obviously doesn't do what I want since the last two arguments will point to the same memory space. I would like to be able get the above to work properly without to many hassles.
So my question is:
Is there some magic way I have missed to accomplish what I want without doing cumbersome allocation & freeing?
***** UPDATE 2010-04-20 *****
Shameless plug: look at my own answer here
I guess it will work but it's also bordering to overkill. Opinions?

Well, one widely used approach is to put the responsibility of preparing the memory buffer for the result on the caller. The caller can choose whatever method it likes best.
In your case write your p as
char* p(char *buffer, size_t max_length, int x) {
snprintf(buffer, max_length, "number is %d", x);
return buffer;
}
and call it as
char buffer1[512], buffer2[512];
some_func( somearg, p(buffer1, sizeof buffer1 - 1, 6), p(buffer2, sizeof buffer2 - 1, 7) );
This approach has at least one obvious drawback though: in general case the caller does not know in advance how many characters it needs to allocate for the buffer. If a good constant compile-time value is available, then it is easy, but in more complicated cases an additional effort is required, like providing some kind of "pre-calculation" functionality that returns the required buffer size as a run-time value. (The snprintf function actually works that way: you can call it with null buffer pointer and zero buffer size, just to make a fictive run in order to determine the buffer size).

Have the caller provide the buffer (and the size of the buffer). It's thread safe, and the buffer can usually go on the stack, so you don't bring on the overhead of heap allocation.

So long as you understand this to be thread-unsafe and so long that your logic expect the values returned by the "convenience" methods to only be valid for the duration of a few calls to [possibly various] methods, you can maybe extend this convenience in two unrelated and possibly complementary ways.
Add an extra argument to the convenience methods so these can pass -optionally- a container for the return value (also add a size_of_passed_buffer argument if such size cannot be implicitly set). Whenever the caller supplies a buffer, use it, otherwise, use the static buffer.
BTW, if the buffer passed to the convenience methods are local variables, their allocation will be automatically (and adequately) managed, following the lifespan of the subroutines where the convenience methods are called.
Use a circular buffer, allowing for a given number of calls before the buffer elements are reused.
Such a buffer can also be global, i.e. shared with multiple "convenience" methods (which of course also need to a) play nice, thread-wise, and b) share the pointer to next available byte/element in buffer.
Implementing all this may seem to be
a lot of work (for the convenience logic), and also
a potential for several bugs/issues
However, provided that
the buffer(s) is (are) adequately sized, and that
the logic using these convenience methods understand the "rules of the game",
this pattern supplies a simplistic automated heap management system, which is a nice thing to have in C (which unlike with Java, .NET and other systems does not offer a built in GC-based heap management)

In short, no. C does not provide any form of automatic heap management, so you're own for tracking allocated memory. A standard C-like solution is to have the caller provide a buffer, instead of allocating one internally. Although this just moves the responsibility of tracking memory around, it often ends up in a more convenient place. You could, I suppose, look in to Boehm's conservative garbage collector if you want to get a form of garbage collection in C.

If the args to the helpers are always literals (as in the example), you could use a macro:
#define P(NUMLIT) ("number is " #NUMLIT)
...
somefunc(somearg, P(6), P(7));
...
The preprocessor creates a string from the macro argument NUMLIT and appends it to "number is " to create a single string literal, just as
"this" " " "string"
is emitted as a single string literal, "this string".

I found an alternate method. Something like this:
#define INIT(n) \
int xi = 0; \
char *x[n]; \
#define MACRO(s) \
(++xi, xi %= sizeof(x)/sizeof(*x), x[xi] = alloca(strlen(s)+1), strcpy(x[xi], (s)), x[xi])
that I can call like this:
INIT(2);
some_func( somearg, MACRO("testing1"), MACRO("testing2"));
So the buffers are on the stack without needing any freeing. And it's even thread-safe.

Related

Returning a pointer to a static buffer

In C on a small embedded system, is there any reason not to do this:
const char * filter_something(const char * original, const int max_length)
{
static char buffer[BUFFER_SIZE];
// checking inputs for safety omitted
// copy input to buffer here with appropriate filtering etc
return buffer;
}
this is essentially a utility function the source is FLASH memory which may be corrupted, we do a kind of "safe copy" to make sure we have a null terminated string. I chose to use a static buffer and make it available read only to the caller.
A colleague is telling me that I am somehow not respecting the scope of the buffer by doing this, to me it makes perfect sense for the use case we have.
I really do not see any reason not to do this. Can anyone give me one?
(LATER EDIT)
Many thanks to all who responded. You have generally confirmed my ideas on this, which I am grateful for. I was looking for major reasons not to do this, I don't think that there are any. To clarify a few points:
rentrancy/thread safety is not a concern. It is a small (bare metal) embedded system with a single run loop. This code will not be called from ISRs, ever.
in this system we are not short on memory, but we do want very predictable behavior. For this reason I prefer declaring an object like this statically, even though it might be a little "wasteful". We have already had issues with large objects declared carelessly on the stack, which caused intermittent crashes (now fixed but it took a while to diagnose). So in general, I am preferring static allocation, simply to have very predictability, reliability, and less potential issues downstream.
So basically it's a case of taking a certain approach for a specific system design.
Pro
The behavior is well defined; the static buffer exists for the duration of the program and may be used by the program after filter_something returns.
Cons
Returning a static buffer is prone to error because people writing calls to the routines may neglect or be unaware that a static buffer is returned. This can lead to attempts to use multiple instances of the buffer from multiple calls to the function (in the same thread or different threads). Clear documentation is essential.
The static buffer exists for the duration of the program, so it occupies space at times when it may not be needed.
It really depends on how filter_something is used. Take the following as an example
#include <stdio.h>
#include <string.h>
const char* filter(const char* original, const int max_length)
{
static char buffer[1024];
memset(buffer, 0, sizeof(buffer));
memcpy(buffer, original, max_length);
return buffer;
}
int main()
{
const char *strone, *strtwo;
char deepone[16], deeptwo[16];
/* Case 1 */
printf("%s\n", filter("everybody", 10));
/* Case 2 */
printf("%s %s %s\n", filter("nobody", 7), filter("somebody", 9), filter("anybody", 8));
/* Case 2 */
if (strcmp(filter("same",5), filter("different", 10)) == 0)
printf("Strings same\n");
else
printf("Strings different\n");
/* Case 3 - Both of these end up with the same pointer */
strone = filter("same",5);
strtwo = filter("different", 10);
if (strcmp(strone, strtwo) == 0)
printf("Strings same\n");
else
printf("Strings different\n");
/* Case 4 - You need a deep copy if you wish to compare */
strcpy(deepone, filter("same", 5));
strcpy(deeptwo, filter("different", 10));
if (strcmp(deepone, deeptwo) == 0)
printf("Strings same\n");
else
printf("Strings different\n");
}
The output when gcc is used is
everybody
nobody nobody nobody
Strings same
Strings same
Strings different.
When filter is used by itself, it behaves quite well.
When it is used multiple times in an expression, the behaviour is undefined there is no telling what it will do. All instances will use the contents the last time the filter was executed. This depends on the order in which the execution was performed.
If an instance is taken, the contents of the instance will not stay the same as when the instance was taken. This is also a common problem when C++ coders switch to C# or Java.
If a deep copy of the instance is taken, then the contents of the instance when the instance was taken will be preserved.
In C++, this technique is often used when returning objects with the same consequences.
It is true that the identifier buffer only has scope local to the block in which it is declared. However, because it is declared static, its lifetime is that of the full program.
So returning a pointer to a static variable is valid. In fact, many standard functions do this such as strtok and ctime.
The one thing you need to watch for is that such a function is not reentrant. For example, if you do something like this:
printf("filter 1: %s, filter 2: %s\n",
filter_something("abc", 3), filter_something("xyz", 3));
The two function calls can occur in any order, and both return the same pointer, so you'll get the same result printed twice (i.e. the result of whatever call happens to occur last) instead of two different results.
Also, if such a function is called from two different threads, you end up with a race condition with the threads reading/writing the same place.
Just to add to the previous answers, I think the problem, in a more abstract sense, is to make the filtering result broader in scope than it ought to be. You introduce a 'state' which seems useless, at least if the caller's intention is only to get a filtered string. In this case, it should be the caller who should create the array, likely on the stack, and pass it as a parameter to the filtering method. It is the introduction of this state that makes possible all the problems referred to in the preceding responses.
From a program design, it's frowned upon to return pointers to private data, in case that data was made private for a reason. That being said, it's less bad design to return a pointer to a local static then it is to use spaghetti programming with "globals" (external linkage). Particularly when the pointer returned is const qualified.
One general issue with staticvariables, that may or may not be a problem regardless of embedded or hosted system is re-entrancy. If the code needs to be interrupt/thread safe, then you need to implement means to achieve that.
The obvious alternative to it all is caller allocation and you've got to ask yourself why that's not an option:
void filter_something (size_t size, char dest[size], const char original[size]);
(Or if you will, [restrict size] on both pointers for a mini-optimization.)

Why C doesn't have a function which is used like strcpy() and check buffer size automatically to prevent buffer overflow bug?

I'm really wondering why there's no function in C like strcpy(), memcpy(), etc. that automatically checks the size of the buffer. Something that behaves like this:
#define strcpy2(X, Y) strncpy(X, Y, sizeof(X))
Some people tell me: "Because it's old language." But, C is not a dead language. IOS can fix the standard, and new functions like strncpy have been added.
Others tell me: "It causes performance issues." But, I argue "if a function like that existed, you can still use the old function in situations where performance is important. In all situation, you can use that function and you can expect security improvement."
Still others tell me: "So, there's a function like strncpy()", or "C is designed for professional developer who consider this problem", but strncpy() does not do the check automatically - developers must determine the size of the buffer, and still large programs like Chrome, which are made by professional developers, have buffer overflow vulnerabilities.
I want to know a technical reason why such a function cannot be made.
*English is not my native language. so I guess there are some mistakes... sorry about this. (Edit (cmaster): Should be fixed now. Hope you like the new wording.)
If X is a pointer, and it usually is, then sizeof X tells you nothing about the size of the array to which X points. The size must be passed as a parameter.
To really understand the reason why C functions cannot do what you want, you need to understand about the difference between arrays and pointers, and what it means that an array decays to a pointer. Just to give you an idea what I'm talking about:
int array[7]; //define an array
int* pointer = array; //define a pointer that points to the same memory, array decays into a pointer to the first int
//Now the following two expressions are precisely equivalent, since array decays to a pointer again:
pointer[3];
array[3];
//However, the sizeof of the two is not the same:
assert(sizeof(array) == 7*sizeof(int)); //this is what you used in your define
assert(*pointer == sizeof(int)); //probably not what you expected
//Now the thing gets nasty: Array declarations in function arguments truly decay into pointers!
void foo(int bar[9]) {
assert(sizeof(bar) == sizeof(int)); //I bet, you didn't expect this!
}
//This is, because the definition of foo() is truly equivalent to this definition:
void foo(int* bar) {
assert(sizeof(bar) == sizeof(int));
}
//Transfering this to your #define, this will definitely not do what you want:
void baz(char aBuffer[BUFFER_SIZE], const char* source) {
strcpy2(aBuffer, source); //This will copy only the first four or eight bytes (depending on the size of a pointer on your system), no matter how big you make BUFFER_SIZE!
}
I hope, I enticed you to google for array-pointer-decay now...
The truth is, that the C language relies heavily on the fact that no array size is required to correctly access an array element, only the surrounding loops need to know the size. As such, arrays decay to pure pointers in many places, and once they are decayed, there is no bringing back the size of the array. This brings a great deal of flexibility and simplicity to the language (very easy handling of subarrays!), but it also makes a function that behaves like your #define impossible.
Technical reason is: in C the buffer size cannot be checked automatically, because it is not managed by the language. Functions like strcpy operate on pointers, and though pointers point to buffers, there is no way for strcpy implementation to know how long a buffer is. Your suggestion of using sizeof does not work, since sizeof returns the object size, not the size of the buffer a pointer points to. (In your example it would return always the same number, most probably 4 or 8).
C language makes programmer responsible for managing buffer sizes, so one can use functions like strncpy and pass the buffer size explicitly. But it will never be possible to implement safe version of strcpy in C, since it would require fundamental changes in the way the language treats pointers.
All of it applies to C descendants like C++ of Objective C too.
#include <stdlib.h>
char* x;
if (!asprintf(&x, "%s", y)) {
perror("asprintf");
exit(1);
}
// from here, x will contain the content of y
Under the assumption, that y is Null terminated, this works safely.
(Written a on tablet, so forgive any silly errors, please.)

In C, what is a safer function to use than strtrns?

strtrns has the following descriptions: desc-1, desc-2
The strtrns() function transforms string and copies it into
result.
Any character that appears in old is replaced with the character in
the same position in new. The new result is returned. .........
This function is a security risk because it is possible to overflow
the newString buffer. If the currentString buffer is larger than the
newString buffer, then an overflow will occur.
And this is its prototype( or "signature"? ":
char * strtrns(const char *string, const char *old, const char *new, char *result);
I've been googling to no avail. I appreciate any tips or advice.
I think you can write your own safe one pretty quickly.
It won't be a direct replacement, as the signature is slightly different, and it will allocate memory that the caller must free, but it can serve mostly the same job.
(I'm also changing the parameter name new, which is a reserved word in C++, and the parameter string which is a very common type in C++. These changes makes the function compatible with C++ code as well)
char* alloc_strtrns(const char *srcstr, const char *oldtxt, const char *newtxt)
{
if (strlen(oldtxt) != strlen(newtxt))
{
return NULL; /* Old and New lengths MUST match */
}
char* result = strdup(srcstr); /* TODO: check for NULL */
/* Caller is responsible for freeing! */
return strtrns(srcstr, oldtxt, newtxt, result);
}
The claim that this function is unsafe is nonsense. In C, whenever you have an interface that takes a pointer to a buffer and fills it with some amount of data, you must have a contract between the caller and callee regarding the buffer size. For some functions where the caller cannot know in advance how much data the callee will write, the most logical interface design (contract) is to have the caller pass the buffer size to the callee and have the callee return an error or truncate the data if the buffer is too small. But for functions like strcpy or in your case strtrns where the number of output bytes is a trivial function (like the identity function) of the number of input bytes, it makes perfectly good sense for the contract to simply be that the output buffer provided by the caller must be at least as large as the input buffer.
Anyone who is not comfortable with strict adherence to interface contracts should not be writing C. There is really no way around this; adding complex bounds-checking interfaces certainly does not solve the problem but just shifts around the nature of the contracts you have to follow.
By the way, strtrns is not a standard function anyway so if you'd prefer a different contract anyway you might be better off writing your own similar function. This would increase portability too.
You don't really have any options in C. You simply have to ensure that the destination buffer is large enough.

Dealing with returning C strings

What is considered better practice when writing methods that return strings in C?
passing in a buffer and size:
void example_m_a(type_a a,char * buff,size_t buff_size)
or making and returning a string of proper size:
char * example_m_b(type_a a)
P.S. what do you think about returning the buffer ptr to allow assignment style and
nested function calls i.e.
char * example_m_a(type_a a,char * buff,size_t buff_size)
{
...
return buff;
}
Passing a buffer as an argument solves most the problems this type of code can run into.
If it returns a pointer to a buffer, then you need to decide how it is allocated and if the caller is responsible for freeing it. The function could return a static pointer that doesn't need to be freed, but then it isn't thread safe.
Passing a buffer and a size is generally less error-prone, especially if the sizes of your strings are typically of a "reasonable" size. If you dynamically allocate memory and return a pointer, the caller is responsible for freeing the memory (and must remember to use the corresponding free function for the memory depending on how the function allocated it).
If you examine large C APIs such as Win32, you will find that virtually all functions that return strings use the first form where the caller passes a buffer and a size. Only in limited circumstances might you find the second form where the function allocates the return value (I can't think of any at the moment).
I'd prefer the second option because it allows the function to decide how big a buffer is needed. Often the caller is not in a position to take that decision.
Another alternative to the pass a buffer and size style, using a return code:
size_t example_m_a(type_a a,char * buff,size_t buff_size)
A zero return code indicates that the caller's buffer was suitable and has been filled in.
A return code > 0 indicates that the caller's buffer was too small and reveals the size that is actually needed, allowing the caller to resize his buffer and retry.
Passing buffer address and length is best in most cases. It is less error-prone and one does not have to worry about memory leaks. In fact, in some tight embedded systems it is completely undesirable to use the heap. However, the function must not overrun the buffer as that can crash the system and worse: make it vulnerable to hackers.
The only time where I've seen function returning allocated buffer is libxml's API to generate XML text from xmlDoc.

Disabling NUL-termination of strings in GCC

Is it possible to globally disable NUL-terminated strings in GCC?
I am using my own string library, and I have absolutely no need for the final NUL characters as it already stores the proper length internally in a struct.
However, if I wanted to append 10 strings, this would mean that 10 bytes are unnecessarily allocated on the stack. With wide strings it is even worse: As for x86, there are 40 bytes wasted; and for x86_64, 80 bytes!
I defined a macro to add those stack-allocated strings to my struct:
#define AppendString(ppDest, pSource) \
AppendSubString(ppDest, (*ppDest)->len + 1, pSource, 0, sizeof(pSource) - 1)
Using sizeof(...) - 1 works quite well but I am wondering whether I could get rid of NUL termination in order to save a few bytes?
This is pretty awful, but you can explicitly specify the length of every character array constant:
char my_constant[6] = "foobar";
assert(sizeof my_constant == 6);
wchar_t wide_constant[6] = L"foobar";
assert(sizeof wide_constant == 6*sizeof(wchar_t));
I understand you're only dealing with strings declared in your program:
....
char str1[10];
char str2[12];
....
and not with text buffers you allocate with malloc() and friends otherwise sizeof is not going to help you.
Anyway, i would just think twice about removing the \0 at the end: you would lose the compatibility with C standard library functions.
Unless you are going to rewrite any single string function for your library (sprintf, for example), are you sure you want to do it?
I can't remember the details, but when I do
char my_constant[5]
it is possible that it will reserve 8 bytes anyway, because some machines can't address the middle of a word.
It's nearly always best to leave this sort of thing to the compiler and let it handle the optmisation for you, unless there is a really really good reason to do so.
If you're not using any of the Standard Library function that deal with strings you can forget about the NUL terminating byte.
No strlen(), no fgets(), no atoi(), no strtoul(), no fopen(), no printf() with the %s conversion specifier ...
Declare your "not quite C strings" with just the needed space;
struct NotQuiteCString { /* ... */ };
struct NotQuiteCString variable;
variable.data = malloc(5);
data[0] = 'H'; /* ... */ data[4] = 'o'; /* "hello" */
Indeed this is only in case you are really low in memory. Otherwise I don't recommend to do so.
It seems most proper way to do thing you are talking about is:
To prepare some minimal 'listing' file in a form of:
string1_constant_name "str1"
string2_constant_name "str2"
...
To construct utility which processes your file and generates declarations such as
const char string1_constant[4] = "str1";
Of course I'd not recommend to do this by hands, because otherwise you can get in trouble after any string change.
So now you have both non-terminated strings because of fixed auto-generated arrays and also you have sizeof() for every variable. This solution seems acceptable.
Benefits are easy localization, possibility to add some level of checks to make this solution risk lower and R/O data segment savings.
Drawback is need to include all of such string constants in every module (as include to keep sizeof() known). So this only makes sense if your linker merges such symbols (some don't).
Aren't these similar to Pascal-style strings, or Hollerith Strings? I think this is only useful if you actually want the String data to preserve NULLs, in which you're really pushing around arbitrary memory, not "strings" per se.
The question uses false assumptions - it assumes that storing the length (e.g. implicitly by passing it as a number to a function) incurs no overhead, but that's not true.
While one might save space by not storing the 0-byte (or wchar), the size must be stored somewhere, and the example hints that it is passed as a constant argument to a function somewhere, which almost certainly takes more space, in code. If the same string is used multiple times, the overhead is per use, not per-string.
Having a wrapper that uses strlen to determine the length of a string and isn't inlined will almost certainly save more space.

Resources