How to calculate the length of output that sprintf will generate? - c

Goal: serialize data to JSON.
Issue: i cant know beforehand how many chars long the integer is.
i thought a good way to do this is by using sprintf()
size_t length = sprintf(no_buff, "{data:%d}",12312);
char *buff = malloc(length);
snprintf(buff, length, "{data:%d}",12312);
//buff is passed on ...
Of course i can use a stack variable like char a[256] instead of no_buff.
Question: But is there in C a utility for disposable writes like the unix /dev/null?
Smth like this:
#define FORGET_ABOUT_THIS ...
size_t length = sprintf(FORGET_ABOUT_THIS, "{data:%d}",12312);
p.s. i know that i can also get the length of the integer through log but this ways seems nicer.

Since C is where simple language, there is no such thing as "disposable buffers" -- all memory management are on programmers shoulders (there is GNU C compiler extensions for these but they are not standard).
cant know beforehand how many chars long the integer is.
There is much easier solution for your problem. snprintf knows!
On C99-compatible platforms call snprintf with NULL as first argument:
ssize_t bufsz = snprintf(NULL, 0, "{data:%d}",12312);
char* buf = malloc(bufsz + 1);
snprintf(buf, bufsz + 1, "{data:%d}",12312);
...
free(buf);
In older Visual Studio versions (which have non-C99 compatible CRT), use _scprintf instead of snprintf(NULL, ...) call.

You can call int len = snprintf(NULL, 0, "{data:%d}", 12312) to test how much space you need.
snprintf will print at most size characters, where size is the second argument, and return how many characters would have been necessary to print the whole thing, not counting the terminating '\0'. Because you pass in 0, it won't actually write anything out (and thus will avoid any null pointer exception that would happen by trying to dereference NULL), but it will still return the length that is needed to fit the whole output, which you can use to allocate your buffer.
At that point you can allocate and print to your buffer, remembering to include one more for the trailing '\0':
char *buf = malloc(len + 1);
snprintf(buf, len + 1, "{data:%d}", 12312);

To just obtain the length you can write:
int length = snprintf(NULL, 0, "{data:%d}", 12312);
Note that the return type is int. It may return -1 in case of some sort of error. Make sure your input data doesn't include long strings that might cause the total length to exceed INT_MAX !

If you check the performance, you will running snprintf without an output buffer will take roughly the same time as a full invocation.
So I recommend you to use a smaller buffer just in case, and only call it a second time if the returned size exceeded the buffer size.
This uses C++'s std::string but I guess you can adapt it for your needs.
std::string format(const char* format, ...) {
va_list args;
va_start(args, format);
char smallBuffer[1024];
int size = vsnprintf(smallBuffer, sizeof smallBuffer, format, args);
va_end(args);
if (size < sizeof smallBuffer)
return std::string(smallBuffer);
char buffer[size + 1]; /* maybe malloc if it's too big */
va_start(args, format);
vsnprintf(buffer, sizeof buffer, format, args);
va_end(args);
return std::string(buffer);
}
This code will run 2x faster for strings under 1k compared to the longer ones.

Calling snprintf(nullptr, 0, ...) does return the size but it has performance penalty, because it will call IO_str_overflow and which is slow.
If you do care about performance, you can pre-allocate a dummy buffer and pass its pointer and size to ::snprintf. it will be several times faster than the nullptr version.
template<typename ...Args>
size_t get_len(const char* format, Args ...args) {
static char dummy[4096]; // you can change the default size
return ::snprintf(dummy, 4096, format, args...) + 1; // +1 for \0
}

Printf supports %n format parameter, which means "write position of %n in output string to int value pointed by x-parameter), so:
int x;snprintf(NULL, 0, "Message: %s%n", "Error!",&x);
Should works!

This isn't strictly an answer to your question, but you may find it helpful nonetheless. It is not portable, but if you're running this on glibc, you can simply use asprintf() instead, which will do the memory allocation for you.

Related

Declare string in C without giving size

I want to concatenate in a string multiple sentences. At the moment my buffer is with fixed size 100, but I do not know the total count of the sentences to concatenate and this size can be not enough in the future. How can I define a string without defining its size?
char buffer[100];
int offset = sprintf (buffer, "%d plus %d is %d", 5, 3, 5+3);
offset += sprintf (buffer + offset, " and %d minus %d is %d", 6, 3, 6-3);
offset += sprintf (buffer + offset, " even more");
printf ("[%s]",buffer);
This is a fundamental aspect of C. C never does automatic management of dynamically-constructed strings for you — this is always your responsibility.
Here is an outline of four different techniques you might use. You can ask additional questions about any of these that aren't clear.
Run through your string-construction process twice. Make one pass to collect the lengths of all the substrings, then call malloc to allocate a buffer of the computed size, then make a second pass to actually construct your string.
Allocate a smallish (or empty) initial buffer with malloc, and then, each time you're about to append a new substring to it, check the buffer's size, and if necessary grow it bigger using realloc. (In this case I always use three variables: (1) pointer to buffer, (2) allocated size of buffer, (3) number of characters currently in buffer. The goal is to always keep (2) ≥ (3).)
Allocate a dynamically-growing "memstream" and use fprintf or the like to "print" to it. This is an ideal technique, although memstreams are not standard and not supported on all platforms, and dynamically-allocating memstreams are even more exotic and less common. (It's possible to write your own, but it's a lot of work.) You can open a fixed-size memstream using fmemopen (although this is not what you want), and you can open the holy grail, a dynamically-allocating memstream (which is what you want) using open_memstream, if you have it. Both are documented on this man page. (This technique is analogous to stringstream in C++.)
The "better to beg forgiveness than ask permission" technique. You can allocate a buffer which you're pretty sure is amply big enough, then blindly stuff all your substrings into it, then at the very end call strlen on it and, if you guessed wrong and the string is longer than the buffer you allocated, print a big scary noisy error message and abort. This is a blunt and risky technique, not one you'd use in a production program. It's possible that when you overflow the buffer, you damage things in a way that causes the program to crash before it has a chance to perform its belated check-and-maybe-exit step at all. If you used this technique at all, it would be considerably "safer" (that is, considerably less likely to prematurely crash before the check) if you allocated the buffer using malloc, than if you declared it as an ordinary, fixed-size array (whether static or local).
Personally, I've used all four of these. Out in the rest of the world, numbers 1 and 2 are both in common use by just about everybody. Comparing them: number 1 is simple and a bit easier but has an uncomfortable amount of code replication (and might therefore be brittle if new strings are added later); number 2 is more robust but obviously requires you to be comfortable with the way realloc works (and this technique can be less than robust in its own way, if you've got any auxiliary pointers into your buffer which need to be ponderously relocated each time realloc is called).
Number 3 is an "exotic" technique: theoretically almost ideal, but definitely more sophisticated and necessitating some extra support, since nothing like open_memstream is standard.
And number 4 is obviously a risky and inherently not reliable technique, which you'd use — if at all — only in throwaway or prototype code, never in production.
You can use the snprintf function with NULL for the first argument and 0 for the second get the size that a formatted string would be. You can then allocate the space dynamically and call snprintf again to actually build the string.
char *buffer = NULL;
int len, offset = 0;
len = snprintf (NULL, 0, "%d plus %d is %d", 5, 3, 5+3);
buffer = realloc(buffer, offset + len + 1);
offset = sprintf (buffer + offset, "%d plus %d is %d", 5, 3, 5+3);
len = snprintf (NULL, 0, " and %d minus %d is %d", 6, 3, 6-3);
buffer = realloc(buffer, offset + len + 1);
offset += sprintf (buffer + offset, " and %d minus %d is %d", 6, 3, 6-3);
len = snprintf (NULL, 0, " even more");
buffer = realloc(buffer, offset + len + 1);
offset += sprintf (buffer + offset, " even more");
printf ("[%s]",buffer);
Note that this implementation omits checking on realloc and snprintf for brevity. It also repeats the format string and arguments. The following function addresses these shortcomings:
int append_buffer(char **buffer, int *offset, const char *format, ...)
{
va_list args;
int len;
va_start(args, format);
len = vsnprintf(NULL, 0, format, args);
if (len < 0) {
perror("vsnprintf failed");
return 0;
}
va_end(args);
char *tmp = realloc(*buffer, *offset + len + 1);
if (!tmp) {
perror("realloc failed");
return 0;
}
*buffer = tmp;
va_start(args, format);
*offset = vsprintf(*buffer + *offset, format, args);
if (len < 0) {
perror("vsnprintf failed");
return 0;
}
va_end(args);
return 1;
}
Which you can then call like this:
char *buffer = NULL;
int offset = 0;
int rval;
rval = append_buffer(&buffer, &offset, "%d plus %d is %d", 5, 3, 5+3);
if (!rval) return 1;
rval = append_buffer(&buffer, &offset, " and %d minus %d is %d", 6, 3, 6-3);
if (!rval) return 1;
rval = append_buffer(&buffer, &offset, " even more");
if (!rval) return 1;
printf ("[%s]",buffer);
free(buffer);

How to avoid calling fopen() with a buffer that is not null-terminated in C?

Let's look at this example:
static FILE *open_file(const char *file_path)
{
char buf[80];
size_t n = snprintf(buf, sizeof (buf), "%s", file_path);
assert(n < sizeof (buf));
return fopen(buf, "r");
}
Here, the assert() is off-by-one. From the manpage for snprintf:
"Upon successful return, these functions return the number of characters printed (excluding the null byte used to end output to strings)."
So, if it returns 80, then the string will fill the buffer, and won't be terminated by \0. This will cause a problem because fopen() assumes it is null terminated.
What is the best way to prevent this?
So, if it returns 80, then the string will fill the buffer, and won't be terminated by \0
That is incorrect: the string would be null-terminated no matter what you pass for file_path. Obviously, the string would be cut off at the sizeof(buf)-1.
Note that snprintf could return a number above 80 as well. This would mean that the string you wanted to print was longer than the buffer you have provided.
What is the best way to prevent this?
You are already doing it: the assert is not necessary for preventing unterminated strings. You can use the return value to decide if any truncation has happened, and pass a larger buffer to compensate:
// Figure out the size
size_t n = snprintf(NULL, 0, "%s", file_path);
// Allocate the buffer and print into it
char *tmpBuf = malloc(n+1);
snprintf(tmpBuf, n+1, "%s", file_path);
// Prepare the file to return
FILE *res = fopen(tmpBuf, "r");
// Free the temporary buffer
free(tmpBuf);
return res;
What is the best way to prevent this?
Simple, don't give it a non-null terminated string. Academic questions aside, you are in control of the code you write. You don't have to protect against yourself sabotaging the project in every conceivable way, you just have to not sabotage yourself.
If everyone checked and double checked everything in code, the performance loss would be incredible. There's a reason why fopen doesn't do it.
There are a couple of issues here.
First assert() is used to catch issues as a part of designer testing. It is not meant to be used in production code.
Secondly if the file path is not complete then do you really want to call fopen()?
Normally what is done is to add one to the expected number of characters.
static FILE *open_file(const char *file_path)
{
char buf[80 + 1] = {0};
size_t n = snprintf(buf, 80, "%s", file_path);
assert(n < sizeof (buf));
return fopen(buf, "r");
}

How to compute how much of my buffer is used (size) for write()?

char buf[256];
sprintf(buf, "It was %s\r\n", weather);
write(p->fd, buf, sizeof(buf));
The code above is a snippet of a large project.
buf is used to hold a number of different strings of different length. How do I know what to put in the write function? sizeof() just gives 256 I believe, because write just spits out a bunch of extra garbage characters.
The version of the code using len should be:
char buf[256];
int len = snprintf(buf, sizeof buf, "It was %s\r\n", weather);
if ( len < 0 || len >= sizeof buf )
// error handling, abort...
write(p->fd, buf, len);
Using sprintf is risky as it may cause a buffer overflow if weather is not fully under your control.
As mentioned in its documentation, the sprintf family can return a negative value if there is an error; and the returned value can be larger than the buffer size if the write would have not fitted in the buffer.
Another option covered by other answers is to omit checking len, and instead use strlen to find the length to send. Some would consider that unnecessarily inefficient. Further, in that case you should really check len anyway in case encoding fails which would result in strlen operating on garbage.
Solution is a combination of memset and strlen or sprintf return
char buf[256];
memset(buf,'\0',sizeof(buf));
sprintf(buf, "It was %s\r\n", weather);
write(p->fd, buf, strlen(buf));
OR
char buf[256];
memset(buf,'\0',sizeof(buf));
int len = sprintf(buf, "It was %s\r\n", weather);
write(p->fd, buf, len);
You are correct that sizeof() does not do what you want.
How you determine how much valid data is actually in your buffer depends on how you put the data there. In your particular case, you could either use the return value of sprintf(), or you could use strlen() on the buffer. I would recommend the former, since sprintf() is going to return that value whether you use it or not. Either of those alternatives will exclude the string's trailing null byte, so be sure to add one to the length if you want to write that null, too.
If elsewhere you are filling the buffer by other means, then the appropriate mechanism for determining how many bytes to write may differ.

Estimate size of formatted snprintf() string?

I'm considering writing a function to estimate at least the full length of a formatted string coming from the sprintf(), snprintf() functions.
My approach was to parse the format string to find the various %s, %d, %f, %p args, creating a running sum of strlen()s, itoa()s, and strlen(format_string) to get something guaranteed to be big enough to allocate a proper buffer for snprintf().
I'm aware the following works, but it takes 10X as long, as all the printf() functions are very flexible, but very slow because if it.
char c;
int required_buffer_size = snprintf(&c, 1, "format string", args...);
Has this already been done ? - via the suggested approach, or some other reasonably efficient approach - IE: 5-50X faster than sprintf() variants?
Allocate a big enough buffer first and check if it was long enough. If it wasn't reallocate and call a second time.
int len = 200; /* Any number well chosen for the application to cover most cases */
int need;
char *buff = NULL;
do {
need = len+1;
buff = realloc(buff, need); /* I don't care for return value NULL */
len = snprintf(buff, need, "...", ....);
/* Error check for ret < 0 */
} while(len > need);
/* buff = realloc(buff, len+1); shrink memory block */
By choosing your initial value correctly you will have only one call to snprintf() in most cases and the little bit of over-allocation shouldn't be critical. If you're in a so tight environment that this overallocation is critical, then you have already other problems with the expensive allocation and formating.
In any case, you could still call a realloc() afterwards to shrink the allocated buffer to the exact size.
If the first argument to snprintf is NULL, the return value is the number of characters that would have been written.

Reading strings and integers from one binary text in C

I'm using C and I want to read from a binaryFile.
I know that it is contain strings in the following way: Length of a string, the string itself, the length of a string, string itself, and so on...
I want to count the number of times which the string Str appears in the binary file.
So I want to do something like this:
int N;
while (!feof(file)){
if (fread(&N, sizeof(int), 1, file)==1)
...
Now I need to get the string itself. I know it's length. Should I do a 'for'
loop and get with fgetc char by char? I know I'm not allowed to use fscanf since
it's not a text file, but can I use fgetc? And would I get what I'm expecting for
my string? (To use dynamic allocation for char* for it with the size of the length
and use strcpy to add it to the current string?)
You could allocate some memory with malloc then fread into that buffer:
char *str;
/* ... */
if (fread(&N, sizeof(int), 1, file)==1)
{
/* check that N > 0 */
str = malloc(N+1);
if (fread(str, sizeof(char), N, file) == N)
{
str[N] = '\0'; /* terminate str */
printf("Read %d chars: %s\n", N, str);
}
free(str);
}
You should probably loop on:
while (fread(&N, sizeof(int), 1, file) == 1)
{
// Check N for sanity
char *buffer = malloc(N+1);
// Check malloc succeeded
if (fread(buffer, N, 1, file) != 1)
...process error...
buffer[N] = '\0'; // Null terminate for sanity's sake
...store buffer (the pointer) for later processing so you aren't leaking...
...or free it if you won't need it later...
}
You could use getc() or fgetc() in a loop; that would work. However, the direct fread() is much simpler (and is coded as if it uses getc() in a loop).
You might want to do some sanity checking on N before blindly using it with malloc(). In particular, negative values are likely to lead to much unhappiness.
The file format as written is tied to one class of machine — either big-endian or little-endian, and with the fixed size of int (probably 32-bits). Writing more portable data is slightly fiddlier, but eminently doable — but probably not relevant to you just yet.
Using feof() is seldom the correct way to test for whether to continue with a loop. Indeed, there is not often a need to use feof() in code. When it is used, it is because an I/O operation 'failed' and you need to disambiguate between 'it was not an error — just EOF' and 'there was some sort of error on the device'.

Resources