Estimate size of formatted snprintf() string?

Estimate size of formatted snprintf() string? - c

I'm considering writing a function to estimate at least the full length of a formatted string coming from the sprintf(), snprintf() functions.
My approach was to parse the format string to find the various %s, %d, %f, %p args, creating a running sum of strlen()s, itoa()s, and strlen(format_string) to get something guaranteed to be big enough to allocate a proper buffer for snprintf().
I'm aware the following works, but it takes 10X as long, as all the printf() functions are very flexible, but very slow because if it.
char c;
int required_buffer_size = snprintf(&c, 1, "format string", args...);
Has this already been done ? - via the suggested approach, or some other reasonably efficient approach - IE: 5-50X faster than sprintf() variants?

Allocate a big enough buffer first and check if it was long enough. If it wasn't reallocate and call a second time.
int len = 200; /* Any number well chosen for the application to cover most cases */
int need;
char *buff = NULL;
do {
need = len+1;
buff = realloc(buff, need); /* I don't care for return value NULL */
len = snprintf(buff, need, "...", ....);
/* Error check for ret < 0 */
} while(len > need);
/* buff = realloc(buff, len+1); shrink memory block */
By choosing your initial value correctly you will have only one call to snprintf() in most cases and the little bit of over-allocation shouldn't be critical. If you're in a so tight environment that this overallocation is critical, then you have already other problems with the expensive allocation and formating.
In any case, you could still call a realloc() afterwards to shrink the allocated buffer to the exact size.

If the first argument to snprintf is NULL, the return value is the number of characters that would have been written.

Related

How to calculate the length of output that sprintf will generate?

Goal: serialize data to JSON.
Issue: i cant know beforehand how many chars long the integer is.
i thought a good way to do this is by using sprintf()
size_t length = sprintf(no_buff, "{data:%d}",12312);
char *buff = malloc(length);
snprintf(buff, length, "{data:%d}",12312);
//buff is passed on ...
Of course i can use a stack variable like char a[256] instead of no_buff.
Question: But is there in C a utility for disposable writes like the unix /dev/null?
Smth like this:
#define FORGET_ABOUT_THIS ...
size_t length = sprintf(FORGET_ABOUT_THIS, "{data:%d}",12312);
p.s. i know that i can also get the length of the integer through log but this ways seems nicer.

Since C is where simple language, there is no such thing as "disposable buffers" -- all memory management are on programmers shoulders (there is GNU C compiler extensions for these but they are not standard).
cant know beforehand how many chars long the integer is.
There is much easier solution for your problem. snprintf knows!
On C99-compatible platforms call snprintf with NULL as first argument:
ssize_t bufsz = snprintf(NULL, 0, "{data:%d}",12312);
char* buf = malloc(bufsz + 1);
snprintf(buf, bufsz + 1, "{data:%d}",12312);
...
free(buf);
In older Visual Studio versions (which have non-C99 compatible CRT), use _scprintf instead of snprintf(NULL, ...) call.

You can call int len = snprintf(NULL, 0, "{data:%d}", 12312) to test how much space you need.
snprintf will print at most size characters, where size is the second argument, and return how many characters would have been necessary to print the whole thing, not counting the terminating '\0'. Because you pass in 0, it won't actually write anything out (and thus will avoid any null pointer exception that would happen by trying to dereference NULL), but it will still return the length that is needed to fit the whole output, which you can use to allocate your buffer.
At that point you can allocate and print to your buffer, remembering to include one more for the trailing '\0':
char *buf = malloc(len + 1);
snprintf(buf, len + 1, "{data:%d}", 12312);

To just obtain the length you can write:
int length = snprintf(NULL, 0, "{data:%d}", 12312);
Note that the return type is int. It may return -1 in case of some sort of error. Make sure your input data doesn't include long strings that might cause the total length to exceed INT_MAX !

If you check the performance, you will running snprintf without an output buffer will take roughly the same time as a full invocation.
So I recommend you to use a smaller buffer just in case, and only call it a second time if the returned size exceeded the buffer size.
This uses C++'s std::string but I guess you can adapt it for your needs.
std::string format(const char* format, ...) {
va_list args;
va_start(args, format);
char smallBuffer[1024];
int size = vsnprintf(smallBuffer, sizeof smallBuffer, format, args);
va_end(args);
if (size < sizeof smallBuffer)
return std::string(smallBuffer);
char buffer[size + 1]; /* maybe malloc if it's too big */
va_start(args, format);
vsnprintf(buffer, sizeof buffer, format, args);
va_end(args);
return std::string(buffer);
}
This code will run 2x faster for strings under 1k compared to the longer ones.

Calling snprintf(nullptr, 0, ...) does return the size but it has performance penalty, because it will call IO_str_overflow and which is slow.
If you do care about performance, you can pre-allocate a dummy buffer and pass its pointer and size to ::snprintf. it will be several times faster than the nullptr version.
template<typename ...Args>
size_t get_len(const char* format, Args ...args) {
static char dummy[4096]; // you can change the default size
return ::snprintf(dummy, 4096, format, args...) + 1; // +1 for \0
}

Printf supports %n format parameter, which means "write position of %n in output string to int value pointed by x-parameter), so:
int x;snprintf(NULL, 0, "Message: %s%n", "Error!",&x);
Should works!

This isn't strictly an answer to your question, but you may find it helpful nonetheless. It is not portable, but if you're running this on glibc, you can simply use asprintf() instead, which will do the memory allocation for you.

Scanning string with length restriction

Using the standard C library, is there a way to scan a string (containing no whitespace) from standard input only if it fits in a buffer? In the following example I would like scanCount to be 0 if the input string is larger than 32:
char str[32];
int scanCount;
scanCount = scanf("%32s", str);
Edit: I also need file pointer rollback when the input string is too large.

You specified a requirement to only read if the whole data fits your buffer. This requirement makes no sense at all as it doesn't provide any functionality to your program. You can easily achieve the same sort of tasks without it. It also is not how operating systems present files to the user applications.
You can simply create a buffer of any size you see fit and then you can keep the data in the buffer until you can handle it, or you can do magic like actually resizing the buffer to accomodate more incoming data.
You can read any number of characters from a file using the ANSI fread() function:
size_t count;
char buffer[50];
count = fread(buffer, 1, sizeof buffer, stdin);
You can then see how many characters have actually been read by looking at the count variable, you can fill in the final NUL character if it's less than the buffer size or you can decide what to do next, if the whole buffer has been read and more data may be availabe. You could of course read sizeof buffer - 1 instead, to be able to always finalize the string. When the count is smaller than your specified value, feof() and ferror() can be used to see what happened. You can also look at the actual and check for a LF character to see how many lines you have read.
When using an enlarging buffer, you will need malloc() or just create a NULL pointer that will later be allocated using realloc():
/* Set initial size and offset. */
size_t offset = 0;
size_t size = 0;
char *buffer = NULL;
When you need to change the size of the buffer, you can use realloc():
/* Change the size. */
size = 100;
buffer = realloc(buffer, size);
(The first time it's equivalent to buffer = malloc(size).)
You can then read data into the buffer:
size_t count = fread(buffer + offset, 1, size - offset, stdin);
count += offset;
(The first time it's equivalent to fread(buffer, 1, size, stdin).)
When finished, you should free the buffer:
free(buffer);
At any time, you still have all the already read data somewhere in a buffer, so you can get back to it at any time, you just decouple the reading and processing, where the above examples are all about reading.
The processing then depends on what you need. You generally need to identify the start and end of the data that you want to extract.
Example start and end, where end means one character after the last one you want, so the arithmetics work better:
size_t start = 0;
size_t end = 10;
Extract the data (using bits of C99):
char data[end - start + 1];
memcpy(data, buffer + start, end - start);
data[end] = '\0';
Now you have a NUL-terminated string containing the data you wanted to extract. Sometimes you just assume start = 0 and then want to consume the data from the buffer to make place for new data:
char data[end + 1];
/* copy out the data */
memcpy(data, buffer, end);
/* move data between end end offset to the beginning */
memmove(buffer, buffer + end, offset - end);
/* adjust the offset accordingly */
offset -= end;
Now you have your data extracted but you still have the buffer ready with the rest of the data you haven't processed, yet. This effectively achieves what you wanted, as by keeping the data in an intermediate buffer, you're effectively peeking into an arbitrary part of the data received on input and taking out the data only if it fits your expectations, doing whatever else if they don't.Of course you should carefully test all return values to check for exceptional conditions and such stuff.
I personally would also turn all indexes in the examples into pointers directly to the memory and adjust the arithmetics accordingly, but not everyone enjoys pointer arithmetics as I do ;). I also tend to prefer low-level POSIX API over the intermetiate layer in form of the ANSI API. Ready to fix bugs or improve explanations, please comment.

Your comment that you need the file pointer reset on scan failure makes this impossible to do with scanf().
scanf() is basically specified as "fscanf( stdin, ... )", and fscanf() is defined to "[push] back at most one input character onto the input stream" (C99, footnote 242). (I assume this is for the same reason that ungetc() is only required to support one byte of push-back: So that it can be conveniently buffered in memory.)
*scanf() is a poor choice to read uncertain inputs, for the reason described above and several other shortcomings when it comes to recovery-from-error. Generally speaking, if there is any chance that the input might not conform to the expected format, read input into an internal memory buffer first and then parse it from there.

Just read and store one character too many, and test for that.
char str[34]; // 33 characters + NUL terminator
int scanCount = scanf("%33s", str);
if (scanCount > 0 && strlen(str) > 32)
{
scanCount = 0;
}

On scanning a stream such as stdin is only allowed to "put back" up to 1 char. So scanning 32 or 33 char and then undoing is not possible.
If your input could use ftell() and fseek() (Available when stdin is redirected), code could
long pos = ftell(input);
char str[32+1];
int scanCount;
scanCount = fscanf(input, "%32s", str);
if (scanCount != 1 || strlen(str) >= 32) {
fseek(input, pos, SEEK_SET);
scanCount = fscanf(input, some_new_format, ....);
}
Otherwise use fgets() to read a maximal line and use sscanf()
char buf[1024];
if (fget(buf, sizeof buf, stdin) == NULL) Handle_IOError_or_EOF();
char str[32+1];
int scanCount;
scanCount = sscanf(buf, "%32s", str);
if (scanCount != 1 || strlen(str) >= 32) {
scanCount = sscanf(buf, some_new_format, ....);
}

Reading numbers

Huge thanks to everyone that answered , i have realised that i suck a lot at this, i will take every answer into consideration and hopefully i will manage to compile something that is working

Some remarks:
Allocating 500 MB just in case doesn't seem like a good idea. A better approach would be to allocate a small amount of memory first, if it's not enough then allocate 2 times bigger memory, etc (this would work if you read the number on per-character basis).
Important: right after every (re)allocation, you have to check whether your malloc call succeeded (i.e. what it returns is not NULL), otherwise you cannot go any further.
what the first getchar() is for?
instead of using gets(), you could try to read the characters one-by-one, until you encounter something that is not a number, at which point you can assume that the number input has finished (that is the simplest way, obviously one can process user input differently).
adding '\0' for something that was read with gets() is not needed, afaik (for something that would be read character-by-character, that would make sense).
Last but not least, you should also take care of actually freeing the allocated memory (i.e. calling free() after you are done with num). Not doing so results in a memory leak.
(Update) printf("%c",num[0]); will only print the first character of the string num. If you want to print out the whole string, you should call printf("%s",num);

Well, there are quite a few problems with this code, none that necessarily have to do with reading big numbers. But you're still learning, so here we go. In order in which they appear in the code:
(Not really an error, but also not recommended): Casting the result of malloc is unnecessary, as outlined in this answer.
As the other answer states: allocating 500MB is probably way overkill, if you really need this much you can always add more, but you may want to start out with less (5KB, for example).
You should add a new-line at the end of your puts, or the output may end up in places where you don't expect it (i.e. much later).
(This is an error) Don't ever use gets: this page explains why.
You're checking if(num == NULL) after you've already used it (presumably to check if gets failed, but it will return NULL on failure, the num pointer itself won't be changed). You want to move this check up to right after the malloc.
After your NULL-check for num your code happily continues after the if, you'll want to add a return or exit inside the if's body.
There is a syntax error with your very last printf: you forgot the closing ].
When you decide to use fgets to get the user input, you can check if the last character in the string is a new-line. If it isn't then that means it couldn't fit the entire input into the string, so you will need to fgets some more. When the last character is a new-line you might want to remove that (use num[len]='\0'; trick that isn't necessary for gets, but is for fgets).
Instead of increasing the size of your buffer by just 1, you should grow it by a bit more than that: a common used value is to just double the current size. malloc, calloc and realloc are fairly expensive system-calls (performance-wise) and since you don't seem too fussed about memory-usage it can save a lot of time keeping these calls to a minimum.
An example of these recommendations:
size_t bufferSize = 5000, // start with 5K
inputLength = 0;
char * buffer = malloc(bufferSize);
if(buffer == NULL){
perror("No memory!");
exit(-1);
}
while(fgets(buffer, bufferSize, stdin) != NULL){
inputLength = strlen(buffer);
if(buffer[inputLength] != '\n'){ // last character was not a new-line
bufferSize *= 2; // double the buffer in size
char * tmp = realloc(buffer, bufferSize);
if(tmp == NULL){
perror("No memory!");
free(buffer);
exit(-1);
}
// reallocating didn't fail: continue with grown buffer
buffer = tmp;
}else{
break; // last character was a new-line: were done reading
}
}
Beware of bugs in the above code; I have only proved it correct, not tried it.
Finally, instead of re-inventing the wheel, you may want to take a look at the GNU Multiple Precision library which is specifically made for handling big numbers. If anything you can use it for inspiration.

This is how you could go about reading some really big numbers in. I have decided on your behalf that a 127 digit number is really big.
#include <stdio.h>
#include <stdlib.h>
#define BUFSIZE 128
int main()
{
int n, number, len;
char *num1 = malloc(BUFSIZE * sizeof (char));
if(num1==NULL){
puts("Not enough memory");
return 1;
}
char *num2 = malloc(BUFSIZE * sizeof (char));
if(num2==NULL){
puts("Not enough memory");
return 1;
}
puts("Please enter your first number");
fgets(num1, BUFSIZE, stdin);
puts("Please enter your second number");
fgets(num2, BUFSIZE, stdin);
printf("Your first number is: %s\n", num1);
printf("Your second number is: %s\n", num2);
free(num1);
free(num2);
return 0;
}
This should serve as a starting point for you.

memcpy vs strcat

Seems to be a basic question but I would rather ask this to clear up than spend many more days on this.I am trying to copy data in a buffer which I receive(recv call) which will be then pushed to a file. I want to use memcpy to continuously append/add data to the buffer until the size of buffer is not enough to hold more data where I than use the realloc. The code is as below.
int vl_packetSize = PAD_SIZE + (int)p_size - 1; // PAD_SIZE is the size of char array sent
//p_size is the size of data to be recv. Same data size is used by send
int p_currentSize = MAX_PROTO_BUFFER_SIZE;
int vl_newPacketSize = p_currentSize;
char *vl_data = (char *)malloc(vl_packetSize);
memset((char *)vl_data,'\0',vl_packetSize);
/* Allocate memory to the buffer */
vlBuffer = (char *)malloc(p_currentSize);
memset((char *)vlBuffer,'\0',p_currentSize);
char *vlBufferCopy = vlBuffer;
if(vlBuffer==NULL)
return ERR_NO_MEM;
/* The sender first sends a padding data of size PAD_SIZE followed by actual data. I want to ignore the pad hence do vl_data+PAD_SIZE on memcpy */
if((p_currentSize - vl_llLen) < (vl_packetSize-PAD_SIZE)){
vl_newPacketSize +=vl_newPacketSize;
char *vlTempBuffer = (char *)realloc(vlBufferCopy,(size_t)vl_newPacketSize);
if(vlTempBuffer == NULL){
if(debug > 1)
fprintf(stdout,"Realloc failed:%s...Control Thread\n\n",fn_strerror_r(errno,err_buff));
free((void *)vlBufferCopy);
free((void *)vl_data);
return ERR_NO_MEM;
}
vlBufferCopy = vlTempBuffer;
vl_bytesIns = vl_llLen;
vl_llLen = 0;
vlBuffer = vlBufferCopy+vl_bytesIns;
fprintf(stdout,"Buffer val after realloc:%s\n\n",vlBufferCopy);
}
memcpy(vlBuffer,vl_data+PAD_SIZE,vl_packetSize-PAD_SIZE);
/*
fprintf(stdout,"Buffer val before increment:%s\n\n",vlBuffer);
fprintf(stdout,"vl_data length:%d\n\n",strlen(vl_data+PAD_SIZE));
fprintf(stdout,"vlBuffer length:%d\n\n",strlen(vlBuffer));
*/
vlBuffer+=(vl_packetSize-PAD_SIZE);
vl_llLen += (vl_packetSize-PAD_SIZE);
vl_ifNotFlush = 1;
//fprintf(stdout,"Buffer val just before realloc:%s\n\n",vlBufferCopy);
}
Problem: Whan ever I fputs the data into the file later on. Only the first data recv/added to buffer is gets into the file.
Also when I print the value of vlBufferCopy(which points to first location of data returned by malloc or realloc) I get the same result.
If I decrease the size by 1, I see entire data in the file, but it somehow misses the new line character and hence the data is
not inserted in the proper format in the file.
I know it is because of trailing '\0' but some how reducing the size by 1
(vlBuffer+=(vl_packetSize-PAD_SIZE-1);)
misses the new line character. fputs while putting the data removes the trailing null character
Please let me know what I am missing here to check or in the logic
(Note: I tried using strcat:
strcat(vlBuffer,vl_data+PAD_SIZE);
but I wanted to use memcpy as it is faster and also it can be used for any kind of buffer and not only character pointer
Thanks

strcat and memcpy are very different functions.
I suggest you read the documentation of each.
Mainly, there are two differences:
1. memcpy copies data where you tell it to. strcat finds the end of the string, and copies there.
2. memcpy copies the number of bytes you request. strcat copies until the terminating null.
If you're dealing with packets of arbitrary contents, you have no use for strcat, or other string functions.

You need to write to the file in a binary-safe way. Check how to use fwrite instead of fputs. fwrite will copy all the buffer, even if there's a zero in the middle of it.
const char *mybuff= "Test1\0Test2";
const int mybuff_len = 11;
size_t copied = fwrite(mybuff, mybuff_len, 1, output_file);

Populating static string buffer in C via snprintf

I have some buffer and known size
#define BUFFER_SIZE 1024*1024
char buffer[BUFFER_SIZE];
I must populate this buffer with some complex string.
int populate_string(char *buffer) {
char *tbuffer = buffer;
size_t tsize = BUFFER_SIZE;
int rv;
rv = snprintf(tbuffer, tsize, "foobar %s %d %s %D", ...);
if (rv < 0) {
printf("snprintf() error");
return -1;
} else if (rv >= tsize) {
printf("overflow, increase buffer size");
return -1;
} else {
tsize -= rv;
tbuffer += rv;
}
// repeat snprintf's until string is fully populated
return 0;
}
So, I have three questions:
Is this the best way for dynamically populating static string?
Is my way of populating string safe?
How can I reduce number of lines? These return value checks take a lot of place, especially if there is lot of snprintfs.

Depends on what you do with this strings then :) Obvious alternative is to use linked lists instead.
Yes, it's safe.
Sometimes there is no need to check whether it snprintf error or overflow - so you can use only one if() check.

open_memstream and fmemopen might be of interest for you:
http://linux.die.net/man/3/open_memstream
You can populate your buffer by opening a file stream to the buffer, i.e.:
FILE *f=fmemopen(buffer,BUFFER_SIZE,"w");
fwrite(f,...);
Writing to this stream will yield an error (if you disable buffering) when you hit the ceiling of "buffer".
Also, you can use open_memstream to create a stream on a memory which automatically resizes to hold what you write to it.

It should be safe to use snprintf, assuming that the buffer length you are using is accurate. If the buffer length decreases, you would run into problems. Instead of using tsize to store the length of the buffer, have the caller pass in the buffer length as a parameter. That should make your function re-usable for different buffer sizes. You would still have to trust that the value provided by the caller is accurate, but I suppose you can't completely error-proof the function.
If you want to reduce your line count, combine multiple snprintf calls into one. That should reduce the number of error-checking blocks that are required. The drawback is you might run out of buffer midway through a string, but I don't think your current code protects against that either. To do that, you'd have to print to a temporary, internal string, measure that string's length, and then copy it to the output buffer if and only if there is enough room left.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Estimate size of formatted snprintf() string? - c

If the first argument to snprintf is NULL, the return value is the number of characters that would have been written.

Related

How to calculate the length of output that sprintf will generate?

Scanning string with length restriction

Reading numbers

memcpy vs strcat

Populating static string buffer in C via snprintf

Categories

Resources