C string definition of unknown length - c

I need to store data (Of previously unknown format/size) inside a string for processing later (To be stored in an XML file)
How do I do this?
As you can see, the code below will generate a segfault.
char * type;
char * output;
for (i=0; i< 10; i++){
if(strcmp(GTK_OBJECT_TYPE_NAME(g_hash_table_lookup(widgetbuffer,allocate[i])), "GtkAdjustment") == 0){
type = "spin";
sprintf(output, "%f", gtk_adjustment_get_value(GTK_ADJUSTMENT(g_hash_table_lookup(widgetbuffer,allocate[i]))));
}else if(strcmp(GTK_OBJECT_TYPE_NAME(g_hash_table_lookup(widgetbuffer,allocate[i])), "GtkCheckButton") == 0){
type = "check";
sprintf(output, "%d", gtk_toggle_button_get_active(GTK_TOGGLE_BUTTON(g_hash_table_lookup(widgetbuffer,allocate[i]))));
}else if(strcmp(GTK_OBJECT_TYPE_NAME(g_hash_table_lookup(widgetbuffer,allocate[i])), "GSList") == 0){
type = "radio"; // Loop through grouped buttons and find active one
sprintf(output, "%d", g_slist_position(g_hash_table_lookup(widgetbuffer,allocate[i]),
g_slist_find_custom(g_hash_table_lookup(widgetbuffer,allocate[i]),
NULL, (GCompareFunc) searchRadio)));
}else if(strcmp(GTK_OBJECT_TYPE_NAME(g_hash_table_lookup(widgetbuffer,allocate[i])), "GtkComboBox") == 0){
type = "combo";
sprintf(output, "%d", gtk_combo_box_get_active(GTK_COMBO_BOX(g_hash_table_lookup(widgetbuffer,allocate[i]))));
}else if(strcmp(GTK_OBJECT_TYPE_NAME(g_hash_table_lookup(widgetbuffer,allocate[i])), "GtkEntry") == 0){
type = "entry";
output = (char *) gtk_entry_get_text(GTK_ENTRY(g_hash_table_lookup(widgetbuffer,allocate[i])));
}
[...]

In general, to "store data of an unknown size" in a C string, you have two choices:
Allocate a buffer large enough to hold any expected size (which means you would truncate the data if it exceeds that size), or
Dynamically allocate a buffer (using malloc()) that is large enough to hold the data. Don't forget to free() it, too.
Your code is using an uninitialised pointer output and that is why it is segfaulting. You will need to do one of the above.

If your platform has snprintf or similar (most do), then you want something like this:
int n = snprintf( NULL, 0, "%s is %d", somestring, someinteger );
char * p = malloc( n + 1 );
sprintf( p, "%s is %d", somestring, someinteger );
The first call to snprintf returns how many chars would be needed to store the formatted output, but doesn't actually do any formatting. Then you allocate the space needed and do the real formatting.

You are not allocating output and that is precisely why you are getting segmentation faults. As it is in the code provided, output has not been initialized. Your compiler should warn you of this.
If you know a safe maximum size, you can simply allocate it on the stack:
char output[512];
...if the maximum size was 512 bytes. Otherwise, you can look at malloc to allocate memory off the heap.

In addition to pre-allocate a big enough buffer or to truncate the output to size:
Pre-process the generation (no writes, just count the lengthes) and allocate a suitable buffer for the second (really write) pass
Write to 'unlimited' storage: a file

Related

Declare string in C without giving size

I want to concatenate in a string multiple sentences. At the moment my buffer is with fixed size 100, but I do not know the total count of the sentences to concatenate and this size can be not enough in the future. How can I define a string without defining its size?
char buffer[100];
int offset = sprintf (buffer, "%d plus %d is %d", 5, 3, 5+3);
offset += sprintf (buffer + offset, " and %d minus %d is %d", 6, 3, 6-3);
offset += sprintf (buffer + offset, " even more");
printf ("[%s]",buffer);
This is a fundamental aspect of C. C never does automatic management of dynamically-constructed strings for you — this is always your responsibility.
Here is an outline of four different techniques you might use. You can ask additional questions about any of these that aren't clear.
Run through your string-construction process twice. Make one pass to collect the lengths of all the substrings, then call malloc to allocate a buffer of the computed size, then make a second pass to actually construct your string.
Allocate a smallish (or empty) initial buffer with malloc, and then, each time you're about to append a new substring to it, check the buffer's size, and if necessary grow it bigger using realloc. (In this case I always use three variables: (1) pointer to buffer, (2) allocated size of buffer, (3) number of characters currently in buffer. The goal is to always keep (2) ≥ (3).)
Allocate a dynamically-growing "memstream" and use fprintf or the like to "print" to it. This is an ideal technique, although memstreams are not standard and not supported on all platforms, and dynamically-allocating memstreams are even more exotic and less common. (It's possible to write your own, but it's a lot of work.) You can open a fixed-size memstream using fmemopen (although this is not what you want), and you can open the holy grail, a dynamically-allocating memstream (which is what you want) using open_memstream, if you have it. Both are documented on this man page. (This technique is analogous to stringstream in C++.)
The "better to beg forgiveness than ask permission" technique. You can allocate a buffer which you're pretty sure is amply big enough, then blindly stuff all your substrings into it, then at the very end call strlen on it and, if you guessed wrong and the string is longer than the buffer you allocated, print a big scary noisy error message and abort. This is a blunt and risky technique, not one you'd use in a production program. It's possible that when you overflow the buffer, you damage things in a way that causes the program to crash before it has a chance to perform its belated check-and-maybe-exit step at all. If you used this technique at all, it would be considerably "safer" (that is, considerably less likely to prematurely crash before the check) if you allocated the buffer using malloc, than if you declared it as an ordinary, fixed-size array (whether static or local).
Personally, I've used all four of these. Out in the rest of the world, numbers 1 and 2 are both in common use by just about everybody. Comparing them: number 1 is simple and a bit easier but has an uncomfortable amount of code replication (and might therefore be brittle if new strings are added later); number 2 is more robust but obviously requires you to be comfortable with the way realloc works (and this technique can be less than robust in its own way, if you've got any auxiliary pointers into your buffer which need to be ponderously relocated each time realloc is called).
Number 3 is an "exotic" technique: theoretically almost ideal, but definitely more sophisticated and necessitating some extra support, since nothing like open_memstream is standard.
And number 4 is obviously a risky and inherently not reliable technique, which you'd use — if at all — only in throwaway or prototype code, never in production.
You can use the snprintf function with NULL for the first argument and 0 for the second get the size that a formatted string would be. You can then allocate the space dynamically and call snprintf again to actually build the string.
char *buffer = NULL;
int len, offset = 0;
len = snprintf (NULL, 0, "%d plus %d is %d", 5, 3, 5+3);
buffer = realloc(buffer, offset + len + 1);
offset = sprintf (buffer + offset, "%d plus %d is %d", 5, 3, 5+3);
len = snprintf (NULL, 0, " and %d minus %d is %d", 6, 3, 6-3);
buffer = realloc(buffer, offset + len + 1);
offset += sprintf (buffer + offset, " and %d minus %d is %d", 6, 3, 6-3);
len = snprintf (NULL, 0, " even more");
buffer = realloc(buffer, offset + len + 1);
offset += sprintf (buffer + offset, " even more");
printf ("[%s]",buffer);
Note that this implementation omits checking on realloc and snprintf for brevity. It also repeats the format string and arguments. The following function addresses these shortcomings:
int append_buffer(char **buffer, int *offset, const char *format, ...)
{
va_list args;
int len;
va_start(args, format);
len = vsnprintf(NULL, 0, format, args);
if (len < 0) {
perror("vsnprintf failed");
return 0;
}
va_end(args);
char *tmp = realloc(*buffer, *offset + len + 1);
if (!tmp) {
perror("realloc failed");
return 0;
}
*buffer = tmp;
va_start(args, format);
*offset = vsprintf(*buffer + *offset, format, args);
if (len < 0) {
perror("vsnprintf failed");
return 0;
}
va_end(args);
return 1;
}
Which you can then call like this:
char *buffer = NULL;
int offset = 0;
int rval;
rval = append_buffer(&buffer, &offset, "%d plus %d is %d", 5, 3, 5+3);
if (!rval) return 1;
rval = append_buffer(&buffer, &offset, " and %d minus %d is %d", 6, 3, 6-3);
if (!rval) return 1;
rval = append_buffer(&buffer, &offset, " even more");
if (!rval) return 1;
printf ("[%s]",buffer);
free(buffer);

Array Not Filling Properly

I am trying to deconstruct a document into its respective paragraphs, and input each paragraphs, as a string, into an array. However, each time a new value is added, it overwrites all previous values in the array. The last "paragraph" read (as denoted by newline) is the value of each non-null value of the array.
Here is the code:
char buffer[MAX_SIZE];
char **paragraphs = (char**)malloc(MAX_SIZE * sizeof(char*));
int pp = 0;
int i;
FILE *doc;
doc = fopen(argv[1], "r+");
assert(doc);
while((i = fgets(buffer, sizeof(buffer), doc) != NULL)) {
if(strncmp(buffer, "\n", sizeof(buffer))) {
paragraphs[pp++] = (char*)buffer;
}
}
printf("pp: %d\n", pp);
for(i = 0; i < MAX_SIZE && paragraphs[i] != NULL; i++) {
printf("paragraphs[%d]: %s", i, paragraphs[i]);
}
The output I receive is:
pp: 4
paragraphs[0]: paragraph four
paragraphs[1]: paragraph four
paragraphs[2]: paragraph four
paragraphs[3]: paragraph four
when the program is run as follows: ./prog.out doc.txt, where doc.txt is:
paragraph one
paragraph two
paragraph three
paragraph four
The behavior of the program is otherwise desired. The paragraph count works properly, ignoring the line that contains ONLY the newline character (line 4).
I assume the problem occurs in the while loop, however am unsure how to remedy the problem.
Your solution is pretty sound. Your Paragraph array is supposed to hold each paragraph, and since each paragraph element is just a small 4 bytes pointer you can afford to define a reasonable max number of them. However, since this max number is a constant, it is of little use to allocate the array dynamically.
The only meaningful use of dynamic allocation would be to read the whole text once to count the actual number of paragraphs, allocate the array accordingly and re-read the whole file a second time, but I doubt this is worth the effort.
The downside of using fixed-size paragraph array is that you must stop filling it once you reach the maximal number of elements.
You can then re-allocate a bigger array if you absolutely want to be able to process the whole Bible, but for an educational exercise I think it's reasonable to just stop recording paragraphs (thus producing a code that can store and count paragraphs up to a maximal number).
The real trouble with your code is, you don't store the paragraph contents anywhere. When you read the actual lines, it's always inside the same buffer, so each paragraph will point to the same string, which will contain the last paragraph read.
The solution is to make a unique copy of the buffer and have the current paragraph point to that.
C being already messy enough as it is, I suggest using the strdup() function, which duplicates a string (basically computing string length, allocating sufficient memory, copying the string into it and returning the new block of memory holding the new copy). You just need to remember to free this new copy once you're done using it (in your case at the end of your program).
This is not the most time-efficient solution, since each string will require a strlen and a malloc performed internally by strdump while you could have pre-allocated a big buffer for all paragraphs, but it is certainly simpler and probably more memory-efficient (only the minimal amount of memory will be allocated for each string, though each malloc consumes a few extra bytes for internal allocator housekeeping).
The bloody awkward fgets also stores the trailing \n at the end of the line, so you'll probably want to get rid of that.
Your last display loop would be simpler, more robust and more efficient if you simply used pp as a limit, instead of checking uninitialized paragraphs.
Lastly, you'd better define two different constants for max line size and max number of paragraphs. Using the same value for both makes little sense, unless you're processing perfectly square texts :).
#define MAX_LINE_SIZE 82 // max nr of characters in a line (including trailing \n and \0)
#define MAX_PARAGRAPHS 100 // max number of paragraphs in a file
void main (void)
{
char buffer[MAX_LINE_SIZE];
char * paragraphs[MAX_PARAGRAPHS];
int pp = 0;
int i;
FILE *doc;
doc = fopen(argv[1], "r+");
assert(doc != NULL);
while((fgets(buffer, sizeof(buffer), doc) != NULL)) {
if (pp != MAX_PARAGRAPHS // make sure we don't overflow our paragraphs array
&& strcmp(buffer, "\n")) {
// fgets awkwardly collects the ending \n, so get rid of it
if (buffer[strlen(buffer)-1] == '\n') buffer[strlen(buffer)-1] = '\0';
// current paragraph references a unique copy of the actual text
paragraphs[pp++] = strdup (buffer);
}
}
printf("pp: %d\n", pp);
for(i = 0; i != pp; i++) {
printf("paragraphs[%d]: %s", i, paragraphs[i]);
free(paragraphs[i]); // release memory allocated by strdup
}
}
What is the proper way to allocate the necessary memory? Is the malloc on line 2 not enough?
No, you need to allocate memory for the 2D array of strings you created. The following will not work as coded.
char **paragraphs = (char**)malloc(MAX_SIZE * sizeof(char*));
If you have: (for a simple explanation)
char **array = {0}; //array of C strings, before memory is allocation
Then you can create memory for it like this:
int main(void)
{
int numStrings = 10;// for example, change as necessary
int maxLen = MAX_SIZE; //for example, change as necessary
char **array {0};
array = allocMemory(array, numStrings, maxLen);
//use the array, then free it
freeMemory(array, numStrings);
return 0;
}
char ** allocMemory(char ** a, int numStrings, int maxStrLen)
{
int i;
a = calloc(sizeof(char*)*(numStrings+1), sizeof(char*));
for(i=0;i<numStrings; i++)
{
a[i] = calloc(sizeof(char)*maxStrLen + 1, sizeof(char));
}
return a;
}
void freeMemory(char ** a, int numStrings)
{
int i;
for(i=0;i<numStrings; i++)
if(a[i]) free(a[i]);
free(a);
}
Note: you can determine the number of lines in a file several ways, One way for example, by FILE *fp = fopen(filepath, "r");, then calling ret = fgets(lineBuf, lineLen, fp) in a loop until ret == EOF, keeping count of an index value for each loop. Then fclose(). (which you did not do either) This necessary step is not included in the code example above, but you can add it if that is the approach you want to use.
Once you have memory allocated, Change the following in your code:
paragraphs[pp++] = (char*)buffer;
To:
strcpy(paragraphs[pp++], buffer);//no need to cast buffer, it is already char *
Also, do not forget to call fclose() when you are finished with the open file.

Estimate size of formatted snprintf() string?

I'm considering writing a function to estimate at least the full length of a formatted string coming from the sprintf(), snprintf() functions.
My approach was to parse the format string to find the various %s, %d, %f, %p args, creating a running sum of strlen()s, itoa()s, and strlen(format_string) to get something guaranteed to be big enough to allocate a proper buffer for snprintf().
I'm aware the following works, but it takes 10X as long, as all the printf() functions are very flexible, but very slow because if it.
char c;
int required_buffer_size = snprintf(&c, 1, "format string", args...);
Has this already been done ? - via the suggested approach, or some other reasonably efficient approach - IE: 5-50X faster than sprintf() variants?
Allocate a big enough buffer first and check if it was long enough. If it wasn't reallocate and call a second time.
int len = 200; /* Any number well chosen for the application to cover most cases */
int need;
char *buff = NULL;
do {
need = len+1;
buff = realloc(buff, need); /* I don't care for return value NULL */
len = snprintf(buff, need, "...", ....);
/* Error check for ret < 0 */
} while(len > need);
/* buff = realloc(buff, len+1); shrink memory block */
By choosing your initial value correctly you will have only one call to snprintf() in most cases and the little bit of over-allocation shouldn't be critical. If you're in a so tight environment that this overallocation is critical, then you have already other problems with the expensive allocation and formating.
In any case, you could still call a realloc() afterwards to shrink the allocated buffer to the exact size.
If the first argument to snprintf is NULL, the return value is the number of characters that would have been written.

reading an int and storing it in a char * buffer in C

I have a file with integers. I want to write in a buffer those integers as chars (its ascii number). Because it is part of a bigger project please do not post different but please help me on that. What I especially need is chars to be stored in a buffer of type char *.
These are my declarations.
FILE *in;
long io_len = 1000;
char * buffer;
in=fopen("input.txt","a+");
buffer = malloc(io_len * sizeof(*buffer));
if(buffer == NULL){
perror("malloc");
exit(EXIT_FAILURE);
}
I am figuring out 2 sollutions.
If I write this one:
read_ret = read(in, buffer, io_len);
it reads from file in, io_len bytes and stores them in buffer. But it reads characters. So for example if I write 123 it will write to buffer 1,2,3 not the character with ascii number 123.
So I did this:
while((fscanf(in,"%d", &i))==1){
printf(": %d\n", i);
}
which reads the integers as I want. Now I am a little bit confused on how I will store them in buffer, as characters. I have tried this but it get me a segmentation fault.
while((fscanf(in,"%d", &i))==1){
printf(": %d\n", i);
buffer=(char) i;
printf("Character in Buffer:%s\n",buffer);
buffer++;
}
Have in mind that later in my file I am writing my buffer somewhere else, so whatever I will do I want the pointer to be at the start of my char array(if it makes sense what I am saying)
Your final code should at least give you a warning about assigning an integer to a pointer in the line buffer=(char) i;. It looks like you want to dereference the pointer.
You are also printing a string when it looks like you really only want to print a character at a time.
Your code should probably look like this:
int character_index = 0;
while((fscanf(in,"%d", &i))==1){
printf(": %d\n", i);
buffer[character_index]=(char) i;
printf("Character in Buffer:%c\n",buffer[character_index]);
character_index++;
}

memcpy vs strcat

Seems to be a basic question but I would rather ask this to clear up than spend many more days on this.I am trying to copy data in a buffer which I receive(recv call) which will be then pushed to a file. I want to use memcpy to continuously append/add data to the buffer until the size of buffer is not enough to hold more data where I than use the realloc. The code is as below.
int vl_packetSize = PAD_SIZE + (int)p_size - 1; // PAD_SIZE is the size of char array sent
//p_size is the size of data to be recv. Same data size is used by send
int p_currentSize = MAX_PROTO_BUFFER_SIZE;
int vl_newPacketSize = p_currentSize;
char *vl_data = (char *)malloc(vl_packetSize);
memset((char *)vl_data,'\0',vl_packetSize);
/* Allocate memory to the buffer */
vlBuffer = (char *)malloc(p_currentSize);
memset((char *)vlBuffer,'\0',p_currentSize);
char *vlBufferCopy = vlBuffer;
if(vlBuffer==NULL)
return ERR_NO_MEM;
/* The sender first sends a padding data of size PAD_SIZE followed by actual data. I want to ignore the pad hence do vl_data+PAD_SIZE on memcpy */
if((p_currentSize - vl_llLen) < (vl_packetSize-PAD_SIZE)){
vl_newPacketSize +=vl_newPacketSize;
char *vlTempBuffer = (char *)realloc(vlBufferCopy,(size_t)vl_newPacketSize);
if(vlTempBuffer == NULL){
if(debug > 1)
fprintf(stdout,"Realloc failed:%s...Control Thread\n\n",fn_strerror_r(errno,err_buff));
free((void *)vlBufferCopy);
free((void *)vl_data);
return ERR_NO_MEM;
}
vlBufferCopy = vlTempBuffer;
vl_bytesIns = vl_llLen;
vl_llLen = 0;
vlBuffer = vlBufferCopy+vl_bytesIns;
fprintf(stdout,"Buffer val after realloc:%s\n\n",vlBufferCopy);
}
memcpy(vlBuffer,vl_data+PAD_SIZE,vl_packetSize-PAD_SIZE);
/*
fprintf(stdout,"Buffer val before increment:%s\n\n",vlBuffer);
fprintf(stdout,"vl_data length:%d\n\n",strlen(vl_data+PAD_SIZE));
fprintf(stdout,"vlBuffer length:%d\n\n",strlen(vlBuffer));
*/
vlBuffer+=(vl_packetSize-PAD_SIZE);
vl_llLen += (vl_packetSize-PAD_SIZE);
vl_ifNotFlush = 1;
//fprintf(stdout,"Buffer val just before realloc:%s\n\n",vlBufferCopy);
}
Problem: Whan ever I fputs the data into the file later on. Only the first data recv/added to buffer is gets into the file.
Also when I print the value of vlBufferCopy(which points to first location of data returned by malloc or realloc) I get the same result.
If I decrease the size by 1, I see entire data in the file, but it somehow misses the new line character and hence the data is
not inserted in the proper format in the file.
I know it is because of trailing '\0' but some how reducing the size by 1
(vlBuffer+=(vl_packetSize-PAD_SIZE-1);)
misses the new line character. fputs while putting the data removes the trailing null character
Please let me know what I am missing here to check or in the logic
(Note: I tried using strcat:
strcat(vlBuffer,vl_data+PAD_SIZE);
but I wanted to use memcpy as it is faster and also it can be used for any kind of buffer and not only character pointer
Thanks
strcat and memcpy are very different functions.
I suggest you read the documentation of each.
Mainly, there are two differences:
1. memcpy copies data where you tell it to. strcat finds the end of the string, and copies there.
2. memcpy copies the number of bytes you request. strcat copies until the terminating null.
If you're dealing with packets of arbitrary contents, you have no use for strcat, or other string functions.
You need to write to the file in a binary-safe way. Check how to use fwrite instead of fputs. fwrite will copy all the buffer, even if there's a zero in the middle of it.
const char *mybuff= "Test1\0Test2";
const int mybuff_len = 11;
size_t copied = fwrite(mybuff, mybuff_len, 1, output_file);

Resources