Parse values from a text file in C [duplicate] - c

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Parsing text in C
Say I have written to a text file in this format:
key1/value1
key2/value2
akey/withavalue
anotherkey/withanothervalue
I have a linked list like:
struct Node
{
char *key;
char *value;
struct Node *next;
};
to hold the values. How would I read key1 and value1? I was thinking of putting line by line in a buffer and using strtok(buffer, '/'). Would that work? What other ways could work, maybe a bit faster or less prone to error? Please include a code sample if you can!

Since your problem is a very good candidate for optimizing memory fragmentation, here is an implementation that uses some simple arcane magic to allocate all strings and the structure itself in a single piece of memory.
When destroying the node, you need only a single call to free(), to the node itself.
struct Node *list = NULL, **nextp = &list;
char buffer[1024];
while (fgets(buffer, sizeof buffer, file) != NULL) {
struct Node *node;
node = malloc(sizeof(struct Node) + strlen(buffer) + 1);
node->key = strtok(strcpy((char*)(node+1), buffer), "/\r\n");
node->value = strtok(NULL, "\r\n");
node->next = NULL;
*nextp = node;
nextp = &node->next;
}
Explanation:
With 20 comments and one unexplained downvote, I think that the code needs some explanation, specially with regards to the tricks employed:
Building a linked list:
struct Node *list = NULL, **nextp = &list;
...
*nextp = node;
nextp = &node->next;
This is a trick to create a linked list iteratively in forward order without having to special-case the head of the list. It uses a pointer-to-pointer to the next node. First the nextp pointer points to the list head pointer; in the first iteration, the list head is set through this pointer-to-pointer and then nextp is moved to the next pointer of that node. Subsequent iterations fill the next pointer of the last node.
Single allocation:
node = malloc(sizeof(struct Node) + strlen(buffer) + 1);
node->key = ... strcpy((char*)(node+1), buffer) ...
We have to deal with three pointers: the node itself, the key string and the value string. This usually would require three separate allocations (malloc, calloc, strdup...), and consequently free separate releases (free). Instead, in this case, the spaces of the tree elements are summed in sizeof(struct Node) + strlen(buffer) + 1 and passed to a single malloc call, which returns a single block of memory. The beginning of this block of memory is assigned to node, the structure itself. The additional memory (strlen(buffer)+1) comes right after the node, and it's address is obtained using pointer arithmetic using node+1. It is used to make a copy of the entire string read from the file ("key/value\n").
Since malloc is called a single time for each node, a single allocation is made. It means that you don't need to call free(node->key) and free(node->value). In fact, it won't work at all. Just a single free(node) will take care of deallocating the structure and both strings in one block.
Line parsing:
node->key = strtok(strcpy((char*)(node+1), buffer), "/\r\n");
node->value = strtok(NULL, "\r\n");
The first call to strtok returns the pointer to the beginning of the buffer itself. It looks for a '/' (additionally for end-of-line markers) and breaks the string there with a NUL character. So the "key/value\n" is broken in "key" and "value\n" with a NUL character in between, and a pointer to the first is returned and stored in node->key. The second call to strtok will work upon the remaining "value\n", strip the end-of-line marker and returning a pointer to "value", which is stored in node->value.
I hope this cleans all questions about the above solution... it is too much for a closed question. The complete test code is here.

You could also use fscanf to parse the input lines directly into keys and values:
char key[80], value[80];
fscanf (pFile, "%s/%s", key, value);
However, the drawback of this approach is that you need to allocate big enough buffers for the keys and values in advance (or use a temp buffer, then copy its value into its final destination, allocated with the right size). With strtok you can check the length of each key and value, then allocate a buffer of exactly the right size to store it.
Update: As commenters pointed out, another (potentially more serious) drawback of fscanf is that it doesn't work for strings containing whitespace.

If you wouldn't mind having aboth the key and the walue in one memory block (two strings), you can just read into a buffer, find the '/' change it into '\0' and point the value pointer just behind the '\0' character. Just don't forget to call free() only on the key value (this will free both the key and the value).

cycle through lines:
1) find '/'
2) devide pair key/value at the position '/'
3) fill key and value
in code:
char* p;
if(p = strchr(str,'/')
{
*p++ = 0;
key = strdup(str);
value = strdup(p);
}

Related

What is the intent of the pointer arithmetic in this code?

What does this line of code do, newnode -> item.string = (char *)newnode + sizeof(log_t);, in the below example?
int nodesize = sizeof(log_t) + strlen(data.string) + 1;
newnode = (log_t *)malloc(nodesize);
if (newnode == NULL) return -1;
// What is this line doing:
newnode -> item.string = (char *)newnode + sizeof(log_t);
strcpy(newnode -> item.string, data.string);
where newnode is the variable of type log_t (log_t has a variable named item of type data_t). data_t has a property char *string.
Is this code setting up the buffer size of string?
I'm assuming the memory for newnode was allocated with additional space for the string.
The result is the same, but I'd personally write that as newnode->item.string = (char*) &newnode[1];
That is, the storage space for string is immediately after the log_t object. This is sometimes done when a single chunk of memory has been allocated in advance, and objects and their members all point to memory in this chunk. It's been done in the past to cut down on the overhead of small memory allocations.
If log_t is 32 bytes, and the string is 16 bytes long (including the nul terminator!), you could malloc 48 bytes, point the string member to the 32nd byte of this memory allocation and copy the string there.
newnode -> item.string = (char *)newnode + sizeof(log_t);
The right side will take the pointer newnode, cast it to a character pointer then add the size of an object to it.
This gives a pointer n bytes beyond newnode, where n is the size of the log_t object.
It then places this pointer value into the string member of the item member of the object pointed to by newnode.
Without seeing the actual structures in use, it's a little hard to tell why this is being done but my best guess would be that it's to provide an efficient self-referential pointer.
In other words, the pointer within newnode will point to an actual part of the newnode itself, or part of a larger memory block that was allocated which contains a newnode object at the start of it. And, since you state that newnode is of the type log_t, it must be the latter case (a type cannot contain a copy of itself - it can contain a pointer to the type itself but not a actual copy).
An example of where this may come in handy is an object allocation where small sizes are satisfied completely by the object itself but larger ones are handled differently, such as with an int-to-string map entry:
typedef struct {
int id;
char *string;
} hdr_t;
typedef struct {
hdr_t hdr;
char smallBuff[31];
} entry_t;
In the case where you want to populate an entry_t variable with a 500-character string, you would allocate the string separately then just set up string to point to it.
However, for a string of thirty characters or less, you could just create it in smallBuff then set string to point to that instead (no separate memory needed). That would be done with:
entry_t *entry = malloc (sizeof (*entry)); // should check for NULL.
entry->hdr.id = 7;
entry->hdr.string = (char*)entry + sizeof (hdr_t);
strcpy (entry->hdr.string, "small string");
The third line in that sample above is very similar to what you have in your code.
Similarly (and probably more apropos to your case), you can allocate more memory than you need and use it:
typedef struct {
int id;
char *string;
} entry_t;
char *str = "small string";
entry_t *entry = malloc (sizeof (*entry) + strlen (str) + 1); // with extra bytes.
entry->id = 7;
entry->string = (char*)entry + sizeof (entry_t);
strcpy (entry->string, str);
It's doing pointer arithmetic. In C, when you add a value to a pointer, it moves the pointer by that value * the size of the type pointed to by the pointer.
What this is doing is casting newnode to a char*, so that the pointer arithmetic is done assuming that newnode is a char*, thus the size of the data it points to is 1. It them adds the sizeof(log_t) which is the size of type log_t. Based on your description of log_t, it looks like it contains a single char*, so its size would be the size of a pointer, either 4 or 8 bytes, depending on the architecture.
So, this will set newnode.item.string to be sizeof(log_t) bytes after address that newnode contains.

Allocating array of struct with array inside

I want to read users input combined of strings and numbers, like this:
50:string one
25:string two blablabla
...
I don't know how many of the lines the input will have and I also don't know maximum length of the strings.
So I created
typdedef struct line
{
int a
char *string
} line;
Then an array of this sturct
line *Array = NULL;
Now I have a cycle, that reads one line and parses it to temporaryString and temporaryA. How can I realloc the Array to copy these into the array?
There are two valid options to do what you want:
1) use the realloc() function; it's like malloc and calloc but you can realloc your memory, as the name can advise;
2) use a linked list;
The second is more complex than the first one, but is also really valid. In your case, a simple linked list could have the form:
typdedef struct line
{
int a;
char *string;
line *next;
//line *prev;
} line;
Everytime you add a node, you have to alloc a struct line with your new data, set next pointer to NULL or to itself, it's the same, and set the previous next pointer to the new data you created. That's a simpy method to do manually a realloc. The prev pointer is needed only if you need to go from the last item to the first; if you don't need this feature, just save the root pointer (the first one) and use only next pointer.
You could something like this (pseudo code).
idx = 0;
while (input = read()) {
temporaryString, temporaryA = parse(input);
Array = realloc(Array, (idx + 1)*sizeof(line));
Array[idx].a = temporaryA;
Array[idx].string = malloc(strlen(temporaryString) + 1);
strcpy(Array[idx].string, temporaryString);
idx++;
}

Making generic function to load items into linked list

OK beginning C programmer here. What I'm attempting to do is create a function that populates a linked list from a text file. What I've done so far is to use fgets() and strtok() to iterate through the text file and I'm trying to load the tokenized strings into a function to populate the linked list. First off, when I use strtok, how do I capture the tokenized strings into char arrays or strings? So far, I've tried something like this:
char catID[ID_LEN+1];
char drinkType[1];
char itemName[MAX_NAME_LEN + 1];
while((fgets(line, sizeof(line), menufile)) != NULL) {
token = strtok(line, "|");
strcpy(data,strdup(token));
addCatNode(menu, catID);
printf("%s\n", catID);
i++;
while(token){
if(token)
{
strcpy(drinkType,strdup(token));
addNodeItem(&menu, drinkType);
strcpy(itemName,strdup(token));
addNodeItem(&menu, itemName);
token = strtok(NULL, "|");
}
}
}
but somehow I don't think that's the right approach. And of course when I try and load the data into the addNodeItem() function, whose prototype I've written like this:
void addNodeItem(BCSType* menu, char *nodeitem);
and try and add the item using this notation:
category->nodeitem
the compiler tells me that there is no member named 'nodeitem' in the struct. Of course there isn't, but I'm trying to load the name from the strtok() part, so how do I get the addNodeItem() function to recognize the name that I'm trying to pass into it? Very confused here.
The "category" struct in the linked list looks like this:
typedef struct category
{
char categoryID[ID_LEN + 1];
char categoryName[MAX_NAME_LEN + 1];
char drinkType; /* (H)ot or (C)old. */
char categoryDescription[MAX_DESC_LEN + 1];
CategoryTypePtr nextCategory;
ItemTypePtr headItem;
unsigned numItems;
} CategoryType;
There are several issues here, but for starters, strcpy(drinkType, strdup(token)) doesn't make a lot of sense.
token is a pointer to a portion of your input string, with the '|' separator replaced by a NULL.
strdup allocates strlen(token) worth of memory and copies the contents of token. So far so good. It returns the address of the new memory, which you don't store anywhere, so you can never free() it. This is a leak, and you could eventually run out of memory.
strcpy(drinkType,strdup(token)) copies that new memory into the memory pointed to by drinkType. Thats only 1 character. I don't know if that is big enough. Neither do you, since there could be anything in the file you are loading. This is a bug waiting to happen.
And then it seems like the addNodeItem() function is missing something. You are passing a value that represents one of the possible values in the structure, with no way of specifying which one. You would have better luck creating a local copy of CategoryType, assigning all the information from the tokenizer, and then copying the whole thing into a new node.
outline of algorithm:
while lines:
CategoryType newCat
tokenize line
copy tokens into correct members of newCat, using `strncpy` to ensure no overruns.
add newCat to linked List.
For the last step, you need to duplicate newCat before adding it, because it will be overwritten when you process the next line. You can either pass a copy to AddNodeItem, or have AddNodeItem make the copy. (There are other options, but these are probably most straightforward.)

linked list segfault after first traversal

I am using the linked list example from junghans and try to make it work
with some server code. In the char array, I could insert a host (from inet_ntoa)
and update it's age. So I can send one packet to the daemon, but then it crashes.
I tried setting next_pointer=start_pointer; because from what I read, it will be
a circular list. However, after receiving the second packet, strcpy crashes..
Questions:
How do I point to the beginning, if next_pointer=start_pointer doesn't do the trick?
Do I need to free, before overwrite a member of the char array?
struct x {
char name[20];
int age;
struct x *next_rec;
};
struct x *start_pointer;
struct x *next_pointer; // starting hosts, will be overwritten
char *names[] = {
"127.0.0.1",
"evil666",
"192.168.56.101",
""
};
int ages[] = {0,20,30,0};
// some other code
while (1) {
sleep(1);
us=time(NULL);
printf("%ld, Sleep a second\n", us);
buf[0] = 0x0;
current_host = 0x0;
memset (buf,0,sizeof buf);
if(recvfrom(s, buf, BUFLEN, 0, (struct sockaddr*)&si_other, &slen)==-1)
diep("recvfrom()");
current_host = inet_ntoa(si_other.sin_addr);
if(!current_host)
diep("inet_ntoa()");
/* linked list initialization */
/* Initalise 'start_pointer' by reserving
* memory and pointing to it
*/
start_pointer=(struct x *) malloc (sizeof (struct x));
if(!start_pointer)
diep("start pointer on holiday");
/* Initalise 'next_pointer' to point
* to the same location.
*/
next_pointer=start_pointer;
/* Put some data into the reserved
* memory.
*/
strcpy(next_pointer->name,current_host);
next_pointer->age = ages[count];
/* Loop until all data has been read */
while ( ages[++count] != 0 )
{
/* Reserve more memory and point to it */
next_pointer->next_rec=(struct x *) malloc (sizeof (struct x));
if(!next_pointer)
diep("next pointer on holiday");
strcpy(next_pointer->name, names[count]);
next_pointer->age = ages[count];
}
next_pointer->next_rec=NULL;
next_pointer=start_pointer;
/* insert new record, update age */
while (next_pointer != NULL)
{
printf("%s ", next_pointer->name);
if(strstr(next_pointer->name,current_host)) {
printf("%d \n", next_pointer->age+1);
}
if(!strstr(next_pointer->name,current_host)) {
printf("%d \n", next_pointer->age);
}
next_pointer=next_pointer->next_rec;
}
next_pointer=start_pointer; // XXX
The problem with your code is that you missed this part from the linked C file in the init loop: next_pointer=next_pointer->next_rec. As a result, in first iteration you allocate new list node, but then modify contents of the first node. And then in subsequent iterations you allocate even more nodes, but you still only modify the first node.
Then right after the loop you terminate the list, but since you did not update next_pointer in the meantime, you're list has now a single node. (And you leaked some memory there, overwriting addresses with next allocations and NULL, so now you cannot free it.)
As to your more specific questions:
Question 1: next_pointer is only a helper variable to iterate over a list. If you want a circular list, you should set some next_rec pointer to start_pointer. You could do it like this:
for (next_pointer = start_pointer;
next_pointer->next_rec != NULL; /* This is not the last node. */
next_pointer = next_pointer->next_rec /* Move to the next node. */)
;
/* At this moment next_pointer points to the last node of the list. */
next_pointer->next_rec = start_pointer; /* And a cycle is there. */
However, you can actually do this while initialising your loop. When you exit from the init loop, next_pointer is indeed pointing at the last node. So instead of doing next_pointer->next_rec=NULL to terminate the list, do next_pointer->next_rec = start_pointer to make a cycle.
UPDATE That is, if you do want a circular list. Because in fact next_pointer=start_pointer will make next_pointer point at the beginning. So I assumed you want the end of the list point to the beginning (as you mentioned circular list).
Question 2: If you didn't allocate a string, explicitly (with malloc) or implicitly (for example with strdup), you don't need to free it. In particular in that code there are no strings to free:
next_pointer->name is an array within structure. You allocated it along with the structure (list node), and it will be freed with the node.
names is an array of pointers to the constant strings. They were not allocated, but instead will be part of data section of application compiled code, and thus cannot be freed.
Last but not least: lookout for strcpy. If you are copying only IPs there, 20 characters will always be enough, but the first moment you'll try to copy something larger, you have buffer overflow and can overwrite some other data. You should be using strncpy, with n = 19, followed by next_pointer->name[19] = 0 to ensure null termination.
And you don't need to run strstr() twice there. It either returns NULL pointer or not, so you could run
if (strstr( /* .. the arguments .. */ )) {
/* ... */
} else {
/* what if the call did return NULL. */
}

Malloc corrupting already malloc'd memory in C

I'm currently helping a friend debug a program of his, which includes linked lists. His list structure is pretty simple:
typedef struct nodo{
int cantUnos;
char* numBin;
struct nodo* sig;
}Nodo;
We've got the following code snippet:
void insNodo(Nodo** lista, char* auxBin, int auxCantUnos){
printf("*******Insertando\n");
int i;
if (*lista) printf("DecInt*%p->%p\n", *lista, (*lista)->sig);
Nodo* insert = (Nodo*)malloc(sizeof(Nodo*));
if (*lista) printf("Malloc*%p->%p\n", *lista, (*lista)->sig);
insert->cantUnos = auxCantUnos;
insert->numBin = (char*)malloc(strlen(auxBin)*sizeof(char));
for(i=0 ; i<strlen(auxBin) ; i++)
insert->numBin[i] = auxBin[i];
insert->numBin[i] = '\0';
insert->sig = NULL;
Nodo* aux;
/* [etc] */
(The lines with extra indentation were my addition for debug purposes)
This yields me the following:
*******Insertando
DecInt*00341098->00000000
Malloc*00341098->2832B6EE
(*lista)->sig is previously and deliberately set as NULL, which checks out until here, and fixed a potential buffer overflow (he'd forgotten to copy the NULL-terminator in insert->numBin).
I can't think of a single reason why'd that happen, nor I've got any idea on what else should I provide as further info.
(Compiling on latest stable MinGW under fully-patched Windows 7, friend's using MinGW under Windows XP. On my machine, at least, in only happens when GDB's not attached.)
Any ideas? Suggestions? Possible exorcism techniques? (Current hack is copying the sig pointer to a temp variable and restore it after malloc. It breaks anyways. Turns out the 2nd malloc corrupts it too. Interestingly enough, it resets sig to the exact same value as the first one).
UPDATE: Thanks for the answers. Regarding the Node* thing, it's fixed, but no change. At least prevents potential problems afterwards. String copying isn't the issue, as I already fixed all missing \0s myself. (Note the insertBin[i] = '\0' after the for)
One problem is this line:
Nodo* insert = (Nodo*)malloc(sizeof(Nodo*));
it should be
Nodo* insert = (Nodo*)malloc(sizeof(Nodo));
(Rule of thumb: you should have one less '*' in the sizeof() )
You need to allocate space for the Node structure, NOT space for a pointer to the Node structure (which incidently, will be 4 bytes on 32bit systems)
A similiar problem exists with not allocating enough room for the string (char array); don't forget the space for the terminating zero '\0'
on this line:
Nodo* insert = (Nodo*)malloc(sizeof(Nodo*));
You're only allocating enough memory for a pointer to Nodo, not a whole Nodo. You want:
Nodo* insert = (Nodo*)malloc(sizeof(Nodo));
Also, you may have at least one other allocation error:
insert->numBin = (char*)malloc(strlen(auxBin)*sizeof(char));
for(i=0 ; i<strlen(auxBin) ; i++)
insert->numBin[i] = auxBin[i];
It looks like you're duplicating a string. You'll want to allocate enough for the string plus one to get the terminating \0. You can simplify with this standard library call:
insert->numBin = strdup(auxBin);
EDIT: just noticed you're on Windows, so strdup() might not be available (it's a POSIX routine) so you can cover string duplication this way. Note the +1 on the length for the terminator:
insert->numBin = (char *)malloc( strlen(auxBin)+1 );
strcpy( insert->numBin, auxBin );
When you allocate memory for a string (char *) make sure it is of length strlen + 1 for the \0 at the end.
insert->numBin = (char*)malloc(strlen(auxBin)*sizeof(char));
needs to be
insert->numBin = (char*)malloc(strlen(auxBin) + 1);
Also there is no need to say * sizeof(char) which is 1.
One more thing John is right about how you allocate the structure, it must not be sizeof the pointer but sizeof the struct.

Resources