Parsing CSV Values in C - c

I am trying to write a basic CSV parser in C that generates a dynamic array of char* when given a char* and a separator character, such as a comma:
char **filldoc_parse_csv(char *toparse, char sepchar)
{
char **strings = NULL;
char *buffer = NULL;
int j = 0;
int k = 1;
for(int i=0; i < strlen(toparse); i++)
{
if(toparse[i] != sepchar)
{
buffer = realloc(buffer, sizeof(char)*k);
strcat(buffer, (const char*)toparse[i]);
k++;
}
else
{
strings = realloc(strings, sizeof(buffer)+1);
strings[j] = buffer;
free(buffer);
j++;
}
}
return strings;
}
However, when I call the function as in the manner below:
char **strings = filldoc_parse_csv("hello,how,are,you", ',');
I end up with a segmentation fault:
Program received signal SIGSEGV, Segmentation fault.
__strcat_sse2 () at ../sysdeps/x86_64/multiarch/../strcat.S:166
166 ../sysdeps/x86_64/multiarch/../strcat.S: No such file or directory.
(gdb) backtrace
#0 __strcat_sse2 () at ../sysdeps/x86_64/multiarch/../strcat.S:166
#1 0x000000000040072c in filldoc_parse_csv (toparse=0x400824 "hello,how,are,you", sepchar=44 ',') at filldocparse.c:20
#2 0x0000000000400674 in main () at parsetest.c:6
The problem is centered around allocating enough space for the buffer string. If I have to, I will make buffer a static array, however, I would like to use dynamic memory allocation for this purpose. How can I do it correctly?

Various problems
strcat(buffer, (const char*)toparse[i]); attempts to changes a char to a string.
strings = realloc(strings, sizeof(buffer)+1); reallocates the same amount of space. sizeof(buffer) is the size of the pointer buffer, not the size of memory it points to.
The calling function has no way to know how many entries in strings. Suggest andding a NULL sentinel.
Minor: better to use size_t rather than int. Use more descriptive names. Do not re-call strlen(toparse) repetitively. Use for(int i=0; toparse[i]; i++) . Make toparse a const char *

You have problems with your memory allocations. When you do e.g. sizeof(buffer) you will get the size of the pointer and not what it points to. That means you will in the first run allocate five bytes (on a 32-bit system), and the next time the function is called you will allocate five bytes again.
There are also many other problems, like you freeing the buffer pointer once you assigned the pointer to strings[j]. The problem with this is that the assignment only copies the pointer and not what it points to, so by freeing buffer you also free strings[j].
Both the above problems will lead to your program having undefined behavior, which is the most common cause of runtime-crashes.
You should also avoid assigning the result of realloc to the pointer you're trying to reallocate, because if realloc fails it will return NULL and you loose the original pointer causing a memory leak.

Related

strcpy seg fault in C

Curious about what is going wrong with this strcpy.
int main(void){
char *history[10];
for(int i = 0; i < 10; i++){
history[i] = NULL;
}
char line[80];
fgets(line,80,stdin);
strcpy(history[0],line); //This line segfaults
}
You've created an array of NULL pointers. You then tried to copy characters onto NULL. That's a no-no.
EDIT:
Your program could be optimized to this:
void main() {
char line[80];
fgets(line,80,stdin);
}
Your history array is never used to generate any output. So, while others have pointed out you need to allocate memory, technically, you could simply do this:
history[0] = line;
That will be a valid pointer up until the line goes out of scope, which is when history goes out of scope so it won't matter.
You need to allocate memory for history[0]. As history[0] is assigned NULL referencing it or writing to it will/may cause segfault.
something like
//this will create memory for 100 chars
history[0] = malloc(sizeof(char) * 100);
strcpy(history[0],line);
Or
//shortcut for both - this allocate new memory and returns pointer.
history[0] = strdup(line);

Using Malloc for i endless C -String

I was wondering is it possible to create one endless array which can store endlessly long strings?
So what I exactly mean is, I want to create a function which gets i Strings with n length.I want to input infinite strings in the program which can be infinite characters long!
void endless(int i){
//store user input on char array i times
}
To achieve that I need malloc, which I would normally use like this:
string = malloc(sizeof(char));
But how would that work for lets say 5 or 10 arrays or even a endless stream of arrays? Or is this not possible?
Edit:
I do know memory is not endless, what I mean is if it where infinite how would you try to achieve it? Or maybe just allocate memory until all memory is used?
Edit 2:
So I played around a little and this came out:
void endless (char* array[], int numbersOfArrays){
int j;
//allocate memory
for (j = 0; j < numbersOfArrays; j++){
array[j] = (char *) malloc(1024*1024*1024);
}
//scan strings
for (j = 0; j < numbersOfArrays; j++){
scanf("%s",array[j]);
array[j] = realloc(array[j],strlen(array[j]+1));
}
//print stringd
for (j = 0; j < numbersOfArrays; j++){
printf("%s\n",array[j]);
}
}
However this isn't working maybe I got the realloc part terrible wrong?
The memory is not infinite, thus you cannot.
I mean the physical memory in a computer has its limits.
malloc() will fail and allocate no memory when your program requestes too much memory:
If the function failed to allocate the requested block of memory, a null pointer is returned.
Assuming that memory is infinite, then I would create an SxN 2D array, where S is the number of strings and N the longest length of the strings you got, but obviously there are many ways to do this! ;)
Another way would be to have a simple linked list (I have one in List (C) if you need one), where every node would have a char pointer and that pointer would eventually host a string.
You can define a max length you will assume it will be the max lenght of your strings. Otherwise, you could allocate a huge 1d char array which you hole the new string, use strlen() to find the actual length of the string, and then allocate dynamically an array that would exactly the size that is needed, equal of that length + 1 for the null-string-terminator.
Here is a toy example program that asks the user to enter some strings. Memory is allocated for the strings in the get_string() function, then pointers to the strings are added to an array in the add_string() function, which also allocates memory for array storage. You can add as many strings of arbitrary length as you want, until your computer runs out of memory, at which point you will probably segfault because there are no checks on whether the memory allocations are successful. But that would take an awful lot of typing.
I guess the important point here is that there are two allocation steps: one for the strings and one for the array that stores the pointers to the strings. If you add a string literal to the storage array, you don't need to allocate for it. But if you add a string that is unknown at compile time (like user input), then you have to dynamically allocate memory for it.
Edit:
If anyone tried to run the original code listed below, they might have encountered some bizarre behavior for long strings. Specifically, they could be truncated and terminated with a mystery character. This was a result of the fact that the original code did not handle the input of an empty line properly. I did test it for a very long string, and it seemed to work. I think that I just got "lucky." Also, there was a tiny (1 byte) memory leak. It turned out that I forgot to free the memory pointed to from newstring, which held a single '\0' character upon exit. Thanks, Valgrind!
This all could have been avoided from the start if I had passed a NULL back from the get_string() function instead of an empty string to indicate an empty line of input. Lesson learned? The source code below has been fixed, NULL now indicates an empty line of input, and all is well.
#include <stdio.h>
#include <stdlib.h>
char * get_string(void);
char ** add_string(char *str, char **arr, int num_strings);
int main(void)
{
char *newstring;
char **string_storage;
int i, num = 0;
string_storage = NULL;
puts("Enter some strings (empty line to quit):");
while ((newstring = get_string()) != NULL) {
string_storage = add_string(newstring, string_storage, num);
++num;
}
puts("You entered:");
for (i = 0; i < num; i++)
puts(string_storage[i]);
/* Free allocated memory */
for (i = 0; i < num; i++)
free(string_storage[i]);
free(string_storage);
return 0;
}
char * get_string(void)
{
char ch;
int num = 0;
char *newstring;
newstring = NULL;
while ((ch = getchar()) != '\n') {
++num;
newstring = realloc(newstring, (num + 1) * sizeof(char));
newstring[num - 1] = ch;
}
if (num > 0)
newstring[num] = '\0';
return newstring;
}
char ** add_string(char *str, char **arr, int num_strings)
{
++num_strings;
arr = realloc(arr, num_strings * (sizeof(char *)));
arr[num_strings - 1] = str;
return arr;
}
I was wondering is it possible to create one endless array which can store endlessly long strings?
The memory can't be infinite. So, the answer is NO. Even if you have every large memory, you will need a processor that could address that huge memory space. There is a limit on amount of dynamic memory that can be allocated by malloc and the amount of static memory(allocated at compile time) that can be allocated. malloc function call will return a NULL if there is no suitable memory block requested by you in the heap memory.
Assuming that you have very large memory space available to you relative to space required by your input strings and you will never run out of memory. You can store your input strings using 2 dimensional array.
C does not really have multi-dimensional arrays, but there are several ways to simulate them. You can use a (dynamically allocated) array of pointers to (dynamically allocated) arrays. This is used mostly when the array bounds are not known until runtime. OR
You can also allocate a global two dimensional array of sufficient length and width. The static allocation for storing random size input strings is not a good idea. Most of the memory space will be unused.
Also, C programming language doesn't have string data type. You can simulate a string using a null terminated array of characters. So, to dynamically allocate a character array in C, we should use malloc like shown below:
char *cstr = malloc((MAX_CHARACTERS + 1)*sizeof(char));
Here, MAX_CHARACTERS represents the maximum number of characters that can be stored in your cstr array. The +1 is added to allocate a space for null character if MAX_CHARACTERS are stored in your string.

C Turning a dynamic char array into an array of char arrays

I'm working on a function, that has to take a dynamic char array, separate it at spaces, and put each word in an array of char arrays. Here's the code:
char** parse_cmdline(const char *cmdline)
{
char** arguments = (char**)malloc(sizeof(char));
char* buffer;
int lineCount = 0, strCount = 0, argCount = 0;
int spaceBegin = 0;
while((cmdline[lineCount] != '\n'))
{
if(cmdline[lineCount] == ' ')
{
argCount++;
arguments[argCount] = (char*)malloc(sizeof(char));
strCount = 0;
}
else
{
buffer = realloc(arguments[argCount], strCount + 1);
arguments[argCount] = buffer;
arguments[argCount][strCount] = cmdline[lineCount];
strCount++;
}
lineCount++;
}
arguments[argCount] = '\0';
free(buffer);
return arguments;
}
The problem is that somewhere along the way I get a Segmentation fault and I don't exacly know where.
Also, this current version of the function assumes that the string does not begin with a space, that is for the next version, i can handle that, but i can't find the reason for the seg. fault
This code is surely not what you intended:
char** arguments = (char**)malloc(sizeof(char));
It allocates a block of memory large enough for one char, and sets a variable of type char ** (arguments) to point to it. But even if you wanted only enough space in arguments for a single char *, what you have allocated is not enough (not on any C system you're likely to meet, anyway). It is certainly not long enough for multiple pointers.
Supposing that pointers are indeed wider than single chars on your C system, your program invokes undefined behavior as soon as it dereferences arguments. A segmentation fault is one of the more likely results.
The simplest way forward is probably to scan the input string twice: once to count the number of individual arguments there are, so that you can allocate enough space for the pointers, and again to create the individual argument strings and record pointers to them in your array.
Note, too, that the return value does not carry any accessible information about how much space was allocated, or, therefore, how many argument strings you extracted. The usual approach to this kind of problem is to allocate space for one additional pointer, and to set that last pointer to NULL as a sentinel. This is much akin to, but not the same as, using a null char to mark the end of a C string.
Edited to add:
The allocation you want for arguments is something more like this:
arguments = malloc(sizeof(*arguments) * (argument_count + 1));
That is, allocate space for one more object than there are arguments, with each object the size of the type of thing that arguments is intended to point at. The value of arguments is not accessed by sizeof, so it doesn't matter that it is indeterminate at that point.
Edited to add:
The free() call at the end is also problematic:
free(buffer);
At that point, variable buffer points to the same allocated block as the last element of arguments points to (or is intended to point to). If you free it then all pointers to that memory are invalidated, including the one you are about to return to the caller. You don't need to free buffer at that point any more than you needed to free it after any of the other allocations.
This is probably why you have a segmentation fault:
In char** arguments = (char**)malloc(sizeof(char));, you have used malloc (sizeof (char)), this allocates space for only a single byte (enough space for one char). This is not enough to hold a single char* in arguments.
But even if it was in some system, so arguments[argCount] is only reading allocated memory for argCount = 0. For other values of argCount, the array index is out of bounds - leading to a segmentation fault.
For example, if your input string is something like this - "Hi. How are you doing?", then it has 4 ' ' characters before \n is reached, and the value of argCount will go up till 3.
What you want to do is somthing like this:
char** parse_cmdline( const char *cmdline )
{
Allocate your array of argument pointers with length for 1 pointer and init it with 0.
char** arguments = malloc( sizeof(char*) );
arguments[0] = NULL;
Set a char* pointer to the first char in yor command line and remember the
beginn of the first argument
int argCount = 0, len = 0;
const char *argStart = cmdline;
const char *actPos = argStart;
Continue until end of command line reached.
If you find a blank you have a new argument which consist of th characters between argStart and actPos . Allocate and copy argument from command line.
while( *actPos != '\n' && *actPos != '\0' )
{
if( cmdline[lineCount] == ' ' && actPos > argStart )
{
argCount++; // increment number of arguments
arguments = realloc( arguments, (argCount+1) * sizeof(char*) ); // allocate argCount + 1 (NULL at end of list of arguments)
arguments[argCount] = NULL; // list of arguments ends with NULL
len = actPos - argStart;
arguments[argCount-1] = malloc( len+1 ); // allocate number of characters + '\0'
memcpy( arguments[argCount-1], actPos, len ); // copy characters of argument
arguments[argCount-1] = 0; // set '\0' at end of argument string
argStart = actPos + 1; // next argument starts after blank
}
actPos++;
}
return arguments;
}
some suggestions i would give is, before calling malloc, you might want to first count the number of words you have. then call malloc as char ** charArray = malloc(arguments*sizeof(char*));. This will be the space for the char ** charArray. Then each element in charArray should be malloced by the size of the word you are trying to store in that element. Then you may store that word inside that index.
Ex. *charArray = malloc(sizeof(word)); Then you can store it as **charArray = word;
Be careful with pointer arithmetic however.
The segmentation fault is definitly arising from you trying to access an element in an array in an undefined space. Which arises from you not mallocing space correctly for the array.

ERROR READING STRING

My code does not work. I get run time error at the moment i accept a string. What is the problem with this code?
//this is what i have in main()
char *ele,*s[max];
int *count,temp=0;
count=&temp;
printf("Enter string to insert: ");
scanf("%s",ele);
addleft(s,ele,count);
//following is the function definition
void addleft(char *s[max],char *ele,int *count)
{
int i;
if((*count)==max)
{
printf("Queue full!\n");
return;
}
for(i=*count;i>0;i--)
strcpy(s[i],s[i-1]);
strcpy(s[0],ele);
(*count)++;
printf("String inserted at left!\n");
}
ele is an uninitialised char* and has no memory associated with it and scanf() will be attempting to write to it causing undefined behaviour, a segmentation fault is probable.
You need to either dynamically allocate memory for ele or declare a local array and prevent buffer overrun when using scanf():
char ele[1024];
if (1 == scanf("%1023s", ele))
{
/* Process 'ele'. */
}
Additionally, the function addleft() is using strcpy() on s, which is an array of char* and each of the char* in the array is unitialised. This is undefined behaviour and a probable segmentation fault. To correct, you could use strdup() if it is available otherwise malloc() and strcpy():
/* Instead of:
strcpy(s[0],ele);
use:
*/
s[0] = strdup(ele);
Note that the for loop inside the addleft() function is dangerous as the char* contained within s are not necessarily of the same length. This could easily lead to writing beyond the end of arrays. However, as the elements are addresses of dynamically allocated char* you can just swap the elements instead of copying their content.
sscanf("%s", ele) is putting the input in the memory pointed to by 'ele'. But 'ele' has never been initialized to point to anything. Something like:
char ele[128];
or
char* ele = malloc(...)
should fix it up.
You are causing a buffer overflow because the pointer ele is not pointing to any allocated memory. You are writing into memory that your program needs to run, therefore crashing it. I recommend you implement mallocinto your program like this:
char *ele;
if (!(ele = malloc(50))) //allocate 50 bytes of memory
{
//allocation failed
exit(0);
}
scanf("%s", ele); //string can hold 50 bytes now
free(ele); //free allocated space
You might want to read up on the malloc function here
An easier route would just to make ele an array instead of a pointer:
char ele[50]; //ele is an array of 50 bytes

What does memcpy do exactly in this program?

I am writing a program where the input will be taken from stdin. The first input will be an integer which says the number of strings to be read from stdin.
I just read the string character-by-character into a dynamically allocated memory and displays it once the string ends.
But when the string is larger than allocated size, I am reallocating the memory using realloc. But even if I use memcpy, the program works. Is it undefined behavior to not use memcpy? But the example Using Realloc in C does not use memcpy. So which one is the correct way to do it? And is my program shown below correct?
/* ss.c
* Gets number of input strings to be read from the stdin and displays them.
* Realloc dynamically allocated memory to get strings from stdin depending on
* the string length.
*/
#include <stdio.h>
#include <stdlib.h>
int display_mem_alloc_error();
enum {
CHUNK_SIZE = 31,
};
int display_mem_alloc_error() {
fprintf(stderr, "\nError allocating memory");
exit(1);
}
int main(int argc, char **argv) {
int numStr; //number of input strings
int curSize = CHUNK_SIZE; //currently allocated chunk size
int i = 0; //counter
int len = 0; //length of the current string
int c; //will contain a character
char *str = NULL; //will contain the input string
char *str_cp = NULL; //will point to str
char *str_tmp = NULL; //used for realloc
str = malloc(sizeof(*str) * CHUNK_SIZE);
if (str == NULL) {
display_mem_alloc_error();
}
str_cp = str; //store the reference to the allocated memory
scanf("%d\n", &numStr); //get the number of input strings
while (i != numStr) {
if (i >= 1) { //reset
str = str_cp;
len = 0;
}
c = getchar();
while (c != '\n' && c != '\r') {
*str = (char *) c;
printf("\nlen: %d -> *str: %c", len, *str);
str = str + 1;
len = len + 1;
*str = '\0';
c = getchar();
if (curSize/len == 1) {
curSize = curSize + CHUNK_SIZE;
str_tmp = realloc(str_cp, sizeof(*str_cp) * curSize);
if (str_tmp == NULL) {
display_mem_alloc_error();
}
memcpy(str_tmp, str_cp, curSize); // NB: seems to work without memcpy
printf("\nstr_tmp: %d", str_tmp);
printf("\nstr: %d", str);
printf("\nstr_cp: %d\n", str_cp);
}
}
i = i + 1;
printf("\nEntered string: %s\n", str_cp);
}
return 0;
}
/* -----------------
//input-output
gcc -o ss ss.c
./ss < in.txt
// in.txt
1
abcdefghijklmnopqrstuvwxyzabcdefghij
// output
// [..snip..]
Entered string:
abcdefghijklmnopqrstuvwxyzabcdefghij
-------------------- */
Thanks.
Your program is not quite correct. You need to remove the call to memcpy to avoid an occasional, hard to diagnose bug.
From the realloc man page
The realloc() function changes the size of the memory block pointed to
by ptr to size bytes. The contents will be unchanged in the range
from the start of the region up to the minimum of the old and new
sizes
So, you don't need to call memcpy after realloc. In fact, doing so is wrong because your previous heap cell may have been freed inside the realloc call. If it was freed, it now points to memory with unpredictable content.
C11 standard (PDF), section 7.22.3.4 paragraph 2:
The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.
So in short, the memcpy is unnecessary and indeed wrong. Wrong for two reasons:
If realloc has freed your previous memory, then you are accessing memory that is not yours.
If realloc has just enlarged your previous memory, you are giving memcpy two pointers that point to the same area. memcpy has a restrict qualifier on both its input pointers which means it is undefined behavior if they point to the same object. (Side note: memmove doesn't have this restriction)
Realloc enlarge the memory size where reserved for your string. If it is possible to enlarge it without moving the datas, those will stay in place. If it cannot, it malloc a lager memory plage, and memcpy itself the data contained in the previous memory plage.
In short, it is normal that you dont have to call memcpy after realloc.
From the man page:
The realloc() function tries to change the size of the allocation pointed
to by ptr to size, and returns ptr. If there is not enough room to
enlarge the memory allocation pointed to by ptr, realloc() creates a new
allocation, copies as much of the old data pointed to by ptr as will fit
to the new allocation, frees the old allocation, and returns a pointer to
the allocated memory. If ptr is NULL, realloc() is identical to a call
to malloc() for size bytes. If size is zero and ptr is not NULL, a new,
minimum sized object is allocated and the original object is freed. When
extending a region allocated with calloc(3), realloc(3) does not guaran-
tee that the additional memory is also zero-filled.

Resources