I'm getting a segmentation fault when running the code below.
It should basically read a .csv file with over 3M lines and do other stuff afterwards (not relevant to the problem), but after 207746 iterations it returns a segmentation fault. If I remove the p = strsep(&line,"|"); and just print the whole line it will print the >3M lines.
int ReadCSV (int argc, char *argv[]){
char *line = NULL, *p;
unsigned long count = 0;
FILE *data;
if (argc < 2) return 1;
if((data = fopen(argv[1], "r")) == NULL){
printf("the CSV file cannot be open");
exit(0);
}
while (getline(&line, &len, data)>0) {
p = strsep(&line,"|");
printf("Line number: %lu \t p: %s\n", count, p);
count++;
}
free(line);
fclose(data);
return 0;
}
I guess it'd have to do with the memory allocation, but can't figure out how to fix it.
A combination of getline and strsep often causes confusion, because both functions change the pointer that you pass them by pointer as the initial argument. If you pass the pointer that has been through strsep to getline again, you run the risk of undefined behavior on the second iteration.
Consider an example: getline allocates 101 bytes to line, and reads a 100-character string into it. Note that len is now set to 101. You call strsep, which finds '|' in the middle of the string, so it points line to what used to be line+50. Now you call getline again. It sees another 100-character line, and concludes that it is OK to copy it into the buffer, because len is still 101. However, since line points to the middle of the buffer now, writing 100 characters becomes undefined behavior.
Make a copy of line before calling strsep:
while (getline(&line, &len, data)>0) {
char *copy = line;
p = strsep(©, "|");
printf("Line number: %lu \t p: %s\n", count, p);
count++;
}
Now line that you pass to getline is preserved between loop iterations.
Look at the expression getline(&line, &len, data) and read the manpage:
If *line is set to NULL and *len is set 0 before the call, then
getline() will allocate a buffer for storing the line. This buffer
should be freed by the user program even if getline() failed.
This should be the case on your first time round the loop (although we can't see where len is declared, let's just assume your real code does this correctly)
Alternatively, before calling getline(), *line can contain a
pointer to a malloc(3)-allocated buffer *len bytes in size. If the
buffer is not large enough to hold the line, getline() resizes it
with realloc(3), updating *line and *len as necessary.
OK, so if line != NULL it must point to a buffer allocated by malloc of size len. The buffer allocated by your first call to getline (as above) satisfies this.
Note it's not good enough for line to point somewhere into that buffer, it must be the beginning.
Now look at the expression strsep(&line,"|") and read the manpage for that:
... This token is terminated by overwriting the delimiter with a
null byte ('\0'), and *line is updated to point past the token
So, the first argument (line) is changed so that you can call strsep again with the same first argument, and get the next token. This means line is no longer a valid argument to getline, because it isn't the start of a malloc'd buffer (and the length len is also now wrong).
In practice, either
getline will try to read len bytes into the buffer you gave it, but since you advanced line by the length of the first token, it writes off the end of your allocated block. This might just damage the heap rather than dying immediately
getline will try to realloc the buffer you gave it, but since it isn't a valid allocated block, you get heap damage again.
While we're here, you also don't check p is non-NULL, but damaging line is the main problem.
Oh, and if you think the problem is allocation-related, try using valgrind - it generally finds the moment things first go wrong.
Related
I am trying to write a function to convert a text file into a CSV file.
The input file has 3 lines with space-delimited entries. I have to find a way to read a line into a string and transform the three lines from the input file to three columns in a CSV file.
The files look like this :
Jake Ali Maria
24 23 43
Montreal Johannesburg Sydney
And I have to transform it into something like this:
Jake, 24, Montreal
...etc
I figured I could create a char **line variable that would hold three references to three separate char arrays, one for each of the three lines of the input file. I.e., my goal is to have *(line+i) store the i+1'th line of the file.
I wanted to avoid hardcoding char array sizes, such as
char line1 [999];
fgets(line1, 999, file);
so I wrote a while loop to fgets pieces of a line into a small buffer array of predetermined size, and then strcat and realloc memory as necessary to store the line as a string, with *(line+i) as as pointer to the string, where i is 0 for the first line, 1 for the second, etc.
Here is the problematic code:
#include <stdio.h>
#include<stdlib.h>
#include<string.h>
#define CHUNK 10
char** getLines (const char * filename){
FILE *file = fopen(filename, "rt");
char **lines = (char ** ) calloc(3, sizeof(char*));
char buffer[CHUNK];
for(int i = 0; i < 3; i++){
int lineLength = 0;
int bufferLength = 0;
*(lines+i) = NULL;
do{
fgets(buffer, CHUNK, file);
buffLength = strlen(buffer);
lineLength += buffLength;
*(lines+i) = (char*) realloc(*(lines+i), (lineLength +1)*sizeof(char));
strcat(*(lines+i), buffer);
}while(bufferLength ==CHUNK-1);
}
puts(*(lines+0));
puts(*(lines+1));
puts(*(lines+2));
fclose(file);
}
void load_and_convert(const char* filename){
char ** lines = getLines(filename);
}
int main(){
const char* filename = "demo.txt";
load_and_convert(filename);
}
This works as expected only for i=0. However, going through this with GDB, I see that I get a realloc(): invalid pointer error. The buffer loads fine, and it only crashes when I call 'realloc' in the for loop for i=1, when I get to the second line.
I managed to store the strings like I wanted in a small example I did to try to see what was going on, but the inputs were all on the same line. Maybe this has to do with fgets reading from a new line?
I would really appreciate some help with this, I've been stuck all day.
Thanks a lot!
***edit
I tried as suggested to use calloc instead of malloc to initialize the variable **lines, but I still have the same issue.I have added the modifications to the original code I uploaded.
***edit
After deleting the file and recompiling, the above now seems to work. Thank you to everyone for helping me out!
You allocate line (which is a misnomer since it's not a single line), which is a pointer to three char*s. You never initialize the contents of line (that is, you never make any of those three char*s point anywhere). Consequently, when you do realloc(*(line + i), ...), the first argument is uninitialized garbage.
To use realloc to do an initial memory allocation, its first argument must be a null pointer. You should explicitly initialize each element of line to NULL first.
Additionally, *(line+i) = (char *)realloc(*(line+i), ...) is still bad because if realloc fails to allocate memory, it will return a null pointer, clobber *(line + i), and leak the old pointer. You instead should split it into separate steps:
char* p = realloc(line[i], ...);
if (p == null) {
// Handle failure somehow.
exit(1);
}
line[i] = p;
A few more notes:
In C, you should avoid casting the result of malloc/realloc/calloc. It's not necessary since C allows implicit conversion from void* to other pointer types, and the explicit could mask an error where you accidentally omit #include <stdlib.h>.
sizeof(char) is, by definition, 1 byte.
When you're allocating memory, it's safer to get into a habit of using T* p = malloc(n * sizeof *p); instead of T* p = malloc(n * sizeof (T));. That way if the type of p ever changes, you won't silently be allocating the wrong amount of memory if you neglect to update the malloc (or realloc or calloc) call.
Here, you have to zero your array of pointers (for example by using calloc()),
char **line = (char**)malloc(sizeof(char*)*3); //allocate space for three char* pointers
otherwise the reallocs
*(line+i) = (char *)realloc(*(line+i), (inputLength+1)*sizeof(char)); //+1 for the empty character
use an uninitialized pointer, leading to undefined behaviour.
That it works with i=0 is pure coindicence and is a typical pitfall when encountering UB.
Furthermore, when using strcat(), you have to make sure that the first parameter is already a zero-terminated string! This is not the case here, since at the first iteration, realloc(NULL, ...); leaves you with an uninitialized buffer. This can lead to strcpy() writing past the end of your allocated buffer and lead to heap corruption. A possible fix is to use strcpy() instead of strcat() (this should even be more efficient here):
do{
fgets(buffer, CHUNK, file);
buffLength = strlen(buffer);
lines[i] = realloc(lines[i], (lineLength + buffLength + 1));
strcpy(lines[i]+lineLength, buffer);
lineLength += buffLength;
}while(bufferLength ==CHUNK-1);
The check bufferLength == CHUNK-1 will not do what you want if the line (including the newline) is exactly CHUNK-1 bytes long. A better check might be while (buffer[buffLength-1] != '\n').
Btw. line[i] is by far better readable than *(line+i) (which is semantically identical).
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *input_f;
input_f = fopen("Input.txt", "r"); //Opens the file in read mode.
if (input_f != NULL)
{
char line[2048];
while( fgets(line, sizeof line, input_f) != NULL )
{
//do something
}
fclose(input_f); //Close the input file.
}
else
{
perror("File couldn't opened"); //Will print that file couldn't opened and why.
}
return 0;
}
Hi. I know I can read line by line with this code in C, but I don't want to limit line size, say like in this code with 2048.
I thought about using malloc, but I don't know the size of the line before I read it, so IMO it cannot be done.
Is there a way to not to limit line size?
This question is just for my curiosity, thank you.
When you are allocating memory dynamically, you will want to change:
char line[2048];
to
#define MAXL 2048 /* the use of a define will become apparent when you */
size_t maxl = MAXL; /* need to check to determine if a realloc is needed */
char *line = malloc (maxl * sizeof *line);
if (!line) /* always check to insure allocation succeeded */
...error.. memory allocation failed
You read read up to (maxl -1) chars or a newline (if using fgetc, etc..) or read the line and then check whether line [strlen (line) - 1] == '\n' to determine whether you read the entire line (if using fgets). (POSIX requires all lines terminate with a newline) If you read maxl characters (fgetc) or did not read the newline (fgets), then it is a short read and more characters remain. Your choice is to realloc (generally doubling the size) and try again. To realloc:
char *tmp = realloc (line, 2 * maxl)
if (tmp) {
line = tmp;
maxl *= 2;
}
Note: never reallocate using your original pointer (e.g. line = realloc (line, 2 * maxl) because if realloc fails, the memory is freed and the pointer set to NULL and you will lose any data that existed in line. Also note that maxl is typically doubled each time you realloc. However, you are free to choose whatever size increasing scheme you like. (If you are concerned about zeroing all new memory allocated, you can use memset to initialize the newly allocated space to zero/null. Useful in some situations where you want to insure your line is always null-terminated)
That is the basic dynamic allocation/reallocation scheme. Note you are reading until you read the complete line, so you will need to restructure your loop test. And lastly, since you allocated the memory, you are responsible for freeing the memory when you are done with it. A tool you cannot live without is valgrind (or similar memory checker) to confirm you are not leaking memory.
Tip if you are reading and want to insure your string is always null-terminated, then after allocating your block of memory, zero (0) all characters. As mentioned earlier, memset is available, but if you choose calloc instead of malloc it will zero the memory for you. However, on realloc the new space is NOT zero'ed either way, so calling memset is required regardless of what function originally allocated the block.
Tip2 Look at the POSIX getline. getline will handle the allocation/reallocation needed so long as line is initialized to NULL. getline also returns the number of characters actually read dispensing with the need to call strlen after fgets to determine the same.
Let me know if you have additional questions.
Consider 2 thoughts:
An upper bound of allocated memory is reasonable. The nature of the task should have some idea of a maximum line length, be it 80, 1024 or 1 Mbyte.
With a clever OS, actual usage of allocated memory may not occur until needed. See Why is malloc not "using up" the memory on my computer?
So let code allocate 1 big buffer to limit pathological cases and let the underlying memory management (re-)allocate real memory as needed.
#define N (1000000)
char *buf = malloc(N);
...
while (fgets(buf, N, stdin) != NULL)) {
size_t len = strlen(buf);
if (len == N-1) {
perror("Excessive Long Line");
exit(EXIT_FAILURE);
}
}
free(buf);
I want to read multiple lines from a text file by a loop, but I always get Conditional jump or move depends on uninitialised value(s) error in my getline() line.
My code:
char *string;
size_t len = 0;
while (getline(&string, &len, fileStream) != -1) { // error happens this line
// do something
}
free(string);
fclose(fileSream);
I tried but failed to fix it. Any solutions will be appreciated.
You need to do either of below.
set char *string = NULL; and len to 0. [[ Preferred Method ]]
allocate memory to char *string and send the size of allocated memory using len.
Related quotes from man page for referrence
If *lineptr is set to NULL and *n is set 0 before the call, then
getline() will allocate a buffer for storing the line. This buffer
should be freed by the user program even if getline() failed.
Alternatively, before calling getline(), *lineptr can contain a
pointer to a malloc(3)-allocated buffer *n bytes in size. If the
buffer is not large enough to hold the line, getline() resizes it
with realloc(3), updating *lineptr and *n as necessary.
I had the same problem, for me it was easily solved by initializing string and len but also by initializing an int for getline, example
int gline = 0;
while (gline = getline(&string, &len, fileStream)) != -1
In the code below, I hope you can see that I have a char* variable and that I want to read in a string from a file. I then want to pass this string back from the function. I'm rather confused by pointers so I'm not too sure what I'm supposed to do really.
The purpose of this is to then pass the array to another function to be searched for a name.
Unfortunately the program crashes as a result and I've no idea why.
char* ObtainName(FILE *fp)
{
char* temp;
int i = 0;
temp = fgetc(fp);
while(temp != '\n')
{
temp = fgetc(fp);
i++;
}
printf("%s", temp);
return temp;
}
Any help would be vastly appreciated.
fgetc returns an int, not a char*. This int is a character from the stream, or EOF if you reach the end of the file.
You're implicitly casting the int to a char*, i.e., interpreting it as an address (turn your warnings on.) When you call printf it reads that address and continues to read a character at a time looking for the null terminator which ends the string, but that address is almost certainly invalid. This is undefined behavior.
I've taken some liberties with what you wanted to accomplish. Rather that deal with pointers, you can just use a fixed sized array as long as you can set a maximum length. I've also included several checks so that you don't run off the end of the buffer or the end of the file. Also important is to make sure that you have a null termination '\0' at the end of the string.
#define MAX_LEN 100
char* ObtainName(FILE *fp)
{
static char temp[MAX_LEN];
int i = 0;
while(i < MAX_LEN-1)
{
if (feof(fp))
{
break;
}
temp[i] = fgetc(fp);
if (temp[i] == '\n')
{
break;
}
i++;
}
temp[i] = '\0';
printf("%s", temp);
return temp;
}
So, there are several problems here:
You're not setting aside any storage for the string contents;
You're not storing the string contents correctly;
You're attempting to read memory that doesn't belong to you;
The way you're attempting to return the string is going to give you heartburn.
1. You're not setting aside storage for the string contents
The line
char *temp;
declares temp as a pointer to char; its value will be the address of a single character value. Since it's declared at local scope without the static keyword, its initial value will be indeterminate, and that value may not correspond to a valid memory address.
It does not set aside any storage for the string contents read from fp; that would have to be done as a separate step, which I'll get to below.
2. You're not storing the string contents correctly
The line
temp = fgetc(fp);
reads the next character from fp and assigns it to temp. First of all, this means you're only storing the last character read from the stream, not the whole string. Secondly, and more importantly, you're assigning the result of fgetc() (which returns a value of type int) to an object of type char * (which is treated as an address). You're basically saying "I want to treat the letter 'a' as an address into memory." This brings us to...
3. You're attempting to read memory that doesn't belong to you
In the line
printf("%s", temp);
you're attempting to print out the string beginning at the address stored in temp. Since the last thing you wrote to temp was most likely a character whose value is < 127, you're telling printf to start at a very low and most likely not accessible address, hence the crash.
4. The way you're attempting to return the string is guaranteed to give you heartburn
Since you've defined the function to return a char *, you're going to need to do one of the following:
Allocate memory dynamically to store the string contents, and then pass the responsibility of freeing that memory on to the function calling this one;
Declare an array with the static keyword so that the array doesn't "go away" after the function exits; however, this approach has serious drawbacks;
Change the function definition;
Allocate memory dynamically
You could use dynamic memory allocation routines to set aside a region of storage for the string contents, like so:
char *temp = malloc( MAX_STRING_LENGTH * sizeof *temp );
or
char *temp = calloc( MAX_STRING_LENGTH, sizeof *temp );
and then return temp as you've written.
Both malloc and calloc set aside the number of bytes you specify; calloc will initialize all those bytes to 0, which takes a little more time, but can save your bacon, especially when dealing with text.
The problem is that somebody has to deallocate this memory when its no longer needed; since you return the pointer, whoever calls this function now has the responsibility to call free() when it's done with that string, something like:
void Caller( FILE *fp )
{
...
char *name = ObtainName( fo );
...
free( name );
...
}
This spreads the responsibility for memory management around the program, increasing the chances that somebody will forget to release that memory, leading to memory leaks. Ideally, you'd like to have the same function that allocates the memory free it.
Use a static array
You could declare temp as an array of char and use the static keyword:
static char temp[MAX_STRING_SIZE];
This will set aside MAX_STRING_SIZE characters in the array when the program starts up, and it will be preserved between calls to ObtainName. No need to call free when you're done.
The problem with this approach is that by creating a static buffer, the code is not re-entrant; if ObtainName called another function which in turn called ObtainName again, that new call will clobber whatever was in the buffer before.
Why not just declare temp as
char temp[MAX_STRING_SIZE];
without the static keyword? The problem is that when ObtainName exits, the temp array ceases to exist (or rather, the memory it was using is available for someone else to use). That pointer you return is no longer valid, and the contents of the array may be overwritten before you can access it again.
Change the function definition
Ideally, you'd like for ObtainName to not have to worry about the memory it has to write to. The best way to achieve that is for the caller to pass target buffer as a parameter, along with the buffer's size:
int ObtainName( FILE *fp, char *buffer, size_t bufferSize )
{
...
}
This way, ObtainName writes data into the location that the caller specifies (useful if you want to obtain multiple names for different purposes). The function will return an integer value, which can be a simple success or failure, or an error code indicating why the function failed, etc.
Note that if you're reading text, you don't have to read character by character; you can use functions like fgets() or fscanf() to read an entire string at a time.
Use fscanf if you want to read whitespace-delimited strings (i.e., if the input file contains "This is a test", fscanf( fp, "%s", temp); will only read "This"). If you want to read an entire line (delimited by a newline character), use fgets().
Assuming you want to read an individual string at a time, you'd use something like the following (assumes C99):
#define FMT_SIZE 20
...
int ObtainName( FILE *fp, char *buffer, size_t bufsize )
{
int result = 1; // assume success
int scanfResult = 0;
char fmt[FMT_SIZE];
sprintf( fmt, "%%%zus", bufsize - 1 );
scanfResult = fscanf( fp, fmt, buffer );
if ( scanfResult == EOF )
{
// hit end-of-file before reading any text
result = 0;
}
else if ( scanfResult == 0 )
{
// did not read anything from input stream
result = 0;
}
else
{
result = 1;
}
return result;
}
So what's this noise
char fmt[FMT_SIZE];
sprintf( fmt, "%%%zus", bufsize - 1 );
about? There is a very nasty security hole in fscanf() when you use the %s or %[ conversion specifiers without a maximum length specifier. The %s conversion specifier tells fscanf to read characters until it sees a whitespace character; if there are more non-whitespace characters in the stream than the buffer is sized to hold, fscanf will store those extra characters past the end of the buffer, clobbering whatever memory is following it. This is a common malware exploit. So we want to specify a maximum length for the input; for example, %20s says to read no more than 20 characters from the stream and store them to the buffer.
Unfortunately, since the buffer length is passed in as an argument, we can't write something like %20s, and fscanf doesn't give us a way to specify the length as an argument the way fprintf does. So we have to create a separate format string, which we store in fmt. If the input buffer length is 10, then the format string will be %10s. If the input buffer length is 1000, then the format string will be %1000s.
The following code expands on that in your question, and returns the string in allocated storage:
char* ObtainName(FILE *fp)
{
int temp;
int i = 1;
char *string = malloc(i);
if(NULL == string)
{
fprintf(stderr, "malloc() failed\n");
goto CLEANUP;
}
*string = '\0';
temp = fgetc(fp);
while(temp != '\n')
{
char *newMem;
++i;
newMem=realloc(string, i);
if(NULL==newMem)
{
fprintf(stderr, "realloc() failed.\n");
goto CLEANUP;
}
string=newMem;
string[i-1] = temp;
string[i] = '\0';
temp = fgetc(fp);
}
CLEANUP:
printf("%s", string);
return(string);
}
Take care to 'free()' the string returned by this function, or a memory leak will occur.
I am getting "Bus Error" trying to read stdin into a char* variable.
I just want to read whole stuff coming over stdin and put it first into a variable, then continue working on the variable.
My Code is as follows:
char* content;
char* c;
while( scanf( "%c", c)) {
strcat( content, c);
}
fprintf( stdout, "Size: %d", strlen( content));
But somehow I always get "Bus error" returned by calling cat test.txt | myapp, where myapp is the compiled code above.
My question is how do i read stdin until EOF into a variable? As you see in the code, I just want to print the size of input coming over stdin, in this case it should be equal to the size of the file test.txt.
I thought just using scanf would be enough, maybe buffered way to read stdin?
First, you're passing uninitialized pointers, which means scanf and strcat will write memory you don't own. Second, strcat expects two null-terminated strings, while c is just a character. This will again cause it to read memory you don't own. You don't need scanf, because you're not doing any real processing. Finally, reading one character at a time is needlessly slow. Here's the beginning of a solution, using a resizable buffer for the final string, and a fixed buffer for the fgets call
#define BUF_SIZE 1024
char buffer[BUF_SIZE];
size_t contentSize = 1; // includes NULL
/* Preallocate space. We could just allocate one char here,
but that wouldn't be efficient. */
char *content = malloc(sizeof(char) * BUF_SIZE);
if(content == NULL)
{
perror("Failed to allocate content");
exit(1);
}
content[0] = '\0'; // make null-terminated
while(fgets(buffer, BUF_SIZE, stdin))
{
char *old = content;
contentSize += strlen(buffer);
content = realloc(content, contentSize);
if(content == NULL)
{
perror("Failed to reallocate content");
free(old);
exit(2);
}
strcat(content, buffer);
}
if(ferror(stdin))
{
free(content);
perror("Error reading from stdin.");
exit(3);
}
EDIT: As Wolfer alluded to, a NULL in your input will cause the string to be terminated prematurely when using fgets. getline is a better choice if available, since it handles memory allocation and does not have issues with NUL input.
Since you don't care about the actual content, why bother building a string? I'd also use getchar():
int c;
size_t s = 0;
while ((c = getchar()) != EOF)
{
s++;
}
printf("Size: %z\n", s);
This code will correctly handle cases where your file has '\0' characters in it.
Your problem is that you've never allocated c and content, so they're not pointing anywhere defined -- they're likely pointing to some unallocated memory, or something that doesn't exist at all. And then you're putting data into them. You need to allocate them first. (That's what a bus error typically means; you've tried to do a memory access that's not valid.)
(Alternately, since c is always holding just a single character, you can declare it as char c and pass &c to scanf. No need to declare a string of characters when one will do.)
Once you do that, you'll run into the issue of making sure that content is long enough to hold all the input. Either you need to have a guess of how much input you expect and allocate it at least that long (and then error out if you exceed that), or you need a strategy to reallocate it in a larger size if it's not long enough.
Oh, and you'll also run into the problem that strcat expects a string, not a single character. Even if you leave c as a char*, the scanf call doesn't make it a string. A single-character string is (in memory) a character followed by a null character to indicate the end of the string. scanf, when scanning for a single character, isn't going to put in the null character after it. As a result, strcpy isn't going to know where the end of the string is, and will go wandering off through memory looking for the null character.
The problem here is that you are referencing a pointer variable that no memory allocated via malloc, hence the results would be undefined, and not alone that, by using strcat on a undefined pointer that could be pointing to anything, you ended up with a bus error!
This would be the fixed code required....
char* content = malloc (100 * sizeof(char));
char c;
if (content != NULL){
content[0] = '\0'; // Thanks David!
while ((c = getchar()) != EOF)
{
if (strlen(content) < 100){
strcat(content, c);
content[strlen(content)-1] = '\0';
}
}
}
/* When done with the variable */
free(content);
The code highlights the programmer's responsibility to manage the memory - for every malloc there's a free if not, you have a memory leak!
Edit: Thanks to David Gelhar for his point-out at my glitch! I have fixed up the code above to reflect the fixes...of course in a real-life situation, perhaps the fixed value of 100 could be changed to perhaps a #define to make it easy to expand the buffer by doubling over the amount of memory via realloc and trim it to size...
Assuming that you want to get (shorter than MAXL-1 chars) strings and not to process your file char by char, I did as follows:
#include <stdio.h>
#include <string.h>
#define MAXL 256
main(){
char s[MAXL];
s[0]=0;
scanf("%s",s);
while(strlen(s)>0){
printf("Size of %s : %d\n",s,strlen(s));
s[0]=0;
scanf("%s",s);
};
}