I am trying to write a function to convert a text file into a CSV file.
The input file has 3 lines with space-delimited entries. I have to find a way to read a line into a string and transform the three lines from the input file to three columns in a CSV file.
The files look like this :
Jake Ali Maria
24 23 43
Montreal Johannesburg Sydney
And I have to transform it into something like this:
Jake, 24, Montreal
...etc
I figured I could create a char **line variable that would hold three references to three separate char arrays, one for each of the three lines of the input file. I.e., my goal is to have *(line+i) store the i+1'th line of the file.
I wanted to avoid hardcoding char array sizes, such as
char line1 [999];
fgets(line1, 999, file);
so I wrote a while loop to fgets pieces of a line into a small buffer array of predetermined size, and then strcat and realloc memory as necessary to store the line as a string, with *(line+i) as as pointer to the string, where i is 0 for the first line, 1 for the second, etc.
Here is the problematic code:
#include <stdio.h>
#include<stdlib.h>
#include<string.h>
#define CHUNK 10
char** getLines (const char * filename){
FILE *file = fopen(filename, "rt");
char **lines = (char ** ) calloc(3, sizeof(char*));
char buffer[CHUNK];
for(int i = 0; i < 3; i++){
int lineLength = 0;
int bufferLength = 0;
*(lines+i) = NULL;
do{
fgets(buffer, CHUNK, file);
buffLength = strlen(buffer);
lineLength += buffLength;
*(lines+i) = (char*) realloc(*(lines+i), (lineLength +1)*sizeof(char));
strcat(*(lines+i), buffer);
}while(bufferLength ==CHUNK-1);
}
puts(*(lines+0));
puts(*(lines+1));
puts(*(lines+2));
fclose(file);
}
void load_and_convert(const char* filename){
char ** lines = getLines(filename);
}
int main(){
const char* filename = "demo.txt";
load_and_convert(filename);
}
This works as expected only for i=0. However, going through this with GDB, I see that I get a realloc(): invalid pointer error. The buffer loads fine, and it only crashes when I call 'realloc' in the for loop for i=1, when I get to the second line.
I managed to store the strings like I wanted in a small example I did to try to see what was going on, but the inputs were all on the same line. Maybe this has to do with fgets reading from a new line?
I would really appreciate some help with this, I've been stuck all day.
Thanks a lot!
***edit
I tried as suggested to use calloc instead of malloc to initialize the variable **lines, but I still have the same issue.I have added the modifications to the original code I uploaded.
***edit
After deleting the file and recompiling, the above now seems to work. Thank you to everyone for helping me out!
You allocate line (which is a misnomer since it's not a single line), which is a pointer to three char*s. You never initialize the contents of line (that is, you never make any of those three char*s point anywhere). Consequently, when you do realloc(*(line + i), ...), the first argument is uninitialized garbage.
To use realloc to do an initial memory allocation, its first argument must be a null pointer. You should explicitly initialize each element of line to NULL first.
Additionally, *(line+i) = (char *)realloc(*(line+i), ...) is still bad because if realloc fails to allocate memory, it will return a null pointer, clobber *(line + i), and leak the old pointer. You instead should split it into separate steps:
char* p = realloc(line[i], ...);
if (p == null) {
// Handle failure somehow.
exit(1);
}
line[i] = p;
A few more notes:
In C, you should avoid casting the result of malloc/realloc/calloc. It's not necessary since C allows implicit conversion from void* to other pointer types, and the explicit could mask an error where you accidentally omit #include <stdlib.h>.
sizeof(char) is, by definition, 1 byte.
When you're allocating memory, it's safer to get into a habit of using T* p = malloc(n * sizeof *p); instead of T* p = malloc(n * sizeof (T));. That way if the type of p ever changes, you won't silently be allocating the wrong amount of memory if you neglect to update the malloc (or realloc or calloc) call.
Here, you have to zero your array of pointers (for example by using calloc()),
char **line = (char**)malloc(sizeof(char*)*3); //allocate space for three char* pointers
otherwise the reallocs
*(line+i) = (char *)realloc(*(line+i), (inputLength+1)*sizeof(char)); //+1 for the empty character
use an uninitialized pointer, leading to undefined behaviour.
That it works with i=0 is pure coindicence and is a typical pitfall when encountering UB.
Furthermore, when using strcat(), you have to make sure that the first parameter is already a zero-terminated string! This is not the case here, since at the first iteration, realloc(NULL, ...); leaves you with an uninitialized buffer. This can lead to strcpy() writing past the end of your allocated buffer and lead to heap corruption. A possible fix is to use strcpy() instead of strcat() (this should even be more efficient here):
do{
fgets(buffer, CHUNK, file);
buffLength = strlen(buffer);
lines[i] = realloc(lines[i], (lineLength + buffLength + 1));
strcpy(lines[i]+lineLength, buffer);
lineLength += buffLength;
}while(bufferLength ==CHUNK-1);
The check bufferLength == CHUNK-1 will not do what you want if the line (including the newline) is exactly CHUNK-1 bytes long. A better check might be while (buffer[buffLength-1] != '\n').
Btw. line[i] is by far better readable than *(line+i) (which is semantically identical).
Related
I need to create 2 separate functions readLine() and readLines() in C. The first one has to return the pointer to the input string and the second one should return the array of pointers of input strings. readLines() is terminated with a new line character. I am getting some errors, probably something with memory, sometimes it works, sometimes it doesn't. Here is the code:
char* readLine() {
char pom[1000];
gets(pom);
char* s = (char*)malloc(sizeof(char) * strlen(pom));
strcpy(s, pom);
return s;
}
And here is readLines()
char** readLines() {
char** lines = (char**)malloc(sizeof(char*));
int i = 0;
do {
char pom[1000];
gets(pom);
lines[i] = (char*)malloc(sizeof(char) * strlen(pom));
strcpy(lines[i], pom);
i++;
} while (strlen(lines[i - 1]) != 0);
return lines;
}
In the main, I call these functions as
char* p = readLine();
char** lines = readLines();
When allocating memory for a string using malloc, you should allocate enough memory for the entire string, including the terminating null character.
In your case, strcpy will cause a buffer overflow, because the destination buffer isn't large enough.
You should change the line
char* s = (char*)malloc(sizeof(char) * strlen(pom));
to
char* s = (char*)malloc( sizeof(char) * (strlen(pom)+1) );
and change the line
lines[i] = (char*)malloc(sizeof(char) * strlen(pom));
to
lines[i] = (char*)malloc( sizeof(char) * (strlen(pom)+1) );
Also, the line
char** lines = (char**)malloc(sizeof(char*));
is wrong, as it only allocates enough memory for a single pointer. You need one pointer per line. Unfortunately, you don't know in advance how many lines there will be, so you also don't know how many pointers you will need. However, you can resize the buffer as required, by using the function realloc.
Although it is unrelated to your problem, it is worth noting that the function gets has been removed from the ISO C standard and should no longer be used. It is recommended to use fgets instead. See this question for further information: Why is the gets function so dangerous that it should not be used?
Also, in C, it is not necessary to cast the return value of malloc. This is only necessary in C++. See this question for further information: Do I cast the result of malloc?
In your code, you first increment i using the statement i++; and then you reconstruct the previous value of i by subtracting 1:
while (strlen(lines[i - 1]) != 0);
This is unnecessarily cumbersome. It would be better to write
while (strlen(lines[i++]) != 0);
and remove the line i++;. That way, you no longer have to subtract by 1.
I am pretty new to C and memory allocation in general. Basically what I am trying to do is copy the contents of an input file of unknown size and reverse it's contents using recursion. I feel that I am very close, but I keep getting a segmentation fault when I try to put in the contents of what I presume to be the reversed contents of the file (I presume because I think I am doing it right....)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int recursive_back(char **lines, int lineNumber, FILE *input) {
char *input_line = malloc(sizeof(char) * 1000);
lines = realloc(lines, (lineNumber) * 1000 * sizeof(char));
if(fgets(input_line, 201, input) == NULL) {
*(lines + lineNumber) = input_line;
return 1;
}
else {
printf("%d\n", lineNumber);
return (1+recursive_back(lines, ++lineNumber, input));
}
}
void backward (FILE *input, FILE *output, int debugflag ) {
int i;
char **lines; //store lines in here
lines = malloc(1000 * sizeof(char *) ); //1000 lines
if(lines == NULL) { //if malloc failed
fprintf(stderr, "malloc of lines failed\n");
exit(1);
}
int finalLineCount, lineCount;
finalLineCount = recursive_back(lines, 0, input);
printf("test %d\n", finalLineCount);
for(i = finalLineCount; i > 0; i--) {
fputs(*(lines+i), output); //segfault here
}
}
I am using a simple input file to test the code. My input file is 6 lines long that says "This is a test input file". The actual input files are being opened in another function and passed over to the backward function. I have verified that the other functions in my program work since I have been playing around with different options. These two functions are the only functions that I am having trouble with. What am I doing wrong?
Your problem is here:
lines = realloc(lines, (lineNumber) * 1000 * sizeof(char));
exactly as #ooga said. There are at least three separate things wrong with it:
You are reallocating the memory block pointed to by recursive_back()'s local variable lines, and storing the new address (supposing that the reallocation succeeds) back into that local variable. The new location is not necessarily the same as the old, but the only pointer to it is a local variable that goes out of scope at the end of recursive_back(). The caller's corresponding variable is not changed (including when the caller is recursive_back() itself), and therefore can no longer be relied upon to be a valid pointer after recursive_back() returns.
You allocate space using the wrong type. lines has type char **, so the object it points to has type char *, but you are reserving space based on the size of char instead.
You are not reserving enough space, at least on the first call, when lineNumber is zero. On that call, when the space requested is exactly zero bytes, the effect of the realloc() is to free the memory pointed to by lines. On subsequent calls, the space allocated is always one line's worth less than you think you are allocating.
It looks like the realloc() is altogether unnecessary if you can rely on the input to have at most 1000 lines, so you should consider just removing it. If you genuinely do need to be able to reallocate in a way that the caller will see, then the caller needs to pass a pointer to its variable, so that recursive_back() can modify it via that pointer.
I'm fairly new to C; been at it for 3 weeks in a class. I am having a bit of trouble with pointers, and am sure there is probably an easy fix. So basically, this program is supposed to read a word from an input file, store it in an array of pointers with memory allocation, print the word and the normalized form of the word (irrelevant process), and then reallocate the space so that the pointer array will grow as more words are inputted. However, I am having a bit of trouble getting the words to print and the array to reallocate (I currently have it set to a fixed size just to troubleshoot the whole printing aspect). Let me know if there is something wrong with my variable declarations, or if I am just making a stupid mistake please (I am sure it is the probably a combination of the two). Again, I'm very new to C, so I apologize if this is an easy question.
char * word_regular[100];
char * word_norm[100];
int main (int argc, char * argv[])
{
if (argc != 2){
printf("You have not entered a valid number of files.\n");
exit(1);
}
FILE * f_in = fopen(argv[1],"r");
int i = 0;
char word[512];
char norm_word[512];
while(fscanf(f_in, "%s", word) != EOF) {
if (is_valid_entry(word)) {
word_regular[i] = malloc(sizeof(char) * strlen(word) + 1);
strcpy(word_regular[i],word);
printf("%s\n",*word_regular[i]);
word_norm[i] = malloc(sizeof(char) * strlen(norm_word) + 1);
normalize(word, norm_word);
strcpy(word_norm[i],norm_word);
printf("%s\n", *word_norm[i]);
i++;
Some problems that are with your current code (ignoring the dynamic size need as opposed to fixed since you already said you are using that to debug),
printf("%s\n",*word_regular[i]);
%s takes a char * for printing, so it should be
printf("%s\n",word_regular[i]);
For the second printf, since norm_word itself is a char array,
you should simply use
printf("%s\n", &norm_word[i]);
If you want to print string starting from the ith index.
Update:
A quick tip is to pay attention whether you are copying the \0 with strings or not. Because your api calls, such as strlen would go beyond string crashing (or worst silently), unless it is null terminated.
The problem with your printf call is that you pass a char (*word_regular[i], *norm_word[i]) instead of char * (word_regular[i], word_norm[i]) when trying to print a string.
If you want to dynamically grow the array, you need to dynamically allocate it in the first place, so instead of declaring arrays of pointers:
char * word_regular[100];
char * word_norm[100];
You need to declare pointers to pointers:
char ** word_regular;
char ** word_norm;
Allocate an initial buffer for them (in a function, main for example):
word_regular = malloc(sizeof(char *) * INITIAL_AMOUNT);
Then reallocate them as needed.
word_regular = realloc(word_regular, sizeof(char *) * new_amount);
You will need to keep track of the amount of pointers in the arrays, and free them properly of course...
I'm trying to write a stream editor in C and I'm having a hard time dealing with strings. After reading in the lines of a File, I want to store them locally in an array of Strings. However, when I try to store the variable temp into the array of strings StoredEdits I get a segmentation fault (core dumped) error. Furthermore, if I uncomment the char* temp2 variable and save this into my array as a workaround, then the last value read in gets stored for every value in the array.
I assume this has to do with the fact that temp2 is a pointer. I've tried a million things like malloc'ing and free'ing this variable after each iteration, but nothing seems to work.
Any help would be greatly appreciated.
#define MAX_SIZE 100
typedef char String[MAX_SIZE];
int main(int argc, char* argv[]){
char** StoredEdits;
int index, numOfEdits;
FILE *EditFile;
char* temp;
//char* temp2;
StoredEdits = (char**)malloc(MAX_INPUT_SIZE*sizeof(String));
/*Check to see that edit file is passed in.*/
if(argc < 2){
printf("ERROR: Edit File not given\n");
return(EXIT_FAILURE);
}
printf("%s\n",argv[1]);
if( (EditFile = fopen(argv[1],"r")) != NULL ){
printf("file opened\n");
numOfEdits = 0;
while(fgets(temp, MAX_STRING_SIZE, EditFile) != NULL){
printf("%d %s",numOfEdits,temp);
//temp2 = temp;
StoredEdits[numOfEdits++] = temp;
//StoredEdits[numOfEdits++] = temp;
printf("Stored successfully\n");
}
..........
printf("%d\n",numOfEdits);
for(index=0;index<numOfEdits;index++){
printf("%d %s\n",index, StoredEdits[index]);
}
You need to initialize temp to point to valid storage.
temp = malloc(MAX_STRING_SIZE+1);
It looks like you may have intended to do something like this:
String temp;
using your macro. This would be better as a regular char array. And the common name for this is buffer.
char buffer[MAX_STRING_SIZE+1];
Then, you should store in your array, not temp itself, but a new string containing a copy of the contents. There is a POSIX function strdup that should be helpful here. Note, strdup is not part of the C standard, but it is available in most hosted implementations. Historically, it comes from the BSD branch.
StoredEdits[numOfEdits++] = strdup(temp);
Let me backpedal a little and say that if you're allocating new storage for temp inside the loop, then you should skip the strdup because, as Jim Balter says, this will leak memory. If you allocate temp outside of the loop, then it makes little difference whether you allocate it statically (by declaring a char []) or dynamically (with malloc).
By the way, this line will not buy you much:
typedef char String[MAX_SIZE];
For why, see the classic Kernighan (the K in K&R) essay Why Pascal is not my favorite Programming Language.
Also note, that my examples above do not check the pointer returned by malloc. malloc can fail. When malloc fails it will return a NULL pointer. If you try to store data through this pointer, Kaboom!
You're right about your problem being because of pointer semantics. You should use copy the contents of the string from temp.
char *cpy = malloc(1 + strlen(temp));
if (cpy)
strcpy(cpy, temp);
//else handle error
StoredEdits[numOfEdits++] = cpy;
Others answered the reason for the error.
But from the program, i see that you tried to allocate a character double array. then you store each line read from the file into the array.
StoredEdits = (char**)malloc(MAX_INPUT_SIZE*sizeof(String));
if my assumption is right, then you should pass the array into strcpy like the below.
strcpy(StoredEdits[numOfEdits],tmp);
when you have a file where each line varies in size, it is better to go array of pointers points to character array.
I'm trying to pass a string to chdir(). But I always seem to have some trailing stuff makes the chdir() fail.
#define IN_LEN 128
int main(int argc, char** argv) {
int counter;
char command[IN_LEN];
char** tokens = (char**) malloc(sizeof(char)*IN_LEN);
size_t path_len; char path[IN_LEN];
...
fgets(command, IN_LEN, stdin)
counter = 0;
tmp = strtok(command, delim);
while(tmp != NULL) {
*(tokens+counter) = tmp;
tmp = strtok(NULL, delim);
counter++;
}
if(strncmp(*tokens, cd_command, strlen(cd_command)) == 0) {
path_len = strlen(*(tokens+1));
strncpy(path, *(tokens+1), path_len-1);
// this is where I try to remove the trailing junk...
// but it doesn't work on a second system
if(chdir(path) < 0) {
error_string = strerror(errno);
fprintf(stderr, "path: %s\n%s\n", path, error_string);
}
// just to check if the chdir worked
char buffer[1000];
printf("%s\n", getcwd(buffer, 1000));
}
return 0;
}
There must be a better way to do this. Can any help out? I'vr tried to use scanf but when the program calls scanf, it just hangs.
Thanks
It looks like you've forgotten to append a null '\0' to path string after calling strncpy(). Without the null terminator chdir() doesn't know where the string ends and it just keeps looking until it finds one. This would make it appear like there are extra characters at the end of your path.
You have (at least) 2 problems in your example.
The first one (which is causing the immediately obvious problems) is the use of strncpy() which doesn't necessarily place a '\0' terminator at the end of the buffer it copies into. In your case there's no need to use strncpy() (which I consider dangerous for exactly the reason you ran into). Your tokens will be '\0' terminated by strtok(), and they are guaranteed to be smaller than the path buffer (since the tokens come from a buffer that's the same size as the path buffer). Just use strcpy(), or if you want the code to be resiliant of someone coming along later and mucking with the buffer sizes use something like the non-standard strlcpy().
As a rule of thumb don't use strncpy().
Another problem with your code is that the tokens allocation isn't right.
char** tokens = (char**) malloc(sizeof(char)*IN_LEN);
will allocate an area as large as your input string buffer, but you're storing pointers to strings in that allocation, not chars. You'll have fewer tokens than characters (by definition), but each token pointer is probably 4 times larger than a character (depending on the platform's pointer size). If your string has enough tokens, you'll overrun this buffer.
For example, assume IN_LEN is 14 and the input string is "a b c d e f g". If you use spaces as the delimiter, there will be 7 tokens, which will require a pointer array with 28 bytes. Quite a few more than the 14 allocated by the malloc() call.
A simple change to:
char** tokens = (char**) malloc((sizeof(char*) * IN_LEN) / 2);
should allocate enough space (is there an off-by-one error in there? Maybe a +1 is needed).
A third problem is that you potentially access *tokens and *(tokens+1) even if zero or only one token was added to the array. You'll need to add some checks of the counter variable before dereferencing those pointers.