I'm trying to read a file into an array of strings using getline() and realloc(). I have used very similar code in the past for tokenizing strings and everything worked correctly. I'll consider the sample input file as
1
2
3
4
5
Here's the code:
char** read_input(const char* input_file, char* line){
FILE *fp;
size_t len = 0;
size_t nums = 0;
ssize_t read;
char** res = NULL;
if ((fp = fopen(input_file, "r")) == NULL){
printf("Incorrect file\n", strerror(errno));
exit(EXIT_FAILURE);
}
while ((read = getline(&line, &len, fp)) != -1){
if ((res = realloc(res, sizeof(char*) * ++nums)) == NULL)
exit(EXIT_FAILURE);
char to_strip[sizeof(read) * sizeof(char)];
strcpy(to_strip, line);
if (line[read - 1] == '\n')
to_strip[read - 1] = 0;
else
line[read] = 0;
res[nums - 1] = to_strip;
}
free(line);
if ((res = realloc(res, sizeof(char*) * (nums + 1))) == NULL)
exit(EXIT_FAILURE);
res[nums - 1] = 0;
return res;
}
After the loop, if I print the contents of the array, I get:
5
5
5
5
5
despite the fact that if I call print inside the loop, after each assignment to res, I get the right numbers. This is really stumping me, because I can't see what could be wrong except with realloc, but I thought that realloc preserved array contents. Thanks.
You're busy invoking undefined behaviour because you're storing a pointer to to_strip in the reallocated array each time, and the pointer goes out of scope each iteration of the loop, and gets overwritten on each iteration which is why you see the same value at the end. If you printed all the values in the loop rather than just the current value, you'd see first 1, then 2 2, then 3 3 3, then 4 4 4 4 and finally 5 5 5 5 5. If you did enough work after returning from this function before printing the results, you'd see garbage as the space would be used for other purposes.
You need to make copies of the lines you store in your reallocated array. The simplest is to use strdup():
res[nums - 1] = strdup(to_strip);
Don't forget to release the strings as well as the array of pointers to the strings.
Don't forget to close the file you opened before you return.
It seems odd to pass line into the function. It must either be a null pointer or pointing to space that can be passed to realloc(). Since you then free() the space before returning, the calling function needs to know that you've released the space it passed you — and since you didn't know how big it was, you told getline() it was of size zero, so it had to be released. The interface would be cleaner without that parameter; use a local char *line = 0; at the start of the function.
This is a problem:
ssize_t read;
char to_strip[sizeof(read) * sizeof(char)];
strcpy(to_strip, line);
sizeof(read) - probably 4 or 8 - is a strange amount to allocate for a buffer that you are copying a string into. I think you meant char to_strip[ read + 1 ];. However, later on you have the line:
res[nums - 1] = to_strip;
which places a pointer to to_strip into res. However, to_strip ceases to exist at the end of the for loop, so these will be wild pointers. If your intent is to store all of the text read from the file for later access then you will need to allocate memory for each line.
Jonathan Leffler's suggestion of strdup is probably the simplest solution to this; thanks to him for clearing up my erroneous use of getline.
You could also do away with to_strip entirely, as you could just overwrite the \n directly in line.
Related
I found this piece of code at Reading a file character by character in C and it compiles and is what I wish to use. My problem that I cannot get the call to it working properly. The code is as follows:
char *readFile(char *fileName)
{
FILE *file = fopen(fileName, "r");
char *code;
size_t n = 0;
int c;
if (file == NULL)
return NULL; //could not open file
code = malloc(1500);
while ((c = fgetc(file)) != EOF)
{
code[n++] = (char) c;
}
code[n] = '\0';
return code;
}
I am not sure of how to call it. Currently I am using the following code to call it:
.....
char * rly1f[1500];
char * RLY1F; // This is the Input File Name
rly1f[0] = readFile(RLY1F);
if (rly1f[0] == NULL) {
printf ("NULL array); exit;
}
int n = 0;
while (n++ < 1000) {
printf ("%c", rly1f[n]);
}
.....
How do I call the readFile function such that I have an array (rly1f) which is not NULL? The file RLY1F exists and has data in it. I have successfully opened it previously using 'in line code' not a function.
Thanks
The error you're experiencing is that you forgot to pass a valid filename. So either the program crashes, or fopen tries to open a trashed name and returns NULL
char * RLY1F; // This is not initialized!
RLY1F = "my_file.txt"; // initialize it!
The next problem you'll have will be in your loop to print the characters.
You have defined an array of pointers char * rly1f[1500];
You read 1 file and store it in the first pointer of the array rly1f[0]
But when you display it you display the pointer values as characters which is not what you want. You should just do:
while (n < 1000) {
printf ("%c", rly1f[0][n]);
n++;
}
note: that would not crash but would print trash if the file read is shorter than 1000.
(BLUEPIXY suggested the post-incrementation fix for n BTW or first character is skipped)
So do it more simply since your string is nul-terminated, pass the array to puts:
puts(rly1f[0]);
EDIT: you have a problem when reading your file too. You malloc 1500 bytes, but you read the file fully. If the file is bigger than 1500 bytes, you get buffer overflow.
You have to compute the length of the file before allocating the memory. For instance like this (using stat would be a better alternative maybe):
char *readFile(char *fileName, unsigned int *size) {
...
fseek(file,0,SEEK_END); // set pos to end of file
*size = ftell(file); // get pos, i.e. size
rewind(file); // set pos to 0
code = malloc(*size+1); // allocate the proper size plus one
notice the extra parameter which allows you to return the size as well as the file data.
Note: on windows systems, text files use \r\n (CRLF) to delimit lines, so the allocated size will be higher than the number of characters read if you use text mode (\r\n are converted to \n so there are less chars in your buffer: you could consider a realloc once you know the exact size to shave off the unused allocated space).
Every time I try to allocate memory for array of strings, my program fails with an error message "error reading characters of strings".
What am I doing wrong? Is there any problem with my memory allocation?
char** file2array(char *filename)
{
FILE *fp = fopen(filename, "r");
int i;
int max_line_len = 20;
int num_lines = 6;
char** lines = (char **) malloc(sizeof(char*) * (num_lines + 1));
fseek(fp, 0, SEEK_END);
int chars_num = ftell(fp);
rewind(fp);
for (i = 1; i < (num_lines + 1); i++) {
lines[i] = (char *) malloc(max_line_len + 1);
fgets(lines[i], chars_num, fp);
}
fclose(fp);
return lines;
}
You appear to have not presented the code that actually emits the error message, so it's hard to be certain what it's complaining about. You certainly do, however, have a problem with your loop indexing:
for (i = 1; i < (num_lines + 1); i++) {
lines[i] = /* ... */
Variable i runs from 1 through num_lines, never taking the value 0, thus the first pointer in your dynamic array of pointers (at index 0) is never initialized. That's a plausible explanation for why some later code might have a problem reading the characters.
Additionally, you use fgets() wrongly. You tell it that your buffers are chars_num bytes long, but they are actually max_line_len + 1 bytes long. This presents the possibility of a buffer overrun.
Indeed, I'm uncertain what the point of chars_num is supposed to be. You initialize it to the length of the file (provided that that length fits in an int), but I'm not seeing how that's useful.
Furthermore, your function may return unexpected results if the file being read has lines longer than you account for (20 bytes, including line terminator) or a different number of lines than you expect (6).
I don't see any inherent problem with your memory allocation, but it wouldn't be StackOverflow if I didn't give you a severe tongue-lashing for casting the return value of malloc(). Consider yourself flogged.
I am trying to deconstruct a document into its respective paragraphs, and input each paragraphs, as a string, into an array. However, each time a new value is added, it overwrites all previous values in the array. The last "paragraph" read (as denoted by newline) is the value of each non-null value of the array.
Here is the code:
char buffer[MAX_SIZE];
char **paragraphs = (char**)malloc(MAX_SIZE * sizeof(char*));
int pp = 0;
int i;
FILE *doc;
doc = fopen(argv[1], "r+");
assert(doc);
while((i = fgets(buffer, sizeof(buffer), doc) != NULL)) {
if(strncmp(buffer, "\n", sizeof(buffer))) {
paragraphs[pp++] = (char*)buffer;
}
}
printf("pp: %d\n", pp);
for(i = 0; i < MAX_SIZE && paragraphs[i] != NULL; i++) {
printf("paragraphs[%d]: %s", i, paragraphs[i]);
}
The output I receive is:
pp: 4
paragraphs[0]: paragraph four
paragraphs[1]: paragraph four
paragraphs[2]: paragraph four
paragraphs[3]: paragraph four
when the program is run as follows: ./prog.out doc.txt, where doc.txt is:
paragraph one
paragraph two
paragraph three
paragraph four
The behavior of the program is otherwise desired. The paragraph count works properly, ignoring the line that contains ONLY the newline character (line 4).
I assume the problem occurs in the while loop, however am unsure how to remedy the problem.
Your solution is pretty sound. Your Paragraph array is supposed to hold each paragraph, and since each paragraph element is just a small 4 bytes pointer you can afford to define a reasonable max number of them. However, since this max number is a constant, it is of little use to allocate the array dynamically.
The only meaningful use of dynamic allocation would be to read the whole text once to count the actual number of paragraphs, allocate the array accordingly and re-read the whole file a second time, but I doubt this is worth the effort.
The downside of using fixed-size paragraph array is that you must stop filling it once you reach the maximal number of elements.
You can then re-allocate a bigger array if you absolutely want to be able to process the whole Bible, but for an educational exercise I think it's reasonable to just stop recording paragraphs (thus producing a code that can store and count paragraphs up to a maximal number).
The real trouble with your code is, you don't store the paragraph contents anywhere. When you read the actual lines, it's always inside the same buffer, so each paragraph will point to the same string, which will contain the last paragraph read.
The solution is to make a unique copy of the buffer and have the current paragraph point to that.
C being already messy enough as it is, I suggest using the strdup() function, which duplicates a string (basically computing string length, allocating sufficient memory, copying the string into it and returning the new block of memory holding the new copy). You just need to remember to free this new copy once you're done using it (in your case at the end of your program).
This is not the most time-efficient solution, since each string will require a strlen and a malloc performed internally by strdump while you could have pre-allocated a big buffer for all paragraphs, but it is certainly simpler and probably more memory-efficient (only the minimal amount of memory will be allocated for each string, though each malloc consumes a few extra bytes for internal allocator housekeeping).
The bloody awkward fgets also stores the trailing \n at the end of the line, so you'll probably want to get rid of that.
Your last display loop would be simpler, more robust and more efficient if you simply used pp as a limit, instead of checking uninitialized paragraphs.
Lastly, you'd better define two different constants for max line size and max number of paragraphs. Using the same value for both makes little sense, unless you're processing perfectly square texts :).
#define MAX_LINE_SIZE 82 // max nr of characters in a line (including trailing \n and \0)
#define MAX_PARAGRAPHS 100 // max number of paragraphs in a file
void main (void)
{
char buffer[MAX_LINE_SIZE];
char * paragraphs[MAX_PARAGRAPHS];
int pp = 0;
int i;
FILE *doc;
doc = fopen(argv[1], "r+");
assert(doc != NULL);
while((fgets(buffer, sizeof(buffer), doc) != NULL)) {
if (pp != MAX_PARAGRAPHS // make sure we don't overflow our paragraphs array
&& strcmp(buffer, "\n")) {
// fgets awkwardly collects the ending \n, so get rid of it
if (buffer[strlen(buffer)-1] == '\n') buffer[strlen(buffer)-1] = '\0';
// current paragraph references a unique copy of the actual text
paragraphs[pp++] = strdup (buffer);
}
}
printf("pp: %d\n", pp);
for(i = 0; i != pp; i++) {
printf("paragraphs[%d]: %s", i, paragraphs[i]);
free(paragraphs[i]); // release memory allocated by strdup
}
}
What is the proper way to allocate the necessary memory? Is the malloc on line 2 not enough?
No, you need to allocate memory for the 2D array of strings you created. The following will not work as coded.
char **paragraphs = (char**)malloc(MAX_SIZE * sizeof(char*));
If you have: (for a simple explanation)
char **array = {0}; //array of C strings, before memory is allocation
Then you can create memory for it like this:
int main(void)
{
int numStrings = 10;// for example, change as necessary
int maxLen = MAX_SIZE; //for example, change as necessary
char **array {0};
array = allocMemory(array, numStrings, maxLen);
//use the array, then free it
freeMemory(array, numStrings);
return 0;
}
char ** allocMemory(char ** a, int numStrings, int maxStrLen)
{
int i;
a = calloc(sizeof(char*)*(numStrings+1), sizeof(char*));
for(i=0;i<numStrings; i++)
{
a[i] = calloc(sizeof(char)*maxStrLen + 1, sizeof(char));
}
return a;
}
void freeMemory(char ** a, int numStrings)
{
int i;
for(i=0;i<numStrings; i++)
if(a[i]) free(a[i]);
free(a);
}
Note: you can determine the number of lines in a file several ways, One way for example, by FILE *fp = fopen(filepath, "r");, then calling ret = fgets(lineBuf, lineLen, fp) in a loop until ret == EOF, keeping count of an index value for each loop. Then fclose(). (which you did not do either) This necessary step is not included in the code example above, but you can add it if that is the approach you want to use.
Once you have memory allocated, Change the following in your code:
paragraphs[pp++] = (char*)buffer;
To:
strcpy(paragraphs[pp++], buffer);//no need to cast buffer, it is already char *
Also, do not forget to call fclose() when you are finished with the open file.
My goal is to read a file, and save each elements in this file into a new array..
rewind(fp); ii = 0; while (!feof(fp)) {
ii ++;
fscanf(fp, "%s\n", filename_i);
fp_i = fopen(filename_i, "r");
if (fp_i == NULL) {
fprintf(stderr, "can't open input file %s \n", filename_i);
exit(1);
}
filename_ii[ii] = filename_i;
printf("%s, %d\n", filename_ii[ii],ii);
fclose(fp_i);
}
printf("a %s %d\n",filename_ii[9],DataSize[2]);
printf("a %s %d\n",filename_ii[1],DataSize[2]);
In while() function, my output is each elements, but I don't know why the last two printf() returns the same results, i.e, it seems like both filename_ii[1] and filename_ii[9] point the last element in the file. Does anyone have ideas about what's wrong in my code? Thank you~
You need to use strcpy to copy a string. Change:
filename_ii[ii] = filename_i; // this just assigns a pointer -
// it doesn't actually copy a string
to:
strcpy(filename_ii[ii], filename_i); // copy the *contents* of `filename_i`
// to `filename_ii[ii]`
This assumes of course that the filename_ii array has been correctly initialised and is not just an array of dangling char * pointers (not possible to tell from the code as currently posted in the question).
Note that if filename_ii is just an array of uninitialised char * pointers then you can use strdup to handle the memory allocation and copying all in one convenient function call. In which case you would change the line above to:
filename_ii[ii] = strdup(filename_i); // allocate memory to `filename_ii[ii]` and
// copy the *contents* of `filename_i`
// to `filename_ii[ii]`
Stop using feof()/fscanf() like that, it's super-brittle and needlessly hard to get right.
Instead:
char line[1024]; /* or whatever makes you feel comfortable */
while(fgets(line, sizeof line, fp) != NULL)
{
size_t len = strlen(line);
if(len == 1) /* Ignore blank lines. */
continue;
if(line[len - 1] == '\n')
line[--len] = '\0'; /* Remove linefeed. */
if(access(line, R_OK) == 0)
strcpy(filename_ii[ii++], line);
}
This:
Uses fgets() to read in a whole line.
Uses access() to check if the file can open. Note that this kind of checking is always prone to race-conditions.
Uses strcpy() to copy the filename, assuming filename_ii[] is a properly set up array.
I'm trying to open a file with fopen, but I don't want a static location so I am getting the string in from the user when he/she runs the program.
However if a user does not enter one a default file is specified.
Can I just put the malloc var in to the fopen path parameter?
char *file_path_mem = malloc(sizeof(char));
if (file_path_mem != NULL) //Null if out of memory
{
printf("Enter path to file, if in current directory then specify name\n");
printf("File(default: marks.txt): ");
while ((c = (char)getchar()) != '\n')
{
file_path_mem[i++] = c;
file_path_mem = realloc(file_path_mem, i+1 * sizeof(char));
}
file_path_mem[i] = '\0';
if (i == 0 && c == '\n')
{
file_path_mem = realloc(file_path_mem, 10 * sizeof(char);
file_path_mem = "marks.txt";
}
}
else
{
printf("Error: Your system is out of memory, please correct this");
return 0;
}
if (i==0)
{
FILE *marks_file = fopen("marks.txt", "r");
}
else
{
FILE *marks_file = fopen(file_path_mem, "r");
}
free(file_path_mem);
As you might have guess I am a c novice so if I have done something horrible wrong, then sorry.
This is not doing what you think it is:
file_path_mem = realloc(file_path_mem, 10 * sizeof(char);
file_path_mem = "marks.txt";
What you want to do is change that second line to copy the default name into the allocated buffer:
strcpy(file_path_mem, "marks.txt");
You CAN pas the char* returned from malloc straight into fopen. Just make sure it contains valid data. Though make sure you strcpy the new string in as R Samuel points out.
Btw, your use of realloc will give pretty awful performance. Realloc works by seeing if there is enogh space to grow the memory (This is not guranteed, realloc must just return a block that is the newSize containing the old data). If there isn't enough space, it allocs a new block of the new size and copies the old data ot the new block. Obviously this is suboptimal.
You are better off allocating a block of, say, 16 chars and then , if you need more than 16, realloc'ing a further 16 chars. There will be a bit of memory wastage but you are not, potentially, copying anything like as much memory around.
Edit: Further to that. For a string that contains 7 character. Using your realloc every byte scheme you will generate the following processes.
alloc 1 byte
alloc 2 bytes
copy 1 byte
free 1 byte
alloc 3 bytes
copy 2 bytes
free 2 bytes
alloc 4 bytes
copy 3 bytes
free 3 bytes
alloc 5 bytes
copy 4 bytes
free 4 bytes
alloc 6 bytes
copy 5 bytes
free 5 bytes
alloc 7 bytes
copy 6 bytes
free 6 bytes
alloc 8 bytes
copy 7 bytes
This is 8 allocations, 7 frees and a copy of 28 bytes. Not to mention the 8 bytes you assign to the array (7 characters + 1 null terminator).
Allocations are SLOW. Frees are SLOW. Copying is much slower than not copying.
For this example using my allocate 16 bytes at a time system you alloc once, assign 8 bytes. No copies and the only free is when you've finished with it. You DO waste 8 bytes though. But ... well ... 8 bytes is nothing in the grand scheme of things...
realloc is relatively more expensive than the rest of your loop, so you might as well just start with a 128 byte buffer, and realloc another 128 bytes if you fill that up.
I suggest some of the following changes:
define your default location at the top of the file
char* defaultLocation = 'myfile.txt';
char* locationToUse;
use constants instead of hard coding numbers in your code (like the 10 you have in there)
int DEFAULT_INPUT_BUFFER_SIZE = 128;
char* userInputBuffer = malloc(sizeof(char) * DEFAULT_INPUT_BUFFER_SIZE) );
int bufferFillIndex = 0;
only reallocate infrequently, and not at all if possible (size 128 buffer)
while ((c = (char)getchar()) != '\n')
{
file_path_mem[bufferFillIndex++] = c;
if (bufferFillIndex % DEFAULT_INPUT_BUFFER_SIZE == 0)
{
realloc(file_path_mem, (bufferFillIndex + DEFAULT_INPUT_BUFFER_SIZE) * sizeof(char);
}
}
Should not be an issue here but for realocation you should state the new size like that
file_path_mem = realloc(file_path_mem, (i+1) * sizeof(char));
You could simplify this greatly with getline(), assuming it is available on your system:
printf("Enter path to file, if in current directory then specify name\n");
printf("File(default: marks.txt): ");
int bufSize = 100;
char* fileNameBuf = (char*) malloc(bufSize + 1);
int fileNameLength = getline(&fileNameBuf, &bufSize, stdin);
if(fileNameLength < 0)
{
fprintf(stderr, "Error: Not enough memory.\n")
exit(1);
}
/* strip line end and trailing whitespace. */
while(fileNameLength && fileNameBuf[fileNameLength - 1] <= ' ')
--fileNameLength;
fileNameBuf[fileNameLength] = 0;
if(!fileNameLength)
{
free(fileNameBuf); /* Nothing entered; use default. */
fileNameBuf = "marks.txt";
}
FILE *marks_file = fopen(fileNameBuf, "r");
if(fileNameLength)
free(fileNameBuf);
I don't know if your C supports in-block declarations, so I'll let you fix that if needed.
EDIT: Here's a getline tutorial.
As a learning experience I would second the suggestions of others to expand your buffer by more than one byte. It won't make any difference if it's not in a loop but calling realloc() on each byte would be unacceptable in other contexts.
One trick that works fairly well is to double the size of the buffer every time it fills up.
On the other hand, if you really just want to open a file and don't care about writing the world's greatest name loop, then standard procedure is to simplify the code even at the price of a limit on file name length. There is no rule that says you have to accept silly pathnames.
FILE *read_and_open(void)
{
char *s, space[1000];
printf("Enter path to file, if in current directory then specify name\n");
printf("File(default: marks.txt): ");
fgets(space, sizeof space, stdin);
if((s = strchr(space, '\n')) != NULL)
*s = 0;
if (strlen(space) == 0)
return fopen("marks.txt", "r");
return fopen(space, "r");
}