How to read multiple .txt files into a single buffer?

How to read multiple .txt files into a single buffer? - c

I am trying to read multiple text files into a single char* array in C. I can get as far as allocating the char* to the correct size (i.e. the sizes of all the text files summed up).
I tried to read each file, one by one, into their own buffer, then concatenate that buffer onto the end of the one that contains them all. This is all done in a for-loop. But when I print it out to make sure it worked, only the last file that was read is printed out.
I have also tried fread, but that seems to overwrite the buffer that it writes to, rather than append to the end of it.
Here is my code, most of it is from another SO thread:
for(int i = 2; i < argc; i++) {
char *buffer = NULL;
size_t size = 0;
/* Get the buffer size */
fseek(file, 0, SEEK_END); /* Go to end of file */
size = ftell(file); /* How many bytes did we pass ? */
/* Set position of stream to the beginning */
rewind(file);
/* Allocate the buffer (no need to initialize it with calloc) */
buffer = malloc((size + 1) * sizeof(*buffer)); /* size + 1 byte for the \0 */
/* Read the file into the buffer */
fread(buffer, size, 1, file); /* Read 1 chunk of size bytes from fp into buffer */
/* NULL-terminate the buffer */
buffer[size] = '\0';
allFiles = strcat(allFiles, buffer);
free(buffer);
fclose(file);
}
Please help me out, I am stumped by what seems like a simple thing to do in C. Thanks.

It sounds like you're doing everything correct, but you need to increment the pointer before you pass it to fread for the next file otherwise you'll overwrite the beginning of the file over and over.
Assuming buf is the correct size for all the files +1 for the nul byte and files is an array of char *'s containing the filenames NUM_FILES long, you'll need to do something like this.
char *p = buf;
for(int i = 0; i < NUM_FILES; i++) {
FILE *f = fopen(files[i], "rb");
fseek(f, 0, SEEK_END);
long bytes = ftell(f);
fseek(f, 0, SEEK_SET);
fread(p, (size_t)bytes, 1, f);
p += bytes;
fclose(f);
}
*p = 0;

Related

Dynamically allocating memory to an array and reading a large text file

I've had a look at some other similar questions and examples but I'm stumped. My goal is to open a very large text file (novel sized), allocate memory to an array, and then store the text into that array so I'm able to do further processing in the future.
This is my current code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LINELEN 74
int main(void) {
FILE *file;
char filename[] = "large.txt";
int count = 0, i = 0, len;
/* Open the file */
file = fopen(filename, "r");
if (file == NULL) {
printf("Cannot open file");
return -1;
}
/* Get size of file for memory allocation */
fseek(file, 0, SEEK_END);
long size = ftell(file);
fseek(file, 0, SEEK_SET);
/* Allocate memory to the array */
char *text_array = (char*)malloc(size*sizeof(char));
/* Store the information into the array */
while(fgets(&text_array[count], LINELEN, file) != NULL) {
count++;
}
len = sizeof(text_array) / sizeof(text_array[0]);
while(i<len) {
/* printf("%s", text_array); */
i++;
}
printf("%s", text_array);
/* return array */
return EXIT_SUCCESS;
}
I was expecting to have a large body of text printed from text_array at the bottom. Instead I get a garbled mess of random characters much smaller than the body of text I was hoping for. What am I doing wrong? I suspect it has something to do with my memory allocation but don't know what.
Any help is much appreciated.

There's no need to call fgets() in a loop. You know how big the file is, just read the entire thing into text_array with one call:
fread(text_array, 1, size, file);
However, if you want to treat text_array as a string, you need to add a null terminator. So you should add 1 when calling malloc().
Another problem is len = sizeof(text_array) / sizeof(text_array[0]). text_array is a pointer, not an array, so you can't use sizeof to get the amount of space it uses. But you don't need to do that, since you already have the space in the size variable.
There's no need to print text_array in a loop.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LINELEN 74
int main(void) {
FILE *file;
char filename[] = "large.txt";
int count = 0, i = 0, len;
/* Open the file */
file = fopen(filename, "r");
if (file == NULL) {
printf("Cannot open file");
return -1;
}
/* Get size of file for memory allocation */
fseek(file, 0, SEEK_END);
size_t size = ftell(file);
fseek(file, 0, SEEK_SET);
/* Allocate memory to the array */
char *text_array = (char*)malloc(size*sizeof(char) + 1);
/* Store the information into the array */
fread(text_array, 1, size, file);
text_array[size] = '\0';
printf("%s, text_array);
/* return array */
return EXIT_SUCCESS;
}

This part
while(fgets(&text_array[count], LINELEN, file) != NULL) {
count++;
}
is problematic.
If the loop is un-rolled it's "kind of like":
fgets(&text_array[0], LINELEN, file)
fgets(&text_array[1], LINELEN, file)
fgets(&text_array[2], LINELEN, file)
So you only advance the fgetsdestination buffer by a single char between each fgets call. If we assume the fgets reads more than a single character, the second fgets overwrites data from the first fgets. The third fgets overwrites data from the second and so on.
You need to advance the buffer with as many characters as fgets actually read or use another way of reading, e.g. fread.

Copy a file with buffers of different sizes for read and write

I have been doing some practice problems for job interviews and I came across a function that I can't wrap my mind on how to tackle it. The idea is to create a function that takes the name of two files, and the allowed buffer size to read from file1 and allowed buffer size for write to file2. if the buffer size is the same, I know how to go trough the question, but I am having problems figuring how to move data between the buffers when the sizes are of different. Part of the constraints is that we have to always fill the write buffer before writing it to file. if file1 is not a multiple of file2, we pad the last buffer transfer with zeros.
// input: name of two files made for copy, and their limited buffer sizes
// output: number of bytes copied
int fileCopy(char* file1,char* file2, int bufferSize1, int bufferSize2){
int bytesTransfered=0;
int bytesMoved=o;
char* buffer1, *buffer2;
FILE *fp1, *fp2;
fp1 = fopen(file1, "r");
if (fp1 == NULL) {
printf ("Not able to open this file");
return -1;
}
fp2 = fopen(file2, "w");
if (fp2 == NULL) {
printf ("Not able to open this file");
fclose(fp1);
return -1;
}
buffer1 = (char*) malloc (sizeof(char)*bufferSize1);
if (buffer1 == NULL) {
printf ("Memory error");
return -1;
}
buffer2 = (char*) malloc (sizeof(char)*bufferSize2);
if (buffer2 == NULL) {
printf ("Memory error");
return -1;
}
bytesMoved=fread(buffer1, sizeof(buffer1),1,fp1);
//TODO: Fill buffer2 with maximum amount, either when buffer1 <= buffer2 or buffer1 > buffer2
//How do I iterate trough file1 and ensuring to always fill buffer 2 before writing?
bytesTransfered+=fwrite(buffer2, sizeof(buffer2),1,fp2);
fclose(fp1);
fclose(fp2);
return bytesTransfered;
}
How should I write the while loop for the buffer transfers before the fwrites?

I am having problems figuring how to move data between the buffers when the sizes are of different
Layout a plan. For "some practice problems for job interviews", a good plan and ability to justify it is important. Coding, although important, is secondary.
given valid: 2 FILE *, 2 buffers and their sizes
while write active && read active
while write buffer not full && reading active
if read buffer empty
read
update read active
append min(read buffer length, write buffer available space) of read to write buffer
if write buffer not empty
pad write buffer
write
update write active
return file status
Now code it. A more robust solution would use a struct to group the FILE*, buffer, size, offset, length, active variables.
// Return true on problem
static bool rw(FILE *in_s, void *in_buf, size_t in_sz, FILE *out_s,
void *out_buf, size_t out_sz) {
size_t in_offset = 0;
size_t in_length = 0;
bool in_active = true;
size_t out_length = 0;
bool out_active = true;
while (in_active && out_active) {
// While room for more data
while (out_length < out_sz && in_active) {
if (in_length == 0) {
in_offset = 0;
in_length = fread(in_buf, in_sz, 1, in_s);
in_active = in_length > 0;
}
// Append a portion of `in` to `out`
size_t chunk = min(in_length, out_sz - out_length);
memcpy((char*) out_buf + out_length, (char*) in_buf + in_offset, chunk);
out_length += chunk;
in_length -= chunk;
in_offset += chunk;
}
if (out_length > 0) {
// Padding only occurs, maybe, on last write
memset((char*) out_buf + out_length, 0, out_sz - out_length);
out_active = fwrite(out_buf, out_sz, 1, out_s) == out_sz;
out_length = 0;
}
}
return ferror(in_s) || ferror(out_s);
}
Other notes;
Casting malloc() results not needed. #Gerhardh
// buffer1 = (char*) malloc (sizeof(char)*bufferSize1);
buffer1 = malloc (sizeof *buffer1 * bufferSize1);
Use stderr for error messages. #Jonathan Leffler
Open the file in binary.
size_t is more robust for array/buffer sizes than int.
Consider sizeof buffer1 vs. sizeof (buffer1) as parens not needed with sizeof object

while(bytesMoved > 0) {
for(i=0; i<bytesMoved && i<bufferSize2; i++)
buffer2[i]=buffer1[i];
bytesTransfered+=fwrite(buffer2, i,1,fp2);
bytesMoved-=i;
}
If bufferSize1 is smaller than the filesize you need an outer loop.

As the comments to your question have indicated, this solution is not the best way to transfer data from 1 file to another file. However, your case has certain restrictions, which this solution accounts for.
(1) Since you are using a buffer, you do not need to read and write 1 char at a time, but instead you can make as few calls to those functions possible.
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
:from the man page for fread, nmemb can = bufferSize1
(2) You will need to check the return from fread() (i.e. bytesMoved) and compare it with both of the bufferSize 1 and 2. If (a) bytesMoved (i.e. return from fread()) is equal to bufferSize1 or if (b) bufferSize2 is less than bufferSize1 or the return from fread(), then you know that there is still data that needs to be read (or written). So, therefore you should begin the next transfer of data, and when completed return to the previous step you left off on.
Note: The pointer to the File Stream in fread() and fwrite() will begin where it left off in the event that the data is larger than the bufferSizes.
PseudoCode:
/* in while() loop continue reading from file 1 until nothing is left to read */
while (bytesMoved = fread(buffer1, sizeof(buffer1), bufferSize1, fp1))
{
/* transfer from buffer1 to buffer2 */
for(i = 0; i < bytesMoved && i < bufferSize2; i++)
buffer2[i] = buffer1[i];
buffer2[i] = '\0';
iterations = 1; /* this is just in case your buffer2 is super tiny and cannot store all from buffer1 */
/* in while() loop continue writing to file 2 until nothing is left to write
to upgrade use strlen(buffer2) instead of bufferSize2 */
while (bytesTransfered = fwrite(buffer2, sizeof(buffer2), bufferSize2, fp2))
{
/* reset buffer2 & write again from buffer1 to buffer2 */
for(i = bufferSize2 * iterations, j = 0; i < bytesMoved && j < bufferSize2; i++, j++)
buffer2[j] = buffer1[i];
buffer2[j] = '\0';
iterations++;
}
/* mem reset buffer1 to prepare for next data transfer*/
}

not getting all data in file using fopen

I'm using the fopen with fread for this:
FILE *fp;
if (fopen_s(&fp, filePath, "rb"))
{
printf("Failed to open file\n");
//exit(1);
}
fseek(fp, 0, SEEK_END);
int size = ftell(fp);
rewind(fp);
char buffer = (char)malloc(sizeof(char)*size);
if (!buffer)
{
printf("Failed to malloc\n");
//exit(1);
}
int charsTransferred = fread(buffer, 1, size, fp);
printf("charsTransferred = %d, size = %d\n", charsTransferred, strlen(buffer));
fclose(fp);
I'm not getting the file data in the new file. Here is a comparison between the original file (right) and the one that was sent over the network (left):
Any issues with my fopen calls?
EDIT: I can't do away with the null terminators, because this is a PDF. If i get rid of them the file will corrupt.

Be reassured: the way you're doing the read ensures that you're reading all the data.
you're using "rb" so even in windows you're covered against CR+LF conversions
you're computing the size all right using ftell when at the end of the file
you rewind the file
you allocate properly.
BUT you're not storing the right variable type:
char buffer = (char)malloc(sizeof(char)*size);
should be
char *buffer = malloc(size);
(that very wrong and you should correct it, but since you successfully print some data, that's not the main issue. Next time enable and read the warnings. And don't cast the return value of malloc, it's error-prone specially in your case)
Now, the displaying using printf and strlen which confuses you.
Since the file is binary, you meet a \0 somewhere, and printf prints only the start of the file. If you want to print the contents, you have to perform a loop and print each character (using charsTransferred as the limit).
That's the same for strlen which stops at the first \0 character.
The value in charsTransferred is correct.
To display the data, you could use fwrite to stdout (redirect the output or this can crash your terminal because of all the junk chars)
fwrite(buffer, 1, size, stdout);
Or loop and print only if the char is printable (I'd compare ascii codes for instance)
int charsTransferred = fread(buffer, 1, size, fp);
int i;
for (i=0;i<charsTransferred;i++)
{
char b = buffer[i];
putchar((b >= ' ') && (b < 128) ? b : "-");
if (i % 80 == 0) putchar('\n'); // optional linefeed every now and then...
}
fflush(stdout);
that code prints dashes for characters outside the standard printable ASCII-range, and the real character otherwise.

How to fwrite to pointer of pointer?

I have a function which should store content of a file to pointer of pointer - content. When I am trying to check the result of fwrite function - it returns nothing to writn. What am I doing wrong here? Did I allocate memory correctly (if I want to copy the whole file)?
bool load(FILE* file, BYTE** content, size_t* length)
{
int len = 0, writn = 0;
fseek(file, 0, SEEK_END);
*length = len = ftell(file);
rewind(file);
*content = (char) malloc((len + 1) * sizeof(char)); //(len + 1) * sizeof(char)
writn = fwrite(*content, len + 1, 1, file);
return true;
}

You probably opened the file for reading "r" mode and fwrite() will write into the file, not read. If this is the case fwrite() will fail of course.
Perhaps you simply need
// Use long int for `length' to avoid a problem with `ftell()'
// read the documentation
bool load(FILE* file, BYTE **content, long int *length)
{
fseek(file, 0, SEEK_END);
*length = ftell(file);
rewind(file);
if (*length == -1)
return false;
*content = malloc(*length + 1);
if (*content == NULL)
return false;
if (fread(*content, 1, *length, file) != *length) {
free(*content);
*content = NULL;
return false;
}
(*content)[*length] = '\0';
return true;
}
You also, try to "read" more data than available, since you get the file length and still try to read 1 more byte.

What I see this function do is:
determine the size of the file;
allocate a chunk of memory that size;
write that chunk to the file.
This assumes that file is opened for reading and writing. The fseek seeks to the end of the file (a read operation); following the rewind the chunk is written (write operation). If the file is only opened for writing, then fseek will probably fail, so your size will be zero. If the file is only open for reading, then your fwrite will fail. In addition, tou write uninitialized data to the file (the allocated memory has not been ininitialized).
Is this what it is supposed to do?

How do you save the characters of a file into a string or array?

For example, if I had a file name random.txt, which reads:
This is a string.
Abc
Zxy
How would you save the characters in random.txt to a string or array that includes all of the characters in the text file?
So far I have (using redirection for file)
#include <stdio.h>
#include <string.h>
int main () {
int c;
do {
c = fgetc(stdin);
putchar(c);
} while (c != EOF);
return 0;
}

First part: About file handles
The stdin variable holds a FILE handle to which the user input is redirected (the FILE data type is defined in stdio.h). You can create handles to files using the function FILE *fopen(const char *path, const char *mode).
Your example applied to a regular file would be something like this (no error checking is done):
int main() {
int c;
FILE *myfile = fopen("path/to/file", "r"); //Open file for reading
while(!feof(myfile)) {
c = fgetc(myfile);
//do stuff with 'c'
//...
}
fclose(myfile); //close the file
return 0;
}
More information about fopen here: http://linux.die.net/man/3/fopen
Second part: About C strings
C strings (char arrays terminated with the null character '\0') can be defined in several ways. One of them is by statically defining them:
char mystring[256]; //This defines an array of 256 bytes (255 characters plus end null)
It is very important to take care about the limits of the buffer. In our example, writing beyond 256 bytes in the buffer will make the program crash. If we assume our file will not have lines longer than 255 characters (including line terminators like \r and \n) we can use the fgets function (http://linux.die.net/man/3/fgets):
char *fgets(char *s, int size, FILE *stream);
Simple (newbie) example:
int main() {
char mystring[256];
FILE *myfile = fopen("path/to/file", "r"); //Open file for reading
while(!feof(myfile)) {
if (fgets(mystring, sizeof(mystring), myfile) != NULL) {
printf("%s", mystring);
}
}
fclose(myfile); //close the file
return 0;
}
Notice that fgets is used for reading lines. If you want to read characters 1 by 1, you should keep using fgetc and pushing them manually into a buffer.
Finally, if you want to read a whole text file into a C string (no error checking):
int main() {
FILE *myfile = fopen("path/to/file", "r"); //Open file for reading
//Get the file size
fseek(myfile, 0, SEEK_END);
long filesize = ftell(myfile);
fseek(myfile, 0, SEEK_SET);
//Allocate buffer dynamically (not statically as in previous examples)
//We are reserving 'filesize' bytes plus the end null required in all C strings.
char *mystring = malloc(filesize + 1); //'malloc' is defined in stdlib.h
fread(mystring, filesize, 1, myfile); //read all file
mystring[filesize] = 0; //write the end null
fclose(myfile); //close file
printf("%s", mystring); //dump contents to screen
free(mystring); //deallocate buffer. 'mystring' is no longer usable.
return 0;
}

The following will work on either stdin or an actual file (i.e. you can replace "stream" with stdin, or fopen() it with a filename), dynamically allocating as it goes. After it runs, "ptr" will be a pointer to an array holding the contents of the file.
/* read chars from stream in blocks of 4096 bytes,
dynamically allocating until eof */
size_t bytes_read = 0;
char * ptr = NULL;
while (1) {
size_t chunk_read;
/* increase size of allocation by 4096 bytes */
ptr = realloc(ptr, bytes_read + 4096);
/* read up to 4096 bytes to the newest portion of allocation */
chunk_read = fread(ptr + bytes_read, 1, 4096, stream);
bytes_read += chunk_read;
/* if fread() got less than the full amount of characters, break */
if (chunk_read < 4096) break;
}
/* resize pointer downward to actual number of bytes read,
plus an explicit null termination */
bytes_read += 1;
ptr = realloc(ptr, bytes_read);
ptr[bytes_read - 1] = '\0';

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to read multiple .txt files into a single buffer? - c

Related

Dynamically allocating memory to an array and reading a large text file

Copy a file with buffers of different sizes for read and write

not getting all data in file using fopen

How to fwrite to pointer of pointer?

How do you save the characters of a file into a string or array?

Categories

Resources