reading a text file into an array in c - c

What would be the most efficient method of reading a text file into a dynamic one-dimensional array? reallocing after every read char seems silly, reallocing after every read line doesn't seem much better. I would like to read the entire file into the array. How would you do it?

I don't understand quite what you want. Do you want to incrementally process the file, reading one line from it, then abandon it and process the next? Or do you want to read the entire file into a buffer? If you want the latter, I think this is appropriate (check for NULL return for malloc and fopen in real code for whether the file exist and whether you got enough memory):
FILE *f = fopen("text.txt", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);
char *bytes = malloc(pos);
fread(bytes, pos, 1, f);
fclose(f);
hexdump(bytes); // do some stuff with it
free(bytes); // free allocated memory

If mmap(2) is available on your system, you can open the file and map it into memory. That way, you have no memory to allocate, you even don't have to read the file, the system will do it. You can use the fseek() trick litb gave to get the size.
void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);
EDIT: You have to use lseek() to obtain the size of the file, .
int fd = open("filename", O_RDONLY);
int nbytes = lseek(fd, 0, SEEK_END);
void *content = mmap(NULL, nbytes, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);

If you want to use ISO C, use this function.
It's litb's answer, wrapped with some error handling...

Related

fwrite keeps writing undesired text into my file

The output text file is only supposed to contain the contents of word1, but other things that are inside of my write function keep getting in there and I'm not sure why.
My main function:
int main(){
unsigned long size4;
char* word1 = "Hellllooooooo";
char* file_namee = "test.txt";
file_write(size4, word1, file_namee);
exit(0);
}
Here is my file write function:
int file_write(unsigned long size, char *output, char *file_name2){
FILE *file;
file = fopen(file_name2, "wb");
if(file == NULL){
printf("Cannot open file");
}
fwrite(output, 1, size, file);
fseek(file, 0, SEEK_END);
size = ftell(file);
rewind(file);
return size;
}
Here is what it outputs and writes into test.txt:
Hellllooooooo test.txt rb Cannot open file wb Cannot
Everything after "Hellllooooooo" is unexpected and I'm not sure why it's giving me that.
You're using an uninitialized variable. That size parameter is not necessary if you're using C strings. Skip it and:
fwrite(output, strlen(output), 1, file);
fwrite needs to know the length of the buffer which can come in many forms, often raw, so the length parameter cannot be inferred. You must supply it, but as you're using C strings, you can use strlen() to compute.

Convert ngx_chain_t into buffer memory

I'm looking for the way to convert ngx_chain_t object (with is already filled by nginx and ready response to client or pass to another filter) to buffer memory just like when we read the whole file into memory like this:
#include <stdio.h>
#define MAX 999999
char source[MAX + 1];
FILE *fp = fopen("thisfile", "r");
size_t newLen = fread(source, sizeof(char), MAX, fp);
source[++newLen] = '\0';
Now source is buffer memory that hold the whole content of thisfile in memory.
Is there any way to convert ngx_chain_t buffer into something likes source in this case?
Try fmemopen(3), read the man page 1st!
Maybe I did not understand the question.
But as I understood it, it was about replacing the fopen into something else the could read the inx_chain_t object structure like:
...
fp = fmemopen(object, MAX, "r");
newLen = fread(source, sizeof(char), MAX, fp);
...
Sorry if this still is a misunderstanding.

How can I return file contents from a function?

I am writing a C library. I want to return the contents of a file from this function to the caller.
How can I convert the file contents to char[]?
fopen is crashing since I am using perl.h in my C code.
Is there any other way to convert file into a char array apart from opening & reading the file?
Here is my code:
FILE* fp = fopen("console.txt", "r");
char message[1024];
strcpy(message,"\n");
char buf[80];
while(!feof(fp))
{
fgets(buf, sizeof(buf)-1, fp);
strcat(message, buf);
}
Get the file size (see How can I get a file's size in C?)
Allocate memory to store contents of the file (see malloc).
Read contents of the file into allocated memory (see read).
Return a pointer and a length of data in bytes to the user.
Don't forget to check for errors in between those steps.
You need to do a few things
Determine size of file (checkout fstat/stat)
You need to malloc enough memory to hold the file
You can then use fread to read the conents of the file into your array (dont forget to open the file in binary mode)
Return you pointer (and dont forget that it has to be freed after by the function caller)
Try this code. The function returns the string with file content, just print it.
#include <stdio.h>
#include <stdlib.h>
char *readFile(char *filename)
{
char * buffer = 0;
long length;
FILE * f = fopen (filename, "r");
if (f)
{
fseek (f, 0, SEEK_END);
length = ftell (f);
fseek (f, 0, SEEK_SET);
buffer = malloc (length);
if (buffer)
{
fread (buffer, 1, length, f);
}
fclose (f);
}
return buffer;
}
int main()
{
char *content=readFile("D://test1.xml");
printf("%s",content);
return 0;
}
Bit more information is required to give a good answer as the contents and nature of the file is required.
But generally,
If it is a text file see the stdio library - you can get the length, create an array and fill it. Otherwise (binary files) parsing it would be a good idea and return a data structure - a lot more useful

Reading all content from a text file - C

I am trying to read all content from a text file. Here is the code which I wrote.
#include <stdio.h>
#include <stdlib.h>
#define PAGE_SIZE 1024
static char *readcontent(const char *filename)
{
char *fcontent = NULL, c;
int index = 0, pagenum = 1;
FILE *fp;
fp = fopen(filename, "r");
if(fp) {
while((c = getc(fp)) != EOF) {
if(!fcontent || index == PAGE_SIZE) {
fcontent = (char*) realloc(fcontent, PAGE_SIZE * pagenum + 1);
++pagenum;
}
fcontent[index++] = c;
}
fcontent[index] = '\0';
fclose(fp);
}
return fcontent;
}
static void freecontent(char *content)
{
if(content) {
free(content);
content = NULL;
}
}
This is the usage
int main(int argc, char **argv)
{
char *content;
content = readcontent("filename.txt");
printf("File content : %s\n", content);
fflush(stdout);
freecontent(content);
return 0;
}
Since I am new to C, I wonder whether this code looks perfect? Do you see any problems/improvements?
Compiler used : GCC. But this code is expected to be cross platform.
Any help would be appreciated.
Edit
Here is the updated code with fread and ftell.
static char *readcontent(const char *filename)
{
char *fcontent = NULL;
int fsize = 0;
FILE *fp;
fp = fopen(filename, "r");
if(fp) {
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);
fcontent = (char*) malloc(sizeof(char) * fsize);
fread(fcontent, 1, fsize, fp);
fclose(fp);
}
return fcontent;
}
I am wondering what will be the relative complexity of this function?
You should try look into the functions fsize (About fsize, see update below) and fread. This could be a huge performance improvement.
Use fsize to get the size of the file you are reading. Use this size to do one alloc of memory only. (About fsize, see update below. The idea of getting the size of the file and doing one alloc is still the same).
Use fread to do block reading of the file. This is much faster than single charecter reading of the file.
Something like this:
long size = fsize(fp);
fcontent = malloc(size);
fread(fcontent, 1, size, fp);
Update
Not sure that fsize is cross platform but you can use this method to get the size of the file:
fseek(fp, 0, SEEK_END);
size = ftell(fp);
fseek(fp, 0, SEEK_SET);
People often realloc to twice the existing size to get amortized constant time instead of linear. This makes the buffer no more than twice as large, which is usually okay, and you have the option of reallocating back down to the correct size after you're done.
But even better is to stat(2) for the file size and allocate once (with some extra room if the file size is volatile).
Also, why you don't either fgets(3) instead of reading character by character, or, even better, mmap(2) the entire thing (or the relevant chunk if it's too large for memory).
It is probably slower and certainly more complex than:
while((c = getc(fp)) != EOF) {
putchar(c);
}
which does the same thing as your code.
This is from a quick reading, so I might have missed a few issues.
First, a = realloc(a, ...); is wrong. If realloc() fails, it returns NULL, but doesn't free the original memory. Since you reassign to a, the original memory is lost (i.e., it is a memory leak). The right way to do this is to do: tmp = realloc(a, ...); if (tmp) a = tmp; etc.
Second, about determining the file size using fseek(fp, 0, SEEK_END);, note that this may or may not work. If the file is not random-access (such as stdin), you won't be able to go back to the beginning to read it. Also, fseek() followed by ftell() may not give a meaningful result for binary files. And for text files, it may not give you the right number of characters that can be read. There is some useful information on this topic on comp.lang.c FAQ question 19.2.
Also, in your original code, you don't set index to 0 when it equals PAGESIZE, so if your file length is greater than 2*PAGESIZE, you will overwrite the buffer.
Your freecontent() function:
static void freecontent(char *content)
{
if(content) {
free(content);
content = NULL;
}
}
is useless. It only sets a copy of content to NULL. It is just like if you wrote a function setzero like this:
void setzero(int i) { i = 0; }
A much better idea is to keep track of memory yourself and not free anything more or less than needed.
You shouldn't cast the return value of malloc() or realloc() in C, since a void * is implicitly converted to any other object pointer type in C.
Hope that helps.
One problem I can see here is variable index which is non-decreasing. So the condition
if(!fcontent || index == PAGE_SIZE) will be true only once. So I think check should be like
index%PAGE_SIZE == 0 instead of index == PAGE_SIZE.
On POSIX systems (e.g linux) you could get the same effect with the system call mmap that maps all your file in memory. It has an option to map that file copy on write, so you would overwrite your file if you change the buffer.
This would usually be much more efficient, since you leave as much as you can to the system. No need to do realloc or similar.
In particular, if you are only reading and several processes do that at the same time there would be only one copy in memory for the whole system.

C - Saving a Small File As a String

Hey guys...In C, I wish to read a file in my current working directory (nothing fancy), into a string. Later, I'd like to print that string out so I can do some work on it. I'm more used to Java, so I'm getting my chops on C and would love an explanation on how to do this! Thanks fellas...
Here's a C program that will read a file and print it as a string. The filename is passed as an argument to the program. Error checking would be a good thing to add.
int main(int argc, char *argv[])
{
FILE *f;
char *buffer;
size_t filesize;
f = fopen(argv[1], "r");
// quick & dirty filesize calculation
fseek(f, 0, SEEK_END);
filesize = ftell(f);
fseek(f, 0, SEEK_SET);
// read file into a buffer
buffer = malloc(filesize);
fread(buffer, filesize, 1, f);
printf("%s", buffer);
// cleanup
free(buffer);
return 0;
}
You will use:
FILE *f = fopen(filename, "r");
To open the file. If that returns non-null, you can use:
char buf[MAXIMUM_LINE_SIZE]; /* pick something for MAXIMUM_LINE_SIZE... */
char *p;
while ((p=fgets(buf, sizeof(buf), f)))
{
/* Do something with the line pointed to by p */
}
To do something more sophisticated (not bounded by an arbitrary size, or spanning multiple lines) you'll want to learn about dynamic memory allocation: the functions malloc(), realloc(), free()...
Some links to help you:
manpages for file I/O: fopen, fgets, fclose
for memory allocation: malloc
Also, just to throw it out there: If you are interested in writing C++ instead of C, that also has its own file I/O and string stuff that you may find helpful, and you won't have to do all the memory allocations yourself. But even then, it's probably good to understand the C way also.
You might start with fopen and fread.

Resources