I want to load a txt file into an array like file() does in php. I want to be able to access different lines like array[N] (which should contain the entire line N from the file), then I would need to remove each array element after using it to the array will decrease size until reaching 0 and the program will finish. I know how to read the file but I have no idea how to fill a string array to be used like I said. I am using gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) to compile.
How can I achieve this?
Proposed algorithm:
Use fseek, ftell, fseek to seek to end, determine file length, and seek back to beginning.
malloc a buffer big enough for the whole file plus null-termination.
Use fread to read the whole file into the buffer, then write a 0 byte at the end.
Loop through the buffer byte-by-byte and count newlines.
Use malloc to allocate that number + 1 char * pointers.
Loop through the buffer again, assigning the first pointer to point to the beginning of the buffer, and successive pointers to point to the byte after a newline. Replace the newline bytes themselves with 0 (null) bytes in the process.
One optimization: if you don't need random access to the lines (indexing them by line number), do away with the pointer array and just replace all the newlines with 0 bytes. Then s+=strlen(s)+1; advances to the next line. You'll need to add some check to make sure you don't advance past the end (or beginning if you're doing this in reverse) of the buffer.
Either way, this method is very efficient (no memory fragmentation) but has a couple drawbacks:
You can't individually free lines; you can only free the whole buffer once you finish.
You have to overwrite the newlines. Some people prefer to have them kept in the in-memory structure.
If the file ended with a newline, the last "line" in your pointer array will be zero-length. IMO this is the sane interpretation of text files, but some people prefer considering the empty string after the last newline a non-line and considering the last proper line "incomplete" if it doesn't end with a newline.
I suggest you read your file into an array of pointers to strings which would allow you to index and delete the lines as you have specified. There are efficiency tradeoffs to consider with this approach as to whether you count the number of lines ahead of time or allocate/extend the array as you read each line. I would opt for the former.
Read the file, counting the number of line terminators you see (ether \n or \r\n)
Allocate a an array of char * of that size
Re-read the file, line by line, using malloc() to allocate a buffer for each and pointed to by the next array index
For your operations:
Indexing is just array[N]
Deleting is just freeing the buffer indexed by array[N] and setting the array[N] entry to NULL
UPDATE:
The more memory efficient approach suggested by #r.. and #marc-van-kempen is a good optimization over malloc()ing each line at a time, that is, slurp the file into a single buffer and replace all the line terminators with '\0'
Assuming you've done that and you have a big buffer as char *filebuf and the number of lines is int num_lines then you can allocate your indexing array something like this:
char *lines[] = (char **)malloc(num_lines + 1); // Allocates array of pointers to strings
lines[num_lines] = NULL; // Terminate the array as another way to stop you running off the end
char *p = filebuf; // I'm assuming the first char of the file is the start of the first line
int n;
for (n = 0; n < num_lines; n++) {
lines[i] = p;
while (*p++ != '\0') ; // Seek to the end of this line
if (n < num_lines - 1) {
while (*p++ == '\0') ; // Seek to the start the next line (if there is one)
}
}
With a single buffer approach "deleting" a line is merely a case of setting lines[n] to NULL. There is no free()
Two slightly different ways to achieve this, one is more memory friendly, the other more cpu friendly.
I memory friendly
Open the file and get its size (use fstat() and friends) ==> size
allocate a buffer of that size ==> char buf[size];
scan through the buffer counting the '\n' (or '\n\r' == DOS or '\r' == MAC) ==> N
Allocate an array: char *lines[N]
scan through the buffer again and point lines[0] to &buf[0], scan for the first '\n' or '\r' and set it to '\0' (delimiting the string), set lines[1] to the first character after that that is not '\n' or '\r', etc.
II cpu friendly
Create a linked list structure (if you don't know how to do this or don't want to, have a look at 'glib' (not glibc!), a utility companion of gtk.
Open the file and start reading the lines using fgets(), malloc'ing each line as you go along.
Keep a linked list of lines ==> list and count the total number of lines
Allocate an array: char *lines[N];
Go through the linked list and assign the pointer to each element to its corresponding array element
Free the linked list (not its elements!)
Related
What is the best way to add a '\r' to my new-line sequence? I am writing a string to a file and doing various parsing to it but according to my assignment spec a new line will be considered '\r\n'.
Right now I only have a new line at the end. I was thinking of a for loop and/or using memmove but not sure exactly how to make it work?
for (int x = 0;x < strlen(string);x++)
{
if (string[x] == '\n')
{
..............
}
}
The algorithm is something along the lines of:
Check that the last two characters in string aren't already "\r\n", if they are return string.
Check whether the last character in string has either '\r' or '\n', set a flag.
Allocate strlen(string) + 2 bytes to hold the new string, if the flag is set, otherwise allocate strlen(string) + 3 bytes.
Calculate bytes to copy as strlen(string) - 1 if flag is set, otherwise strlen(string).
Copy number of bytes to copyfrom string to the allocated storage.
Append "\r\n\0" to the end of the bytes copied above.
Return the string just created in allocated storage.
EDIT: Your mileage may vary if you can make assumptions about the state of string. If it resides in a large enough buffer to add characters to the end of it, no new allocation would be required. If your char type is not one byte, you would need to adjust accordingly.
My program takes in files with arbitrarily long lines. Since I don't know how much characters would be on a line, I would like to print the whole line to stdout, without malloc-ing an array to store it. Is this possible?
I am aware that it's possible to print these lines one chunk at a time-- however, the function doing the printing would be called very often, and I wish to avoid the overhead of malloc-ing arrays that hold the output, in every single call.
First of all you can't print things that's not exist, means that you have to store it somewhere, either in the stack or heap. If you use FILE* then libc will do it for you automatically.
Now if you use FILE*, you can use getc to get an ASCII character a time, check if the character is a newline character and push it to stdout.
If you's using file descriptor, you can read a character a time and do exactly the same thing.
Both approaches does not require you explicitly allocate memory in the heap.
Now if you use mmap, you can perform some strtok family function and then print the string to stdout.
takes in files with arbitrarily long lines ... print the whole line to stdout, without malloc-ing an array to store it. Is this possible?
In general, for arbitrary long lines: no.
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. C11dr ยง7.21.2 2
The length of a line is not limited to SIZE_MAX, the longest array possible in C. The length of a line can exceed the memory capacity of the computer. There is just no way to read arbitrary long lines. Simply code could use the following. I doubt it will be satisfactory, yet it does print the entire contents of a file with scant memory.
// Reads one character at a time.
int ch;
while((ch = fgetc(fp)) != EOF) {
putchar(ch);
}
Instead, code should set a sane upper bound on line length. Create an array or allocate for the line. As much as a flexible long line is useful, it is also susceptible to malicious abuse by a hacker exploit consuming unrestrained resources.
#define LINE_LENGTH_MAX 100000
char *line = malloc(LINE_LENGTH_MAX + 1);
if (line) {
while (fgets(line, LINE_LENGTH_MAX+1, fp)) {
if (strlen(line) >= LINE_LENGTH_MAX) {
Handle_Possible_Attach();
}
foo(line); // Use line
}
free(line);
)
I am reading a line from a file and I do not know the length it is going to be. I know there are ways to do this with pointers but I am specifically asking for just a plan char string. For Example if I initialize the string like this:
char string[300]; //(or bigger)
Will having large string values like this be a problem?
Any hard coded number is potentially too small to read the contents of a file. It's best to compute the size at run time, allocate memory for the contents, and then read the contents.
See Read file contents with unknown size.
char string[300]; //(or bigger)
I am not sure which of the two issues you are concerned with, so I will try to address both below:
if the string in the file is larger than 300 bytes and you try to "stick" that string in that buffer, without accounting the max length of your array -you will get undefined behaviour because of overwriting the array.
If you are just asking if 300 bytes is too much too allocate - then no, it is not a big deal unless you are on some very restricted device. e.g. In Visual Studio the default stack size (where that array would be stored) is 1 MB if I am not wrong. Benefits of doing so is understandable, e.g. you don't need to concern yourself with freeing it etc.
PS. So if you are sure the buffer size you specify is enough - this can be fine approach as you free yourself from memory management related issues - which you get from pointers and dynamic memory.
Will having large string values like this be a problem?
Absolutely.
If your application must read the entire line from a file before processing it, then you have two options.
1) Allocate buffer large enough to hold the line of maximum allowed length. For example, the SMTP protocol does not allow lines longer than 998 characters. In that case you can allocate a static buffer of length 1001 (998 + \r + \n + \0). Once you have read a line from a file (or from a client, in the example context) which is longer than the maximum length (that is, you have read 1000 characters and the last one is not \n), you can treat it as a fatal (protocol) error and report it.
2) If there are no limitations on the length of the input line, the only thing you can do to ensure your program robustness is allocating buffers dynamically as the input is read. This may involve storing multiple malloc-ed buffers in a linked list, or calling realloc when buffer exhaustion detected (this is how getline function works, although it is not specified in the C standard, only in POSIX.1-2008).
In either case, never use gets to read the line. Call fgets instead.
It all depends on how you read the line. For example:
char string[300];
FILE* fp = fopen(filename, "r");
//Error checking omitted
fgets(string, 300, fp);
Taken from tutorialspoint.com
The C library function char *fgets(char *str, int n, FILE *stream) reads a line from the specified stream and stores it into the string pointed to by str. It stops when either (n-1) characters are read, the newline character is read, or the end-of-file is reached, whichever comes first.
That means that this will read 299 characters from the file at most. This will cause only a logical error (because you might not get all the data you need) that won't cause any undefined behavior.
But, if you do:
char string[300];
int i = 0;
FILE* fp = fopen(filename, "r");
do{
string[i] = fgetc(fp);
i++;
while(string[i] != '\n');
This will cause Segmantation Fault because it will try to write on unallocated memory on lines bigger than 300 characters.
I am trying to read Data from a Text file & storing it inside a structure having one char pointer & an int variable.
During fetching data from file I know that there will be one string to fetch & one integer value.
I also know the position form where I have to start fetching.
What I don't know is size of the string.
So, how can I allocate memory for that String.
Sample code is here :
struct filevalue
{
char *string;
int integer;
} value;
fseek(ptr,18,SEEK_SET);//seeking from start of file to position from where I get String
fscanf(ptr,"%s",value.string);//ptr is file pointer
fseek(ptr,21,SEEK_CUR);//Now seeking from current position
fscanf(ptr,"%d",value.integer);
Thanks in advance for your help.
Either
malloc the maximum possible length
read that much into the malloc'd block
figure out where the real end of the string is
write a \0 into your malloc'd block there so it behaves correctly as a nul-terminated string (and/or save the length too in case you need it)
optionally realloc your block to the correct size
Or
malloc a reasonable guesstimate N for the length
read that much
if you can't find the end of the string in that buffer:
grow the buffer with realloc to 2N (for example) and read the next N bytes into the end
goto 3
write a \0 etc. as above
You said in a comment that the max. string length is bounded, so the first approach is probably fine. You haven't said how you figure out where the string ends, but I'm assuming there is some delimiter, or it's right-filled with spaces, or something.
Did you mean to SEEK_CUR in your second fseek()? if so, then you know the length of the string. Used a fixed sized buffer.
If you know the position of the first structure, and the position of the second structure, you also know the total length of the first structure (position of second - position of first). You also know the size of the integer part of the structure, and therefore you can easily calculate the length of the string.
off_t pos1; /* Position of first structure */
off_t pos2; /* Position of second structure */
size_t struct_len = pos2 - pos1;
size_t string_len = struct_len - sizeof(int);
i assume you open the file in binary mode since you use fseek.
you could read from the file using fgetc() since you don't know the size just allocate a buffer with some initial size like 100, then read char by char placing them into the buffer. monitor if the buffer is large enough to hold the characters and if not realloc() the buffer to a larger size.
i m using fgets to read line form .txt file. i m passing an array as the first argument. different lines fill in different amount of space in the array, but i want to know the exact length of the line that is read and make decision based in that. is it possible?
FILE * old;
old = fopen("m2p1.txt","r");
char third[100];
fgets(third,sizeof(third),old);
now if i ask for sizeof(third), its obviously 100 because i declared so myself (i cant declare 'third' array without specifying the size) but i need to get the exact size of the line read from file(as it may not fill in the enitre array).
is it possible? what should do?
If fgets succeeds, it'll read a string into your buffer. use strlen() to find its length.
char third[100];
if(fgets(third,sizeof(third),old) != NULL) {
size_t len = strlen(third);
..
}