Handle memory while reading long lines from a file in C - c

First of all, I know this question is very close to this topic, but the question was so poorly worded that I am not even sure it is a duplicate plus no code were shown so I thought it deserved to be asked properly.
I am trying to read a file line by line and I need to store a line in particular in a variable. I have managed to do so quite easily using fgets, nevertheless the size of the lines to be read and the number of lines in the file remain unknown.
I need a way to properly allocate memory to the variable whatever the size of the line might be, using C and not C++.
So far my code looks like that :
allowedMemory = malloc(sizeof(char[1501])); // Checks if enough memory
if (NULL == allowedMemory)
{
fprintf(stderr, "Not enough memory. \n");
exit(1);
}
else
char* res;
res = allowedMemory;
while(fgets(res, 1500, file)) // Iterate until end of file
{
if (res == theLineIWant) // Using strcmp instead of ==
return res;
}
The problem of this code is that it is not adaptable at all. I am looking for a way to allocate just enough memory to res so that I don't miss any data in line.
I was thinking about something like that :
while ( lineContainingKChar != LineContainingK+1Char) // meaning that the line has not been fully read
// And using strcmp instead of ==
realloc(lineContainingKChar, K + 100) // Adding memory
But I would need to iterate through two FILE object in order to fill these variables which would not be very efficient.
Any hints about how to implement this solution or advise about how to do it in a easier way would be appreciated.
EDIT : Seems like using getline() is the best way to do so because this function allocates the memory needed by itself and free it when needed. Nevertheless I don't think that it is 100% portable since I still can't use it though I have included <stdio.h>. To be verified though, since my issues are often situated between keyboard and computer. Until then I am still open to a solution which would not use POSIX-compliant C.

getline() appears to do exactly what you want:
DESCRIPTION
The getdelim() function shall read from stream until it encounters a
character matching the delimiter character.
...
The getline() function shall be equivalent to the getdelim()
function with the delimiter character equal to the <newline>
character.
...
EXAMPLES
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE *fp;
char *line = NULL;
size_t len = 0;
ssize_t read;
fp = fopen("/etc/motd", "r");
if (fp == NULL)
exit(1);
while ((read = getline(&line, &len, fp)) != -1) {
printf("Retrieved line of length %zu :\n", read);
printf("%s", line);
}
if (ferror(fp)) {
/* handle error */
}
free(line);
fclose(fp);
return 0;
}
And per the Linux man page:
DESCRIPTION
getline() reads an entire line from stream, storing the address of
the buffer containing the text into *lineptr. The buffer is null-
terminated and includes the newline character, if one was found.
If *lineptr is set to NULL and *n is set 0 before the call, then
getline() will allocate a buffer for storing the line. This buffer
should be freed by the user program even if getline() failed.
Alternatively, before calling getline(), *lineptr can contain a
pointer to a malloc(3)-allocated buffer *n bytes in size. If the
buffer is not large enough to hold the line, getline() resizes it
with realloc(3), updating *lineptr and *n as necessary.
In either case, on a successful call, *lineptr and *n will be
updated to reflect the buffer address and allocated size respectively.

Related

How to read a line in a file, without know how long is (C language)

i have a file("career.txt") where each line are composed in this way:
serial number(long int) name(string) surname(string) exam_id(string) exam_result(string);
each line could have from 1 to 25 couple composed by the exam_id(ex: INF070) and the exam_result(ex:30L).
Ex line:
333145 Name Surname INF120 24 INF070 28 INF090 R INF100 30L INF090 24
33279 Name Surname GIU123 28 GIU280 27 GIU085 21 GIU300 R
I don't know how many couple there are in one line(there could be 1, 5 or 25)(so i can't use a for cycle) and i have to read it.
How can i read the entire line?
I tried use getline and fgets, and then divide the string with strtok, but I don't know why it doesn't work.
If you don't know the maximum number of characters in a line of the text file, one way to read the entire line is to dynamically allocate memory for the buffer as you read the file.
You can use the getline function to do this which is a POSIX function.
Here is an example of how you can use getline to read a line from a file:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *buffer = NULL;
size_t bufsize = 0;
ssize_t characters;
FILE *file = fopen("file.txt", "r");
if (file == NULL) {
printf("Could not open file\n");
return 1;
}
while ((characters = getline(&buffer, &bufsize, file)) != -1) {
printf("Line: %s", buffer);
}
free(buffer);
fclose(file);
return 0;
}
The getline function works by allocating memory for the buffer automatically, so you don't have to specify a fixed buffer size. It takes three arguments: a pointer to a pointer to a character (in this case, &buffer), a pointer to a size_t variable (in this case, &bufsize), and a pointer to a FILE object representing the file you want to read from. The function reads a line from the file and stores it in the buffer, dynamically allocating more memory if necessary. The getline function returns the number of characters read, which includes the newline character at the end of the line, or -1 if it reaches the end of the file. And in the example above, each line is read and printed. And it is important to release the memory after finished processing the buffer, in this case by free(buffer).

Difference between specifications of fread and fgets?

What is the difference between fread and fgets when reading in from a file?
I use the same fwrite statement, however when I use fgets to read in a .txt file it works as intended, but when I use fread() it does not.
I've switched from fgets/fputs to fread/fwrite when reading from and to a file. I've used fopen(rb/wb) to read in binary rather than standard characters. I understand that fread will get /0 Null bytes as well rather than just single lines.
//while (fgets(buff,1023,fpinput) != NULL) //read in from file
while (fread(buff, 1, 1023, fpinput) != 0) // read from file
I expect to read in from a file to a buffer, put the buffer in shared memory, and then have another process read from shared memory and write to a new file.
When I use fgets() it works as intended with .txt files, but when using fread it adds a single line from 300~ characters into the buffer with a new line. Can't for the life of me figure out why.
fgets will stop when encountering a newline. fread does not. So fgets is typically only useful for text files, while fread can be used for both text and binary files.
From the C11 standard:
7.21.7.2 The fgets function
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
7.21.8.1 The fread function
The fread function reads, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. For each object, size calls are made to the fgetc function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object. The file position indicator for the stream (if defined) is advanced by the number of characters successfully read. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate.
This snippet maybe will make things clearer for you. It just copies a file in chunks.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char ** argv)
{
if(argc != 3) {
printf("Usage: ./a.out src dst\n");
printf("Copies file src to dst\n");
exit(EXIT_SUCCESS);
}
const size_t chunk_size = 1024;
FILE *in, *out;
if(! (in = fopen(argv[1], "rb"))) exit(EXIT_FAILURE);
if(! (out = fopen(argv[2], "wb"))) exit(EXIT_FAILURE);
char * buffer;
if(! (buffer = malloc(chunk_size))) exit(EXIT_FAILURE);
size_t bytes_read;
do {
// fread returns the number of successfully read elements
bytes_read = fread(buffer, 1, chunk_size, in);
/* Insert any modifications you may */
/* want to do here */
// write bytes_read bytes from buffer to output file
if(fwrite(buffer, 1, bytes_read, out) != bytes_read) exit(EXIT_FAILURE);
// When we read less than chunk_size we are either done or an error has
// occured. This error is not handled in this program.
} while(bytes_read == chunk_size);
free(buffer);
fclose(out);
fclose(in);
}
You mentioned in a comment below that you wanted to use this for byteswapping. Well, you can just use the following snippet. Just insert it where indicated in code above.
for(int i=0; i < bytes_read - bytes_read%2; i+=2) {
char tmp = buffer[i];
buffer[i] = buffer[i+1];
buffer[i+1] = tmp;
}

How to avoid calling fopen() with a buffer that is not null-terminated in C?

Let's look at this example:
static FILE *open_file(const char *file_path)
{
char buf[80];
size_t n = snprintf(buf, sizeof (buf), "%s", file_path);
assert(n < sizeof (buf));
return fopen(buf, "r");
}
Here, the assert() is off-by-one. From the manpage for snprintf:
"Upon successful return, these functions return the number of characters printed (excluding the null byte used to end output to strings)."
So, if it returns 80, then the string will fill the buffer, and won't be terminated by \0. This will cause a problem because fopen() assumes it is null terminated.
What is the best way to prevent this?
So, if it returns 80, then the string will fill the buffer, and won't be terminated by \0
That is incorrect: the string would be null-terminated no matter what you pass for file_path. Obviously, the string would be cut off at the sizeof(buf)-1.
Note that snprintf could return a number above 80 as well. This would mean that the string you wanted to print was longer than the buffer you have provided.
What is the best way to prevent this?
You are already doing it: the assert is not necessary for preventing unterminated strings. You can use the return value to decide if any truncation has happened, and pass a larger buffer to compensate:
// Figure out the size
size_t n = snprintf(NULL, 0, "%s", file_path);
// Allocate the buffer and print into it
char *tmpBuf = malloc(n+1);
snprintf(tmpBuf, n+1, "%s", file_path);
// Prepare the file to return
FILE *res = fopen(tmpBuf, "r");
// Free the temporary buffer
free(tmpBuf);
return res;
What is the best way to prevent this?
Simple, don't give it a non-null terminated string. Academic questions aside, you are in control of the code you write. You don't have to protect against yourself sabotaging the project in every conceivable way, you just have to not sabotage yourself.
If everyone checked and double checked everything in code, the performance loss would be incredible. There's a reason why fopen doesn't do it.
There are a couple of issues here.
First assert() is used to catch issues as a part of designer testing. It is not meant to be used in production code.
Secondly if the file path is not complete then do you really want to call fopen()?
Normally what is done is to add one to the expected number of characters.
static FILE *open_file(const char *file_path)
{
char buf[80 + 1] = {0};
size_t n = snprintf(buf, 80, "%s", file_path);
assert(n < sizeof (buf));
return fopen(buf, "r");
}

Invalid Argument Reported By getdelim

I'm trying to use the getdelim function to read an entire text file's contents into a string.
Here is the code I am using:
ssize_t bytesRead = getdelim(&buffer, 0, '\0', fp);
This is failing however, with strerror(errno) saying "Error: Invalid Argument"
I've looked at all the documentation I could and just can't get it working, I've tried getline which does work but I'd like to get this function working preferably.
buffer is NULL initialised as well so it doesn't seem to be that
fp is also not reporting any errors and the file opens perfectly
EDIT: My implementation is based on an answer from this stackoverflow question Easiest way to get file's contents in C
Kervate, please enable compiler warnings (-Wall for gcc), and heed them. They are helpful; why not accept all the help you can get?
As pointed out by WhozCraig and n.m. in comments to your original question, the getdelim() man page shows the correct usage.
If you wanted to read records delimited by the NUL character, you could use
FILE *input; /* Or, say, stdin */
char *buffer = NULL;
size_t size = 0;
ssize_t length;
while (1) {
length = getdelim(&buffer, &size, '\0', input);
if (length == (ssize_t)-1)
break;
/* buffer has length chars, including the trailing '\0' */
}
free(buffer);
buffer = NULL;
size = 0;
if (ferror(input) || !feof(input)) {
/* Error reading input, or some other reason
* that caused an early break out of the loop. */
}
If you want to read the contents of a file into a single character array, then getdelim() is the wrong function.
Instead, use realloc() to dynamically allocate and grow the buffer, appending to it using fread(). To get you started -- this is not complete! -- consider the following code:
FILE *input; /* Handle to the file to read, assumed already open */
char *buffer = NULL;
size_t size = 0;
size_t used = 0;
size_t more;
while (1) {
/* Grow buffer when less than 500 bytes of space. */
if (used + 500 >= size) {
size_t new_size = used + 30000; /* Allocate 30000 bytes more. */
char *new_buffer;
new_buffer = realloc(buffer, new_size);
if (!new_buffer) {
free(buffer); /* Old buffer still exists; release it. */
buffer = NULL;
size = 0;
used = 0;
fprintf(stderr, "Not enough memory to read file.\n");
exit(EXIT_FAILURE);
}
buffer = new_buffer;
size = new_size;
}
/* Try reading more data, as much as fits in buffer. */
more = fread(buffer + used, 1, size - used, input);
if (more == 0)
break; /* Could be end of file, could be error */
used += more;
}
Note that the buffer in this latter snippet is not a string. There is no terminating NUL character, so it's just an array of chars. In fact, if the file contains binary data, the array may contain lots of NULs (\0, zero bytes). Assuming there was no error and all of the file was read (you need to check for that, see the former example), buffer contains used chars read from the file, with enough space allocated for size. If used > 0, then size > used. If used == 0, then size may or may not be zero.
If you want to turn buffer into a string, you need to decide what to do with the possibly embedded \0 bytes -- I recommend either convert to e.g. spaces or tabs, or move the data to skip them altogether --, and add the string-terminating \0 at end to make it a valid string.

How to copy one line from long string C

I'm looking to copy the FIRST line from a LONG string P into a buffer
I have no idea how to make it.
while (*pros_id != '/n'){
*pros_id_line=*pros_id;
pros_id++;
pros_id_line++;
}
And tried
fgets(pros_id_line, sizeof(pros_id_line), pros_id);
Both are not working. Can I get some help please?
Note, as Adriano Repetti pointed out in a comment and an answer, that the newline character is '\n' and not '/n'.
Your initial code can be fixed up to work, provided that the destination buffer is big enough:
while (*pros_id != '\n' && *pros_id != '\0')
*pros_id_line++ = *pros_id++;
*pros_id_line = '\0';
This code does not include the newline in the copied buffer; it is easy enough to add it if you need it.
One advantage of this code is that it makes a single pass through the data up to the newline (or end of string). An alternative makes two passes through the data, one to find the newline and another to copy to the newline:
if ((end = strchr(pros_id, '\n')) != 0)
{
memmove(pros_id_line, pros_id, end - pros_id);
pros_id_line[end - pros_id] = '\0';
}
This ensures that the string is null-terminated; again, it omits the newline, and assumes there is enough space in the pros_id_line buffer for the data. You have to decide what is the correct behaviour when there is no newline in the buffer. It might be sufficient to copy the buffer without the newline into the target area, or you might prefer to report a problem.
You can use strncpy() instead of memmove() but it has a more complex loop condition than memmove() — it has to check for a null byte as well as the count, whereas memmove() only has to check the count. You can use memcpy() instead of memmove() if you're sure there's no overlap between source and target, but memmove() always works and memcpy() sometimes doesn't (though only when the source and target areas overlap), and I prefer reliability over possible misbehaviour.
Note that setting a buffer to zero before copying a string to it is a waste of energy. The parts that you're about to overwrite with data didn't need to be zeroed. The parts that you aren't going to overwrite with data didn't need to be zeroed either. You should know exactly which byte needs to be zeroed, so why waste the time on zeroing anything except the one byte that needs to be zeroed?
(One exception to this is if you are dealing with sensitive data and are concerned that some function that your code will call may deliberately read beyond the end of the string and come across parts of a password or other sensitive data. Then it may be appropriate to wipe the memory before writing new data to it. On the whole, though, most people aren't writing such code.)
New line is \n not /n anyway I'd use strchar for this:
char* endOfFirstLine = strchr(inputString, '\n');
if (endOfFirstLine != NULL)
{
strncpy(yourBuffer, inputString,
endOfFirstLine - inputString);
}
else // Input is one single line
{
strcpy(yourBuffer, inputString);
}
With inputString as your char* multiline string and inputBuffer (assuming it's big enough to contain all data from inputString and it has been zeroed) as your required output (first line of inputString).
If you're going to be doing a lot of reading from long text buffers, you could try using a memory stream, if you system supports them: https://www.gnu.org/software/libc/manual/html_node/String-Streams.html
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
static char buffer[] = "foo\nbar";
int
main()
{
char arr[100];
FILE *stream;
stream = fmemopen(buffer, strlen(buffer), "r");
fgets(arr, sizeof arr, stream);
printf("First line: %s\n", arr);
fgets(arr, sizeof arr, stream);
printf("Second line: %s\n", arr);
fclose (stream);
return 0;
}
POSIX 2008 (e.g. most Linux systems) has getline(3) which heap-allocates a buffer for a line.
So you could code
FILE* fil = fopen("something.txt","r");
if (!fil) { perror("fopen"); exit(EXIT_FAILURE); };
char *linebuf=NULL;
size_t linesiz=0;
if (getline(&linebuf, &linesiz, fil) {
do_something_with(linebuf);
}
else { perror("getline"; exit(EXIT_FAILURE); }
If you want to read an editable line from stdin in a terminal consider GNU readline.
If you are restricted to pure C99 code you have to do the heap allocation yourself (malloc or calloc or perhaps -with care- realloc)
If you just want to copy the first line of some existing buffer char*bigbuf; which is non-NULL, valid, and zero-byte terminated:
char*line = NULL;
char *eol = strchr(bigbuf, '\n');
if (!eol) { // bigbuf is a single line so duplicate it
line = strdup(bigbuf);
if (!line) { perror("strdup"); exit(EXIT_FAILURE); }
} else {
size_t linesize = eol-bugbuf;
line = malloc(linesize+1);
if (!line) { perror("malloc"); exit(EXIT_FAILURE);
memcpy (line, bigbuf, linesize);
line[linesize] = '\0';
}

Resources