Problem
I am currently writing a small (and bad) grep-like program for Windows. In it I want to read files line by line and print out the ones which contain a key. For this to work I need a function which reads each line of a file. Since I am not on Linux I cannot use the getline function and have to implement it myself.
I have found an SO answer where such a function is implemented. I tried it out and it works fine for 'normal' text files. But the program crashes if I try to read a file with a line length of 13 000 characters.
MCVE
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char * getline(FILE *f)
{
size_t size = 0;
size_t len = 0;
size_t last = 0;
char *buf = NULL;
do {
size += BUFSIZ; /* BUFSIZ is defined as "the optimal read size for this platform" */
buf = realloc(buf, size); /* realloc(NULL,n) is the same as malloc(n) */
/* Actually do the read. Note that fgets puts a terminal '\0' on the
end of the string, so we make sure we overwrite this */
if (buf == NULL) return NULL;
fgets(buf + last, size, f);
len = strlen(buf);
last = len - 1;
} while (!feof(f) && buf[last] != '\n');
return buf;
}
int main(int argc, char *argv[])
{
FILE *file = fopen(argv[1], "r");
if (file == NULL)
return 1;
while (!feof(file))
{
char *line = getline(file);
if (line != NULL)
{
printf("%s", line);
free(line);
}
}
return 0;
}
This is the file I am using. It contains three short lines which get read just fine and a long one from one of my Qt projects. When reading this line the getline function reallocates 2 times to a size of 1024 and crashes at the 3rd time. I've put printf around the realloc to make sure it crashes there and it definitely does.
Question
Could anyone explain me why my program is crashing like that? I just spend hours with this and don't know what to do anymore.
In this fragment
size += BUFSIZ;
buf = realloc(buf, size);
if (buf == NULL) return NULL;
fgets(buf + last, size, f);
you add size + BUFSIZ and allocate that, but then you read that same – increased! – size. In essence, you are reading more and more characters than you allocated in each turn. The first time around, size = BUFSIZ and you read exactly size/BUFSIZ characters. If the line is longer than this (the last character is not \n), you increase the size of the memory (size += BUFSIZ) but you also read its (new) total size again – and you've already processes that last number of size bytes.
The allocated memory grows with BUFSIZE per loop, but the amount of bytes to read increases with BUFSIZE – after one loop, it's BUFSIZE, after two loops 2*BUFSIZE, and so on, until something important gets overwritten and the program is terminated.
If you read only chunks of the exact size of BUFSIZE then this should work.
Note that your code expects the last line to end with an \n, which may not always be true. You can catch this with an additional test:
if (!fgets(buf + last, size, f))
break;
so your code won't be trying to read past the end of the input file.
Related
this is my first time asking questions here. I'm currently learning C and Linux at the same time. I'm working on a simple c program that use system call only to read and write files. My problem now is, how can I read the file and compare the string/word are the same or not. An example here like this:
foo.txt contains:
hi
bye
bye
hi
hi
And bar.txt is empty.
After I do:
./myuniq foo.txt bar.txt
The result in bar.txt will be like:
hi
bye
hi
The result will just be like when we use uniq in Linux.
Here is my code:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#define LINE_MAX 256
int main(int argc, char * argv[]){
int wfd,rfd;
size_t n;
char temp[LINE_MAX];
char buf[LINE_MAX];
char buf2[LINE_MAX];
char *ptr=buf;
if(argc!=3){
printf("Invalid useage: ./excutableFileName readFromThisFile writeToThisFile\n");
return -1;
}
rfd=open(argv[1], O_RDONLY);
if(rfd==-1){
printf("Unable to read the file\n");
return -1;
}
wfd=open(argv[2], O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
if(wfd==-1){
printf("Unable to write to the file\n");
return -1;
}
while(n = read(rfd,buf,LINE_MAX)){
write(wfd,buf,n);
}
close(rfd);
close(wfd);
return 0;
}
The code above will do the reading and writing with no issue. But I can't really figure out how to read char one by one in C style string under what condition of while loop.
I do know that I may need a pointer to travel inside of buf to find the next line '\n' and something like:
while(condi){
if(*ptr == '\n'){
strcpy(temp, buf);
strcpy(buf, buf2);
strcpy(buf2, temp);
}
else
write(wfd,buf,n);
*ptr++;
}
But I might be wrong since I can't get it to work. Any feedback might help. Thank you.
And again, it only can be use system call to accomplish this program. I do know there is a easier way to use FILE and fgets or something else to get this done. But that's not the case.
You only need one buffer that stores whatever the previous line contained.
The way this works for the current line is that before you add a character you test whether what you're adding is the same as what's already in there. If it's different, then the current line is marked as unique. When you reach the end of the line, you then know whether to output the buffer or not.
Implementing the above idea using standard input for simplicity (but it doesn't really matter how you read your characters):
int len = 0;
int dup = 0;
for (int c; (c = fgetc(stdin)) != EOF; )
{
// Check for duplicate and store
if (dup && buf[len] != c)
dup = 0;
buf[len++] = c;
// Handle end of line
if (c == '\n')
{
if (dup) printf("%s", buf);
len = 0;
dup = 1;
}
}
See here that we use the dup flag to represent whether a line is duplicated. For the first line, clearly it is not, and all subsequent lines start off with the assumption they are duplicates. Then the only possibility is to remain a duplicate or be detected as unique when one character is different.
The comparison before store is actually avoiding tests against uninitialized buffer values too, by way of short-circuit evaluation. That's all managed by the dup flag -- you only test if you know the buffer is already good up to this point:
if (dup && buf[len] != c)
dup = 0;
That's basically all you need. Now, you should definitely add some sanity to prevent buffer overflow. Or you may wish to use a dynamic buffer that grows as necessary.
An entire program that operates on standard I/O streams, plus handles arbitrary-length lines might look like this:
#include <stdio.h>
#include <stdlib.h>
int main()
{
size_t capacity = 15, len = 0;
char *buf = malloc(capacity);
for (int c, dup = 0; (c = fgetc(stdin)) != EOF || len > 0; )
{
// Grow buffer
if (len == capacity)
{
capacity = (capacity * 2) + 1;
char *newbuf = realloc(buf, capacity);
if (!newbuf) break;
buf = newbuf;
dup = 0;
}
// NUL-terminate end of line, update duplicate-flag and store
if (c == '\n' || c == EOF)
c = '\0';
if (dup && buf[len] != c)
dup = 0;
buf[len++] = c;
// Output line if not a duplicate, and reset
if (!c)
{
if (!dup)
printf("%s\n", buf);
len = 0;
dup = 1;
}
}
free(buf);
}
Demo here: https://godbolt.org/z/GzGz3nxMK
If you must use the read and write system calls, you will have to build an abstraction around them, as they have no notion of lines, words, or characters. Semantically, they deal purely with bytes.
Reading arbitrarily-sized chunks of the file would require us to sift through looking for line breaks. This would mean tokenizing the data in our buffer, as you have somewhat shown. A problem occurs when our buffer ends with a partial line. We would need to make adjustments so our next read call concatenates the rest of the line.
To keep things simple, instead, we might consider reading the file one byte at a time.
A decent (if naive) way to begin is by essentially reimplementing the rough functionally of fgets. Here we read a single byte at a time into our buffer, at the current offset. We end when we find a newline character, or when we would no longer have enough room in the buffer for the null-terminating character.
Unlike fgets, here we return the length of our string.
size_t read_a_line(char *buf, size_t bufsize, int fd)
{
size_t offset = 0;
while (offset < (bufsize - 1) && read(fd, buf + offset, 1) > 0)
if (buf[offset++] == '\n')
break;
buf[offset] = '\0';
return offset;
}
To mimic uniq, we can create two buffers, as you have, but initialize their contents to empty strings. We take two additional pointers to manipulate later.
char buf[LINE_MAX] = { 0 };
char buf2[LINE_MAX] = { 0 };
char *flip = buf;
char *flop = buf2;
After opening our files for reading and writing, our loop begins. We continue this loop as long as we read a nonzero-length string.
If our current string does not match our previously read string, we write it to our output file. Afterwards, we swap our pointers. On the next iteration, from the perspective of our pointers, the secondary buffer now contains the previous line, and the primary buffer is overwritten with the current line.
Again, note that our initial previously read line is the empty string.
size_t length;
while ((length = read_a_line(flip, LINE_MAX, rfd))) {
if (0 != strcmp(flip, flop))
write(wfd, flip, length);
swap_two_pointers(&flip, &flop);
}
Our pointer swapping function.
void swap_two_pointers(char **a, char **b) {
char *t = *a;
*a = *b;
*b = t;
}
Some notes:
The contents of our file-to-be-read should never contains a line that would exceed LINE_MAX (including the newline character). We do not handle this situation, which is admittedly a sidestep, as this is the problem we wanted to avoid with the chunking method.
read_a_line should not be passed NULL or 0, to its first and second arguments. An exercise for the reader to figure out why that is.
read_a_line does not really handle read failing in the middle of a line.
The fgets() function has two problems. The first is that, if the size of the line is longer than that of the passed buffer, the line is truncated. The second is that, if the line read from the file has embedded '\0' characters, then there is no way to know the actual length of the line. I would like to get a replacement for fgets() that dynamically allocates the space for the line read and also provides the size of the line read. I have written the code for dynamically allocating the space. I am unable to figure out how to get the size of the line read. I am a beginner. Thank you so much.
#include <stdio.h>
#include <stdlib.h>
#include <error.h>
#include <errno.h>
char *myfgets(FILE *fptr, int *size);
char *myfgets(FILE *fptr, int *size) {
char *buffer;
char *ret;
buffer = (char *)malloc((*size) * sizeof(char));
if (buffer == NULL)
error(1, 0, "No memory available\n");
ret = fgets(buffer, *size, fptr);
if (ret == NULL)
error(1, 0, "Error in reading the file\n");
return ret;
}
int main(int argc, char *argv[]) {
char *file;
FILE *fptr;
int size;
char *result;
if (argc != 3)
error(1, 0, "Too many or few arguments <File_name>, <Number of bytes to read>\n");
file = argv[1];
size = atoi(argv[2]);
fptr = fopen(file, "r");
if (fptr == NULL)
error(1, 0, "Error in opening the file\n");
result = myfgets(fptr, &size);
printf("The line read is :%s", result);
free(result);
return 0;
}
Use getline(3) to read a complete line of unknown length. It allocates memory as needed to hold it all.
The function can deal with 0 bytes in the line being read too. From the linked man page (emphasis added):
On success, getline() and getdelim() return the number of characters read, including the delimiter character, but not including the terminating null byte ('\0'). This value can be used to handle embedded null bytes in the line read.
So you just have to save its return value instead of using strlen().
You have correctly identified 2 issues in fgets(), but your proposed alternative does not address either of them as you still call fgets().
You should write a loop, calling getc() repeatedly until you get EOF or '\n' and you would store the bytes read into an allocated array, reallocating as needed.
Here is a simplistic version:
// Read a full line from `fptr`
// - return `NULL` at end of file or upon read error like `fgets()`.
// - otherwise return a pointer to an allocated array containing the
// characters read, up to and including the newline and a null terminator.
// - store the number of bytes read into *plength.
// - the buffer is null terminated, and it may contain embedded null bytes
// if such bytes were read from the file
char *myfgets(FILE *fptr, size_t *plength) {
size_t length = 0;
char *buffer = NULL, *newp;
int c;
for (;;) {
if (c = getc()) == EOF) {
if (!feof(fptr)) {
/* read error: discard data read so far and return NULL */
free(buffer);
buffer = NULL;
length = 0;
}
break;
}
if ((newp = realloc(buffer, length + 2)) == NULL) {
free(buffer);
error(1, 0, "Out of memory for realloc\n");
return NULL;
}
buffer = newp;
buffer[length] = c;
length++;
if (c == '\n')
break;
}
if (length != 0) {
buffer[length] = '\0';
}
*plength = length;
return buffer;
}
Various approaches for a "fixed" fgets():
1) Use the non-C library standard getline() as suggested by #Shawn. Commonly available in *nix and source code easy enough to find. It unfortunately obliges a new type: ssize_t.
2) Roll your own getc() code #chqrlie. Corner cases can be tricky.
3) Repeatedly call fgets() as needed. Pre-fill the buffer with '\n' and look for the first occurrence of '\n', its position, next character to help determine length. (There are only a few cases to consider)
4) Repeatedly call scanf("%99[^\n]%n", buf100, &n) and getc() for the '\n' as needed. Look at the return value and n to determine length.
5) Likely others
A good functional test of the design is how well did it report the cases:
Happy path: a line was read, memory allocated, no problems.
End-of-file: Nothing read due to end of file.
Out-of-memory.
Input error occurred.
Other considerations:
Do you really want to save a '\n'?
Performance.
As for me with "dynamically allocates the space" with no limit, code introduces the ability for a nefarious user to overwhelm memory resources by entering a pathologically long line. Rather than give such ability to a user, I recommend to limit input to a sane bound. Excessively long input is an attack that should be detected, not enabled.
So I would start with
char *myfgets(FILE *fptr, size_t limit, size_t *size) {
I have this C assignment I am a bit struggling at this specific point. I have some background in C, but pointers and dynamic memory management still elude me very much.
The assignment asks us to write a program which would simulate the behaviour of the "uniq" command / filter in UNIX.
But the problem I am having is with the C library functions getline or getdelim (we need to use those functions according to the implementation specifications).
According to the specification, the user input might contain arbitrary amount of lines and each line might be of arbitrary length (unknown at compile-time).
The problem is, the following line for the while-loop
while (cap = getdelim(stream.linesArray, size, '\n', stdin))
compiles and "works" somehow when I leave it like that. What I mean by this is that, when I execute the program, I enter arbitrary amount of lines of arbitrary length per each line and the program does not crash - but it keeps looping unless I stop the program execution (whether the lines are correctly stored in " char **linesArray; " are a different story I am not sure about.
I would like to be able to do is something like
while ((cap = getdelim(stream.linesArray, size, '\n', stdin)) && (cap != -1))
so that when getdelim does not read any characters at some line (besides EOF or \n) - aka the very first time when user enters an empty line -, the program would stop taking more lines from stdin.
(and then print the lines that were stored in stream.linesArray by getdelim).
The problem is, when I execute the program if I make the change I mentioned above, the program gives me "Segmentation Fault" and frankly I don't know why and how should I fix this (I have tried to do something about it so many times to no avail).
For reference:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/getdelim.html
https://en.cppreference.com/w/c/experimental/dynamic/getline
http://man7.org/linux/man-pages/man3/getline.3.html
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define DEFAULT_SIZE 20
typedef unsigned long long int ull_int;
typedef struct uniqStream
{
char **linesArray;
ull_int lineIndex;
} uniq;
int main()
{
uniq stream = { malloc(DEFAULT_SIZE * sizeof(char)), 0 };
ull_int cap, i = 0;
size_t *size = 0;
while ((cap = getdelim(stream.linesArray, size, '\n', stdin))) //&& (cap != -1))
{
stream.lineIndex = i;
//if (cap == -1) { break; }
//print("%s", stream.linesArray[i]);
++i;
if (i == sizeof(stream.linesArray))
{
stream.linesArray = realloc(stream.linesArray, (2 * sizeof(stream.linesArray)));
}
}
ull_int j;
for (j = 0; j < i; ++j)
{
printf("%s\n", stream.linesArray[j]);
}
free(stream.linesArray);
return 0;
}
Ok, so the intent is clear - use getdelim to store the lines inside an array. getline itself uses dynamic allocation. The manual is quite clear about it:
getline() reads an entire line from stream, storing the address of the
buffer containing the text into *lineptr. The buffer is
null-terminated and includes the newline character, if one was found.
The getline() "stores the address of the buffer into *lineptr". So lineptr has to be a valid pointer to a char * variable (read that twice).
*lineptr and *n will be updated
to reflect the buffer address and allocated size respectively.
Also n needs to be a valid(!) pointer to a size_t variable, so the function can update it.
Also note that the lineptr buffer:
This buffer should be freed by the user program even if getline() failed.
So what do we do? We need to have an array of pointers to an array of strings. Because I don't like becoming a three star programmer, I use structs. I somewhat modified your code a bit, added some checks. You have the excuse me, I don't like typedefs, so I don't use them. Renamed the uniq to struct lines_s:
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
struct line_s {
char *line;
size_t len;
};
struct lines_s {
struct line_s *lines;
size_t cnt;
};
int main() {
struct lines_s lines = { NULL, 0 };
// loop breaks on error of feof(stdin)
while (1) {
char *line = NULL;
size_t size = 0;
// we pass a pointer to a `char*` variable
// and a pointer to `size_t` variable
// `getdelim` will update the variables inside it
// the initial values are NULL and 0
ssize_t ret = getdelim(&line, &size, '\n', stdin);
if (ret < 0) {
// check for EOF
if (feof(stdin)) {
// EOF found - break
break;
}
fprintf(stderr, "getdelim error %zd!\n", ret);
abort();
}
// new line was read - add it to out container "lines"
// always handle realloc separately
void *ptr = realloc(lines.lines, sizeof(*lines.lines) * (lines.cnt + 1));
if (ptr == NULL) {
// note that lines.lines is still a valid pointer here
fprintf(stderr, "Out of memory\n");
abort();
}
lines.lines = ptr;
lines.lines[lines.cnt].line = line;
lines.lines[lines.cnt].len = size;
lines.cnt += 1;
// break if the line is "stop"
if (strcmp("stop\n", lines.lines[lines.cnt - 1].line) == 0) {
break;
}
}
// iterate over lines
for (size_t i = 0; i < lines.cnt; ++i) {
// note that the line has a newline in it
// so no additional is needed in this printf
printf("line %zu is %s", i, lines.lines[i].line);
}
// getdelim returns dynamically allocated strings
// we need to free them
for (size_t i = 0; i < lines.cnt; ++i) {
free(lines.lines[i].line);
}
free(lines.lines);
}
For such input:
line1 line1
line2 line2
stop
will output:
line 0 is line1 line1
line 1 is line2 line2
line 2 is stop
Tested on onlinegdb.
Notes:
if (i == sizeof(stream.linesArray)) sizeof does not magically store the size of an array. sizeof(stream.linesArray) is just sizeof(char**) is just a sizeof of a pointer. It's usually 4 or 8 bytes, depending if on the 32bit or 64bit architecture.
uniq stream = { malloc(DEFAULT_SIZE * sizeof(char)), - stream.linesArray is a char** variable. So if you want to have an array of pointers to char, you should allocate the memory for pointers malloc(DEFAULT_SIZE * sizeof(char*)).
typedef unsigned long long int ull_int; The size_t type if the type to represent array size or sizeof(variable). The ssize_t is sometimes used in posix api to return the size and an error status. Use those variables, no need to type unsigned long long.
ull_int cap cap = getdelim - cap is unsigned, it will never be cap != 1.
I need to be able to send the output of the GET command and store it into a variable inside my program, currently I'm doing it like this:
GET google.com | ./myprogram
And receiving it in my program with the following code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(int argc, char *argv[]){
char *a = (char *) malloc (10000000);
scanf("%[^\n]", a);
printf("%s\n",a);
return 0;
}
The problem I have is that the scanf function stops at a new line, and I need to be able to store the whole paragraph output from GET.
Any help will be appreciated. Thanks.
One possibility: Does GET include the size information in the headers? Could you use that to determine how much space to allocate, and how much data to read? That's fiddly though, and requires reading the data in dribs and drabs.
More simply, consider using POSIX (and Linux) getdelim() (a close relative of getline()) and specify the delimiter as the null byte. That's unlikely to appear in the GET output, so the whole content will be a single 'line', and getdelim() will allocate an appropriate amount of space automatically. It also tells you how long the data was.
#include <stdio.h>
int main(void)
{
char *buffer = 0;
size_t buflen = 0;
int length = getdelim(&buffer, &buflen, '\0', stdin);
if (length > 0)
printf("%*.*s\n", length, length, buffer);
free(buffer);
return 0;
}
scanf documentation says
These functions return the number of input items successfully
matched and assigned, which can be fewer than provided for, or even
zero in the event of an early matching failure. The value EOF is
returned if the end of input is reached before either the first
successful conversion or a matching failure occurs. EOF is also
returned if a read error occurs, in which case the error indicator for
the stream (see ferror(3)) is set, and errno is set indicate the
error.
https://www.freebsd.org/cgi/man.cgi?query=scanf&sektion=3
Have you considered writing a loop that calls scanf, monitors its return value and breaks out if EOF
Consider the following readall() function implemented in standard C:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
char *readall(FILE *source, size_t *length)
{
char *data = NULL;
size_t size = 0;
size_t used = 0;
size_t n;
/* If we have a place to store the length,
we initialize it to zero. */
if (length)
*length = 0;
/* Do not attempt to read the source, if it is
already in end-of-file or error state. */
if (feof(source) || ferror(source))
return NULL;
while (1) {
/* Ensure there is unused chars in data. */
if (used >= size) {
const size_t new_size = (used | 65535) + 65537 - 32;
char *new_data;
new_data = realloc(data, new_size);
if (!new_data) {
/* Although reallocation failed, data is still there. */
free(data);
/* We just fail. */
return NULL;
}
data = new_data;
size = new_size;
}
/* Read more of the source. */
n = fread(data + used, 1, size - used, source);
if (!n)
break;
used += n;
}
/* Read error or other wonkiness? */
if (!feof(source) || ferror(source)) {
free(data);
return NULL;
}
/* Optimize the allocation. For ease of use, we append
at least one nul byte ('\0') at end. */
{
const size_t new_size = (used | 7) + 9;
char *new_data;
new_data = realloc(data, new_size);
if (!new_data) {
if (used >= size) {
/* There is no room for the nul. We fail. */
free(data);
return NULL;
}
/* There is enough room for at least one nul,
so no reason to fail. */
} else {
data = new_data;
size = new_size;
}
}
/* Ensure the buffer is padded with nuls. */
memset(data + used, 0, size - used);
/* Save length, if requested. */
if (length)
*length = used;
return data;
}
It reads everything from the specified file handle (which can be a standard stream like stdin or a pipe opened via popen()) into a dynamically allocated buffer, appends a nul byte (\0), and returns a pointer to the buffer. If not NULL, the actual number of characters read (so, not including the appended nul byte), is stored in the size_t pointed to by the second parameter.
You can use it to read binary data output by programs, say dot -Tpng diagram.dot or image converters, or even wget -O - output (getting data from specific URLs, text or binary).
You can use this for example thus:
int main(void)
{
char *src;
size_t len;
src = readall(stdin, &len);
if (!src) {
fprintf(stderr, "Error reading standard input.\n");
return EXIT_FAILURE;
}
fprintf(stderr, "Read %zu chars.\n", len);
/* As an example, print it to standard output. */
if (len > 0)
fwrite(src, len, 1, stdout);
free(src);
return EXIT_SUCCESS;
}
The readall() function has two quirks: it allocates memory in roughly 131072-byte chunks (but could vary if fread() were to return short reads), and pads the buffer with 7 to 15 nul bytes. (There are reasons why I like doing it this way, but it is all speculative and specific to the C libraries I tend to use, so it is not important.)
Although the ones used above work fine, you can change the size_new calculations if you prefer otherwise. Just make sure that they both are at least used + 1.
I'm trying to create a function to read a single line from a file of text using fgets() and store it in a dynamically allocating char* using malloc()but I am unsure as to how to use realloc() since I do not know the length of this single line of text and do not want to just guess a magic number for the maximum size that this line could possibly be.
#include "stdio.h"
#include "stdlib.h"
#define INIT_SIZE 50
void get_line (char* filename)
char* text;
FILE* file = fopen(filename,"r");
text = malloc(sizeof(char) * INIT_SIZE);
fgets(text, INIT_SIZE, file);
//How do I realloc memory here if the text array is full but fgets
//has not reach an EOF or \n yet.
printf(The text was %s\n", text);
free(text);
int main(int argc, char *argv[]) {
get_line(argv[1]);
}
I am planning on doing other things with the line of text but for sake of keeping this simple, I have just printed it and then freed the memory.
Also: The main function is initiated by using the filename as the first command line argument.
The getline function is what you looking for.
Use it like this:
char *line = NULL;
size_t n;
getline(&line, &n, stdin);
If you really want to implement this function yourself, you can write something like this:
#include <stdlib.h>
#include <stdio.h>
char *get_line()
{
int c;
/* what is the buffer current size? */
size_t size = 5;
/* How much is the buffer filled? */
size_t read_size = 0;
/* firs allocation, its result should be tested... */
char *line = malloc(size);
if (!line)
{
perror("malloc");
return line;
}
line[0] = '\0';
c = fgetc(stdin);
while (c != EOF && c!= '\n')
{
line[read_size] = c;
++read_size;
if (read_size == size)
{
size += 5;
char *test = realloc(line, size);
if (!test)
{
perror("realloc");
return line;
}
line = test;
}
c = fgetc(stdin);
}
line[read_size] = '\0';
return line;
}
One possible solution is to use two buffers: One temporary that you use when calling fgets; And one that you reallocate, and append the temporary buffer to.
Perhaps something like this:
char temp[INIT_SIZE]; // Temporary string for fgets call
char *text = NULL; // The actual and full string
size_t length = 0; // Current length of the full string, needed for reallocation
while (fgets(temp, sizeof temp, file) != NULL)
{
// Reallocate
char *t = realloc(text, length + strlen(temp) + 1); // +1 for terminator
if (t == NULL)
{
// TODO: Handle error
break;
}
if (text == NULL)
{
// First allocation, make sure string is properly terminated for concatenation
t[0] = '\0';
}
text = t;
// Append the newly read string
strcat(text, temp);
// Get current length of the string
length = strlen(text);
// If the last character just read is a newline, we have the whole line
if (length > 0 && text[length - 1] == '\n')
{
break;
}
}
[Discalimer: The code above is untested and may contain bugs]
With the declaration of void get_line (char* filename), you can never make use of the line you read and store outside of the get_line function because you do not return a pointer to line and do not pass the address of any pointer than could serve to make any allocation and read visible back in the calling function.
A good model (showing return type and useful parameters) for any function to read an unknown number of characters into a single buffer is always POSIX getline. You can implement your own using either fgetc of fgets and a fixed buffer. Efficiency favors the use of fgets only to the extent it would minimize the number of realloc calls needed. (both functions will share the same low-level input buffer size, e.g. see gcc source IO_BUFSIZ constant -- which if I recall is now LIO_BUFSIZE after a recent name change, but basically boils down to an 8192 byte IO buffer on Linux and 512 bytes on windows)
So long as you dynamically allocate the original buffer (either using malloc, calloc or realloc), you can read continually with a fixed buffer using fgets adding the characters read into the fixed buffer to your allocated line and checking whether the final character is '\n' or EOF to determine when you are done. Simply read a fixed buffer worth of chars with fgets each iteration and realloc your line as you go, appending the new characters to the end.
When reallocating, always realloc using a temporary pointer. That way, if you run out of memory and realloc returns NULL (or fails for any other reason), you won't overwrite the pointer to your currently allocated block with NULL creating a memory leak.
A flexible implementation that sizes the fixed buffer as a VLA using either the defined SZINIT for the buffer size (if the user passes 0) or the size provided by the user to allocate initial storage for line (passed as a pointer to pointer to char) and then reallocating as required, returning the number of characters read on success or -1 on failure (the same as POSIX getline does) could be done like:
/** fgetline, a getline replacement with fgets, using fixed buffer.
* fgetline reads from 'fp' up to including a newline (or EOF)
* allocating for 'line' as required, initially allocating 'n' bytes.
* on success, the number of characters in 'line' is returned, -1
* otherwise
*/
ssize_t fgetline (char **line, size_t *n, FILE *fp)
{
if (!line || !n || !fp) return -1;
#ifdef SZINIT
size_t szinit = SZINIT > 0 ? SZINIT : 120;
#else
size_t szinit = 120;
#endif
size_t idx = 0, /* index for *line */
maxc = *n ? *n : szinit, /* fixed buffer size */
eol = 0, /* end-of-line flag */
nc = 0; /* number of characers read */
char buf[maxc]; /* VLA to use a fixed buffer (or allocate ) */
clearerr (fp); /* prepare fp for reading */
while (fgets (buf, maxc, fp)) { /* continuall read maxc chunks */
nc = strlen (buf); /* number of characters read */
if (idx && *buf == '\n') /* if index & '\n' 1st char */
break;
if (nc && (buf[nc - 1] == '\n')) { /* test '\n' in buf */
buf[--nc] = 0; /* trim and set eol flag */
eol = 1;
}
/* always realloc with a temporary pointer */
void *tmp = realloc (*line, idx + nc + 1);
if (!tmp) /* on failure previous data remains in *line */
return idx ? (ssize_t)idx : -1;
*line = tmp; /* assign realloced block to *line */
memcpy (*line + idx, buf, nc + 1); /* append buf to line */
idx += nc; /* update index */
if (eol) /* if '\n' (eol flag set) done */
break;
}
/* if eol alone, or stream error, return -1, else length of buf */
return (feof (fp) && !nc) || ferror (fp) ? -1 : (ssize_t)idx;
}
(note: since nc already holds the current number of characters in buf, memcpy can be used to append the contents of buf to *line without scanning for the terminating nul-character again) Look it over and let me know if you have further questions.
Essentially you can use it as a drop-in replacement for POSIX getline (though it will not be quite as efficient -- but isn't not bad either)