Why does this code not output the expected output? - c

This can be a good question for finding bugs.
No? Okay for beginners at least.
#define SIZE 4
int main(void){
int chars_read = 1;
char buffer[SIZE + 1] = {0};
setvbuf(stdin, (char *)NULL, _IOFBF, sizeof(buffer)-1);
while(chars_read){
chars_read = fread(buffer, sizeof('1'), SIZE, stdin);
printf("%d, %s\n", chars_read, buffer);
}
return 0;
}
Using the above code, I am trying to read from a file using redirection ./a.out < data. Contents of input file:
1line
2line
3line
4line
But I am not getting the expected output, rather some graphical characters are mixed in.
What is wrong?
Hint: (Courtesy Alok)
sizeof('1') == sizeof(int)
sizeof("1") == sizeof(char)*2
So, use 1 instead :-)
Take a look at this post for buffered IO example using fread.

The type of '1' is int in C, not char, so you are reading SIZE*sizeof(int) bytes in each fread. If sizeof(int) is greater than 1 (on most modern computers it is), then you are reading past the storage for buffer. This is one of the places where C and C++ are different: in C, character literals are of type int, in C++, they are of type char.
So, you need chars_read = fread(buffer, 1, SIZE, stdin); because sizeof(char) is 1 by definition.
In fact, I would write your loop as:
while ((chars_read = fread(buffer, 1, sizeof buffer - 1)) > 0) {
buffer[chars_read] = 0; /* In case chars_read != sizeof buffer - 1.
You may want to do other things in this case,
such as check for errors using ferror. */
printf("%d, %s\n", chars_read, buffer);
}
To answer your another question, '\0' is the int 0, so {'\0'} and {0} are equivalent.
For setvbuf, my documentation says:
The size argument may be given as zero to obtain deferred optimal-size buffer allocation as usual.
Why are you commenting with \\ instead of // or /* */? :-)
Edit: Based upon your edit of the question, sizeof("1") is wrong, sizeof(char) is correct.
sizeof("1") is 2, because "1" is a char array containing two elements: '1' and 0.

Here's a byte-by-byte way to fread the lines from a file using redirection ./a.out < data.
Produces the expected output at least ... :-)
/*
Why does this code not output the expected output ?,
http://stackoverflow.com/questions/2378264/why-does-this-code-not-output-the-expected-output
compile with:
gcc -Wall -O3 fread-test.c
create data:
echo $'1line\n2line\n3line\n4line' > data
./a.out < data
*/
#include <stdio.h>
#define SIZE 5
int main(void)
{
int i=0, countNL=0;
char singlechar = 0;
char linebuf[SIZE + 1] = {0};
setvbuf(stdin, (char *)NULL, _IOFBF, sizeof(linebuf)-1);
while(fread(&singlechar, 1, 1, stdin)) // fread stdin byte-by-byte
{
if ( (singlechar == '\n') )
{
countNL++;
linebuf[i] = '\0';
printf("%d: %s\n", countNL, linebuf);
i = 0;
} else {
linebuf[i] = singlechar;
i++;
}
}
if ( i > 0 ) // if the last line was not terminated by '\n' ...
{
countNL++;
linebuf[i] = '\0';
printf("%d: %s\n", countNL, linebuf);
}
return 0;
}

char buffer[SIZE + 1] = {0};
This isn't doing what you expect, it is making buffer point to a one byte region in the programs constant data segment. I.e this will corrupt SIZE amount of bytes and possibly cause a memory protection fault. Always initialize C strings with strcpy() or equivalent.

Related

Reading arbitrary length strings in C

I've attempted to write a C program to read a string and display it back to the user. I've tested it with a lot of input and it seems to work properly. The thing is that I'm not sure whether or not the c != EOF condition is necessary inside the while expression, and since by definition, the size of a char is 1 byte, maybe I can remove the sizeof(char) expressions inside the malloc and realloc statements, but I'm not sure about this.
Here's the program, also, I manually added a null terminating character to the string:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *str = malloc(sizeof(char));
if (!str)
return 1;
char c;
char *reallocStr;
size_t len = 0;
size_t buf = 1;
printf("Enter some text: ");
while ((c = getchar()) != '\n' && c != EOF) {
if (len == buf) {
buf *= 2;
reallocStr = realloc(str, buf * sizeof(char));
if (!reallocStr)
return 1;
str = reallocStr;
}
str[len++] = c;
}
str[len] = '\0';
printf("You entered: %s\n", str);
free(str);
return 0;
}
As mentioned in the comments, you have a buffer overflow in your code, so you would need to fix that at the very least. To answer your specific questions, sizeof(char) is guaranteed to be 1 (dictated by the c99 spec), so you don't need to multiply by sizeof(char). It's good practice to check for EOF as if your input is coming from an alternate source that has no newline, you don't die (so if someone for example did printf %s hello | yourprogram from a bash prompt, you wouldn't die).
Problems include
Buffer overflow
#HardcoreHenry
Incorrect type
getchar() reruns an int with the values [0..UCHAR_MAX] and the negative: EOF. These 257 different values lose distinctiveness when saved as a char. Possible outcomes: infinite loop or premature loop end. Instead:
// char c;
int c;
Advanced: Arbitrary length
For very long lines buf *= 2; overflows when buf is SIZE_MAX/2 + 1. An alterative to growing in steps of 1, 2, 4, 8, 16,..., consider 1, 3, 7, 15, .... That way code can handle strings up to SIZE_MAX.
Advanced: Reading '\0'
Although uncommon, possible to read in a null character. Then printf("You entered: %s\n", str); will only print to that null character and not to the end of input.
To print all, take advantage that code knows the length.
printf("You entered: ");
fwrite(str, len, 1, stdout);
printf("\n");
To be clear, text input here is not reading of strings, but of reading of lines. That input is saved and converted to a string by appending a null character. Reading a '\0' complicates things, but something robust code handles.

How to read char/string one by one from a file and compare in C

this is my first time asking questions here. I'm currently learning C and Linux at the same time. I'm working on a simple c program that use system call only to read and write files. My problem now is, how can I read the file and compare the string/word are the same or not. An example here like this:
foo.txt contains:
hi
bye
bye
hi
hi
And bar.txt is empty.
After I do:
./myuniq foo.txt bar.txt
The result in bar.txt will be like:
hi
bye
hi
The result will just be like when we use uniq in Linux.
Here is my code:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#define LINE_MAX 256
int main(int argc, char * argv[]){
int wfd,rfd;
size_t n;
char temp[LINE_MAX];
char buf[LINE_MAX];
char buf2[LINE_MAX];
char *ptr=buf;
if(argc!=3){
printf("Invalid useage: ./excutableFileName readFromThisFile writeToThisFile\n");
return -1;
}
rfd=open(argv[1], O_RDONLY);
if(rfd==-1){
printf("Unable to read the file\n");
return -1;
}
wfd=open(argv[2], O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
if(wfd==-1){
printf("Unable to write to the file\n");
return -1;
}
while(n = read(rfd,buf,LINE_MAX)){
write(wfd,buf,n);
}
close(rfd);
close(wfd);
return 0;
}
The code above will do the reading and writing with no issue. But I can't really figure out how to read char one by one in C style string under what condition of while loop.
I do know that I may need a pointer to travel inside of buf to find the next line '\n' and something like:
while(condi){
if(*ptr == '\n'){
strcpy(temp, buf);
strcpy(buf, buf2);
strcpy(buf2, temp);
}
else
write(wfd,buf,n);
*ptr++;
}
But I might be wrong since I can't get it to work. Any feedback might help. Thank you.
And again, it only can be use system call to accomplish this program. I do know there is a easier way to use FILE and fgets or something else to get this done. But that's not the case.
You only need one buffer that stores whatever the previous line contained.
The way this works for the current line is that before you add a character you test whether what you're adding is the same as what's already in there. If it's different, then the current line is marked as unique. When you reach the end of the line, you then know whether to output the buffer or not.
Implementing the above idea using standard input for simplicity (but it doesn't really matter how you read your characters):
int len = 0;
int dup = 0;
for (int c; (c = fgetc(stdin)) != EOF; )
{
// Check for duplicate and store
if (dup && buf[len] != c)
dup = 0;
buf[len++] = c;
// Handle end of line
if (c == '\n')
{
if (dup) printf("%s", buf);
len = 0;
dup = 1;
}
}
See here that we use the dup flag to represent whether a line is duplicated. For the first line, clearly it is not, and all subsequent lines start off with the assumption they are duplicates. Then the only possibility is to remain a duplicate or be detected as unique when one character is different.
The comparison before store is actually avoiding tests against uninitialized buffer values too, by way of short-circuit evaluation. That's all managed by the dup flag -- you only test if you know the buffer is already good up to this point:
if (dup && buf[len] != c)
dup = 0;
That's basically all you need. Now, you should definitely add some sanity to prevent buffer overflow. Or you may wish to use a dynamic buffer that grows as necessary.
An entire program that operates on standard I/O streams, plus handles arbitrary-length lines might look like this:
#include <stdio.h>
#include <stdlib.h>
int main()
{
size_t capacity = 15, len = 0;
char *buf = malloc(capacity);
for (int c, dup = 0; (c = fgetc(stdin)) != EOF || len > 0; )
{
// Grow buffer
if (len == capacity)
{
capacity = (capacity * 2) + 1;
char *newbuf = realloc(buf, capacity);
if (!newbuf) break;
buf = newbuf;
dup = 0;
}
// NUL-terminate end of line, update duplicate-flag and store
if (c == '\n' || c == EOF)
c = '\0';
if (dup && buf[len] != c)
dup = 0;
buf[len++] = c;
// Output line if not a duplicate, and reset
if (!c)
{
if (!dup)
printf("%s\n", buf);
len = 0;
dup = 1;
}
}
free(buf);
}
Demo here: https://godbolt.org/z/GzGz3nxMK
If you must use the read and write system calls, you will have to build an abstraction around them, as they have no notion of lines, words, or characters. Semantically, they deal purely with bytes.
Reading arbitrarily-sized chunks of the file would require us to sift through looking for line breaks. This would mean tokenizing the data in our buffer, as you have somewhat shown. A problem occurs when our buffer ends with a partial line. We would need to make adjustments so our next read call concatenates the rest of the line.
To keep things simple, instead, we might consider reading the file one byte at a time.
A decent (if naive) way to begin is by essentially reimplementing the rough functionally of fgets. Here we read a single byte at a time into our buffer, at the current offset. We end when we find a newline character, or when we would no longer have enough room in the buffer for the null-terminating character.
Unlike fgets, here we return the length of our string.
size_t read_a_line(char *buf, size_t bufsize, int fd)
{
size_t offset = 0;
while (offset < (bufsize - 1) && read(fd, buf + offset, 1) > 0)
if (buf[offset++] == '\n')
break;
buf[offset] = '\0';
return offset;
}
To mimic uniq, we can create two buffers, as you have, but initialize their contents to empty strings. We take two additional pointers to manipulate later.
char buf[LINE_MAX] = { 0 };
char buf2[LINE_MAX] = { 0 };
char *flip = buf;
char *flop = buf2;
After opening our files for reading and writing, our loop begins. We continue this loop as long as we read a nonzero-length string.
If our current string does not match our previously read string, we write it to our output file. Afterwards, we swap our pointers. On the next iteration, from the perspective of our pointers, the secondary buffer now contains the previous line, and the primary buffer is overwritten with the current line.
Again, note that our initial previously read line is the empty string.
size_t length;
while ((length = read_a_line(flip, LINE_MAX, rfd))) {
if (0 != strcmp(flip, flop))
write(wfd, flip, length);
swap_two_pointers(&flip, &flop);
}
Our pointer swapping function.
void swap_two_pointers(char **a, char **b) {
char *t = *a;
*a = *b;
*b = t;
}
Some notes:
The contents of our file-to-be-read should never contains a line that would exceed LINE_MAX (including the newline character). We do not handle this situation, which is admittedly a sidestep, as this is the problem we wanted to avoid with the chunking method.
read_a_line should not be passed NULL or 0, to its first and second arguments. An exercise for the reader to figure out why that is.
read_a_line does not really handle read failing in the middle of a line.

C - Copying text from a file results in unknown characters being copied over as well

When running the following C file, copying the character to fgetc to my tmp pointer results in unknown characters being copied over for some reason. The characters received from fgetc() are the expected characters. However, for some reason when assigning this character to my tmp pointer unknown characters get copied over.
I've tried looking for the reason why online, but haven't found any luck. From what I have read it could be something to do with UTF-8 and ASCII issues. However, I'm not sure about the fix. I'm a relatively new C programmer and still new to memory management.
Output:
TMP: Hello, DATA!�
TEXT: Hello, DATA!�
game.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <allegro5/allegro5.h>
#include <allegro5/allegro_font.h>
const int WIN_WIDTH = 1366;
const int WIN_HEIGHT = 768;
char *readFile(const char *fileName) {
FILE *file;
file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
}
size_t tmpSize = 1;
char *tmp = (char *)malloc(tmpSize);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
}
for (int c = fgetc(file); c != EOF; c = fgetc(file)) {
if (c != NULL) {
if (tmpSize > 1)
tmp = (char *)realloc(tmp, tmpSize);
tmp[tmpSize - 1] = (char *)c;
tmpSize++;
}
}
tmp[tmpSize] = 0;
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
int main(int argc, char **argv) {
al_init();
al_install_keyboard();
ALLEGRO_TIMER* timer = al_create_timer (1.0 / 30.0);
ALLEGRO_EVENT_QUEUE *queue = al_create_event_queue();
ALLEGRO_DISPLAY* display = al_create_display(WIN_WIDTH, WIN_HEIGHT);
ALLEGRO_FONT* font = al_create_builtin_font();
al_register_event_source(queue, al_get_keyboard_event_source());
al_register_event_source(queue, al_get_display_event_source(display));
al_register_event_source(queue, al_get_timer_event_source(timer));
int redraw = 1;
ALLEGRO_EVENT event;
al_start_timer(timer);
char *text = readFile("game.DATA");
printf("TEXT: %s\n", text);
while (1) {
al_wait_for_event(queue, &event);
if (event.type == ALLEGRO_EVENT_TIMER)
redraw = 1;
else if ((event.type == ALLEGRO_EVENT_KEY_DOWN) || (event.type == ALLEGRO_EVENT_DISPLAY_CLOSE))
break;
if (redraw && al_is_event_queue_empty(queue)) {
al_clear_to_color(al_map_rgb(0, 0, 0));
al_draw_text(font, al_map_rgb(255, 255, 255), 0, 0, 0, text);
al_flip_display();
redraw = false;
}
}
free(text);
al_destroy_font(font);
al_destroy_display(display);
al_destroy_timer(timer);
al_destroy_event_queue(queue);
return 0;
}
game.DATA file:
Hello, DATA!
What I use to run the program:
gcc game.c -o game $(pkg-config allegro-5 allegro_font-5 --libs --cflags)
--EDIT--
I tried taking the file reading code and running it in a new c file, for some reason it works there, but not when in the game.c file with allegro code.
test.c:
#include <stdlib.h>
#include <stdio.h>
char *readFile(const char *fileName) {
FILE *file;
file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
}
size_t tmpSize = 1;
char *tmp = (char *)malloc(tmpSize);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
}
for (int c = fgetc(file); c != EOF; c = fgetc(file)) {
if (c != NULL) {
if (tmpSize > 1)
tmp = (char *)realloc(tmp, tmpSize);
tmp[tmpSize - 1] = (char *)c;
tmpSize++;
}
}
tmp[tmpSize] = 0;
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
void main() {
char *text = readFile("game.DATA");
printf("TEXT: %s\n", text);
free(text);
return 0;
}
Produces the correct output always:
TMP: Hello, DATA!
TEXT: Hello, DATA!
When you write a loop that updates various things each time through, like you do with tmpSize in your loop here, it's important to have a handle on what the theoretical computer science types call your "loop invariants". That is, what is it that's true each time through the loop? It's important not only to maintain your loop invariants properly, but also to pick your loop invariants so that they're easy to maintain, and easy for a later reader to understand and to verify.
Since tmpSize starts out as 1, I'm guessing your loop invariant is trying to be, "tmpSize is always one more than the size of the string I've read so far". A reason for picking that slightly-strange loop invariant is, of course, that you'll need that extra byte for the terminating \0. The other clue is that you're setting tmp[tmpSize-1] = c;.
But here's the first problem. When we exit the loop, and if tmpSize is still one more than the size of the string you've read so far, let's see what happens. Suppose we read three characters. So tmpSize should be 4. So we'll set tmp[4] = 0;. But wait! Remember, arrays in C are 0-based. So the three characters we read are in tmp[0], tmp[1], and tmp[2], and we want the terminating \0 character to go into tmp[3], not tmp[4]. Something is wrong.
But actually, it's worse than that. I wasn't at all sure I understood the loop invariant, so I cheated, and inserted a few debugging printouts. Right before the realloc call, I added
printf("realloc %zu\n", tmpSize);
and at the end, right before the tmp[tmpSize] = 0; line, I added
printf("final %zu\n", tmpSize);
The last few lines it printed (while reading a game.DATA file containing "Hello, DATA!" just like yours) were:
...
realloc 10
realloc 11
realloc 12
final 13
But this is off by two! If the last reallocation gave the array a size of 12, the valid indices are from 0 to 11. But somehow we end up writing the \0 into cell 13.
It took me a while to figure it out, but the second problem is that you do the reallocation at the top of the loop, before you've incremented tmpLen.
To me, the loop invariant of "one more than the size of the string read so far" is just too hard to think about. I very much prefer to use a loop invariant where the "size" variable keeps track of the number of characters I have read, not +1 or -1 off of that. Let's see how that loop might look. (I've also cleaned up a few other things.)
size_t tmpSize = 0;
char *tmp = malloc(tmpSize+1);
if (tmp == NULL) {
printf("malloc() failed.\n");
exit(1);
}
for (int c = getc(file); c != EOF; c = getc(file)) {
printf("realloc %zu\n", tmpSize+1+1);
tmp = realloc(tmp, tmpSize+1+1); /* +1 for c, +1 for \0 */
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
tmp[tmpSize] = c;
tmpSize++;
}
printf("final %zu\n", tmpSize);
tmp[tmpSize] = '\0';
There's still something fishy here -- I said I didn't like "fudge factors" like +1, and here I've got two -- but at least now the debugging printouts go
...
realloc 11
realloc 12
realloc 13
final 12
so it looks like I'm not overrunning the allocated memory any more.
To make this even better, I want to take a slightly different approach. You're not supposed to worry abut efficiency at first, but I can tell you that a loop that calls realloc to make the buffer bigger by 1, each time it reads a character, can end up being really inefficient. So let's make a few more changes:
size_t nchAllocated = 0;
size_t nchRead = 0;
char *tmp = NULL;
for (int c = getc(file); c != EOF; c = getc(file)) {
if(nchAllocated <= nchRead) {
nchAllocated += 10;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
}
tmp[nchRead++] = c;
}
printf("final %zu\n", nchRead);
tmp[nchRead] = '\0';
Now there are two separate variables: nchAllocated keeps track of exactly how many characters I've allocated, and nchRead keeps track of exactly how many characters I've read. And although I've doubled the number of "counter" variables, in doing so I've simplified a lot of other things, so I think it's a net improvement.
First of all, notice that there are no +1 fudge factors any more, at all.
Second, this loop doesn't call realloc every time -- instead it allocates characters 10 at a time. And because there are separate variables for the number of characters allocated versus read, it can keep track of the fact that it may have allocated more characters than it has read so far. For this code, the debugging printouts are:
realloc 10
realloc 20
final 12
Another little improvement is that we don't have to "preallocate" the array -- there's no initial malloc call. One of our loop invariants is that nchAllocated is the number of characters allocated, and we start this out as 0, and if there are no characters allocated, then it's okay that tmp starts out as NULL. This relies on the fact that when you call realloc for the first time, with tmp equal to NULL, realloc is fine with that, and essentially acts like malloc.
But there's one question you might be asking: If I got rid of all my fudge factors, where do we arrange to allocate one extra byte to hold the terminating \0 character? It's there, but it's subtle: it's lurking in the test
if(nchAllocated <= nchRead)
The very first time through the loop, nchAllocated will be 0, and nchRead will be 0, but this test will be true, so we'll allocate our first chunk of 10 characters, and we're off and running. (If we didn't care about the \0 character, the test nchAllocated < nchRead would have sufficed.)
...But, actually, I've made a mistake! There's a subtle bug here!
What if the file being read is empty? tmp will start out NULL, and we'll never make any trips through the loop, so tmp will remain NULL, so when we assign tmp[nchRead] = 0 it'll blow up.
And actually, it's worse than that. If you trace through the logic very carefully, you'll find that any time the file size is an exact multiple of 10, not enough space gets allocated for the \0, after all.
And this indicates a significant drawback of the "allocate characters 10 at a time" scheme. The code is now harder to test, because the control flow is different for files whose size is a multiple of 10. If you never happen to test that case, you won't realize that this program has a bug in it.
The way I usually fix this is to notice that the \0 byte I have to add to terminate the string is sort of balanced by the EOF character I read that indicated the end of the file. Maybe, when I read the EOF, I can use it to remind me to allocate space for the \0. That's actually easy enough to do, and it looks like this:
int c;
while(1) {
c = getc(file);
if(nchAllocated <= nchRead) {
nchAllocated += 10;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
if (tmp == NULL) {
printf("realloc() failed.\n");
exit(1);
}
}
if(c == EOF)
break;
tmp[nchRead++] = c;
}
printf("final %zu\n", nchRead);
tmp[nchRead] = '\0';
The trick here is that we don't test for EOF until after we've checked that there's enough space in the buffer, and called realloc if necessary. It's as if we allocate space in the buffer for the EOF -- except then we use that space for the \0 instead. This is what I meant by "use it to remind me to allocate space for the \0".
Now, I have to admit that there's still a drawback here, in that the loop is now somewhat unconventional. A loop that has while(1) at the top looks like an infinite loop. This one has
if(c == EOF) break;
down in the middle of it, so it is literally a "break in the middle" loop. (This is by contrast to conventional for and while loops, which are "break at the top", or a do/while loop, which is "break at the bottom".) Personally, I find this to be a useful idiom, and I use it all the time. But some programmers, and perhaps your instructor, would frown on it, because it's "weird", it's "different", it's "unconventional". And to some extent they're right: unconventional programming is somewhat dangerous programming, and is bad if later maintenance programmers can't understand it because they don't recognize or don't understand the idioms in it. (It's sort of the programming equivalent of the English word "ain't", or a split infinitive.)
Finally, if you're still with me, I have one more point to make. (And if you are still with me, thank you. I realize this answer has gotten very long, but I hope you're learning something.)
Earlier I said that "a loop that calls realloc to make the buffer bigger by 1, each time it reads a character, can end up being really inefficient." It turns out that a loop that makes the buffer bigger by 10 isn't much better, and can still be significantly inefficient. You can do a little better by incrementing it by 50 or 100, but if you're dealing with input that might be really big (thousands of characters or more), you're usually better off increasing the buffer size by leaps and bounds, perhaps by multiplying it by some factor, rather than adding. So here's the final version of that part of the loop:
if(nchAllocated <= nchRead) {
if(nchAllocated == 0) nchAllocated = 10;
else nchAllocated *= 2;
printf("realloc %zu\n", nchAllocated);
tmp = realloc(tmp, nchAllocated);
And even this improvement -- multiplying by 2, rather than adding something -- comes with a cost: we need an extra test, to special-case the first trip through the loop, because nchAllocated started out as 0, and 0 × 2 = 0.
Your reallocation scheme is incorrect: the array is always too short by one byte and the null terminator is written one position past the end of the string, instead of at the end of the string. This causes an extra byte to be printed, with whatever value happens to be in memory in the block returned by realloc(), which is uninitialized.
It is less confusing to use tmpLen as the length of the string read si far and allocate 2 extra bytes for the newly read character and the null terminator.
Furthermore the test c != NULL makes no sense: c is byte and NULL is a pointer. Similarly, tmp[tmpSize - 1] = (char *)c; is incorrect: you should just write
tmp[tmpSize - 1] = c;
Here is a corrected version:
char *readFile(const char *fileName) {
FILE *file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
return NULL;
}
size_t tmpLen = 0;
char *tmp = (char *)malloc(tmpLen + 1);
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
fclose(file);
return NULL;
}
int c;
while ((c = fgetc(file)) != EOF) {
char *new_tmp = (char *)realloc(tmp, tmpLen + 2);
if (new_tmp == NULL) {
printf("realloc() failure for %zu bytes.\n", tmpLen + 2);
free(tmp);
fclose(file);
return NULL;
}
tmp = new_tmp;
tmp[tmpLen++] = c;
}
tmp[tmpLen] = '\0';
fclose(file);
printf("TMP: %s\n", tmp);
return tmp;
}
It is usually better to reallocate in chunks or with a geometric size increment. Here is a simple implementation:
char *readFile(const char *fileName) {
FILE *file = fopen(fileName, "r");
if (file == NULL) {
printf("File could not be opened for reading.\n");
return NULL;
}
size_t tmpLen = 0;
size_t tmpSize = 16;
char *tmp = (char *)malloc(tmpSize);
char *newTmp;
if (tmp == NULL) {
printf("malloc() could not be called on tmp.\n");
fclose(file);
return NULL;
}
int c;
while ((c = fgetc(file)) != EOF) {
if (tmpSize - tmpLen < 2) {
size_t newSize = tmpSize + tmpSize / 2;
newTmp = (char *)realloc(tmp, newSize);
if (newTmp == NULL) {
printf("realloc() failure for %zu bytes.\n", newSize);
free(tmp);
fclose(file);
return NULL;
}
tmpSize = newSize;
tmp = newTmp;
}
tmp[tmpLen++] = c;
}
tmp[tmpLen] = '\0';
fclose(file);
printf("TMP: %s\n", tmp);
// try to shrink allocated block to the minimum size
// if realloc() fails, return the current block
// it seems impossible for this reallocation to fail
// but the C Standard allows it.
newTmp = (char *)realloc(tmp, tmpLen + 1);
return newTmp ? newTmp : tmp;
}

How should I fix this interesting getdelim / getline (dynamic memory allocation) bug?

I have this C assignment I am a bit struggling at this specific point. I have some background in C, but pointers and dynamic memory management still elude me very much.
The assignment asks us to write a program which would simulate the behaviour of the "uniq" command / filter in UNIX.
But the problem I am having is with the C library functions getline or getdelim (we need to use those functions according to the implementation specifications).
According to the specification, the user input might contain arbitrary amount of lines and each line might be of arbitrary length (unknown at compile-time).
The problem is, the following line for the while-loop
while (cap = getdelim(stream.linesArray, size, '\n', stdin))
compiles and "works" somehow when I leave it like that. What I mean by this is that, when I execute the program, I enter arbitrary amount of lines of arbitrary length per each line and the program does not crash - but it keeps looping unless I stop the program execution (whether the lines are correctly stored in " char **linesArray; " are a different story I am not sure about.
I would like to be able to do is something like
while ((cap = getdelim(stream.linesArray, size, '\n', stdin)) && (cap != -1))
so that when getdelim does not read any characters at some line (besides EOF or \n) - aka the very first time when user enters an empty line -, the program would stop taking more lines from stdin.
(and then print the lines that were stored in stream.linesArray by getdelim).
The problem is, when I execute the program if I make the change I mentioned above, the program gives me "Segmentation Fault" and frankly I don't know why and how should I fix this (I have tried to do something about it so many times to no avail).
For reference:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/getdelim.html
https://en.cppreference.com/w/c/experimental/dynamic/getline
http://man7.org/linux/man-pages/man3/getline.3.html
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define DEFAULT_SIZE 20
typedef unsigned long long int ull_int;
typedef struct uniqStream
{
char **linesArray;
ull_int lineIndex;
} uniq;
int main()
{
uniq stream = { malloc(DEFAULT_SIZE * sizeof(char)), 0 };
ull_int cap, i = 0;
size_t *size = 0;
while ((cap = getdelim(stream.linesArray, size, '\n', stdin))) //&& (cap != -1))
{
stream.lineIndex = i;
//if (cap == -1) { break; }
//print("%s", stream.linesArray[i]);
++i;
if (i == sizeof(stream.linesArray))
{
stream.linesArray = realloc(stream.linesArray, (2 * sizeof(stream.linesArray)));
}
}
ull_int j;
for (j = 0; j < i; ++j)
{
printf("%s\n", stream.linesArray[j]);
}
free(stream.linesArray);
return 0;
}
Ok, so the intent is clear - use getdelim to store the lines inside an array. getline itself uses dynamic allocation. The manual is quite clear about it:
getline() reads an entire line from stream, storing the address of the
buffer containing the text into *lineptr. The buffer is
null-terminated and includes the newline character, if one was found.
The getline() "stores the address of the buffer into *lineptr". So lineptr has to be a valid pointer to a char * variable (read that twice).
*lineptr and *n will be updated
to reflect the buffer address and allocated size respectively.
Also n needs to be a valid(!) pointer to a size_t variable, so the function can update it.
Also note that the lineptr buffer:
This buffer should be freed by the user program even if getline() failed.
So what do we do? We need to have an array of pointers to an array of strings. Because I don't like becoming a three star programmer, I use structs. I somewhat modified your code a bit, added some checks. You have the excuse me, I don't like typedefs, so I don't use them. Renamed the uniq to struct lines_s:
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
struct line_s {
char *line;
size_t len;
};
struct lines_s {
struct line_s *lines;
size_t cnt;
};
int main() {
struct lines_s lines = { NULL, 0 };
// loop breaks on error of feof(stdin)
while (1) {
char *line = NULL;
size_t size = 0;
// we pass a pointer to a `char*` variable
// and a pointer to `size_t` variable
// `getdelim` will update the variables inside it
// the initial values are NULL and 0
ssize_t ret = getdelim(&line, &size, '\n', stdin);
if (ret < 0) {
// check for EOF
if (feof(stdin)) {
// EOF found - break
break;
}
fprintf(stderr, "getdelim error %zd!\n", ret);
abort();
}
// new line was read - add it to out container "lines"
// always handle realloc separately
void *ptr = realloc(lines.lines, sizeof(*lines.lines) * (lines.cnt + 1));
if (ptr == NULL) {
// note that lines.lines is still a valid pointer here
fprintf(stderr, "Out of memory\n");
abort();
}
lines.lines = ptr;
lines.lines[lines.cnt].line = line;
lines.lines[lines.cnt].len = size;
lines.cnt += 1;
// break if the line is "stop"
if (strcmp("stop\n", lines.lines[lines.cnt - 1].line) == 0) {
break;
}
}
// iterate over lines
for (size_t i = 0; i < lines.cnt; ++i) {
// note that the line has a newline in it
// so no additional is needed in this printf
printf("line %zu is %s", i, lines.lines[i].line);
}
// getdelim returns dynamically allocated strings
// we need to free them
for (size_t i = 0; i < lines.cnt; ++i) {
free(lines.lines[i].line);
}
free(lines.lines);
}
For such input:
line1 line1
line2 line2
stop
will output:
line 0 is line1 line1
line 1 is line2 line2
line 2 is stop
Tested on onlinegdb.
Notes:
if (i == sizeof(stream.linesArray)) sizeof does not magically store the size of an array. sizeof(stream.linesArray) is just sizeof(char**) is just a sizeof of a pointer. It's usually 4 or 8 bytes, depending if on the 32bit or 64bit architecture.
uniq stream = { malloc(DEFAULT_SIZE * sizeof(char)), - stream.linesArray is a char** variable. So if you want to have an array of pointers to char, you should allocate the memory for pointers malloc(DEFAULT_SIZE * sizeof(char*)).
typedef unsigned long long int ull_int; The size_t type if the type to represent array size or sizeof(variable). The ssize_t is sometimes used in posix api to return the size and an error status. Use those variables, no need to type unsigned long long.
ull_int cap cap = getdelim - cap is unsigned, it will never be cap != 1.

Reading from socket in C giving weird output

I'm having trouble reading from a socket. The code I'm using is below, sometimes it works just fine, but at other times, it just prints some unreadable characters, or some random readable ones... is there a better way?
char* answer = (char*) malloc(1024);
int answerLength = 0;
char prevChar = 0;
char newChar = 0;
while (answerLength < 1024 && read(sock, &newChar, 1) > 0 ) {
if (newChar == '\n' && prevChar == '\r') {
break;
}
printf("%d\n", answerLength);
answer[ answerLength ] = newChar;
answerLength++;
prevChar = newChar;
}
printf("%s\n", answer);
Strings in C must be null-terminated, which means they must have a symbol \0 as the last character.
Since you don't guarantee that it will happen anywhere in your code, answer may be padded with memory garbage next to the data you read.
To make sure it won't happen, use:
answer[answerLength] = '\0';
printf("%s\n", answer);
Also, you could just read() the whole thing straight to answer, you don't need that pointless loop:
int len;
while (len = read(sock, &answer[answerLength], 1024 - answerLength))
answerLength += len;
answer[answerLength] = '\0';
printf("%s\n", answer);
The data you read isn't terminated with a '\0' character, so you can't treat is as a string.
Your char array is not guaranteed to be null terminated. This means that the printf may print more than just what is in your array since it looks for a null termination to stop outputting characters.
You also don't initialise the allocated memory before using it, which is bad practice since the memory can contain random garbage.
To get the code to work better and hopefully fix your problem, you should do the following:
char* answer = malloc(1024 + 1); /* add extra byte for null terminator */
/* other variables the same */
memset( answer, '\0', 1024 + 1 ); /* initialise memory before use */
while (answerLength < 1024 && read(sock, &newChar, 1) > 0 ) {
/* loop is the same */
}
printf("%s\n", answer);
There is also an argument in printf which will tell it to print a certain number of characters. Like so:
printf( "%.*s\n", answerLength, answer );

Resources