Program using read() entering into an infinite loop - c

1oid ReadBinary(char *infile,HXmap* AssetMap)
{
int fd;
size_t bytes_read, bytes_expected = 100000000*sizeof(char);
char *data;
if ((fd = open(infile,O_RDONLY)) < 0)
err(EX_NOINPUT, "%s", infile);
if ((data = malloc(bytes_expected)) == NULL)
err(EX_OSERR, "data malloc");
bytes_read = read(fd, data, bytes_expected);
if (bytes_read != bytes_expected)
printf("Read only %d of %d bytes %d\n", \
bytes_read, bytes_expected,EX_DATAERR);
/* ... operate on data ... */
printf("\n");
int i=0;
int counter=0;
char ch=data[0];
char message[512];
Message* newMessage;
while(i!=bytes_read)
{
while(ch!='\n')
{
message[counter]=ch;
i++;
counter++;
ch =data[i];
}
message[counter]='\n';
message[counter+1]='\0';
//---------------------------------------------------
newMessage = (Message*)parser(message);
MessageProcess(newMessage,AssetMap);
//--------------------------------------------------
//printf("idNUM %e\n",newMessage->idNum);
free(newMessage);
i++;
counter=0;
ch =data[i];
}
free(data);
}
Here, I have allocated 100MB of data with malloc, and passed a file big enough(not 500MB) size of 926KB about. When I pass small files, it reads and exits like a charm, but when I pass a big enough file, the program executes till some point after which it just hangs. I suspect it either entered an infinite loop, or there is memory leak.
EDIT For better understanding I stripped away all unnecessary function calls, and checked what happens, when given a large file as input. I have attached the modified code
void ReadBinary(char *infile,HXmap* AssetMap)
{
int fd;
size_t bytes_read, bytes_expected = 500000000*sizeof(char);
char *data;
if ((fd = open(infile,O_RDONLY)) < 0)
err(EX_NOINPUT, "%s", infile);
if ((data = malloc(bytes_expected)) == NULL)
err(EX_OSERR, "data malloc");
bytes_read = read(fd, data, bytes_expected);
if (bytes_read != bytes_expected)
printf("Read only %d of %d bytes %d\n", \
bytes_read, bytes_expected,EX_DATAERR);
/* ... operate on data ... */
printf("\n");
int i=0;
int counter=0;
char ch=data[0];
char message[512];
while(i<=bytes_read)
{
while(ch!='\n')
{
message[counter]=ch;
i++;
counter++;
ch =data[i];
}
message[counter]='\n';
message[counter+1]='\0';
i++;
printf("idNUM \n");
counter=0;
ch =data[i];
}
free(data);
}
What looks like is, it prints a whole lot of idNUM's and then poof segmentation fault
I think this is an interesting behaviour, and to me it looks like there is some problem with memory
FURTHER EDIT I changed back the i!=bytes_read it gives no segmentation fault. When I check for i<=bytes_read it blows past the limits in the innerloop.(courtesy gdb)

The most glaring problem is this:
while(ch!='\n')
{
message[counter]=ch;
i++;
counter++;
ch =data[i];
}
Unless the last character of the file (or the block that you've just read) is \n, you'll go past the end of the data array, most probably smashing the stack along the way (since you're not checking whether your write to message is within bounds).

Try the following loop. Basically, it refactors your implementation so there is only one place where i is incremented. Having two places is what's causing your trouble.
#include <stdio.h>
#include <string.h>
int main()
{
const char* data = "First line\nSecond line\nThird line";
unsigned int bytes_read = strlen(data);
unsigned int i = 0;
unsigned int counter = 0;
char message[512];
while (i < bytes_read)
{
message[counter] = data[i];
++counter;
if (data[i] == '\n')
{
message[counter] = '\0';
printf("%s", message);
counter = 0;
}
++i;
}
// If data didn't end with a newline
if (counter)
{
message[counter] = '\0';
printf("%s\n", message);
}
return 0;
}
Or, you could take the "don't reinvent the wheel" approach and use a standard strtok call:
#include <stdio.h>
#include <string.h>
int main()
{
char data[] = "First line\nSecond line\nThird line";
char* message = strtok(data, "\n");
while (message)
{
printf("%s\n", message);
message = strtok(NULL, "\n");
}
return 0;
}

Is it possible that on the system you're using, 500,000,000 is larger than the largest size_t? If so, bytes_expected may be rolling over to some smaller value. Then bytes_read is following suit, and you're ending up taking a smaller chunk of data than you actually expect. The result would be that for large data, the last character of data is unlikely to be a '\n', so you blow right past it in that inner loop and start accessing characters beyond the end of data. Segfault follows.

Related

Problem with printing first 10 rows of a file

So I am trying to make a function print_file_rows() that prints the first ten rows of a file. If the file has more than 10 rows it works perfectly fine but if there's 10 or less it starts printing garbage. Any ideas on how I can fix this? (MUST ONLY USE THE SYSTEM FUNCTIONS OPEN/READ/WRITE/CLOSE)
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
void print_file_rows(char *path)
{
int fd = open(path, O_RDONLY);
if (fd < 0)
{
return NULL;
}
size_t size = 100;
size_t offset = 0;
size_t res;
char *buff = malloc(size);
while((res = read(fd, buff + offset, 100)) != 0)
{
offset += res;
if (offset + 100 > size)
{
size *= 2;
buff = realloc(buff, size);
}
}
close(fd);
int j = 0;
for(int i = 0;buff[i] != '\0'; i++)
{
if(j == 10)
{
break;
}
if(buff[i] == '\n')
{
j++;
}
printf("%c", buff[i]);
}
free(buff);
}
int main()
{
print_file_rows("a.txt");
return 0;
}
You do not need any buffers. It is most likely buffered on the OS level so you may print char by char.
int print_file_rows(char *path, size_t nrows)
{
int result = -1;
int fd = open(path, O_RDONLY);
char c;
if (fd > 0)
{
while(nrows && read(fd, &c, 1) == 1)
{
write(STDOUT_FILENO, &c, 1);
if(c == `\n`) nrows--;
}
result = nrows;
}
close(fd);
return result;
}
int main()
{
if(print_file_rows("a.txt", 10) == -1)
printf("Something has gone wrong\n");
return 0;
}
From man 2 read:
SYNOPSIS
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
DESCRIPTION
read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf.
read is for reading raw bytes, and as such has no notion of strings. It does not place a NUL terminating byte ('\0') at the end of the buffer. If you are going to treat the data you read as a string, you must terminate the buffer yourself.
To make room for this NUL terminating byte you should always allocate one extra byte in your buffer (i.e., read one less byte that your maximum).
We can see the return value is actually of type ssize_t, rather than size_t, which allows for
On error, -1 is returned, and errno is set to indicate the error.
This means we will need to check that the return value is greater than zero, rather than not zero which would cause the offset to be decremented on error.
With all that said, note that this answer from a similar question posted just yesterday shows how to achieve this without the use of a dynamic buffer. You can simply read the file one byte at a time and stop reading when you've encountered 10 newline characters.
If you do want to understand how to read a file into a dynamic buffer, then here is an example using the calculated offset to NUL terminate the buffer as it grows. Note that reading the entire file this way is inefficient for this task (especially for a large file).
(Note: the call to write, instead of printf)
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
void print_file_rows(const char *path)
{
int fd = open(path, O_RDONLY);
const size_t read_size = 100;
size_t size = read_size;
size_t offset = 0;
ssize_t res;
char *buff = malloc(size + 1);
while ((res = read(fd, buff + offset, read_size)) > 0) {
offset += res;
buff[offset] = '\0';
if (offset + read_size > size) {
size *= 2;
buff = realloc(buff, size + 1);
}
}
close(fd);
int lines = 0;
for (size_t i = 0; lines < 10 && buff[i] != '\0'; i++) {
write(STDOUT_FILENO, &buff[i], 1);
if (buff[i] == '\n')
lines++;
}
free(buff);
}
int main(void)
{
print_file_rows("a.txt");
}
(Error handling omitted for code brevity. malloc, realloc, and open can all fail, and should normally be handled.)

How to read all the strings first then output it?

I'am using fgets to get inputs.
Input:
station
cricket
expected output:
station
cricket
char str[size_limit];
scanf("%d", &n);
for(i=0; i<n; i++){
fgets(str, sizeof str, stdin);
}
printf("%s", ??);
It's not clear what "compute the outputs" means, but if you want to read all the lines of text in a file and then compute a result from those lines of text, something like this will work (in this case I just print out the lines of text from the in-memory array, but you can modify it as necessary):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char ** linesBuf = NULL;
size_t numSlotsInArray = 0;
int numLinesRead = 0;
char buf[512];
while(fgets(buf, sizeof(buf), stdin))
{
if (numLinesRead >= numSlotsInArray)
{
// Whoops, our allocated array is too short, make it larger
numSlotsInArray = (numSlotsInArray == 0) ? 100 : (numSlotsInArray*2);
linesBuf = realloc(linesBuf, numSlotsInArray*sizeof(char*));
if (linesBuf == NULL)
{
perror("realloc");
exit(10);
}
}
char * nextLine = strdup(buf);
if (nextLine == NULL)
{
perror("strdup");
exit(10);
}
linesBuf[numLinesRead++] = nextLine;
}
printf("total number of lines read: %i\n", numLinesRead);
for(size_t i=0; i<numLinesRead; i++)
{
printf("Line %zu is: [%s]\n", i, linesBuf[i]);
}
// Clean up to avoid leaking memory
for(size_t i=0; i<numLinesRead; i++) free(linesBuf[i]);
free(linesBuf);
return 0;
}
You can run this with stdin redirected from a file (e.g. ./a.out < input_file.txt) or if you run it from a Terminal window, press Ctrl-D to indicate end-of-file to the program.

C programming in Linux: not getting correct output for program that finds number of occurrences of substring in file

I am writing a program that finds the number of occurrences of input substrings from the command line inside a text file (also read from the command line) which is written into a buffer.
When I run the code in bash, I get the error: Segmentation fault (core dumped).
I am still learning how to code with C in this environment and have some sort of idea as to why the segmentation fault occurred (misuse of dynamic memory allocation?), but I could not find the problem with it. All I could conclude was that the problem is coming from within the for loop (I labeled where the potential error is being caused in the code).
EDIT: I managed to fix the segmentation fault error by changing argv[j] to argv[i], however when I run the code now, count1 always returns 0 even if the substring occurs multiple times in the text file and I am not sure what is wrong even though I have gone through the code multiple times.
$ more foo.txt
aabbccc
$ ./main foo.txt a
0
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>
int main(int argc, char *argv[]) {
FILE *fp;
long lsize;
char *buf;
int count = 0, count1 = 0;
int i, j, k, l1, l2;
if (argc < 3) { printf("Error: insufficient arguments.\n"); return(1); };
fp = fopen(argv[1], "r");
if (!fp) {
perror(argv[1]);
exit(1);
}
//get size of file
fseek(fp, 0L, SEEK_END);
lsize = ftell(fp);
rewind(fp);
//allocate memory for entire content
buf = calloc(1, lsize+1);
if (!buf) {
fclose(fp);
fputs("Memory alloc fails.\n", stderr);
exit(1);
}
//copy the file into the buffer
if (1 != fread(buf, lsize, 1, fp)) {
fclose(fp);
free(buf);
fputs("Entire read fails.\n", stderr);
exit(1);
}
l1 = strlen(buf);
//error is somewhere here
for (i = 2; i < argc; i++) {
for (j = 0; j < l1;) {
k = 0;
count = 0;
while ((&buf[j] == argv[k])) {
count++;
j++;
k++;
}
if (count == strlen(argv[j])) {
count1++;
count = 0;
}
else
j++;
}
printf("%d\n", count1);
}
fclose(fp);
return 0;
}
fread(buf, lsize, 1, fp) will read 1 block of lsize bytes, however fread
doesn't care about the contents and won't add a '\0'-terminating byte for the
string, so l1 = strlen(buf); yields undefined behaviour, the rest of the
result can be ignored as a result of this (and your counting has errors as well).
Note that files usually don't have a 0-terminating byte at the end,
that applies even for files containing text, they usually end with a
newline.
You have to set the 0-terminating byte yourself:
if (1 != fread(buf, lsize, 1, fp)) {
fclose(fp);
free(buf);
fputs("Entire read fails.\n", stderr);
exit(1);
}
buf[lsize] = '0';
And you can use strstr to get the location of the substring, like this:
for(i = 2; i < argc; ++i)
{
char *content = buf;
int count = 0;
while((content = strstr(content, argv[i])))
{
count++;
content++; // point to the next char in the substring
}
printf("The substring '%s' appears %d time(s)\n", argv[i], count);
}
Your counting is wrong, there are some errors. This comparison
&buf[j] == argv[k]
is wrong, you are comparing pointers, not the contents. You have to use strcmp
to compare strings. In this case you would have to use strncmp because you
only want to match the substring:
while(strncmp(&buf[j], argv[k], strlen(argv[k])) == 0)
{
// substring matched
}
but this is also wrong, because you are incrementing k as well, which will
give you the next argument, at the end you might read beyond the limits of
argv if the substring is longer than the number of arguments. Based on your
code, you would have to compare characters:
while(buf[j] == argv[i][k])
{
j++;
k++;
}
You would have to increment the counter only when a substring is matched, like
this:
l1 = strlen(buf);
for (i = 2; i < argc; i++) {
int count = 0;
int k = 0; // running index for inspecting argv[i]
for (j = 0; j < l1; ++j) {
while(buf[j + k] == argv[i][k])
k++;
// if all characters of argv[i]
// matched, argv[i][k] will be the
// 0-terminating byte
if(argv[i][k] == 0)
count++;
// reset running index for argv[i]
// go to next char if buf
k = 0;
}
printf("The substring '%s' appears %d time(s)\n", argv[i], count);
}

struct pointers to same memory address producing different data?

I have this simple code to read the lines of a file and store them in a struct:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct filedata {
char **items;
int lines;
};
struct filedata *read_file(char *filename) {
FILE* file = fopen(filename, "r");
if (file == NULL) {
printf("Can't read %s \n", filename);
exit(1);
}
char rbuff;
int nlines = 0; // amount of lines
int chr = 0; // character count
int maxlen = 0; // max line length (to create optimal buffer)
int minlen = 2; // min line length (ignores empty lines with just \n, etc)
while ((rbuff = fgetc(file) - 0) != EOF) {
if (rbuff == '\n') {
if (chr > maxlen) {
maxlen = chr + 1;
}
if (chr > minlen) {
nlines++;
}
chr = 0;
}
else {
chr++;
}
}
struct filedata *rdata = malloc(sizeof(struct filedata));
rdata->lines = nlines;
printf("lines: %d\nmax string len: %d\n\n", nlines, maxlen);
rewind(file);
char *list[nlines];
int buffsize = maxlen * sizeof(char);
char buff[buffsize];
int i = 0;
while (fgets(buff, buffsize, file)) {
if (strlen(buff) > minlen) {
list[i] = malloc(strlen(buff) * sizeof(char) + 1);
strcpy(list[i], buff);
i++;
}
}
rdata->items = (char **)list;
fclose(file);
int c = 0;
for (c; c < rdata->lines; c++) {
printf("line %d: %s\n", c + 1, rdata->items[c]);
}
printf("\n");
return rdata;
}
int main(void) {
char fname[] = "test.txt";
struct filedata *ptr = read_file(fname);
int c = 0;
for (c; c < ptr->lines; c++) {
printf("line %d: %s\n", c + 1, ptr->items[c]);
}
return 0;
}
This is the output when I run it:
lines: 2
max string len: 6
line 1: hello
line 2: world
line 1: hello
line 2: H��
For some reason when it reaches the second index in ptr->items, it prints gibberish output. But yet, if I throw some printf()'s in there to show the pointer addresses, they're exactly the same.
Valgrind also prints this when iterating over the char array the second time:
==3777== Invalid read of size 8
==3777== at 0x400AB3: main (test.c:81)
==3777== Address 0xfff000540 is on thread 1's stack
==3777== 240 bytes below stack pointer
But that really doesn't give me any clues in this case.
I'm using gcc 4.9.4 with glibc-2.24 if that matters.
list is an non-static local variable and using it after exiting its scope (returning from read_file in this case) will invoke undefined behavior because it will vanish on exiting its scope. Allocate it dynamically (typically on the heap) like
char **list = malloc(sizeof(char*) * nlines);
Adding code to check if malloc()s are successful will make your code better.
The variable list is local to read_file, but you store a pointer to list in rdata->items. When read_file returns, rdata->items is a dangling pointer, and accessing it is undefined behavior.

Weird behaviour of FIFO reading/writing in C

I've got a C program that reproduces a server using FIFOs. The program reads two lines from an input FIFO — a number n and a string str— and writes on an output FIFO n lines, each of which is a single occurrence of str. I wrote the following code.
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <ctype.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#define MAX_SIZE 256
char *readline(int fd, char *buffer) {
char c;
int i = 0;
while (read(fd, &c, 1) != 0) {
if (c == '\n')
break;
buffer[i++] = c;
}
return buffer;
}
int main(int argc, char** argv) {
mkfifo(argv[1], 0666);
mkfifo(argv[2], 0666);
int in = open(argv[1], O_RDONLY);
int out = open(argv[2] ,O_WRONLY);
char line[MAX_SIZE];
memset(line, 0, MAX_SIZE);
int n, i;
while (1) {
strcpy(line, readline(in, line));
sscanf(line, "%d", &n);
strcpy(line, readline(in, line));
for (i = 0; i < n; i++) {
write(out, line, strlen(line));
write(out, "\n", 1);
}
}
close(in);
close(out);
return 0;
}
This program compiles and runs with no errors, but it outputs a different number of occurrences of the string at each execution. For example, if the two input lines in the input FIFO are 5\nhello, it then prints out from 1 to 25 occurrences of hello at each run (frequency appears to be completely random).
I've been stuck on this for two days. Please give me some help.
I make no claims or warrants that I even know what your program does, as it has been 20 years since I had any pressing need to work with FIFO's at the system level. But one thing is clear. Two days is a long time to work on something without ever running it in a debugger, of which doing so would have exposed a number of problems.
First, readline() never terminates the string it is passed. This isn't as important the first time around as the second and beyond, since shorter data may be present in the input lines. Furthermore, read() can fail, and in doing so does not return 0, the only condition on your loop which will break. That failure should break the loop and be reflected in the return result. Because you return the buffer pointer, a reasonable failure-result could be NULL:
Consider something like this:
char *readline(int fd, char *buffer)
{
ssize_t res = 0;
char c = 0;
int i = 0;
for(;;)
{
res = read(fd, &c, 1);
if (res < 0)
return NULL;
else if (res == 0 || c == '\n')
break;
buffer[i++] = c;
};
buffer[i] = 0;
return buffer;
}
One could argue that it should return NULL if the buffer is empty, since you can't put a 0-length packet on a FIFO. I leave that for you to decide, but it is a potential hole in your algorithm, to be sure.
Next the strcpy() function has undefined behavior if the buffers submitted overlap. Since readline() returns the very buffer that was passed in, and since said-same buffer is also the target of strcpy() is the same buffer, your program is executing UB. From all that I see, strcpy() is useless in this program in the first place, and shouldn't even be there at all.
This is clearly wrong:
strcpy(line, readline(in, line));
sscanf(line, "%d", &n);
strcpy(line, readline(in, line));
for (i = 0; i < n; i++) {
write(out, line, strlen(line));
write(out, "\n", 1);
}
The above should be this:
if (readline(in, line))
{
if (sscanf(line, "%d", &n) == 1)
{
if (readline(in, line))
{
for (i = 0; i < n; i++)
{
write(out, line, strlen(line));
write(out, "\n", 1);
}
}
}
}
assuming the changes to readline() as prescribed were made. These could be combined into a single three-expression if-clause, but as written above it is at-least debuggable. In other words, via short-circuit eval there should be no problems doing this:
if (readline(in, line) &&
sscanf(line, "%d", &n) == 1 &&
readline(in, line))
{
for (i = 0; i < n; i++)
{
write(out, line, strlen(line));
write(out, "\n", 1);
}
}
but I would advise you keep the former until you have thoroughly debugged this.
Finally, note that readline() is still a buffer overflow just waiting to happen. You really should pass a max-len to that function and limit that potential possibility, or dynamically manage the buffer.
The code does not initalises line for each iteration, so if readline() does not read anything it leaves line's content untouched.
And you do not test whether sscanf() fails, the code does not recognize als n is left unchanged and the last value of line get printed out n times and everything starts over again ...
Also readline() misses to check whether read() failed.
To learn from this exercise it to always test the result of a (system) call whether it failed or not.
int readline(int fd, char *buf, int nbytes) {
int numread = 0;
int value; /* read fonksiyonu sonunda okunan sayı degerini gore islem yapar */
/* Controls */
while (numread < nbytes - 1) {
value = read(fd, buf + numread, 1);
if ((value == -1) && (errno == EINTR))
continue;
if ( (value == 0) && (numread == 0) )
return 0;
if (value == 0)
break;
if (value == -1)
return -1;
numread++;
/* realocating for expand the buffer ...*/
if( numread == allocSize-2 ){
allocSize*=2; /* allocSize yeterli olmadıgı zaman buf ı genisletmemizi saglayarak
memory leak i onler */
buf=realloc(buf,allocSize);
if( buf == NULL ){
fprintf(stderr,"Failed to reallocate!\n");
return -1;
}
}
/* Eger gelen karakter \n ise return okudugu karakter sayısı */
if (buf[numread-1] == '\n') {
buf[numread] = '\0';
return numread;
}
}
errno = EINVAL;
return -1;
}

Resources