reading a file that doesn't exist - c

I have got a small program that prints the contents of files using the system call - read.
unsigned char buffer[8];
size_t offset=0;
size_t bytes_read;
int i;
int fd = open(argv[1], O_RDONLY);
do{
bytes_read = read(fd, buffer, sizeof(buffer));
printf("0x%06x : ", offset);
for(i=0; i<bytes_read; ++i)
{
printf("%c ", buffer[i]);
}
printf("\n");
offset = offset + bytes_read;
}while(bytes_read == sizeof(buffer));
Now while running I give a file name that doesn't exist.
It prints some kind of data mixed with environment variables and a segmentation fault at the end.
How is this possible? What is the program printing?
Thanks,
John

It's printing rubbish because fd will invariably be set to -1 which is not a good thing to pass to read since it will, in turn do nothing other than return -1 as well. It will leave your buffer untouched meaning that it's holding whatever rubbish you had in there when you started.
You could probably put the entire do loop inside something like:
if (fd == -1) {
printf ("error here");
} else {
// do loop here
}

read is returning -1 because fd is invalid, you store that in bytes_read which is of type size_t which is unsigned, so your loop prints (size_t)-1 chars, which is a very large number, much larger than the size of buffer. So, you're printing a big chunk of your address space and then getting a segfault when you eventually reach the end and access an invalid address.
As others have mentioned (without answering your actual question), you should be checking the results of open for an error. e.g.,
int fd = open(argv[1], O_RDONLY);
if( fd < 0 ){
fprintf(stderr, "error opening %s: %s\n", argv[1], strerror(errno));
exit(1);
}
A caveat: if you do another system call, or call any routine that might do a system call (e.g., printf) before calling strerror, you must save errno and then pass the saved copy to strerror.
Another note about your program:
while(bytes_read == sizeof(buffer))
This is not a good test, because read can return less than the amount you ask for. Your loop should continue until read returns <= 0.

You should probably check that the file descriptor returned by open is valid before using it. As per these docs, you should get a non-negative response for a valid file. Reading from an invalid descriptor is likely the source of your problem.

Upon successful completion, open function shall open the file and return a non-negative integer representing the file descriptor. Otherwise, -1 shall be returned and errno set to indicate the error. So please check fd before entering the loop to perform the read.

Related

Incorrect fprintf results

In the code below, I am trying to read from a socket and store the results in a file.
What actually happens, is that my client sends a GET request to my server for a file.html. My server finds the file and writes the contents of it to the socket. Lastly my client reads the content from thread_fd and recreates the file.
For some reason the recreated file has less content than the original. I have located the problem to be some lines in the end, that are missing. When I use printf("%s", buffer) inside the while loop everything seems fine in STDOUT but my fprintf misses somewhat 3.000 bytes for a file of 81.000 bytes size.
#define MAXSIZE 1000
int bytes_read, thread_fd;
char buffer[MAXSIZE];
FILE* new_file;
memset(buffer, 0, MAXSIZE);
if((new_file = fopen(path, "wb+")) == NULL)
{
printf("can not open file \n");
exit(EXIT_FAILURE);
}
while ((bytes_read = read(thread_fd, buffer, MAXSIZE)) > 0)
{
fprintf(new_file, "%s", buffer);
if(bytes_read < MAXSIZE)
break;
memset(buffer, 0, MAXSIZE);
}
You read binary data from the socket that may or may not contain a \0 byte. When you then fprintf that data the fprintf will stop at the first \0 it encounters. In your case that is 3000 bytes short of the full file. If your file contains no \0 byte the fprintf will simply continue printing the ram contents until it segfaults.
Use write() to write the data back to the file and check for errors. Don't forget to close() the file and check that for errors too.
Your code should/could look like:
int readfile(int thread_fd, char *path)
{
unsigned int bytes_read;
char buffer[MAXSIZE];
int new_file;
if ((new_file = open(path, _O_CREAT|_O_BINARY,_S_IWRITE)) == -1) return -1;
while ((bytes_read = read(thread_fd, buffer, MAXSIZE)) > 0)
{
if (write(new_file, buffer, bytes_read)!= bytes_read) {
close(new_file);
return -2;
}
}
close(new_file);
return 0;
}
There are a few issues with your code that can cause this.
The most likely cause is this :
if(bytes_read < MAXSIZE)
break;
This ends the loop when read returns less than the requested amount of bytes. This is however perfectly normal behavior, and can happen eg. when not enough bytes are available at the time of the read call (it's reading from a network socket after all). Just let the loop continue as long as read returns a value > 0 (assuming the socket is a blocking socket - if not, you'll also have to check for EAGAIN and EWOULDBLOCK).
Additionally, if the file you're receiving contains binary data, then it's not a good idea to use fprintf with "%s" to write to the target file. This will stop writing as soon as it finds a '\0' byte (which is not uncommon in binary data). Use fwrite instead.
Even if you're receiving text (suggested by the html file extension), it's still not a good idea to use fprintf with "%s", since the received data won't be '\0' terminated.
This worked!
ps: I don't know if I should be doing this, since I am new here, but really there is no reason for negativity. Any question is a good question. Just answer it if you know it. Do not judge it.
#define MAXSIZE 1000
int bytes_read, thread_fd, new_file;
char buffer[MAXSIZE];
memset(buffer, 0, MAXSIZE);
if((new_file = open(path, O_RDONLY | O_WRONLY | O_CREAT)) < 0)
{
printf("can not open file \n");
exit(EXIT_FAILURE);
}
while ((bytes_read = read(thread_fd, buffer, MAXSIZE)) > 0)
write(new_file, buffer, bytes_read);
close(new_file);

Getting characters past a certain point in a file in C

I want to take all characters past location 900 from a file called WWW, and put all of these in an array:
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while((NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
I try to create a char array of length 1, since the read system call requires a pointer, I cannot use a regular char. The above code does not work. In fact, it does not print any characters to the terminal as expected by the loop. I think my logic is correct, but perhaps a misunderstanding of whats going on behind the scenes is what is making this hard for me. Or maybe i missed something simple (hope not).
If you already know how many bytes to read (e.g. in appropriatesize) then just read in that many bytes at once, rather than reading in bytes one at a time.
char everythingPast900[appropriatesize];
ssize_t bytesRead = read(WWW, everythingPast900, sizeof everythingPast900);
if (bytesRead > 0 && bytesRead != appropriatesize)
{
// only everythingPast900[0] to everythingPast900[bytesRead - 1] is valid
}
I made a test version of your code and added bits you left out. Why did you leave them out?
I also made a file named www.txt that has a hundred lines of "This is a test line." in it.
And I found a potential problem, depending on how big your appropriatesize value is and how big the file is. If you write past the end of EverythingPast900 it is possible for you to kill your program and crash it before you ever produce any output to display. That might happen on Windows where stdout may not be line buffered depending on which libraries you used.
See the MSDN setvbuf page, in particular "For some systems, this provides line buffering. However, for Win32, the behavior is the same as _IOFBF - Full Buffering."
This seems to work:
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
int main()
{
int WWW = open("www.txt", O_RDONLY);
if(WWW < 0)
printf("Error opening www.txt\n");
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
int appropriatesize = 1000;
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while(i < appropriatesize && (NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
return 0;
}
As stated in another answer, read more than one byte. The theory behind "buffers" is to reduce the amount of read/write operations due to how slow disk I/O (or network I/O) is compared to memory speed and CPU speed. Look at it as if it is code and consider which is faster: adding 1 to the file size N times and writing N bytes individually, or adding N to the file size once and writing N bytes at once?
Another thing worth mentioning is the fact that read may read fewer than the number of bytes you requested, even if there is more to read. The answer written by #dreamlax illustrates this fact. If you want, you can use a loop to read as many bytes as possible, filling the buffer. Note that I used a function, but you can do the same thing in your main code:
#include <sys/types.h>
/* Read from a file descriptor, filling the buffer with the requested
* number of bytes. If the end-of-file is encountered, the number of
* bytes returned may be less than the requested number of bytes.
* On error, -1 is returned. See read(2) or read(3) for possible
* values of errno.
* Otherwise, the number of bytes read is returned.
*/
ssize_t
read_fill (int fd, char *readbuf, ssize_t nrequested)
{
ssize_t nread, nsum = 0;
while (nrequested > 0
&& (nread = read (fd, readbuf, nrequested)) > 0)
{
nsum += nread;
nrequested -= nread;
readbuf += nread;
}
return nsum;
}
Note that the buffer is not null-terminated as not all data is necessarily text. You can pass buffer_size - 1 as the requested number of bytes and use the return value to add a null terminator where necessary. This is useful primarily when interacting with functions that will expect a null-terminated string:
char readbuf[4096];
ssize_t n;
int fd;
fd = open ("WWW", O_RDONLY);
if (fd == -1)
{
perror ("unable to open WWW");
exit (1);
}
n = lseek (fd, 900, SEEK_SET);
if (n == -1)
{
fprintf (stderr,
"warning: seek operation failed: %s\n"
" reading 900 bytes instead\n",
strerror (errno));
n = read_fill (fd, readbuf, 900);
if (n < 900)
{
fprintf (stderr, "error: fewer than 900 bytes in file\n");
close (fd);
exit (1);
}
}
/* Read a file, printing its contents to the screen.
*
* Caveat:
* Not safe for UTF-8 or other variable-width/multibyte
* encodings since required bytes may get cut off.
*/
while ((n = read_fill (fd, readbuf, (ssize_t) sizeof readbuf - 1)) > 0)
{
readbuf[n] = 0;
printf ("Read\n****\n%s\n****\n", readbuf);
}
if (n == -1)
{
close (fd);
perror ("error reading from WWW");
exit (1);
}
close (fd);
I could also have avoided the null termination operation and filled all 4096 bytes of the buffer, electing to use the precision part of the format specifiers of printf in this case, changing the format specification from %s to %.4096s. However, this may not be feasible with unusually large buffers (perhaps allocated by malloc to avoid stack overflow) because the buffer size may not be representable with the int type.
Also, you can use a regular char just fine:
char c;
nread = read (fd, &c, 1);
Apparently you didn't know that the unary & operator gets the address of whatever variable is its operand, creating a value of type pointer-to-{typeof var}? Either way, it takes up the same amount of memory, but reading 1 byte at a time is something that normally isn't done as I've explained.
Mixing declarations and code is a no no. Also, no, that is not a valid declaration. C should complain about it along the lines of it being variably defined.
What you want is dynamically allocating the memory for your char buffer[]. You'll have to use pointers.
http://www.ontko.com/pub/rayo/cs35/pointers.html
Then read this one.
http://www.cprogramming.com/tutorial/c/lesson6.html
Then research a function called memcpy().
Enjoy.
Read through that guide, then you should be able to solve your problem in an entirely different way.
Psuedo code.
declare a buffer of char(pointer related)
allocate memory for said buffer(dynamic memory related)
Find location of where you want to start at
point to it(pointer related)
Figure out how much you want to store(technically a part of allocating memory^^^)
Use memcpy() to store what you want in the buffer

reading from a file descriptor in C

(correct me if im wrong on my terms) So i need to read from a file descriptor, but the read method takes in a int for byte size to read that much OR i can use O_NONBLOCK, but i still have to setup up a buffer size of an unknown size. making it difficult. heres what i have so far
this is my method that handles all the polling and mkfifo. and N is already predefined in main
struct pollfd pfd[N];
int i;
for(i = 0; i < N; i++)
{
char fileName[32];
snprintf (fileName, sizeof(fileName), "%d_%di", pid, i);
mkfifo(fileName, 0666);
pfd[i].fd = open(fileName, O_RDONLY | O_NDELAY);
pfd[i].events = POLLIN;
pfd[i].revents = 0;
snprintf (fileName, sizeof(fileName), "%d_%do", pid, i);
mkfifo(fileName, 0666);
i++;
pfd[i].fd = open(fileName, O_WRONLY | O_NDELAY);
pfd[i].events = POLLOUT;
pfd[i].revents = 0;
i--;
}
while(1)
{
int len, n;
n = poll(pfd, N, 2000);
if( n < 0 )
{
printf("ERROR on poll");
continue;
}
if(n == 0)
{
printf("waiting....\n");
continue;
}
for(i = 0; i < N; i++)
{
char buff[1024]; <---i dont want to do this
if (pfd[i].revents & POLLIN)
{
printf("Processing input....\n");
read(pfd[i].fd, buff, O_NONBLOCK);
readBattlefield(buff);
print_battleField_stats();
pfd[i].fd = 0;
}
}
}
i also read somewhere that once read() reads all the data coming, it empties the pipe, meaning i can use the same again for another incoming data. but it doesnt empty the pipe because i cant use the same pipe again. I asked my professor but all he says was to use something like scanf, but how do use scanf if scanf takes a FILE stream, and the poll.fd is an int? essentially my ultimate question is, how to read the incoming data through the file descriptor using scan or of other sort? using scan will help me more with handling the data.
EDIT:
in another terminal i have to put cat file > (named_file)
and my main program will read the input data. heres what the input data looks like
3 3
1 2 0
0 2 0
3 0 0
first 2 numbers are grid information and player number, and after that is grid, but this a simplified version, ill be dealing with sizes over 100's of players and grids of over 1000's
char buff[1024]; <---i dont want to do this
What would you like to do then? This is how it works. This is not how it works:
read(pfd[i].fd, buff, O_NONBLOCK);
This will compile because O_NONBLOCK is an integer #define, but it is absolutely and unequivocally incorrect. The third argument to read() is a number of bytes to read. Not a flag. Period. It may be zero, but what you've done here is pass an arbitrary number -- whatever the value of O_NONBLOCK is, which could easily be more than 1024, the size of your buffer. This does not set the read non-block. recv() is similar to read() and does take such flags as a forth argument, but you can't use that with a file descriptor. If you want to set non-block on a file descriptor, you must do it with open() or fcntl().
how to read the incoming data through the file descriptor using scan or of other sort?
You can create a FILE* stream from an open descriptor with fdopen().
i also read somewhere that once read() reads all the data coming, it empties the pipe, meaning i can use the same again for another incoming data. but it doesnt empty the pipe because i cant use the same pipe again.
Once you reach EOF (because the writer closed the connection), read() will return 0, and continue to return 0 immediately until someone opens the pipe again.
If you set the descriptor non-block, read() will always return immediately; if there is someone connected and nothing to read, it will return -1 but errno will == EAGAIN. See man 2 read.
man fifo is definitely something you should read; if there's anything you aren't sure about, ask a specific question based on that.
And don't forget: Fix that read() call. It's wrong. W R O N G. Your prof/TA/whoever will not miss that.

Reading a large file using C (greater than 4GB) using read function, causing problems

I have to write C code for reading large files. The code is below:
int read_from_file_open(char *filename,long size)
{
long read1=0;
int result=1;
int fd;
int check=0;
long *buffer=(long*) malloc(size * sizeof(int));
fd = open(filename, O_RDONLY|O_LARGEFILE);
if (fd == -1)
{
printf("\nFile Open Unsuccessful\n");
exit (0);;
}
long chunk=0;
lseek(fd,0,SEEK_SET);
printf("\nCurrent Position%d\n",lseek(fd,size,SEEK_SET));
while ( chunk < size )
{
printf ("the size of chunk read is %d\n",chunk);
if ( read(fd,buffer,1048576) == -1 )
{
result=0;
}
if (result == 0)
{
printf("\nRead Unsuccessful\n");
close(fd);
return(result);
}
chunk=chunk+1048576;
lseek(fd,chunk,SEEK_SET);
free(buffer);
}
printf("\nRead Successful\n");
close(fd);
return(result);
}
The issue I am facing here is that as long as the argument passed (size parameter) is less than 264000000 bytes, it seems to be able to read. I am getting the increasing sizes of the chunk variable with each cycle.
When I pass 264000000 bytes or more, the read fails, i.e.: according to the check used read returns -1.
Can anyone point me to why this is happening? I am compiling using cc in normal mode, not using DD64.
In the first place, why do you need lseek() in your cycle? read() will advance the cursor in the file by the number of bytes read.
And, to the topic: long, and, respectively, chunk, have a maximum value of 2147483647, any number greater than that will actually become negative.
You want to use off_t to declare chunk: off_t chunk, and size as size_t.
That's the main reason why lseek() fails.
And, then again, as other people have noticed, you do not want to free() your buffer inside the cycle.
Note also that you will overwrite the data you have already read.
Additionally, read() will not necessarily read as much as you have asked it to, so it is better to advance chunk by the amount of the bytes actually read, rather than amount of bytes you want to read.
Taking everything in regards, the correct code should probably look something like this:
// Edited: note comments after the code
#ifndef O_LARGEFILE
#define O_LARGEFILE 0
#endif
int read_from_file_open(char *filename,size_t size)
{
int fd;
long *buffer=(long*) malloc(size * sizeof(long));
fd = open(filename, O_RDONLY|O_LARGEFILE);
if (fd == -1)
{
printf("\nFile Open Unsuccessful\n");
exit (0);;
}
off_t chunk=0;
lseek(fd,0,SEEK_SET);
printf("\nCurrent Position%d\n",lseek(fd,size,SEEK_SET));
while ( chunk < size )
{
printf ("the size of chunk read is %d\n",chunk);
size_t readnow;
readnow=read(fd,((char *)buffer)+chunk,1048576);
if (readnow < 0 )
{
printf("\nRead Unsuccessful\n");
free (buffer);
close (fd);
return 0;
}
chunk=chunk+readnow;
}
printf("\nRead Successful\n");
free(buffer);
close(fd);
return 1;
}
I also took the liberty of removing result variable and all related logic since, I believe, it can be simplified.
Edit: I have noted that some systems (most notably, BSD) do not have O_LARGEFILE, since it is not needed there. So, I have added an #ifdef in the beginning, which would make the code more portable.
The lseek function may have difficulty in supporting big file sizes. Try using lseek64
Please check the link to see the associated macros which needs to be defined when you use lseek64 function.
If its 32 bit machine, it will cause some problem for reading a file of larger than 4gb. So if you are using gcc compiler try to use the macro -D_LARGEFILE_SOURCE=1 and -D_FILE_OFFSET_BITS=64.
Please check this link also
If you are using any other compiler check for similar types of compiler option.

Socket Read/Write error

would install valgrind to tell me what the problem is, but unfortunately can't any new programs on this computer... Could anyone tell me if there's an obvious problem with this "echo" program? Doing this for a friend, so not sure what the layout of the client is on the other side, but I know that both reads and writes are valid socket descriptors, and I've tested that n = write(writes,"I got your message \n",20); and n = write(reads,"I got your message \n",20); both work so can confirm that it's not a case of an invalid fd. Thanks!
int
main( int argc, char** argv ) {
int reads = atoi(argv[1]) ;
int writes = atoi(argv[3]) ;
int n ;
char buffer[MAX_LINE];
memset(buffer, 0, sizeof(buffer));
int i = 0 ;
while (1) {
read(reads, buffer, sizeof(buffer));
n = write(writes,buffer,sizeof(buffer));
if (n < 0) perror("ERROR reading from socket");
}
There are a few problems, the most pressing of which is that you're likely pushing garbage data down the the write socket by using sizeof(buffer) when writing. Lets say you read data from the reads socket and it's less than MAX_LINES. When you go to write that data, you'll be writing whatever you read plus the garbage at the end of the buffer (even though you memset at the very beginning, continual use of the same buffer without reacting to different read sizes will probably generate some garbage.
Try getting the return value from read and using it in your write. If the read return indicates an error, clean up and either exit or try again, depending on how you want your program to behave.
int n, size;
while (1) {
size = read(reads, buffer, sizeof(buffer));
if (size > 0) {
n = write(writes, buffer, size);
if (n != size) {
// write error, do something
}
} else {
// Read error, do something
}
}
This, of course, assumes your writes and reads are valid file descriptors.
These two lines look very suspicious:
int reads = atoi(argv[1]) ;
int writes = atoi(argv[3]) ;
Do you really get file/socket descriptor numbers on the command line? From where?
Check the return value of your read(2) and write(2), and then the value of errno(3) - they probably tell you that your file descriptors are invalid (EBADF).
One point not made thus far: Although you know that the file descriptors are valid, you should include some sanity checking of the command line.
if (argc < 3) {
printf("usage: foo: input output\n");
exit(0);
}
Even with this sanity checking passing parameters like this on a command line can be dangerous.
The memset() is not needed, provided you change the following (which you should do nevertheless).
read() has a result, telling you how much it has actually read. This you should give to write() in order to write only what you actually have, removing the need for zeroing.
MAX_LINE should be at least 512, if not more.
There probably are some more issues, but I think I have the most important ones.

Resources