Reading File in C using System Call - c

I have tried several solution answers from the site but unable to understand what is going wrong with this code.
I am simply trying to read the file data.txt and print it. The file contains just 12 characters "abcd1234efgh".
fd comes out positive but "br" is 0 on executing the read. Please help out if anyone has some clue on this
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
int main(int args,char* vargs[])
{
int fd = 0;
fd = open("data.txt",O_RDONLY);
if(fd<=0)
printf("Invalid file name");
else{
off_t fs =lseek(fd, (off_t) 0, SEEK_END);
char buf[10];
off_t br = read(fd,buf,10);
printf("%s",buf);
}
return 0;
}

lseek(fd, 0, SEEK_END);
After this file pointer is set at the end of the file, so any further reading is unsuccessful. Just comment this instruction out, or change it to suit your needs.

This:
lseek(fd, (off_t) 0, SEEK_END);
seeks to (an offset of 0 from) the end of the file. When you subsequently try to read, there are no bytes available past that point. You should not need to seek at all if you want to read from the beginning of the file.

Related

Why does fgetc put the file offset at the end of the file?

I have a simple test program that uses fgetc() to read a character from a file stream and lseek() to read a file offset. It looks like this:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main() {
char buf[] = "hello world";
FILE *f;
int fd;
fd = open("test.txt", O_RDWR | O_CREAT | O_TRUNC, 0600);
write(fd, buf, sizeof(buf));
lseek(fd, 0, SEEK_SET);
f = fdopen(fd, "r");
printf("%c\n", fgetc(f));
printf("%d\n", lseek(fd, 0, SEEK_CUR));
}
When I run it, I get the following output:
h
12
The return value of fgetc(f), h, makes sense to me. But why is it repositioning the file offset to be at the end of the file? Why doesn't lseek(fd, 0, SEEK_CUR) give me 1?
If I repeat the the first print statement, it works as expected and prints an e then an l etc.
I don't see any mention of this weird behavior in the man page.
stdio functions like fgetc are buffered. They will read() a large block into a buffer and then return characters from the buffer on successive calls.
Since the default buffer size is more than 12 (usually many KB), the first time you fgetc(), it tries to fill its buffer which means reading the entire file. Thus lseek returns a position at the end of the file.
If you want to get a file position that takes into account what's still in the buffer, use ftell() instead.

Reading files to shared memory

I am reading a binary file that I want to offload directly to the Xeon Phi through Cilk and shared memory.
As we are reading fairly much data at once each time and binary data the preferred option is to use fread.
So if I make a very simple example it would go like this
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
_Cilk_shared uint8_t* _Cilk_shared buf;
int main(int argc, char **argv) {
printf("Argv is %s\n", argv[1]);
FILE* infile = fopen(argv[1], "rb");
buf = (_Cilk_shared uint8_t*) _Offload_shared_malloc(2073600);
int len = fread(buf, 1, 2073600, infile);
if(ferror(infile)) {
perror("ferror");
}
printf("Len is %d and first value of buf is %d\n", len, *buf);
return 0;
}
The example is very simplified from the real code but enough to examplify the behavior.
This code would then return
ferror: Bad address
Len is 0 and first value of buf is 0
However if we switch out the fread for a fgets (not very suitable for reading binary data, specially with the return value) things work great.
That is we switch fgets((char *) buf, 2073600, infile); and then drop the len from the print out we get
first value of buf is 46
Which fits with what we need and I can run _Offload_cilk on a function with buf as an argument and do work on it.
Is there something I am missing or is fread just not supported? I've tried to find as much info on this from both intel and other sites on the internet but I have sadly been unable to.
----EDIT----
After more research into this it seems that running fread on the shared memory with a value higher than 524287 (524287 is 19 bits exactly) fread gets the error from above. At 524287 or lower things work, and you can run as many fread as you want and read all the data.
I am utterly unable to find any reason written anywhere for this.
I don't have a PHI, so unable to see if this would make a difference -- but fread has it's own buffering, and while that may be turned of for this type of readind, then I don't see why you would go through the overhead of using fread rather than just using the lower level calls of open&read, like
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdint.h>
_Cilk_shared uint8_t* _Cilk_shared buf;
int main(int argc, char **argv) {
printf("Argv is %s\n", argv[1]);
int infile = open(argv[1], O_RDONLY); // should test if open ok, but skip to make code similar to OP's
int len, pos =0, size = 2073600;
buf = (_Cilk_shared uint8_t*) _Offload_shared_malloc(size);
do {
buf[pos]=0; // force the address to be mapped to process memory before read
len = read(infile, &buf[pos], size);
if(len < 0) {
perror("error");
break;
}
pos += len; // move position forward in cases where we have no read the entire data in first read.
size -= len;
} while (size > 0);
printf("Len is %d (%d) and first value of buf is %d\n", len, pos, *buf);
return 0;
}
read & write should work with shared memory allocated without the problem you are seeing.
Can you try to insert something like this before the fread calls?
memset(buf, 0, 2073600); // after including string.h
This trick worked for me, but I don't know why (lazy allocation?).
FYI, you can also post a MIC question on this forum.

Finding the size of a file created by fmemopen

I'm using fmemopen to create a variable FILE* fid to pass it to a function the reads data from an open file.
Somewhere in that function it uses the following code to find out the size of the file:
fseek(fid, 0, SEEK_END);
file_size = ftell(fid);
this works well in case of regular files, but in case of file ids created by fmemopen I always get file_size = 8192
Any ideas why this happens?
Is there a method to get the correct file size that works for both regular files and files created with fmemopen?
EDIT:
my call to fmemopen:
fid = fmemopen(ptr, memSize, "r");
where memSize != 8192
EDIT2:
I created a minimal example:
#include <cstdlib>
#include <stdio.h>
#include <string.h>
using namespace std;
int main(int argc, char** argv)
{
const long unsigned int memsize = 1000000;
void * ptr = malloc(memsize);
FILE *fid = fmemopen(ptr, memsize, "r");
fseek(fid, 0, SEEK_END);
long int file_size = ftell(fid);
printf("file_size = %ld\n", file_size);
free(ptr);
return 0;
}
btw, I am currently working on another computer, and here I get file_size=0
In case of fmemopen , if you open using the option b then SEEK_END measures the size of the memory buffer. The value you see must be the default buffer size.
OK, I have got this mystery solved by myself. The documentation says:
If the opentype specifies append mode, then the initial file position is set to the first null character in the buffer
and later:
For a stream open for reading, null characters (zero bytes) in the buffer do not count as "end of file". Read operations indicate end of file only when the file position advances past size bytes.
It seems that fseek(fid, 0, SEEK_END) goes to the first zero byte in the buffer, and not to the end of the buffer.
Still looking for a method that will work on both standard and fmemopen files.

Using fseek and fread

I am working on a project that reads data from bin files and processes the data. The bin file is huge and is about 150MB. I am trying to use fseek to skip unwanted processing of data.
I am wondering if the processing time of fseek is the same as fread.
Thanks!
fseek just repositions the internal file pointer whereas fread actually reads data. So I guess fseek should be much faster than fread
If you are really curious to see what's happening behind the screen, download glibc from here and check for yourself :)
I am wondering if the processing time of fseek is the same as fread.
Probably not though, of course, it's implementation-dependent.
Most likely, fseek will only set an in-memory "file pointer" without going out to the disk to read any information. fread, on the other hand, will read information.
An fseek to file position 149M followed by a 1M fread will probably be faster than 150 different 1M fread calls, throwing away all but the last.
I probably feel fseek might be bit faster than fread as fseek changes the pointer position to the new address space that you have mentioned and there is no date read is happening.
If you are processing huge files have you considered alternatives to read/write?
You may find that mmap() (UNIX) or MapViewOfFile (Windows) is a more suitable alternative.
The following UNIX example demonstrates opening a file for reading and counting the occurance of the ASCII character 'Q'. NOTE - all error checking has been omitted to make the example shorter.
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
int main(int argc, char **argv)
{
int i, fd, len, total;
char *map, *ptr;
fd = open("/tmp/mybigfile", O_RDONLY);
len = lseek(fd, SEEK_END, 0);
map = (char *)mmap(0, len, PROT_READ, MAP_SHARED, fd, 0);
total = 0;
for (i=0; i<len; i++) {
if (map[i] == 'Q') total++;
}
printf("Found %d instances of 'Q'\n");
munmap(map, len);
close(fd);
}

A simple sendfile program, but not works

#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <sys/sendfile.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(){
int fd1,fd2,rc;
off_t offset = 0;
struct stat stat_buf;
fd1=open("./hello.txt",O_RDONLY); //read only
fd2=open("../",O_RDWR); //both read and write
fstat(fd1, &stat_buf); //get the size of hello.txt
printf("file size: %d\n",(int)stat_buf.st_size);
rc=sendfile (fd2, fd1, &offset, stat_buf.st_size);
}
So as you have seen, it's quite a simple program. But I just can't find hello.txt in ../
My aim is to see what happens if I put a whatever number, says 10, instead of st_size which may be hundreds of bytes.
Edit:
Thanks for your answers. Well, I followed your advice and changed
fd2=open("../",O_RDWR);
to
fd2=open("../hello.txt",O_RDWR);
Also, I checked the return value of fstat and sendfile, everything is ok.
But the problem is still the same.
You need to specify the filename in the second open, not just the directory name.
Please be sure to check the return values of all these functions, including fstat.
Have you tried fd2 = open("../hello.txt",O_RDWR);?
1>
fd1=open("./hello.txt",O_RDONLY); //read only
fd2=open("../",O_RDWR); //both read and write
replace with
fd1=open("../hello.txt",O_RDONLY); //read only
fd2=open("../your_file_name",O_RDWR);
2>
fstat(fd1, &stat_buf);
will fill up some info related to fd1 file in stat_buf . Here size of that file is also return in that structure with st_size element.
now in
rc=sendfile (fd2, fd1, &offset, stat_buf.st_size);
total stat_buf.st_size bytes are going to send on fd2 file. if here if you write 10 then only 10 bytes will go in fd2.

Resources