I am working on a project that reads data from bin files and processes the data. The bin file is huge and is about 150MB. I am trying to use fseek to skip unwanted processing of data.
I am wondering if the processing time of fseek is the same as fread.
Thanks!
fseek just repositions the internal file pointer whereas fread actually reads data. So I guess fseek should be much faster than fread
If you are really curious to see what's happening behind the screen, download glibc from here and check for yourself :)
I am wondering if the processing time of fseek is the same as fread.
Probably not though, of course, it's implementation-dependent.
Most likely, fseek will only set an in-memory "file pointer" without going out to the disk to read any information. fread, on the other hand, will read information.
An fseek to file position 149M followed by a 1M fread will probably be faster than 150 different 1M fread calls, throwing away all but the last.
I probably feel fseek might be bit faster than fread as fseek changes the pointer position to the new address space that you have mentioned and there is no date read is happening.
If you are processing huge files have you considered alternatives to read/write?
You may find that mmap() (UNIX) or MapViewOfFile (Windows) is a more suitable alternative.
The following UNIX example demonstrates opening a file for reading and counting the occurance of the ASCII character 'Q'. NOTE - all error checking has been omitted to make the example shorter.
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
int main(int argc, char **argv)
{
int i, fd, len, total;
char *map, *ptr;
fd = open("/tmp/mybigfile", O_RDONLY);
len = lseek(fd, SEEK_END, 0);
map = (char *)mmap(0, len, PROT_READ, MAP_SHARED, fd, 0);
total = 0;
for (i=0; i<len; i++) {
if (map[i] == 'Q') total++;
}
printf("Found %d instances of 'Q'\n");
munmap(map, len);
close(fd);
}
Related
I have tried several solution answers from the site but unable to understand what is going wrong with this code.
I am simply trying to read the file data.txt and print it. The file contains just 12 characters "abcd1234efgh".
fd comes out positive but "br" is 0 on executing the read. Please help out if anyone has some clue on this
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
int main(int args,char* vargs[])
{
int fd = 0;
fd = open("data.txt",O_RDONLY);
if(fd<=0)
printf("Invalid file name");
else{
off_t fs =lseek(fd, (off_t) 0, SEEK_END);
char buf[10];
off_t br = read(fd,buf,10);
printf("%s",buf);
}
return 0;
}
lseek(fd, 0, SEEK_END);
After this file pointer is set at the end of the file, so any further reading is unsuccessful. Just comment this instruction out, or change it to suit your needs.
This:
lseek(fd, (off_t) 0, SEEK_END);
seeks to (an offset of 0 from) the end of the file. When you subsequently try to read, there are no bytes available past that point. You should not need to seek at all if you want to read from the beginning of the file.
Hi I have a doubt regarding following question: In the OS textbook "Operating Systems in Depth by Thomas W Doeppner", one of the chapter exercise questions asks us to find fault with the given code for reading file contents backwards and also asks for a better way to do it. Now I have come across many ways to do that but cant really find out why the following is not considered a good way of doing it?
Appreciate your time and help ,thank you!
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
int main() {
int fd;
off_t fptr;
fd = open("./file.txt", O_RDONLY);
char buf[3];
/* go to last char in file */
fptr = lseek(fd, (off_t)-1, SEEK_END);
while (fptr != -1) {
read(fd, buf, 1);
write(1, buf, 1);
fptr = lseek(fd, (off_t)-2, SEEK_CUR);
}
return 0;
}
The method illustrated in your code is inefficient because you make 3 system calls for each byte in the file. Furthermore, you do not check the return values of the read() and write() function calls, nor that the file was opened successfully.
To improve efficiency, you should bufferize the input/output operations.
Using putchar() instead of write() would be both more efficient and more reliable.
Reading a chunk of file contents (from a few kilobytes to several megabytes) at a time would be more efficient too.
As always, benchmark the resulting code to measure actual performance improvements.
I am reading a binary file that I want to offload directly to the Xeon Phi through Cilk and shared memory.
As we are reading fairly much data at once each time and binary data the preferred option is to use fread.
So if I make a very simple example it would go like this
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
_Cilk_shared uint8_t* _Cilk_shared buf;
int main(int argc, char **argv) {
printf("Argv is %s\n", argv[1]);
FILE* infile = fopen(argv[1], "rb");
buf = (_Cilk_shared uint8_t*) _Offload_shared_malloc(2073600);
int len = fread(buf, 1, 2073600, infile);
if(ferror(infile)) {
perror("ferror");
}
printf("Len is %d and first value of buf is %d\n", len, *buf);
return 0;
}
The example is very simplified from the real code but enough to examplify the behavior.
This code would then return
ferror: Bad address
Len is 0 and first value of buf is 0
However if we switch out the fread for a fgets (not very suitable for reading binary data, specially with the return value) things work great.
That is we switch fgets((char *) buf, 2073600, infile); and then drop the len from the print out we get
first value of buf is 46
Which fits with what we need and I can run _Offload_cilk on a function with buf as an argument and do work on it.
Is there something I am missing or is fread just not supported? I've tried to find as much info on this from both intel and other sites on the internet but I have sadly been unable to.
----EDIT----
After more research into this it seems that running fread on the shared memory with a value higher than 524287 (524287 is 19 bits exactly) fread gets the error from above. At 524287 or lower things work, and you can run as many fread as you want and read all the data.
I am utterly unable to find any reason written anywhere for this.
I don't have a PHI, so unable to see if this would make a difference -- but fread has it's own buffering, and while that may be turned of for this type of readind, then I don't see why you would go through the overhead of using fread rather than just using the lower level calls of open&read, like
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdint.h>
_Cilk_shared uint8_t* _Cilk_shared buf;
int main(int argc, char **argv) {
printf("Argv is %s\n", argv[1]);
int infile = open(argv[1], O_RDONLY); // should test if open ok, but skip to make code similar to OP's
int len, pos =0, size = 2073600;
buf = (_Cilk_shared uint8_t*) _Offload_shared_malloc(size);
do {
buf[pos]=0; // force the address to be mapped to process memory before read
len = read(infile, &buf[pos], size);
if(len < 0) {
perror("error");
break;
}
pos += len; // move position forward in cases where we have no read the entire data in first read.
size -= len;
} while (size > 0);
printf("Len is %d (%d) and first value of buf is %d\n", len, pos, *buf);
return 0;
}
read & write should work with shared memory allocated without the problem you are seeing.
Can you try to insert something like this before the fread calls?
memset(buf, 0, 2073600); // after including string.h
This trick worked for me, but I don't know why (lazy allocation?).
FYI, you can also post a MIC question on this forum.
This question already has answers here:
Reading a text file backwards in C
(5 answers)
Closed 9 years ago.
I am supposed to create a program that takes a given file and creates a file with reversed txt. I wanted to know is there a way i can start the read() from the end of the file and copy it to the first byte in the created file if I dont know the exact size of the file?
Also i have googled this and came across many examples with fread, fopen, etc. However i cant use those for this project i can only use read, open, lseek, write, and close.
here is my code so far its not much but just for reference:
#include<stdio.h>
#include<unistd.h>
int main (int argc, char *argv[])
{
if(argc != 2)/*argc should be 2 for correct execution*/
{
printf("usage: %s filename",argv[0[]);}
}
else
{
int file1 = open(argv[1], O_RDWR);
if(file1 == -1){
printf("\nfailed to open file.");
return 1;
}
int reversefile = open(argv[2], O_RDWR | O_CREAT);
int size = lseek(argv[1], 0, SEEK_END);
char *file2[size+1];
int count=size;
int i = 0
while(read(file1, file2[count], 0) != 0)
{
file2[i]=*read(file1, file2[count], 0);
write(reversefile, file2[i], size+1);
count--;
i++;
lseek(argv[2], i, SEEK_SET);
}
I doubt that most filesystems are designed to support this operation effectively. Chances are, you'd have to read the whole file to get to the end. For the same reasons, most languages probably don't include any special feature for reading a file backwards.
Just come up with something. Try to read the whole file in memory. If it is too big, dump the beginning, reversed, into a temporary file and keep reading... In the end combine all temporary files into one. Also, you could probably do something smart with manual low-level manipulation of disk sectors, or at least with low-level programming directly against the file system. Looks like this is not what you are after, though.
Why don't you try fseek to navigate inside the file? This function is contained in stdio.h, just like fopen and fclose.
Another idea would be to implement a simple stack...
This has no error checking == really bad
get file size using stat
create a buffer with malloc
fread the file into the buffer
set a pointer to the end of the file
print each character going backwards thru the buffer.
If you get creative with google you can get several examples just like this.
IMO the assistance you are getting so far is not really even good hints.
This appears to be schoolwork, so beware of copying. Do some reading about the calls used here. stat (fstat) fread (read)
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
int main(int argc, char **argv)
{
struct stat st;
char *buf;
char *p;
FILE *in=fopen(argv[1],"r");
fstat(fileno(in), &st); // get file size in bytes
buf=malloc(st.st_size +2); // buffer for file
memset(buf, 0x0, st.st_size +2 );
fread(buf, st.st_size, 1, in); // fill the buffer
p=buf;
for(p+=st.st_size;p>=buf; p--) // print traversing backwards
printf("%c", *p);
fclose(in);
return 0;
}
I've written code that should ideally take in data from one document, encrypt it and save it in another document.
But when I try executing the code it does not put the encrypted data in the new file. It just leaves it blank. Someone please spot what's missing in the code. I tried but I couldn't figure it out.
I think there is something wrong with the read/write function, or maybe I'm implementing the do-while loop incorrectly.
#include <stdio.h>
#include <stdlib.h>
#include <termios.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
int main (int argc, char* argv[])
{
int fdin,fdout,n,i,fd;
char* buf;
struct stat fs;
if(argc<3)
printf("USAGE: %s source-file target-file.\n",argv[0]);
fdin=open(argv[1], O_RDONLY);
if(fdin==-1)
printf("ERROR: Cannot open %s.\n",argv[1]);
fdout=open(argv[2], O_WRONLY | O_CREAT | O_EXCL, 0644);
if(fdout==-1)
printf("ERROR: %s already exists.\n",argv[2]);
fstat(fd, &fs);
n= fs.st_size;
buf=malloc(n);
do
{
n=read(fd, buf, 10);
for(i=0;i<n;i++)
buf[i] ^= '#';
write(fd, buf, n);
} while(n==10);
close(fdin);
close(fdout);
}
You are using fd instead of fdin in fstat, read and write system calls. fd is an uninitialized variable.
// Here...
fstat(fd, &fs);
// And here...
n=read(fd, buf, 10);
for(i=0;i<n;i++)
buf[i] ^= '#';
write(fd, buf, n);
You're reading and writing to fd instead of fdin and fdout. Make sure you enable all warnings your compiler will emit (e.g. use gcc -Wall -Wextra -pedantic). It will warn you about the use of an uninitialized variable if you let it.
Also, if you checked the return codes of fstat(), read(), or write(), you'd likely have gotten errors from using an invalid file descriptor. They are most likely erroring out with EINVAL (invalid argument) errors.
fstat(fd, &fs);
n= fs.st_size;
buf=malloc(n);
And since we're here: allocating enough memory to hold the entire file is unnecessary. You're only reading 10 bytes at a time in your loop, so you really only need a 10-byte buffer. You could skip the fstat() entirely.
// Just allocate 10 bytes.
buf = malloc(10);
// Or heck, skip the malloc() too! Change "char *buf" to:
char buf[10];
All said it true, one more tip.
You should use a larger buffer that fits the system hard disk blocks, usually 8192.
This will increase your program speed significantly as you will have less access to the disk by a factor of 800. As you know, accessing to disk is very expensive in terms of time.
Another option is use stdio functions fread, fwrite, etc, which already takes care of buffering, still you'll have the function call overhead.
Roni