C copy file contents from EOF to SOF - c

My program is working almost as it should. The intended purpose is to read the file from the end and copy the contents to destination file. However what confuses me is the lseek() method more so how I should be setting the offset.
My src contents at the moment are:
Line 1
Line 2
Line 3
At the moment what I get in my destination file is:
Line 3
e 2
e 2...
From what I understand calling int loc = lseek(src, -10, SEEK_END); will move the "cursor" in source file to then end then offset it from EOF to SOF for 10 bytes and the value of loc will be the size of file after I have deducted the offset. However after 7h of C I'm almost brain dead here.
int main(int argc, char* argv[])
{
// Open source & source file
int src = open(argv[1], O_RDONLY, 0777);
int dst = open(argv[2], O_CREAT|O_WRONLY, 0777);
// Check if either reported an erro
if(src == -1 || dst == -1)
{
perror("There was a problem with one of the files.");
}
// Set buffer & block size
char buffer[1];
int block;
// Set offset from EOF
int offset = -1;
// Set file pointer location to the end of file
int loc = lseek(src, offset, SEEK_END);
// Read from source from EOF to SOF
while( loc > 0 )
{
// Read bytes
block = read(src, buffer, 1);
// Write to output file
write(dst, buffer, block);
// Move the pointer again
loc = lseek(src, loc-1, SEEK_SET);
}
}

lseek() doesn't change or return the file size. What it returns is the position where the 'cursor' is set to. So when you call
loc = lseek(src, offset, SEEK_END);
twice it will always set the cursor to the same position again. I guess you want to do something like this:
while( loc > 0 )
{
// Read bytes
block = read(src, buffer, 5);
// Write to output file
write(dst, buffer, block);
// Move the pointer again five bytes before the last offset
loc = lseek(src, loc+offset, SEEK_SET);
}
If the line length is variable, you could do something like the following instead:
// define an offset that exceeds the maximum line length
int offset = 256;
char buffer[256];
// determine the file size
off_t size = lseek( src, 0, SEEK_END );
off_t pos = size;
// read block of offset bytes from the end
while( pos > 0 ) {
pos -= offset;
if( pos < 0 ) {
//pos must not be negative ...
offset += pos; // in fact decrements offset!!
pos = 0;
}
lseek( src, pos, SEEK_SET );
// add error checking here!!
read(src, buffer, offset );
// we expect the last byte read to be a newline but we are interested in the one BEFORE that
char *p = memchr( buffer, '\n', offset-1 );
p++; // the beginning of the last line
int len = offset - (p-buffer); // and its length
write( dst, p, len );
pos -= len; // repeat with offset bytes before the last line
}

From some of your comments it looks like you want to reverse the order of the lines in a text file. Unfortunately you're not going to get that with such a simple program. There are several approaches you can take, depending on how complicated you want to get, how big the files are, how much memory is on hand, how fast you want it to be, etc.
Here are some different ideas off the top of my head:
Read your whole source file at once into a single memory block. Scan through the memory block forwards looking for line breaks and recording the pointer and length for each line. Save these records onto a stack (you could use a dynamic array, or an STL vector in C++,) and then to write your output file, you just pop a line's record off the stack (moving backwards through the array) and write it until the stack is empty (you've reached the beginning of the array.)
Start at the end of your input file, but for each line, seek backwards character-by-character until you find the newline that starts the previous line. Seek forwards again past that newline and then read in the line. (You should now know its length.) Or, you could just build up the reversed characters in a buffer and then write them out backwards.
Pull in whole blocks (sectors perhaps) of the file at once, from end to beginning. Within each block, locate the newlines in a similar fashion to the method above except now you already have the characters in memory and so don't need to do any reversing or pulling them in redundantly. However, this solution will be much more complicated because lines can span across block boundaries.
There may be more elaborate/clever tricks, but those are the more obvious, straightforward approaches.

I think you should be using SEEK_CUR instead of SEEK_END in your final call to lseek():
// Set file pointer location to the end of file
int loc = lseek(src, offset, SEEK_END);
// Read from source from EOF to SOF
while( loc > 0 )
{
// Read bytes
block = read(src, buffer, 5);
// Write to output file
write(dst, buffer, block);
// Move the pointer again
lseek(src, -10, SEEK_CUR);
}
You could also do:
// Set file pointer location to the end of file
int loc = lseek(src, offset, SEEK_END);
// Read from source from EOF to SOF
while( loc > 0 )
{
// Read bytes
block = read(src, buffer, 5);
// Write to output file
write(dst, buffer, block);
// Move the pointer again
loc -= 5;
lseek(src, loc, SEEK_SET);
}

Related

How to read a file lines between two bytes position

I am currently reading a file lines from an offset position(the quarter of the file) till the end :
struct stat st;
stat("file.txt", &st);
int fileSize = st.st_size
int minOffset = fileSize/4;
FILE* file_ptr = fopen("file.txt", "r");
fseek(file_ptr, minOffset, SEEK_SET);
int lineLength = 1000;
char * line;
line = malloc(lineLength);
while (read = getline(&line, &lineLength, file_ptr) != -1) {
printf("%s", line);
}
But what I need is to read all lines between two bytes position in the file. As Olaf stated in the comments, I also have the issue that my offset is not necesseraly at line boundary.
For exemple, this could be the maxOffset that I would like to read :
int maxOffset = fileSize / 2;
I want to read from the line where the minOffset position is to the line before the maxOffset position.
The file consists of words(one by line) that always have a length that is smaller then 1000 :
AA
AAS
ABACA
ABACAS
ABACOST
ABACOSTS
ABACULE
ABACULES
ABAISSA
ABAISSABLE
ABAISSABLES
ABAISSAI
ABAISSAIENT
ABAISSAIS
ABAISSAIT
ABAISSAMES
ABAISSANT
ABAISSANTE
ABAISSANTES
ABAISSANTS
ABAISSAS
ABAISSASSE
ABAISSASSENT
ABAISSASSES
ABAISSASSIEZ
ABAISSASSIONS
ABAISSAT
ABAISSATES
....
How can I read a file lines beetween two bytes position ?
First you need to find the byte position of the start of a line at the 1/4, 1/2, and 3/4 points. To do that:
fseek to the approximate position (e.g fseek(filesize/4))
call fgets to read up to the next newline
call ftell to determine the offset
The offset returned is the end of one quarter and the beginning of the next.
To read one quarter of the file:
fseek to the beginning of the quarter
call fgets to read a line
call ftell to see if you've reached the end of the quarter
You want function fread:
int byteStart = 100;
int byteEnd = 200;
line = malloc(byteEnd-byteStart); // Allocate enough space for your data.
fseek(file_ptr, byteStart, SEEK_SET); // Go to your starting point
fread(line, 1, byteEnd-byteStart, file_ptr); // Read until your ending point.

How to find the end of a binary file?

I have opened a binary file as below
FILE *p;
p=fopen("filename.format","rb");
How can I find the end of the file?
The fread function fread returns the number of bytes actually read. So if the number of bytes read is lower that the number of bytes to be read, you are likely at the end of file.
Furthermore the feof function will also tell you if you are at the end of the file.
To find out the size of the file without actually reading it:
long Size;
FILE *p;
p = fopen("filename.format","rb");
fseek (p, 0 , SEEK_END);
Size = ftell (p) ;
rewind (p);
In C++ I usually jump to the end of the file using ifstream::seekg and giving it the ios::end argument for position. The ANSI-C equivalent of seekg is
int fseek ( FILE * stream, long int offset, int origin );
Where origin can be SEEK_SET, SEEK_CUR, SEEK_END.
Trying this would jump to the end of the file:
fseek(p, 0, SEEK_END);
Then, once at the end of the file, just use
long int ftell ( FILE * stream );
to tell your program where the file ends.
For an example, the following code would set the size variable to the physical size - in bytes - of the file, and then jump back to the beginning of the file:
FILE *p = fopen("filename.format", "rb"); // open binary file
fseek(p, 0, SEEK_END); // jump to end of file
long int size = ftell(p); // get size of file
fseek(p, 0, SEEK_SET); // jump back to beginning of file

Copying a file in C with fwrite

I am new to C and was trying to write a program just to copy a file so that I could learn the basics of files. My code takes a file as input, figures out its length by subtracting its start from its end using fseek and ftell. Then, it uses fwrite to write, based on what I could get from its man page, ONE element of data, (END - START) elements long, to the stream pointed to by OUT, obtaining them from the location given by FI. The problem is, although it does produce "copy output," the file is not the same as the original. What am I doing wrong? I tried reading the input file into a variable and then writing from there, but that didn't help either. What am I doing wrong?
Thanks
int main(int argc, char* argv[])
{
FILE* fi = fopen(argv[1], "r"); //create the input file for reading
if (fi == NULL)
return 1; // check file exists
int start = ftell(fi); // get file start address
fseek(fi, 0, SEEK_END); // go to end of file
int end = ftell(fi); // get file end address
rewind(fi); // go back to file beginning
FILE* out = fopen("copy output", "w"); // create the output file for writing
fwrite(fi,end-start,1,out); // write the input file to the output file
}
Should this work?
{
FILE* out = fopen("copy output", "w");
int* buf = malloc(end-start); fread(buf,end-start,1,fi);
fwrite(buf,end-start,1,out);
}
This isn't how fwrite works.
To copy a file, you'd typically allocate a buffer, then use fread to read one buffer of data, followed by fwrite to write that data back out. Repeat until you've copied the entire file. Typical code is something on this general order:
#define SIZE (1024*1024)
char buffer[SIZE];
size_t bytes;
while (0 < (bytes = fread(buffer, 1, sizeof(buffer), infile)))
fwrite(buffer, 1, bytes, outfile);
The first parameter of fwrite is a pointer to the data to be written to the file not a FILE* to read from. You have to read the data from the first file into a buffer then write that buffer to the output file. http://www.cplusplus.com/reference/cstdio/fwrite/
Perhaps a look through an open-source copy tool in C would point you in the right direction.
Here is How It can be done:
Option 1: Dynamic "Array"
Nested Level: 0
// Variable Definition
char *cpArr;
FILE *fpSourceFile = fopen(<Your_Source_Path>, "rb");
FILE *fpTargetFile = fopen(<Your_Target_Path>, "wb");
// Code Section
// Get The Size Of bits Of The Source File
fseek(fpSourceFile, 0, SEEK_END); // Go To The End Of The File
cpArr = (char *)malloc(sizeof(*cpArr) * ftell(fpSourceFile)); // Create An Array At That Size
fseek(fpSourceFile, 0, SEEK_SET); // Return The Cursor To The Start
// Read From The Source File - "Copy"
fread(&cpArr, sizeof(cpArr), 1, fpSourceFile);
// Write To The Target File - "Paste"
fwrite(&cpArr, sizeof(cpArr), 1, fpTargetFile);
// Close The Files
fclose(fpSourceFile);
fclose(fpTargetFile);
// Free The Used Memory
free(cpArr);
Option 2: Char By Char
Nested Level: 1
// Variable Definition
char cTemp;
FILE *fpSourceFile = fopen(<Your_Source_Path>, "rb");
FILE *fpTargetFile = fopen(<Your_Target_Path>, "wb");
// Code Section
// Read From The Source File - "Copy"
while(fread(&cTemp, 1, 1, fpSourceFile) == 1)
{
// Write To The Target File - "Paste"
fwrite(&cTemp, 1, 1, fpTargetFile);
}
// Close The Files
fclose(fpSourceFile);
fclose(fpTargetFile);

C: read only last line of a file. No loops

Using C, is there a way to read only the last line of a file without looping it's entire content?
Thing is that file contains millions of lines, each of them holding an integer (long long int). The file itself can be quite large, I presume even up to 1000mb. I know for sure that the last line won't be longer than 55 digits, but could be 2 only digits as well. It's out of options to use any kind of database... I've considered it already.
Maybe its a silly question, but coming from PHP background I find it hard to answer. I looked everywhere but found nothing clean.
Currently I'm using:
if ((fd = fopen(filename, "r")) != NULL) // open file
{
fseek(fd, 0, SEEK_SET); // make sure start from 0
while(!feof(fd))
{
memset(buff, 0x00, buff_len); // clean buffer
fscanf(fd, "%[^\n]\n", buff); // read file *prefer using fscanf
}
printf("Last Line :: %d\n", atoi(buff)); // for testing I'm using small integers
}
This way I'm looping file's content and as soon as file gets bigger than ~500k lines things slow down pretty bad....
Thank you in advance.
maxim
Just fseek to fileSize - 55 and read forward?
If there is a maximum line length, seek to that distance before the end.
Read up to the end, and find the last end-of-line in your buffer.
If there is no maximum line length, guess a reasonable value, read that much at the end, and if there is no end-of-line, double your guess and try again.
In your case:
/* max length including newline */
static const long max_len = 55 + 1;
/* space for all of that plus a nul terminator */
char buf[max_len + 1];
/* now read that many bytes from the end of the file */
fseek(fd, -max_len, SEEK_END);
ssize_t len = read(fd, buf, max_len);
/* don't forget the nul terminator */
buf[len] = '\0';
/* and find the last newline character (there must be one, right?) */
char *last_newline = strrchr(buf, '\n');
char *last_line = last_newline+1;
Open with "rb" to make sure you're reading binary. Then fseek(..., SEEK_END) and start reading bytes from the back until you find the first line separator (if you know the maximum line length is 55 characters, read 55 characters ...).
ok. It all worked for me. I learned something new. The last line of a file 41mb large and with >500k lines was read instantly. Thanks to you all guys, especially 'Useless' (love the controversy of your nickname, btw). I will post here the code in the hope that someone else in the future can benefit from it:
Reading ONLY the last line of the file:
the file is structured the way that there is a new line appended and I am sure that any line is shorter than, in my case, 55 characters:
file contents:
------------------------
2943728727
3129123555
3743778
412912777
43127787727
472977827
------------------------
notice the new line appended.
FILE *fd; // File pointer
char filename[] = "file.dat"; // file to read
static const long max_len = 55+ 1; // define the max length of the line to read
char buff[max_len + 1]; // define the buffer and allocate the length
if ((fd = fopen(filename, "rb")) != NULL) { // open file. I omit error checks
fseek(fd, -max_len, SEEK_END); // set pointer to the end of file minus the length you need. Presumably there can be more than one new line caracter
fread(buff, max_len-1, 1, fd); // read the contents of the file starting from where fseek() positioned us
fclose(fd); // close the file
buff[max_len-1] = '\0'; // close the string
char *last_newline = strrchr(buff, '\n'); // find last occurrence of newlinw
char *last_line = last_newline+1; // jump to it
printf("captured: [%s]\n", last_line); // captured: [472977827]
}
cheers!
maxim

read from file as char array

I am reaing from a file, and when i read, it takes it line by line, and prints it
what i want exactly is i want an array of char holding all the chars in the file and print it once,
this is the code i have
if(strcmp(str[0],"#")==0)
{
FILE *filecomand;
//char fname[40];
char line[100];
int lcount;
///* Read in the filename */
//printf("Enter the name of a ascii file: ");
//fgets(History.txt, sizeof(fname), stdin);
/* Open the file. If NULL is returned there was an error */
if((filecomand = fopen(str[1], "r")) == NULL)
{
printf("Error Opening File.\n");
//exit(1);
}
lcount=0;
int i=0;
while( fgets(line, sizeof(line), filecomand) != NULL ) {
/* Get each line from the infile */
//lcount++;
/* print the line number and data */
//printf("%s", line);
}
fclose(filecomand); /* Close the file */
You need to determine the size of the file. Once you have that, you can allocate an array large enough and read it in a single go.
There are two ways to determine the size of the file.
Using fstat:
struct stat stbuffer;
if (fstat(fileno(filecommand), &stbuffer) != -1)
{
// file size is in stbuffer.st_size;
}
With fseek and ftell:
if (fseek(fp, 0, SEEK_END) == 0)
{
long size = ftell(fp)
if (size != -1)
{
// succesfully got size
}
// Go back to start of file
fseek(fp, 0, SEEK_SET);
}
Another solution would be to map the entire file to the memory and then treat it as a char array.
Under windows MapViewOfFile, and under unix mmap.
Once you mapped the file (plenty of examples), you get a pointer to the file's beginning in the memory. Cast it to char[].
Since you can't assume how big the file is, you need to determine the size and then dynamically allocate a buffer.
I won't post the code, but here's the general scheme. Use fseek() to navigate to the end of file, ftell() to get size of the file, and fseek() again to move the start of the file. Allocate a char buffer with malloc() using the size you found. The use fread() to read the file into the buffer. When you are done with the buffer, free() it.
Use a different open. i.e.
fd = open(str[1], O_RDONLY|O_BINARY) /* O_BINARY for MS */
The read statement would be for a buffer of bytes.
count = read(fd,buf, bytecount)
This will do a binary open and read on the file.

Resources