I am currently reading a file lines from an offset position(the quarter of the file) till the end :
struct stat st;
stat("file.txt", &st);
int fileSize = st.st_size
int minOffset = fileSize/4;
FILE* file_ptr = fopen("file.txt", "r");
fseek(file_ptr, minOffset, SEEK_SET);
int lineLength = 1000;
char * line;
line = malloc(lineLength);
while (read = getline(&line, &lineLength, file_ptr) != -1) {
printf("%s", line);
}
But what I need is to read all lines between two bytes position in the file. As Olaf stated in the comments, I also have the issue that my offset is not necesseraly at line boundary.
For exemple, this could be the maxOffset that I would like to read :
int maxOffset = fileSize / 2;
I want to read from the line where the minOffset position is to the line before the maxOffset position.
The file consists of words(one by line) that always have a length that is smaller then 1000 :
AA
AAS
ABACA
ABACAS
ABACOST
ABACOSTS
ABACULE
ABACULES
ABAISSA
ABAISSABLE
ABAISSABLES
ABAISSAI
ABAISSAIENT
ABAISSAIS
ABAISSAIT
ABAISSAMES
ABAISSANT
ABAISSANTE
ABAISSANTES
ABAISSANTS
ABAISSAS
ABAISSASSE
ABAISSASSENT
ABAISSASSES
ABAISSASSIEZ
ABAISSASSIONS
ABAISSAT
ABAISSATES
....
How can I read a file lines beetween two bytes position ?
First you need to find the byte position of the start of a line at the 1/4, 1/2, and 3/4 points. To do that:
fseek to the approximate position (e.g fseek(filesize/4))
call fgets to read up to the next newline
call ftell to determine the offset
The offset returned is the end of one quarter and the beginning of the next.
To read one quarter of the file:
fseek to the beginning of the quarter
call fgets to read a line
call ftell to see if you've reached the end of the quarter
You want function fread:
int byteStart = 100;
int byteEnd = 200;
line = malloc(byteEnd-byteStart); // Allocate enough space for your data.
fseek(file_ptr, byteStart, SEEK_SET); // Go to your starting point
fread(line, 1, byteEnd-byteStart, file_ptr); // Read until your ending point.
Related
I'm making a simple sockets program to send a text file or a picture file over to another socket connected to a port. However, I want to also send the size of the file over to the client socket so that it knows how many bytes to receive.
I also want to implement something where I can send a certain number of bytes instead of the file itself. For example, if a file I wanted to send was 14,003 bytes and I felt like sending 400 bytes, then only 400 bytes would be sent.
I am implementing something like this:
#include <stdio.h>
int main(int argc, char* argv[]) {
FILE *fp;
char* file = "text.txt";
int offset = 40;
int sendSize = 5;
int fileSize = 0;
if ((fp = fopen(file, "r")) == NULL) {
printf("Error: Cannot open the file!\n");
return 1;
} else {
/* Seek from offset into the file */
//fseek(fp, 0L, SEEK_END);
fseek(fp, offset, sendSize + offset); // seek to sendSize
fileSize = ftell(fp); // get current file pointer
//fseek(fp, 0, SEEK_SET); // seek back to beginning of file
}
printf("The size is: %d", fileSize);
}
offset is pretty much going to go 40 bytes into the file and then send whatever sendSize bytes over to the other program.
I keep getting an output of 0 instead of 5. Any reason behind this?
You can try this.
#include <stdio.h>
int main(int argc, char* argv[]) {
FILE *fp;
char* file = "text.txt";
int offset = 40;
int sendSize = 5;
int fileSize = 0;
if ((fp = fopen(file, "r")) == NULL) {
printf("Error: Cannot open the file!\n");
return 1;
} else {
fseek(fp, 0L, SEEK_END);
fileSize = ftell(fp);
}
printf("The size is: %d", fileSize);
}
The fseek() to the end, then ftell() method is a reasonably portable way of getting the size of a file, but not guaranteed to be correct. It won't transparently handle newline / carriage return conversions, and as a result, the standard doesn't actually guarantee that the return from ftell() is useful for any purpose other than seeking to the same position.
The only portable way is to read the file until data runs out and keep a count of bytes. Or stat() the file using the (non-ANSI) Unix standard function.
You may be opening the file in text mode as Windows can open a file in text mode even without the "t" option.
And you can't use ftell() to get the size of a file opened in text mode. Per 7.21.9.4 The ftell function of the C Standard:
For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file
position indicator for the stream to its position at the time
of the ftell call; the difference between two such return
values is not necessarily a meaningful measure of the number of
characters written or read.
Even if it does return the "size" of the file, the translation to "text" may changed the actual number of bytes read.
It's also not portable or standard-conforming to use fseek() to find the end of a binary file. Per 7.21.9.2 The
fseek
function:
A binary stream need not meaningfully support fseek calls with a
whence value of SEEK_END.
I think your Seek does not work due to the 3rd parameter:
try to seek with
(fp, offset, SEEK_SET);
as he will try to use the number sendSize+Offset as the "origin" constant, it will be compared to the 3 constant values as below (it is 0, 1 or 2) and as nothing compares it seem to return 0 all time.
http://www.cplusplus.com/reference/cstdio/fseek/
Parameters
stream, offset, origin
Position used as reference for the offset. It is specified by one of the following constants defined in exclusively to be used as arguments for this function:
Constant Reference position
SEEK_SET Beginning of file
SEEK_CUR Current position of the file pointer
SEEK_END End of file
I have opened a binary file as below
FILE *p;
p=fopen("filename.format","rb");
How can I find the end of the file?
The fread function fread returns the number of bytes actually read. So if the number of bytes read is lower that the number of bytes to be read, you are likely at the end of file.
Furthermore the feof function will also tell you if you are at the end of the file.
To find out the size of the file without actually reading it:
long Size;
FILE *p;
p = fopen("filename.format","rb");
fseek (p, 0 , SEEK_END);
Size = ftell (p) ;
rewind (p);
In C++ I usually jump to the end of the file using ifstream::seekg and giving it the ios::end argument for position. The ANSI-C equivalent of seekg is
int fseek ( FILE * stream, long int offset, int origin );
Where origin can be SEEK_SET, SEEK_CUR, SEEK_END.
Trying this would jump to the end of the file:
fseek(p, 0, SEEK_END);
Then, once at the end of the file, just use
long int ftell ( FILE * stream );
to tell your program where the file ends.
For an example, the following code would set the size variable to the physical size - in bytes - of the file, and then jump back to the beginning of the file:
FILE *p = fopen("filename.format", "rb"); // open binary file
fseek(p, 0, SEEK_END); // jump to end of file
long int size = ftell(p); // get size of file
fseek(p, 0, SEEK_SET); // jump back to beginning of file
My program is working almost as it should. The intended purpose is to read the file from the end and copy the contents to destination file. However what confuses me is the lseek() method more so how I should be setting the offset.
My src contents at the moment are:
Line 1
Line 2
Line 3
At the moment what I get in my destination file is:
Line 3
e 2
e 2...
From what I understand calling int loc = lseek(src, -10, SEEK_END); will move the "cursor" in source file to then end then offset it from EOF to SOF for 10 bytes and the value of loc will be the size of file after I have deducted the offset. However after 7h of C I'm almost brain dead here.
int main(int argc, char* argv[])
{
// Open source & source file
int src = open(argv[1], O_RDONLY, 0777);
int dst = open(argv[2], O_CREAT|O_WRONLY, 0777);
// Check if either reported an erro
if(src == -1 || dst == -1)
{
perror("There was a problem with one of the files.");
}
// Set buffer & block size
char buffer[1];
int block;
// Set offset from EOF
int offset = -1;
// Set file pointer location to the end of file
int loc = lseek(src, offset, SEEK_END);
// Read from source from EOF to SOF
while( loc > 0 )
{
// Read bytes
block = read(src, buffer, 1);
// Write to output file
write(dst, buffer, block);
// Move the pointer again
loc = lseek(src, loc-1, SEEK_SET);
}
}
lseek() doesn't change or return the file size. What it returns is the position where the 'cursor' is set to. So when you call
loc = lseek(src, offset, SEEK_END);
twice it will always set the cursor to the same position again. I guess you want to do something like this:
while( loc > 0 )
{
// Read bytes
block = read(src, buffer, 5);
// Write to output file
write(dst, buffer, block);
// Move the pointer again five bytes before the last offset
loc = lseek(src, loc+offset, SEEK_SET);
}
If the line length is variable, you could do something like the following instead:
// define an offset that exceeds the maximum line length
int offset = 256;
char buffer[256];
// determine the file size
off_t size = lseek( src, 0, SEEK_END );
off_t pos = size;
// read block of offset bytes from the end
while( pos > 0 ) {
pos -= offset;
if( pos < 0 ) {
//pos must not be negative ...
offset += pos; // in fact decrements offset!!
pos = 0;
}
lseek( src, pos, SEEK_SET );
// add error checking here!!
read(src, buffer, offset );
// we expect the last byte read to be a newline but we are interested in the one BEFORE that
char *p = memchr( buffer, '\n', offset-1 );
p++; // the beginning of the last line
int len = offset - (p-buffer); // and its length
write( dst, p, len );
pos -= len; // repeat with offset bytes before the last line
}
From some of your comments it looks like you want to reverse the order of the lines in a text file. Unfortunately you're not going to get that with such a simple program. There are several approaches you can take, depending on how complicated you want to get, how big the files are, how much memory is on hand, how fast you want it to be, etc.
Here are some different ideas off the top of my head:
Read your whole source file at once into a single memory block. Scan through the memory block forwards looking for line breaks and recording the pointer and length for each line. Save these records onto a stack (you could use a dynamic array, or an STL vector in C++,) and then to write your output file, you just pop a line's record off the stack (moving backwards through the array) and write it until the stack is empty (you've reached the beginning of the array.)
Start at the end of your input file, but for each line, seek backwards character-by-character until you find the newline that starts the previous line. Seek forwards again past that newline and then read in the line. (You should now know its length.) Or, you could just build up the reversed characters in a buffer and then write them out backwards.
Pull in whole blocks (sectors perhaps) of the file at once, from end to beginning. Within each block, locate the newlines in a similar fashion to the method above except now you already have the characters in memory and so don't need to do any reversing or pulling them in redundantly. However, this solution will be much more complicated because lines can span across block boundaries.
There may be more elaborate/clever tricks, but those are the more obvious, straightforward approaches.
I think you should be using SEEK_CUR instead of SEEK_END in your final call to lseek():
// Set file pointer location to the end of file
int loc = lseek(src, offset, SEEK_END);
// Read from source from EOF to SOF
while( loc > 0 )
{
// Read bytes
block = read(src, buffer, 5);
// Write to output file
write(dst, buffer, block);
// Move the pointer again
lseek(src, -10, SEEK_CUR);
}
You could also do:
// Set file pointer location to the end of file
int loc = lseek(src, offset, SEEK_END);
// Read from source from EOF to SOF
while( loc > 0 )
{
// Read bytes
block = read(src, buffer, 5);
// Write to output file
write(dst, buffer, block);
// Move the pointer again
loc -= 5;
lseek(src, loc, SEEK_SET);
}
I am new to C and was trying to write a program just to copy a file so that I could learn the basics of files. My code takes a file as input, figures out its length by subtracting its start from its end using fseek and ftell. Then, it uses fwrite to write, based on what I could get from its man page, ONE element of data, (END - START) elements long, to the stream pointed to by OUT, obtaining them from the location given by FI. The problem is, although it does produce "copy output," the file is not the same as the original. What am I doing wrong? I tried reading the input file into a variable and then writing from there, but that didn't help either. What am I doing wrong?
Thanks
int main(int argc, char* argv[])
{
FILE* fi = fopen(argv[1], "r"); //create the input file for reading
if (fi == NULL)
return 1; // check file exists
int start = ftell(fi); // get file start address
fseek(fi, 0, SEEK_END); // go to end of file
int end = ftell(fi); // get file end address
rewind(fi); // go back to file beginning
FILE* out = fopen("copy output", "w"); // create the output file for writing
fwrite(fi,end-start,1,out); // write the input file to the output file
}
Should this work?
{
FILE* out = fopen("copy output", "w");
int* buf = malloc(end-start); fread(buf,end-start,1,fi);
fwrite(buf,end-start,1,out);
}
This isn't how fwrite works.
To copy a file, you'd typically allocate a buffer, then use fread to read one buffer of data, followed by fwrite to write that data back out. Repeat until you've copied the entire file. Typical code is something on this general order:
#define SIZE (1024*1024)
char buffer[SIZE];
size_t bytes;
while (0 < (bytes = fread(buffer, 1, sizeof(buffer), infile)))
fwrite(buffer, 1, bytes, outfile);
The first parameter of fwrite is a pointer to the data to be written to the file not a FILE* to read from. You have to read the data from the first file into a buffer then write that buffer to the output file. http://www.cplusplus.com/reference/cstdio/fwrite/
Perhaps a look through an open-source copy tool in C would point you in the right direction.
Here is How It can be done:
Option 1: Dynamic "Array"
Nested Level: 0
// Variable Definition
char *cpArr;
FILE *fpSourceFile = fopen(<Your_Source_Path>, "rb");
FILE *fpTargetFile = fopen(<Your_Target_Path>, "wb");
// Code Section
// Get The Size Of bits Of The Source File
fseek(fpSourceFile, 0, SEEK_END); // Go To The End Of The File
cpArr = (char *)malloc(sizeof(*cpArr) * ftell(fpSourceFile)); // Create An Array At That Size
fseek(fpSourceFile, 0, SEEK_SET); // Return The Cursor To The Start
// Read From The Source File - "Copy"
fread(&cpArr, sizeof(cpArr), 1, fpSourceFile);
// Write To The Target File - "Paste"
fwrite(&cpArr, sizeof(cpArr), 1, fpTargetFile);
// Close The Files
fclose(fpSourceFile);
fclose(fpTargetFile);
// Free The Used Memory
free(cpArr);
Option 2: Char By Char
Nested Level: 1
// Variable Definition
char cTemp;
FILE *fpSourceFile = fopen(<Your_Source_Path>, "rb");
FILE *fpTargetFile = fopen(<Your_Target_Path>, "wb");
// Code Section
// Read From The Source File - "Copy"
while(fread(&cTemp, 1, 1, fpSourceFile) == 1)
{
// Write To The Target File - "Paste"
fwrite(&cTemp, 1, 1, fpTargetFile);
}
// Close The Files
fclose(fpSourceFile);
fclose(fpTargetFile);
Using C, is there a way to read only the last line of a file without looping it's entire content?
Thing is that file contains millions of lines, each of them holding an integer (long long int). The file itself can be quite large, I presume even up to 1000mb. I know for sure that the last line won't be longer than 55 digits, but could be 2 only digits as well. It's out of options to use any kind of database... I've considered it already.
Maybe its a silly question, but coming from PHP background I find it hard to answer. I looked everywhere but found nothing clean.
Currently I'm using:
if ((fd = fopen(filename, "r")) != NULL) // open file
{
fseek(fd, 0, SEEK_SET); // make sure start from 0
while(!feof(fd))
{
memset(buff, 0x00, buff_len); // clean buffer
fscanf(fd, "%[^\n]\n", buff); // read file *prefer using fscanf
}
printf("Last Line :: %d\n", atoi(buff)); // for testing I'm using small integers
}
This way I'm looping file's content and as soon as file gets bigger than ~500k lines things slow down pretty bad....
Thank you in advance.
maxim
Just fseek to fileSize - 55 and read forward?
If there is a maximum line length, seek to that distance before the end.
Read up to the end, and find the last end-of-line in your buffer.
If there is no maximum line length, guess a reasonable value, read that much at the end, and if there is no end-of-line, double your guess and try again.
In your case:
/* max length including newline */
static const long max_len = 55 + 1;
/* space for all of that plus a nul terminator */
char buf[max_len + 1];
/* now read that many bytes from the end of the file */
fseek(fd, -max_len, SEEK_END);
ssize_t len = read(fd, buf, max_len);
/* don't forget the nul terminator */
buf[len] = '\0';
/* and find the last newline character (there must be one, right?) */
char *last_newline = strrchr(buf, '\n');
char *last_line = last_newline+1;
Open with "rb" to make sure you're reading binary. Then fseek(..., SEEK_END) and start reading bytes from the back until you find the first line separator (if you know the maximum line length is 55 characters, read 55 characters ...).
ok. It all worked for me. I learned something new. The last line of a file 41mb large and with >500k lines was read instantly. Thanks to you all guys, especially 'Useless' (love the controversy of your nickname, btw). I will post here the code in the hope that someone else in the future can benefit from it:
Reading ONLY the last line of the file:
the file is structured the way that there is a new line appended and I am sure that any line is shorter than, in my case, 55 characters:
file contents:
------------------------
2943728727
3129123555
3743778
412912777
43127787727
472977827
------------------------
notice the new line appended.
FILE *fd; // File pointer
char filename[] = "file.dat"; // file to read
static const long max_len = 55+ 1; // define the max length of the line to read
char buff[max_len + 1]; // define the buffer and allocate the length
if ((fd = fopen(filename, "rb")) != NULL) { // open file. I omit error checks
fseek(fd, -max_len, SEEK_END); // set pointer to the end of file minus the length you need. Presumably there can be more than one new line caracter
fread(buff, max_len-1, 1, fd); // read the contents of the file starting from where fseek() positioned us
fclose(fd); // close the file
buff[max_len-1] = '\0'; // close the string
char *last_newline = strrchr(buff, '\n'); // find last occurrence of newlinw
char *last_line = last_newline+1; // jump to it
printf("captured: [%s]\n", last_line); // captured: [472977827]
}
cheers!
maxim