C: read only last line of a file. No loops - c

Using C, is there a way to read only the last line of a file without looping it's entire content?
Thing is that file contains millions of lines, each of them holding an integer (long long int). The file itself can be quite large, I presume even up to 1000mb. I know for sure that the last line won't be longer than 55 digits, but could be 2 only digits as well. It's out of options to use any kind of database... I've considered it already.
Maybe its a silly question, but coming from PHP background I find it hard to answer. I looked everywhere but found nothing clean.
Currently I'm using:
if ((fd = fopen(filename, "r")) != NULL) // open file
{
fseek(fd, 0, SEEK_SET); // make sure start from 0
while(!feof(fd))
{
memset(buff, 0x00, buff_len); // clean buffer
fscanf(fd, "%[^\n]\n", buff); // read file *prefer using fscanf
}
printf("Last Line :: %d\n", atoi(buff)); // for testing I'm using small integers
}
This way I'm looping file's content and as soon as file gets bigger than ~500k lines things slow down pretty bad....
Thank you in advance.
maxim

Just fseek to fileSize - 55 and read forward?

If there is a maximum line length, seek to that distance before the end.
Read up to the end, and find the last end-of-line in your buffer.
If there is no maximum line length, guess a reasonable value, read that much at the end, and if there is no end-of-line, double your guess and try again.
In your case:
/* max length including newline */
static const long max_len = 55 + 1;
/* space for all of that plus a nul terminator */
char buf[max_len + 1];
/* now read that many bytes from the end of the file */
fseek(fd, -max_len, SEEK_END);
ssize_t len = read(fd, buf, max_len);
/* don't forget the nul terminator */
buf[len] = '\0';
/* and find the last newline character (there must be one, right?) */
char *last_newline = strrchr(buf, '\n');
char *last_line = last_newline+1;

Open with "rb" to make sure you're reading binary. Then fseek(..., SEEK_END) and start reading bytes from the back until you find the first line separator (if you know the maximum line length is 55 characters, read 55 characters ...).

ok. It all worked for me. I learned something new. The last line of a file 41mb large and with >500k lines was read instantly. Thanks to you all guys, especially 'Useless' (love the controversy of your nickname, btw). I will post here the code in the hope that someone else in the future can benefit from it:
Reading ONLY the last line of the file:
the file is structured the way that there is a new line appended and I am sure that any line is shorter than, in my case, 55 characters:
file contents:
------------------------
2943728727
3129123555
3743778
412912777
43127787727
472977827
------------------------
notice the new line appended.
FILE *fd; // File pointer
char filename[] = "file.dat"; // file to read
static const long max_len = 55+ 1; // define the max length of the line to read
char buff[max_len + 1]; // define the buffer and allocate the length
if ((fd = fopen(filename, "rb")) != NULL) { // open file. I omit error checks
fseek(fd, -max_len, SEEK_END); // set pointer to the end of file minus the length you need. Presumably there can be more than one new line caracter
fread(buff, max_len-1, 1, fd); // read the contents of the file starting from where fseek() positioned us
fclose(fd); // close the file
buff[max_len-1] = '\0'; // close the string
char *last_newline = strrchr(buff, '\n'); // find last occurrence of newlinw
char *last_line = last_newline+1; // jump to it
printf("captured: [%s]\n", last_line); // captured: [472977827]
}
cheers!
maxim

Related

Check multiple files with "strstr" and "fopen" in C

Today I decided to learn to code for the first time in my life. I decided to learn C. I have created a small program that checks a txt file for a specific value. If it finds that value then it will tell you that that specific value has been found.
What I would like to do is that I can put multiple files go through this program. I want this program to be able to scan all files in a folder for a specific string and display what files contain that string (basically a file index)
I just started today and I'm 15 years old so I don't know if my assumptions are correct on how this can be done and I'm sorry if it may sound stupid but I have been thinking of maybe creating a thread for every directory I put into this program and each thread individually runs that code on the single file and then it displays all the directories in which the string can be found.
I have been looking into threading but I don't quite understand it. Here's the working code for one file at a time. Does anyone know how to make this work as I want it?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
//searches for this string in a txt file
char searchforthis[200];
//file name to display at output
char ch, file_name[200];
FILE *fp;
//Asks for full directory of txt file (example: C:\users\...) and reads that file.
//fp is content of file
printf("Enter name of a file you wish to check:\n");
gets(file_name);
fp = fopen(file_name, "r"); // read mode
//If there's no data inside the file it displays following error message
if (fp == NULL)
{
perror("Error while opening the file.\n");
exit(EXIT_FAILURE);
}
//asks for string (what has to be searched)
printf("Enter what you want to search: \n");
scanf("%s", searchforthis);
char* p;
// Find first occurrence of searchforthis in fp
p = strstr(searchforthis, fp);
// Prints the result
if (p) {
printf("This Value was found in following file:\n%s", file_name);
} else
printf("This Value has not been found.\n");
fclose(fp);
return 0;
}
This line,
p = strstr(searchforthis, fp);
is wrong. strstr() is defined as, char *strstr(const char *haystack, const char *needle), no file pointers in it.
Forget about gets(), its prone to overflow, reference, Why is the gets function so dangerous that it should not be used?.
Your scanf("%s",...) is equally dangerous to using gets() as you don't limit the character to be read. Instead, you could re-format it as,
scanf("%199s", searchforthis); /* 199 characters + \0 to mark the end of the string */
Also check the return value of scanf() , in case an input error occurs, final code should look like this,
if (scanf("%199s", searchforthis) != 1)
{
exit(EXIT_FAILURE);
}
It is even better, if you use fgets() for this, though keep in mind that fgets() will also save the newline character in the buffer, you are going to have to strip it manually.
To actually perform checks on the file, you have to read the file line by line, by using a function like, fgets() or fscanf(), or POSIX getline() and then use strstr() on each line to determine if you have a match or not, something like this should work,
char *p;
char buff[500];
int flag = 0, lines = 1;
while (fgets(buff, sizeof(buff), fp) != NULL)
{
size_t len = strlen(buff); /* get the length of the string */
if (len > 0 && buff[len - 1] == '\n') /* check if the last character is the newline character */
{
buff[len - 1] = '\0'; /* place \0 in the place of \n */
}
p = strstr(buff, searchforthis);
if (p != NULL)
{
/* match - set flag to 1 */
flag = 1;
break;
}
}
if (flag == 0)
{
printf("This Value has not been found.\n");
}
else
{
printf("This Value was found in following file:\n%s", file_name);
}
flag is used to determine whether or not searchforthis exists in the file.
Side note, if the line contains more than 499 characters, you will need a larger buffer, or a different function, consider getline() for that case, or even a custom one reading character by character.
If you want to do this for multiple files, you have to place the whole process in a loop. For example,
for (int i = 0; i < 5; i++) /* this will execute 5 times */
{
printf("Enter name of a file you wish to check:\n");
...
}

Stripping newlines from fread

I'm loading a file using this code, but it seems like removing the newlines also for some reason removes all lines but the first.
void load_script(char* path) {
FILE* file;
char* script;
int filesize = 0;
file = fopen(path, "r");
// determine file size
fseek(file, 0L, SEEK_END);
filesize = ftell(file);
fseek(file, 0L, SEEK_SET);
// allocate memory
script = malloc(filesize + 1);
// read script
size_t size = fread(script, 1, filesize, file);
script[size] = 0;
printf("Before stripping:\n%s\n", script);
// strip newlines
script[strcspn(script, "\n")] = 0;
printf("After stripping:\n%s\n", script);
fclose(file);
tokenize(script);
}
Here's the output:
Before stripping:
line 1
line 2
line 3
After stripping:
line 1
I'd love to know the best way to strip newlines from a multiline string. Thanks.
script[strcspn(script, "\n")] = 0;
This terminates the C-string after the first newline. You may want to loop over the string and replace \n' with ' ' instead.
Something like:
// strip newlines
for(size_t i = 0; script[i]; i++)
if (script[i] == '\n') script[i] = ' ';
By the way, you should be using off_t (POSIX defined) to store the file size (as the type of filesize), not an int. An int may not be able to hold the size of a file.
In addition to the solution provided by l3x, I should add that the method used is not reliable:
ftell may fail and upon success, its return value is the number of bytes in the file only if the file is opened in binary mode. For text mode, the Standard does not guarantee anything beyond the fact that it can be used as an argument to fseek.
seeking to the end of file and back to the beginning will not work as expected if the stream does not refer to an actual file: for a terminal, a character device or a pipe, it may not work at all.
It is much more reliable to read the file with getc() into a buffer that you reallocate on demand, one chunk at a time.

reading from a binary file in C

I am currently working on a project in which I have to read from a binary file and send it through sockets and I am having a hard time trying to send the whole file.
Here is what I wrote so far:
FILE *f = fopen(line,"rt");
//size = lseek(f, 0, SEEK_END)+1;
fseek(f, 0L, SEEK_END);
int size = ftell(f);
unsigned char buffer[MSGSIZE];
FILE *file = fopen(line,"rb");
while(fgets(buffer,MSGSIZE,file)){
sprintf(r.payload,"%s",buffer);
r.len = strlen(r.payload)+1;
res = send_message(&r);
if (res < 0) {
perror("[RECEIVER] Send ACK error. Exiting.\n");
return -1;
}
}
I think it has something to do with the size of the buffer that I read into,but I don't know what it's the correct formula for it.
One more thing,is the sprintf done correctly?
If you are reading binary files, a NUL character may appear anywhere in the file.
Thus, using string functions like sprintf and strlen is a bad idea.
If you really need to use a second buffer (buffer), you could use memcpy.
You could also directly read into r.payload (if r.payload is already allocated with sufficient size).
You are looking for fread for a binary file.
The return value of fread tells you how many bytes were read into your buffer.
You may also consider to call fseek again.
See here How can I get a file's size in C?
Maybe your code could look like this:
#include <stdint.h>
#include <stdio.h>
#define MSGSIZE 512
struct r_t {
uint8_t payload[MSGSIZE];
int len;
};
int send_message(struct r_t *t);
int main() {
struct r_t r;
FILE *f = fopen("test.bin","rb");
fseek(f, 0L, SEEK_END);
size_t size = ftell(f);
fseek(f, 0L, SEEK_SET);
do {
r.len = fread(r.payload, 1, sizeof(r.payload), f);
if (r.len > 0) {
int res = send_message(&r);
if (res < 0) {
perror("[RECEIVER] Send ACK error. Exiting.\n");
fclose(f);
return -1;
}
}
} while (r.len > 0);
fclose(f);
return 0;
}
No, the sprintf is not done correctly. It is prone to buffer overflow, a very serious security problem.
I would consider sending the file as e.g. 1024-byte chunks instead of as line-by-line, so I would replace the fgets call with an fread call.
Why are you opening the file twice? Apparently to get its size, but you could open it only once and jump back to the beginning of the file. And, you're not using the size you read for anything.
Is it a binary file or a text file? fgets() assumes you are reading a text file -- it stops on a line break -- but you say it's a binary file and open it with "rb" (actually, the first time you opened it with "rt", I assume that was a typo).
IMO you should never ever use sprintf. The number of characters written to the buffer depends on the parameters that are passed in, and in this case if there is no '\0' in buffer then you cannot predict how many bytes will be copied to r.payload, and there is a very good chance you will overflow that buffer.
I think sprintf() would be the first thing to fix. Use memcpy() and you can tell it exactly how many bytes to copy.

C, reading a multiline text file

I know this is a dumb question, but how would I load data from a multiline text file?
while (!feof(in)) {
fscanf(in,"%s %s %s \n",string1,string2,string3);
}
^^This is how I load data from a single line, and it works fine. I just have no clue how to load the same data from the second and third lines.
Again, I realize this is probably a dumb question.
Edit: Problem not solved. I have no idea how to read text from a file that's not on the first line. How would I do this? Sorry for the stupid question.
Try something like:
/edited/
char line[512]; // or however large you think these lines will be
in = fopen ("multilinefile.txt", "rt"); /* open the file for reading */
/* "rt" means open the file for reading text */
int cur_line = 0;
while(fgets(line, 512, in) != NULL) {
if (cur_line == 2) { // 3rd line
/* get a line, up to 512 chars from in. done if NULL */
sscanf (line, "%s %s %s \n",string1,string2,string3);
// now you should store or manipulate those strings
break;
}
cur_line++;
}
fclose(in); /* close the file */
or maybe even...
char line[512];
in = fopen ("multilinefile.txt", "rt"); /* open the file for reading */
fgets(line, 512, in); // throw out line one
fgets(line, 512, in); // on line 2
sscanf (line, "%s %s %s \n",string1,string2,string3); // line 2 is loaded into 'line'
// do stuff with line 2
fgets(line, 512, in); // on line 3
sscanf (line, "%s %s %s \n",string1,string2,string3); // line 3 is loaded into 'line'
// do stuff with line 3
fclose(in); // close file
Putting \n in a scanf format string has no different effect from a space. You should use fgets to get the line, then sscanf on the string itself.
This also allows for easier error recovery. If it were just a matter of matching the newline, you could use "%*[ \t]%*1[\n]" instead of " \n" at the end of the string. You should probably use %*[ \t] in place of all your spaces in that case, and check the return value from fscanf. Using fscanf directly on input is very difficult to get right (what happens if there are four words on a line? what happens if there are only two?) and I would recommend the fgets/sscanf solution.
Also, as Delan Azabani mentioned... it's not clear from this fragment whether you're not already doing so, but you have to either define space [e.g. in a large array or some dynamic structure with malloc] to store the entire dataset, or do all your processing inside the loop.
You should also be specifying how much space is available for each string in the format specifier. %s by itself in scanf is always a bug and may be a security vulnerability.
First off, you don't use feof() like that...it shows a probable Pascal background, either in your past or in your teacher's past.
For reading lines, you are best off using either POSIX 2008 (Linux) getline() or standard C fgets(). Either way, you try reading the line with the function, and stop when it indicates EOF:
while (fgets(buffer, sizeof(buffer), fp) != 0)
{
...use the line of data in buffer...
}
char *bufptr = 0;
size_t buflen = 0;
while (getline(&bufptr, &buflen, fp) != -1)
{
...use the line of data in bufptr...
}
free(bufptr);
To read multiple lines, you need to decide whether you need previous lines available as well. If not, a single string (character array) will do. If you need the previous lines, then you need to read into an array, possibly an array of dynamically allocated pointers.
Every time you call fscanf, it reads more values. The problem you have right now is that you're re-reading each line into the same variables, so in the end, the three variables have the last line's values. Try creating an array or other structure that can hold all the values you need.
The best way to do this is to use a two dimensional array and and just write each line into each element of the array. Here is an example reading from a .txt file of the poem Ozymandias:
int main() {
char line[15][255];
FILE * fpointer = fopen("ozymandias.txt", "rt");
for (int a = 0; a < 15; a++) {
fgets(line[a], 255, fpointer);
}
for (int b = 0; b < 15; b++) {
printf("%s", line[b]);
}
return 0;
This produces the poem output. Notice that the poem is 14 lines long, it is more difficult to print out a file whose length you do not know because reading a blank line will produce the output "x�oA". Another issue is if you check if the next line is null by writing
while (fgets(....) != NULL)) {
each line will be skipped. You could try going back a line each time to solve this but i think this solution is fine for all intents.
I have an even EASIER solution with no confusing snippets of puzzling methods (no offense to the above stated) here it is:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
string line;//read the line
ifstream myfile ("MainMenu.txt"); // make sure to put this inside the project folder with all your .h and .cpp files
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
Happy coding

read from file as char array

I am reaing from a file, and when i read, it takes it line by line, and prints it
what i want exactly is i want an array of char holding all the chars in the file and print it once,
this is the code i have
if(strcmp(str[0],"#")==0)
{
FILE *filecomand;
//char fname[40];
char line[100];
int lcount;
///* Read in the filename */
//printf("Enter the name of a ascii file: ");
//fgets(History.txt, sizeof(fname), stdin);
/* Open the file. If NULL is returned there was an error */
if((filecomand = fopen(str[1], "r")) == NULL)
{
printf("Error Opening File.\n");
//exit(1);
}
lcount=0;
int i=0;
while( fgets(line, sizeof(line), filecomand) != NULL ) {
/* Get each line from the infile */
//lcount++;
/* print the line number and data */
//printf("%s", line);
}
fclose(filecomand); /* Close the file */
You need to determine the size of the file. Once you have that, you can allocate an array large enough and read it in a single go.
There are two ways to determine the size of the file.
Using fstat:
struct stat stbuffer;
if (fstat(fileno(filecommand), &stbuffer) != -1)
{
// file size is in stbuffer.st_size;
}
With fseek and ftell:
if (fseek(fp, 0, SEEK_END) == 0)
{
long size = ftell(fp)
if (size != -1)
{
// succesfully got size
}
// Go back to start of file
fseek(fp, 0, SEEK_SET);
}
Another solution would be to map the entire file to the memory and then treat it as a char array.
Under windows MapViewOfFile, and under unix mmap.
Once you mapped the file (plenty of examples), you get a pointer to the file's beginning in the memory. Cast it to char[].
Since you can't assume how big the file is, you need to determine the size and then dynamically allocate a buffer.
I won't post the code, but here's the general scheme. Use fseek() to navigate to the end of file, ftell() to get size of the file, and fseek() again to move the start of the file. Allocate a char buffer with malloc() using the size you found. The use fread() to read the file into the buffer. When you are done with the buffer, free() it.
Use a different open. i.e.
fd = open(str[1], O_RDONLY|O_BINARY) /* O_BINARY for MS */
The read statement would be for a buffer of bytes.
count = read(fd,buf, bytecount)
This will do a binary open and read on the file.

Resources