I'm loading a file using this code, but it seems like removing the newlines also for some reason removes all lines but the first.
void load_script(char* path) {
FILE* file;
char* script;
int filesize = 0;
file = fopen(path, "r");
// determine file size
fseek(file, 0L, SEEK_END);
filesize = ftell(file);
fseek(file, 0L, SEEK_SET);
// allocate memory
script = malloc(filesize + 1);
// read script
size_t size = fread(script, 1, filesize, file);
script[size] = 0;
printf("Before stripping:\n%s\n", script);
// strip newlines
script[strcspn(script, "\n")] = 0;
printf("After stripping:\n%s\n", script);
fclose(file);
tokenize(script);
}
Here's the output:
Before stripping:
line 1
line 2
line 3
After stripping:
line 1
I'd love to know the best way to strip newlines from a multiline string. Thanks.
script[strcspn(script, "\n")] = 0;
This terminates the C-string after the first newline. You may want to loop over the string and replace \n' with ' ' instead.
Something like:
// strip newlines
for(size_t i = 0; script[i]; i++)
if (script[i] == '\n') script[i] = ' ';
By the way, you should be using off_t (POSIX defined) to store the file size (as the type of filesize), not an int. An int may not be able to hold the size of a file.
In addition to the solution provided by l3x, I should add that the method used is not reliable:
ftell may fail and upon success, its return value is the number of bytes in the file only if the file is opened in binary mode. For text mode, the Standard does not guarantee anything beyond the fact that it can be used as an argument to fseek.
seeking to the end of file and back to the beginning will not work as expected if the stream does not refer to an actual file: for a terminal, a character device or a pipe, it may not work at all.
It is much more reliable to read the file with getc() into a buffer that you reallocate on demand, one chunk at a time.
Related
I'm making a simple sockets program to send a text file or a picture file over to another socket connected to a port. However, I want to also send the size of the file over to the client socket so that it knows how many bytes to receive.
I also want to implement something where I can send a certain number of bytes instead of the file itself. For example, if a file I wanted to send was 14,003 bytes and I felt like sending 400 bytes, then only 400 bytes would be sent.
I am implementing something like this:
#include <stdio.h>
int main(int argc, char* argv[]) {
FILE *fp;
char* file = "text.txt";
int offset = 40;
int sendSize = 5;
int fileSize = 0;
if ((fp = fopen(file, "r")) == NULL) {
printf("Error: Cannot open the file!\n");
return 1;
} else {
/* Seek from offset into the file */
//fseek(fp, 0L, SEEK_END);
fseek(fp, offset, sendSize + offset); // seek to sendSize
fileSize = ftell(fp); // get current file pointer
//fseek(fp, 0, SEEK_SET); // seek back to beginning of file
}
printf("The size is: %d", fileSize);
}
offset is pretty much going to go 40 bytes into the file and then send whatever sendSize bytes over to the other program.
I keep getting an output of 0 instead of 5. Any reason behind this?
You can try this.
#include <stdio.h>
int main(int argc, char* argv[]) {
FILE *fp;
char* file = "text.txt";
int offset = 40;
int sendSize = 5;
int fileSize = 0;
if ((fp = fopen(file, "r")) == NULL) {
printf("Error: Cannot open the file!\n");
return 1;
} else {
fseek(fp, 0L, SEEK_END);
fileSize = ftell(fp);
}
printf("The size is: %d", fileSize);
}
The fseek() to the end, then ftell() method is a reasonably portable way of getting the size of a file, but not guaranteed to be correct. It won't transparently handle newline / carriage return conversions, and as a result, the standard doesn't actually guarantee that the return from ftell() is useful for any purpose other than seeking to the same position.
The only portable way is to read the file until data runs out and keep a count of bytes. Or stat() the file using the (non-ANSI) Unix standard function.
You may be opening the file in text mode as Windows can open a file in text mode even without the "t" option.
And you can't use ftell() to get the size of a file opened in text mode. Per 7.21.9.4 The ftell function of the C Standard:
For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file
position indicator for the stream to its position at the time
of the ftell call; the difference between two such return
values is not necessarily a meaningful measure of the number of
characters written or read.
Even if it does return the "size" of the file, the translation to "text" may changed the actual number of bytes read.
It's also not portable or standard-conforming to use fseek() to find the end of a binary file. Per 7.21.9.2 The
fseek
function:
A binary stream need not meaningfully support fseek calls with a
whence value of SEEK_END.
I think your Seek does not work due to the 3rd parameter:
try to seek with
(fp, offset, SEEK_SET);
as he will try to use the number sendSize+Offset as the "origin" constant, it will be compared to the 3 constant values as below (it is 0, 1 or 2) and as nothing compares it seem to return 0 all time.
http://www.cplusplus.com/reference/cstdio/fseek/
Parameters
stream, offset, origin
Position used as reference for the offset. It is specified by one of the following constants defined in exclusively to be used as arguments for this function:
Constant Reference position
SEEK_SET Beginning of file
SEEK_CUR Current position of the file pointer
SEEK_END End of file
I am currently working on a project in which I have to read from a binary file and send it through sockets and I am having a hard time trying to send the whole file.
Here is what I wrote so far:
FILE *f = fopen(line,"rt");
//size = lseek(f, 0, SEEK_END)+1;
fseek(f, 0L, SEEK_END);
int size = ftell(f);
unsigned char buffer[MSGSIZE];
FILE *file = fopen(line,"rb");
while(fgets(buffer,MSGSIZE,file)){
sprintf(r.payload,"%s",buffer);
r.len = strlen(r.payload)+1;
res = send_message(&r);
if (res < 0) {
perror("[RECEIVER] Send ACK error. Exiting.\n");
return -1;
}
}
I think it has something to do with the size of the buffer that I read into,but I don't know what it's the correct formula for it.
One more thing,is the sprintf done correctly?
If you are reading binary files, a NUL character may appear anywhere in the file.
Thus, using string functions like sprintf and strlen is a bad idea.
If you really need to use a second buffer (buffer), you could use memcpy.
You could also directly read into r.payload (if r.payload is already allocated with sufficient size).
You are looking for fread for a binary file.
The return value of fread tells you how many bytes were read into your buffer.
You may also consider to call fseek again.
See here How can I get a file's size in C?
Maybe your code could look like this:
#include <stdint.h>
#include <stdio.h>
#define MSGSIZE 512
struct r_t {
uint8_t payload[MSGSIZE];
int len;
};
int send_message(struct r_t *t);
int main() {
struct r_t r;
FILE *f = fopen("test.bin","rb");
fseek(f, 0L, SEEK_END);
size_t size = ftell(f);
fseek(f, 0L, SEEK_SET);
do {
r.len = fread(r.payload, 1, sizeof(r.payload), f);
if (r.len > 0) {
int res = send_message(&r);
if (res < 0) {
perror("[RECEIVER] Send ACK error. Exiting.\n");
fclose(f);
return -1;
}
}
} while (r.len > 0);
fclose(f);
return 0;
}
No, the sprintf is not done correctly. It is prone to buffer overflow, a very serious security problem.
I would consider sending the file as e.g. 1024-byte chunks instead of as line-by-line, so I would replace the fgets call with an fread call.
Why are you opening the file twice? Apparently to get its size, but you could open it only once and jump back to the beginning of the file. And, you're not using the size you read for anything.
Is it a binary file or a text file? fgets() assumes you are reading a text file -- it stops on a line break -- but you say it's a binary file and open it with "rb" (actually, the first time you opened it with "rt", I assume that was a typo).
IMO you should never ever use sprintf. The number of characters written to the buffer depends on the parameters that are passed in, and in this case if there is no '\0' in buffer then you cannot predict how many bytes will be copied to r.payload, and there is a very good chance you will overflow that buffer.
I think sprintf() would be the first thing to fix. Use memcpy() and you can tell it exactly how many bytes to copy.
I using some C code that writes binary data to a file. In the process, it seeks around to different positions and then finally seeks to the end with fseeko(fp, 0, SEEK_END);.
However, in some cases, I want to work on a stream in memory instead. I use open_memstream for this, but seeking to the end pads the buffer with zeros and it ends up being twice as big as it should be.
An example just to demonstrate the effect of the fseek to the end of the stream is below. In the actual code, we also fseek to different parts of the stream, patching and editing bits of it, etc., as the stream is processed. Note also that writing the file at the end to the filesystem is just for demonstration to show the contents of the buffer – otherwise I wouldn't need the memory stream.
#include <stdio.h>
#include <stdlib.h>
#if (defined(BSD) || __APPLE__)
#include "open_memstream.h"
#endif
int main(void) {
FILE *stream;
FILE *outfile;
char *buf;
size_t buf_len;
int i;
stream = open_memstream(&buf, &buf_len);
for(i = 0; i < 1000; i++) {
fprintf(stream, "%d\n", i);
}
fseeko(stream, 0, SEEK_END);
fclose(stream);
outfile = fopen("out.txt", "w");
fwrite(buf, buf_len, 1, outfile);
fclose(outfile);
return 0;
}
I was testing this out on Mac OS X with this implementation of open_memstream and it worked as I expected, but when I run this on Linux the file is twice the size with zeros at the end.
What's the best way to deal with this? I'm not sure if it's reliable to divide the buffer length by two and truncate it.
I've just ran into the same problem on Linux.
// It seams that SEEK_END does not work with open_memstream()
fseek(stream, 0, SEEK_END);
I've ended up doing this:
off_t o = ftell(stream);
/* do some things with the stream */
fseek(stream, o, SEEK_SET);
Using C, is there a way to read only the last line of a file without looping it's entire content?
Thing is that file contains millions of lines, each of them holding an integer (long long int). The file itself can be quite large, I presume even up to 1000mb. I know for sure that the last line won't be longer than 55 digits, but could be 2 only digits as well. It's out of options to use any kind of database... I've considered it already.
Maybe its a silly question, but coming from PHP background I find it hard to answer. I looked everywhere but found nothing clean.
Currently I'm using:
if ((fd = fopen(filename, "r")) != NULL) // open file
{
fseek(fd, 0, SEEK_SET); // make sure start from 0
while(!feof(fd))
{
memset(buff, 0x00, buff_len); // clean buffer
fscanf(fd, "%[^\n]\n", buff); // read file *prefer using fscanf
}
printf("Last Line :: %d\n", atoi(buff)); // for testing I'm using small integers
}
This way I'm looping file's content and as soon as file gets bigger than ~500k lines things slow down pretty bad....
Thank you in advance.
maxim
Just fseek to fileSize - 55 and read forward?
If there is a maximum line length, seek to that distance before the end.
Read up to the end, and find the last end-of-line in your buffer.
If there is no maximum line length, guess a reasonable value, read that much at the end, and if there is no end-of-line, double your guess and try again.
In your case:
/* max length including newline */
static const long max_len = 55 + 1;
/* space for all of that plus a nul terminator */
char buf[max_len + 1];
/* now read that many bytes from the end of the file */
fseek(fd, -max_len, SEEK_END);
ssize_t len = read(fd, buf, max_len);
/* don't forget the nul terminator */
buf[len] = '\0';
/* and find the last newline character (there must be one, right?) */
char *last_newline = strrchr(buf, '\n');
char *last_line = last_newline+1;
Open with "rb" to make sure you're reading binary. Then fseek(..., SEEK_END) and start reading bytes from the back until you find the first line separator (if you know the maximum line length is 55 characters, read 55 characters ...).
ok. It all worked for me. I learned something new. The last line of a file 41mb large and with >500k lines was read instantly. Thanks to you all guys, especially 'Useless' (love the controversy of your nickname, btw). I will post here the code in the hope that someone else in the future can benefit from it:
Reading ONLY the last line of the file:
the file is structured the way that there is a new line appended and I am sure that any line is shorter than, in my case, 55 characters:
file contents:
------------------------
2943728727
3129123555
3743778
412912777
43127787727
472977827
------------------------
notice the new line appended.
FILE *fd; // File pointer
char filename[] = "file.dat"; // file to read
static const long max_len = 55+ 1; // define the max length of the line to read
char buff[max_len + 1]; // define the buffer and allocate the length
if ((fd = fopen(filename, "rb")) != NULL) { // open file. I omit error checks
fseek(fd, -max_len, SEEK_END); // set pointer to the end of file minus the length you need. Presumably there can be more than one new line caracter
fread(buff, max_len-1, 1, fd); // read the contents of the file starting from where fseek() positioned us
fclose(fd); // close the file
buff[max_len-1] = '\0'; // close the string
char *last_newline = strrchr(buff, '\n'); // find last occurrence of newlinw
char *last_line = last_newline+1; // jump to it
printf("captured: [%s]\n", last_line); // captured: [472977827]
}
cheers!
maxim
I write this C code so that I could test whether fwrite could update some values in a text file. I tested on Linux and it works fine. In Windows (vista 32bits), however, it simply does not work. The file remains unchanged after I write a different byte using: cont = fwrite(&newfield, sizeof(char), 1, fp);
The registers are written on the file using a "#" separator, in the format:
Reg1FirstField#Reg1SecondField#Reg2FirstField#Reg2SecondField...
The final file should be: First#1#Second#9#Third#1#
I also tried putc and fprintf, all with no result. Can someone please help me with this?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct test {
char field1[20];
char field2;
} TEST;
int main(void) {
FILE *fp;
TEST reg, regread;
char regwrite[22];
int i, cont, charwritten;
fp=fopen("testupdate.txt","w+");
strcpy(reg.field1,"First");
reg.field2 = '1';
sprintf(regwrite,"%s#%c#", reg.field1, reg.field2);
cont = (int)strlen(regwrite);
charwritten = fwrite(regwrite,cont,1,fp);
fflush(fp);
strcpy(reg.field1,"Second");
reg.field2 = '1';
sprintf(regwrite,"%s#%c#", reg.field1, reg.field2);
cont = (int)strlen(regwrite);
charwritten = fwrite(regwrite,cont,1,fp);
fflush(fp);
strcpy(reg.field1,"Third");
reg.field2 = '1';
sprintf(regwrite,"%s#%c#", reg.field1, reg.field2);
cont = (int)strlen(regwrite);
charwritten = fwrite(regwrite,cont,1,fp);
fflush(fp);
fclose(fp);
// open file to update
fp=fopen("testupdate.txt","r+");
printf("\nUpdate field 2 on the second register:\n");
char aux[22];
// search for second register and update field 2
for (i = 0; i < 3; i ++) {
fscanf(fp,"%22[^#]#", aux);
printf("%d-1: %s\n", i, aux);
if (strcmp(aux, "Second") == 0) {
char newfield = '9';
cont = fwrite(&newfield, sizeof(char), 1, fp);
printf("written: %d bytes, char: %c\n", cont, newfield);
// goes back one byte in order to read properly
// on the next fscanf
fseek(fp,-1,SEEK_CUR);
}
fscanf(fp,"%22[^#]#", aux);
printf("%d-2: %s\n",i, aux);
aux[0] = '\0';
}
fflush(fp);
fclose(fp);
// open file to see if the update was made
fp=fopen("testupdate.txt","r");
for (i = 0; i < 3; i ++) {
fscanf(fp,"%22[^#]#", aux);
printf("%d-1: %s\n", i, aux);
fscanf(fp,"%22[^#]#",aux);
printf("%d-2: %s\n",i, aux);
aux[0] = '\0';
}
fclose(fp);
getchar();
return 0;
}
You're missing a file positioning function between the read and write. The Standard says:
7.19.5.3/6
When a file is opened with update mode, both input and output may be performed on the associated stream. However, ... input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file. ...
for (i = 0; i < 3; i ++) {
fscanf(fp,"%22[^#]#", aux); /* read */
printf("%d-1: %s\n", i, aux);
if (strcmp(aux, "Second") == 0) {
char newfield = '9';
/* added a file positioning function */
fseek(fp, 0, SEEK_CUR); /* don't move */
cont = fwrite(&newfield, sizeof(char), 1, fp); /* write */
I didn't know it but here they explain it:
why fseek or fflush is always required between reading and writing in the read/write "+" modes
Conclusion: You must either fflush or fseek before every write when you use "+".
fseek(fp, 0, SEEK_CUR);
// or
fflush(fp);
cont = fwrite(&newfield, sizeof(char), 1, fp);
Fix verified on Cygwin.
You're not checking any return values for errors. I'm guessing the file is read-only and is not even opening properly.
At least here on OSX, your value 9 is begin appended to the end of the file ... so you're not updating the actual register value for Second at it's position in the file. For some reason after the scan for the appropriate point to modify the values, your stream pointer is actually at the end of the file. For instance, running and compiling your code on OSX produced the following output in the actual text file:
First#1#Second#1#Third#1#9
The reason your initial read-back is working is because the data is being written, but it's at the end of the file. So when you write the value and then back-up the stream and re-read the value, that works, but it's not being written in the location you're assuming.
Update: I've added some calls to ftell to see what's happening to the stream pointer, and it seems that your calls to fscanf are working as you'd assume, but the call to fwrite is jumping to the end of the file. Here's the modified output:
Update field 2 on the second register:
**Stream position: 0
0-1: First
0-2: 1
**Stream position: 8
1-1: Second
**Stream position before write: 15
**Stream position after write: 26
written: 1 bytes, char: 9
1-2: 9
**Stream position after read-back: 26
Update-2: It seems by simply saving the position of the stream-pointer, and then setting the position of the stream-pointer, the call to 'fwrite` worked without skipping to the end of the file. So I added:
fpos_t position;
fgetpos(fp, &position);
fsetpos(fp, &position);
right before the call to fwrite. Again, this is on OSX, you may see something different on Windows.
With this:
fp=fopen("testupdate.txt","w+");
^------ Notice the + sign
You opened the file in "append" mode -- that is what the plus sign does in this parameter. As a result, all of your fwrite() calls will be relative to the end of the file.
Using "r+" for the fopen() mode doesn't make sense -- the + means nothing in this case.
This and other issues with fopen() are why I prefer to use the POSIX-defined open().
To fix your particular case, get rid of the + characters from the fopen() modes, and consider that you might need to specify binary format on Windows ("wb" and "rb" modes).