I wrote this code to test to combine two files:
long getFileSize(char *filename)
{
FILE* fp=fopen(filename,"rb");
fseek(fp,0,SEEK_END);
long size=ftell(fp);
fclose(fp);
return size;
}
long lengthA = getFileSize(argv[1]);
long lengthB = getFileSize(argv[2]);
printf("sizeof %s is:%d\n",argv[1],lengthA);
printf("sizeof %s is %d\n",argv[2],lengthB);
void *pa;
void *pb;
FILE* fp=fopen(argv[1],"rb");
fread(pa,1,lengthA,fp);
fclose(fp);
FILE* fpn=fopen(argv[2],"rb");
fread(pb,1,lengthB,fpn);
fclose(fpn);
printf("pointerA is:%p;pointerB is:%p\n",pa,pb);
FILE *ff=fopen("test.pack","wb");
fwrite(pa,1,lengthA,ff);
fwrite(pb,1,lengthB,ff);
fclose(ff);
long lengthFinal = getFileSize("test.pack");
printf("Final size:%i\n",lengthFinal);
however I don't know if the data is equal to the returned value from getFileSize,the console print clearly says something wrong with it,but I can't figure it out:
sizeof a.zip is:465235
sizeof b.zip is 107814
pointerA is:0x80484ec;pointerB is:0x804aff4
Final size:255270
since I know the length of each file,I can then use fseek to restore them right? that's the idea I was thinking.
*pa and *pb need to point to some memory where the file content shall be read to.
So, do a malloc for these two buffers with lengthA*sizeof(char) and lengthB*sizeof(char) and pass these allocated buffers to fread:
pa = malloc(lengthA*sizeof(char));
pb = malloc(lengthB*sizeof(char));
...
fread(pa,sizeof(char),lengthA,fp);
...
fread(pb,sizeof(char),lengthB,fpn);
Furthermore, fread returns the number of items actually read. Also check this!
Excerpt from man fread:
fread() and fwrite() return the number of items successfully read or written (i.e., not the number of characters). If an error occurs, or the end-of-file is reached, the return value is a short item count (or zero).
Note that there's no real reason to load both source files into memory at once. Also, it's potentially very memory-inefficient to do so, since you're really reading all of the files in, and then all you do is write the contents out again.
A better algorithm, in my opinion, would be:
let C = a reasonable buffer size, say 128 KB
let B = a static buffer of C bytes
let R = the output file, opened for binary write
for each input file F:
open F for binary read
repeat
let N be the number of bytes read, up to a maximum of C
if N > 0
write N first bytes of B into R
until N = 0
close F
close R
This does away with the need to allocate buffers dynamically, you could just do char C[B] and have #define B (128 << 10).
The above assumes that reading from a file which has no more bytes to deliver returns 0 bytes.
Also note that by doing away with the need to load the entire file, you also no longer need to open each input file an extra time just to seek to the end in order to compute the file's size.
pa and pb are not pointing to valid memory.
char* pa = malloc(lengthA * sizeof(char));
char* pb = malloc(lengthB * sizeof(char));
Remember to free() when no longer required.
Check all return values from functions fopen(), fread(), fwrite(), etc.
Related
I recently started dabbing in C again, a language I'm not particularly proficient at and, in fact, keep forgetting (I mostly code in Python). My idea here is to read data from a hypothetically large file as chunks and then process the data accordingly. For now, I'm simulating this by actually loading the whole file into a buffer of type short with fread. This method will be changed, since it would be a very bad idea for, say, a file that's 1 GB, I'd think. The end goal is to read a chunk as one, process, move the cursor, read another chunk and so on.
The file in question is 43 bytes and has the phrase "The quick brown fox jumps over the lazy dog". This size is convenient because it's a prime number, so no matter how many bytes I split it into, there will always be trailing garbage (due to the buffer having leftover space?). Data processing in this case is just printing out the shorts as two chars after byte manipulation (see code below)
#include <stdio.h>
#include <stdlib.h>
#define MAX_BUFF_SIZE 1024
long file_size(FILE *f)
{
if (fseek(f, 0, SEEK_END) != 0) exit(EXIT_FAILURE); // Move cursor to the end
long file_size = ftell(f); // Determine position to get file size
rewind(f);
return file_size;
}
int main(int argc, char* argv[])
{
short buff[MAX_BUFF_SIZE] = {0}; // Initialize to 0 remove trailing garbage
char* filename = argv[1];
FILE* fp = fopen(filename, "r");
if (fp)
{
size_t size = sizeof(buff[0]); // Size in bytes of each chunk. Fixed to 2 bytes
int nmemb = (file_size(fp) + size - 1) / size; // Number of chunks to read from file
// (ceil f_size/size)
printf("Should read at most %d chunks\n", nmemb);
short mask = 0xFF; // Mask to take first or second byte
size_t num_read = fread(buff, size, nmemb, fp);
printf("Read %lu chunks\n\n", num_read); // Seems to have read more? Look into.
for (int i=0; i<nmemb; i++) {
char first_byte = buff[i] & mask;
char second_byte = (buff[i] >> 8) & mask; // Identity for 2 bytes. Keep mask for consistency
printf("Chunk %02d: 0x%04x | %c %c\n", // Remember little endian (bytes reversed)
i, buff[i], first_byte, second_byte);
}
fclose(fp);
} else
{
printf("File %s not found\n", filename);
return 1;
}
return 0;
}
Now yesterday, on printing out the last chunk of data I was getting "Chunk 21: 0xffff9567 | g". The last (first?) byte (0x67) is g, and I did expect some trailing garbage, but I don't understand why it was printing out so many bytes when the variable buff has shorts in it. At that point I was just printing the hex as %x, not %04x, and buff was not initialized to 0. Today, I decided to initialize it to 0 and not only did the garbage disappear, but I can't recreate the problem even after leaving buff uninitialized again.
So here are my questions that hopefully aren't too abstract:
Does fread look beyond the file when reading data and does it remove trailing garbage itself, or is it up to us?
Why was printf showing an int when the buffer is a short? (I assume %x is for ints) and why can't I replicate the behaviour even after leaving buff without initialization?
Should I always initialize the buffer to zero to remove trailing garbage? What's the usual approach here?
I hope these aren't too many, or too vague, questions, and that I was clear enough. Like I said, I don't know much about C but find low-mid level programming very interesting, especially when it comes to direct data bit/byte manipulation.
Hope you have a great day!
EDIT 1:
Some of you wisely suggested I use num_read instead of nmemb on the loop, since that's the return value of fread, but that means I'll discard the rest of the file (nmemb is 22 but num_read is 21). Is that the usual approach? Also, thank you for pointing out that %x was casting to unsigned int, hence the 4 bytes instead of 2.
EDIT 2:
For clarification, and since I mispoke in a comment, I'd like to keep the remaining byte (or data), while discarding the rest, which is undefined. I don't know if this is the usual approach since if I use num_read in the loop, whatever is leftover at the end is discarded, data or not. I'm more interested in knowing what the usual approach is: discard leftover data or remove anything that we know is undefined, in this case one of the bytes.
int32_t a[MAX];
int main()
{
FILE *f = fopen("New.doc","rb");
FILE *g = fopen("temp.doc","wb");
if (f == NULL || g == NULL)
return 0;
int n;
while ((n = fread(a, 4, MAX, f)) > 0)
fwrite(a, 1, n, g);
fclose(f);
fclose(g);
return 0;
}
Size on disk of "new.doc" in my computer is 20kb, when I ran the code above, I got "temp.doc" with 5kb in size => loss of data. However, when I changed 4 into 1, I got "temp.doc" which is exactly similar to "new.doc". Can anyone explain what happened? Thank you.
fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
On success, fread() and fwrite() return the number of items read or
written. This number equals the number of bytes transferred only when
size is 1.
In your case
n = fread(a, 4, MAX, f)
would return sizeof file/4. When you write again, you actually write 1/4 of the file. You can fix the fread call to have size 1 instead of 4
This is happening because you did not pay attention to the semantics of fread() and fwrite(). One returns the number of items written, the other requires the number of bytes to write. In your case, you are reading 4-byte elements and writing 1-byte elements, so of course you are missing 75% of your file. Use a matching element size in the calls to fread() and fwrite() and all will be well.
Note that if your files do not actually contain 4-byte elements (e.g. they are text files), you better use an element size of 1 (and I suggest changing a to be of type char to be less confusing, and using sizeof(a[0]) as the element size in the function calls).
I am completely new to C and need help with this badly.
Im reading a file with fopen(), then obtaining the contents of it using fgetc(). What I want to know is how I can access the line fgetc() returns so if I can put the 4th - 8th characters into a char array. Below is an example I found online but am having a hard time parsing the data returns, I still don't have a firm understanding of C and don't get how an int can be used to store a line of characters.
FILE *fr;
fr = fopen("elapsed.txt", "r");
int n = fgetc(fr);
while(n!= EOF){
printf("%c", n);
n = fgetc(fr);
} printf("\n");
Here
1 first open the file
2 get size of file
3 allocated size to character pointer
4 and read data from file
FILE *fr;
char *message;
fr = fopen("elapsed.txt", "r");
/*create variable of stat*/
struct stat stp = { 0 };
/*These functions return information about a file. No permissions are required on the file itself*/
stat("elapsed.txt", &stp);
/*determine the size of data which is in file*/
int filesize = stp.st_size;
/*allocates the address to the message pointer and allocates memory*/
message = (char *) malloc(sizeof(char) * filesize);
if (fread(message, 1, filesize - 1, fr) == -1) {
printf("\nerror in reading\n");
/**close the read file*/
fclose(fr);
/*free input string*/
free(message);
}
printf("\n\tEntered Message for Encode is = %s", message);
PS Dont Forget to Add #include <sys/stat.h>.
You're not retrieving a line with fgetc. You are retrieving one character at a time from the file. That sample keeps retrieving characters until the EOF character is encountred (end of file). Look at this description of fgetc.
http://www.cplusplus.com/reference/clibrary/cstdio/fgetc/
On each iteration of the while loop, fgetc will retrieve a single character and place it into the variable "n". Something that can help you with "characters" in C is to just think of it as one byte, instead of an actual character. What you're not understanding here is that an int is 4 bytes and the character is 1 byte, but both can store the same bit pattern for the same ASCII character. The only different is the size of the variable internally.
The sample you have above shows a printf with "%c", which means to take the value in "n" and treat it like an ASCII character.
http://www.cplusplus.com/reference/clibrary/cstdio/printf/
You can use a counter in the while loop to keep track of your position to find the 4th and 8th value from the file. You should also think about what happens if the input file is smaller than your maximum size.
Hope that helps.
Ok look at it as box sizes I could have a 30cm x 30cm box that can hold 1 foam letter that I have. Now the function I am calling a function that 'could' return a 60cm x 60cm letter but it 99% likely to return a 30cm x 30cm letter because I know what its reading - I know if I give it a 60cm x 60cm box the result will always fit without surprises.
But if I am sure that the result will always be a 30cm x 30cm box then I know I can convert the result of a function that returns aa 60cm x 60cm box without losing anything
I find fwrite fails when I am trying to write somewhat big data as in the following code.
#include <stdio.h>
#include <unistd.h>
int main(int argc, char* argv[])
{
int size = atoi(argv[1]);
printf("%d\n", size);
FILE* fp = fopen("test", "wb");
char* c = "";
int i = fwrite(c, size, 1, fp);
fclose(fp);
printf("%d\n", i);
return 0;
}
The code is compiled into binary tw
When I try ./tw 10000 it works well. But when I try something like ./tw 12000 it fails.(fwrite() returns 0 instead of 1)
What's the reason of that? In what way can I avoid this?
EDIT: When I do fwrite(c, 1, size, fp) it returns 8192 instead of larger size I give.
2nd EDIT: When I write a loop that runs for size times, and fwrite(c, 1, 1, fp) each time, it work perfectly OK.
It seems when size is too large(as in the first EDIT) it only writes about 8192 bytes.
I guess something has limited fwrite write up to fixed size bytes at a time.
3rd EDIT: The above is not clear.
The following fails for space - w_result != 0 when space is large, where space is determined by me and w_result is object written in total.
w_result = 0;
char* empty = malloc(BLOCKSIZE * size(char));
w_result = fwrite(empty, BLOCKSIZE, space, fp);
printf("%d lost\n", space - w_result);
While this works OK.
w_result = 0;
char* empty = malloc(BLOCKSIZE * sizeof(char));
for(i = 0; i < space; i ++)
w_result += fwrite(empty, BLOCKSIZE, 1, fp);
printf("%d lost\n", space - w_result);
(every variable has been declared.)
I corrected some errors the answers memtioned. But the first one should work according to you.
With fwrite(c, size, 1, fp); you state that fwrite should write 1 item that is size big , big out of the buffer c.
c is just a pointer to an empty string. It has a size of 1. When you tell fwrite to go look for more data than 1 byte in c , you get undefined behavior. You cannot fwrite more than 1 byte from c.
(undefined behavior means anything could happen, it could appear to work fine when you try with a size of 10000 and not with a size of 12000. The implementation dependent reason for that is likely that there is some memory available, perhaps the stack, starting at c and 10000 bytes forward, but at e.g. 11000 there is no memory and you get a segfault)
You are reading memory that doesn't belong to your program (and writing it to a file).
Test your program using valgrind to see the errors.
From that snippet of code, it looks like you're trying to write what's at c, which is just a single NULL byte, to the file pointer, and you're doing so "size" times. The fact that it doesn't crash with 10000 is coincidental. What are you trying to do?
As has been stated by others the code is performing an invalid memory read via c.
A possible solution would be to dynamically allocate a buffer that is size bytes in size, initialise it, and fwrite() it to the file, remembering to deallocate the buffer afterwards.
Remember to check return values from functions (fopen() for example).
I'm trying to read a BMP image (greyscales) with C, save values into an array, and convert this array to a string with values separated with a comma.
My program worked well under Windows 7 64-bit, but I had to move to Windows XP 32-bit because of library compatibility problems.
I have 1,750 images to read, and I want to store all of them in a single string.
When I launch my program it goes fine until the 509:th image, then I get a Segmentation Fault caused by fread(). Here's my code:
int i=0,j,k,num,len,length,l;
unsigned char *Buffer;
FILE *fp;
char *string,*finalstring;
char *query;
char tmp2[5],tmp[3];
query = (char *)malloc(sizeof(char)*200000000);
string = (char *)malloc(sizeof(char)*101376);
Buffer = (unsigned char *)malloc(sizeof(unsigned char)*26368);
BITMAPFILEHEADER bMapFileHeader;
BITMAPINFOHEADER bMapInfoHeader;
length = 0;
for (k =1;k<1751;k++)
{
strcpy(link,"imagepath");
//here just indexing the images from 0000 to 1750
sprintf(tmp2,"%.4d",k);
strcat(link,tmp2);
strcat(link,".bmp");
fp = fopen(link, "rb");
num = fread(&bMapFileHeader,sizeof(BITMAPFILEHEADER),1,fp);
num = fread(&bMapInfoHeader,sizeof(BITMAPINFOHEADER),1,fp);
//seek beginning of data in bitmap
fseek(fp,54,SEEK_SET);
//read in bitmap file to data
fread(Buffer,26368,1,fp);
l=0;
for(i=1024;i<26368;i++)
{
itoa(Buffer[i],tmp,10);
len = strlen(tmp);
memcpy(string+l,tmp,len);
memcpy(string+l+len,",",1);
l = l+len+1;
}
memcpy(query,"",1);
memcpy(string,"",1);
printf("%i\n",k);
}
Thanks
Make it tmp[4]; for three digits and the terminating 0.
Also: where is the fclose? I suspect that you're running out of file handles.
Check, whether fp != 0.
Where did you get 101376 from? Each of your bytes take up at most 5 characters as a decimal number with comma (e.g. -127,), 5*26368 is 131840.
Get rid of the casts in malloc calls. And #include <stdlib.h>.
What's the output of this program, in both the 64-bit and 32-bit systems you're using?
#include <stdio.h>
int main(void) {
printf("sizeof (int) is %d\n", (int)(sizeof (int)));
printf("sizeof (int*) is %d\n", (int)(sizeof (int*)));
return 0;
}
Run your program in the debugger.
Set a breakpoint at the call to
fread -- make it conditional on
k==507 (this will stop it when you
expect the fread to be successful).
When the program hits the
breakpoint, examine the variables
and check what is about to be passed
to fread. The first one or two times
you hit the breakpoint, the values
will be good.
Then on the 509th time, you will
probably see bogus values being passed
to fread. Figure out where those
bogus values are coming from --
possibly set a conditional
breakpoint on the variable being set
to whatever the bogus value is.