see my code is
#include <stdio.h>
#include<stdlib.h>
#include<sys/stat.h>
int main(int argc, char**argv) {
unsigned char *message = NULL;
struct stat stp = { 0 };
stat("huffman.txt", &stp);
/*determine the size of data which is in file*/
int filesize = stp.st_size;
printf("\nFile size id %d\n", filesize);
FILE* original_fileptr = fopen("huffman.txt", "r");
message = malloc(filesize);
fread(message, 1, filesize, original_fileptr);
printf("\n\tEntered Message for Encode is %s= and length %d", message,strlen(message));
return 0;
}
here huffman.txt has size 20 bytes and following character are there
άSUä5Ñ®qøá"F„œ
output of this code is
File size id 20
Entered Message for Encode is άSUä5Ñ®qøá"F„œ= and length 21
now question is if size is 20 then why length is 21 ?
Because C doesn't have native strings, only arrays of characters, and there is a hidden, ubiquitous assumption that the last array member is a zero.
Since you violate that assumption by reading only 20 bytes into an array of 20 elements, with no regard to whether the final byte is zero, and then using string features like %s and strlen, you get essentially undefined behavior.
Getting an answer of 21 is pure luck; anything (far worse) could have happened.
Correct code could be something like this (assuming the file is a text file):
char * buf = calloc(filesize + 1, 1); /* yay, already zeroed! */
fread(buf, 1, filesize, fp);
printf("File contents: '%s'\nFile content size: %u.\n", buf, strlen(buf));
If you're reading arbitrary ("binary") files, this will generally not produce the expected result (unless you know what to expect).
Related
I'm dabbling in a bit of Game Boy save state hacking, and I'm currently getting my head around reading from a binary file in C. I understand that char is one byte so I'm reading in chars using the following code
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
int main () {
FILE *fp = NULL;
char buffer[100];
int filesize;
fp = fopen("save file.srm", "r+"); //open
if(fp == NULL) printf("File loading error number %i\n", errno);
fseek(fp, 0L, SEEK_END); //seek to end to get file size
filesize = ftell(fp);
printf("Filesize is %i bytes\n",filesize);
fseek(fp, 0, SEEK_SET); //set read point at start of file
fread(buffer, sizeof buffer, 1, fp);//read into buffer
for(int i=70;i<80;i++) printf("%x\n", buffer[i]); //display
fclose(fp);
return(0);
}
The output I get is
Filesize is 32768 bytes
ffffff89
66
ffffff85
2
2
ffffff8b
44
ffffff83
c
0
I'm trying to load in bytes, so I want each row of the output to have maximum value 0xff. Can someone explain why I'm getting ffffff8b?
if(fp == NULL) printf("File loading error number %i\n", errno);
When you detect an error, do not just print a message. Either exit the program or do something to correct for the error.
char buffer[10];
Use unsigned char for working with raw data. char may be signed, which can cause undesired effects.
fread(buffer, strlen(buffer)+1, 1, fp);
buffer has not been initialized at this point, so the behavior of strlen(buffer) is not defined by the C standard. In any case, you do not want to use the length of the string currently in buffer as the size for fread. You want the size of the array. So use sizeof buffer (without the +1).
for(int i=0;i<10;i++)
Do not iterate to ten. Iterate to the number of bytes put into the buffer by fread. fread returns size_t value that is the number of items read. If you use it as size_t n = fread(buffer, 1, sizeof buffer, fp);, the number of items (in n) will be the number of bytes read, since having 1 for the second argument says each item to read is one byte.
printf("%x\n", buffer[i]);
To print an unsigned char, use %hhx. Because your buffer had signed char elements, some of them were negative. When used in this printf, they were promoted to negative int values. Then, because of the %x, printf attempted to print them as unsigned int values. All the extra bits from the negative values in two’s complement form showed up.
Very simply: char can be signed or unsigned by default: that's down to the compiler. In your case, it appears to be signed.
When you pass the char of buffer[i] to printf(), it is promoted to int, and sign-extended if the original char value had its top bit set. Hence anything that's in the range 0x80-0xff gets a lot of fs prefixing the value.
If you declare buffer to be unsigned char, this problem should not occur. But you should, in combination with that, use "%hhx" rather than "%x" for your printf() format, since the hh length modifier forces printf() to mask the input value so that only those bits applicable to an unsigned char (given that you're using the x specifier) are used.
I need to be able to make sure my array is correctly receiving values from the file card.raw through fread.
I am not confident about using arrays with pointers, so if anybody could help me with the theory here, it would be GREATLY appreciate it. Thanks in advance.
The code is supposed to take literally one block of size 512 bytes and stick it into the array. Then I am just using a debugger and printf to examine the arrays output.
/**
* recover.c
*
* Computer Science 50
* Problem Set 4
*
* Recovers JPEGs from a forensic image.
*/
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(int argc, char* argv[])
{
//Size of EACH FAT JPEG in bytes
#define FILESIZE 512
unsigned char* buffer[FILESIZE];
///Step 1: Open jpeg
FILE* readfrom = fopen("card.raw", "rb");
if (readfrom == NULL)
{
printf("Could not open");
}
///Step 2: Find Beginning of JPEG. The first digits will be 255216255 Then 224 or 225
fread(&buffer, FILESIZE, 1, readfrom);
for(int x = 0; x < FILESIZE; x++)
{
printf("%d = %c\n", x, buffer[x]);
}
fclose(readfrom);
}
Use return values from input functions. fread() reports how many elements were read - code might not have read 512. Swap FILESIZE, 1 to detect the number of characters/bytes read.
// fread(&buffer, FILESIZE, 1, readfrom);
size_t count = fread(&buffer, 1, FILESIZE, readfrom);
Only print out up to the number of elements read. Recommend hexadecimal (and maybe decimal) output rather than character.
for(size_t x = 0; x < count; x++) {
// printf("%d = %c\n", x, buffer[x]);
printf("%3zu = %02X % 3u\n", x, buffer[x], buffer[x]);
}
If the fopen() failed, best to not continue with for() and fclose().
if (readfrom == NULL) {
printf("Could not open");
return -1;
}
The second parameter is size, in bytes, of each element to be read.
The third parameter is Number of elements each one with a size of the <second parameter> bytes.
So, swap your second and first parameters.
Replace unsigned char* buffer[FILESIZE]; with unsigned char buffer[FILESIZE];. For now, you have an array of unsigned char *, when you need unsigned char. Because buffer is already a pointer, you don't need to take its address. In fread call, replace &buffer with buffer.
It must go like this: fread(buffer, 1, FILESIZE, readfrom);
One more thing: add return with a specific error code after printf("Could not open");, because if file hasn't been open, you cannot read from it, can you? And add return 0; in the end of main.
And take your #define out of main.
Read more about fread here: http://www.cplusplus.com/reference/cstdio/fread/
This question already has answers here:
How does fread really work?
(7 answers)
Closed 7 years ago.
Let's say I have a file with a size of 5000 bytes, which I am trying to read from.
I have this code:
int main()
{
char *file_path[] = "/path/to/my/file"
FILE *fp= fopen(file_path,"rb"); //open the file
fseek(fp, 0, SEEK_END); // seek to end of file
unsigned long fullsize = ftell(fp); //get the file size (5000 for this example)
fseek(fp, 0, SEEK_SET); //bring back the stream to the begging
char *buf = (char*)malloc(5000);
fread(buf,5000,1,fp);
free(buf);
return 0;
}
I can also replace the fread call with
fread(buf,1000,5,fp);
What is better? And why?
In matters of optimization, I understand the return value is different.
If you exchange those two arguments, you still request to read the same number of bytes. However the behaviour is different in other respects:
What happens if the file is shorter than that amount
The return value
Since you should always be checking the return value of fread, this is important :)
If you use the form result = fread(buf, 1, 5000, fp);, i.e. read 5000 units of size 1, but the file size is only 3000, then what will happen is that those 3000 bytes are placed in your buffer, and 3000 is returned.
In other words you can detect a partial read and still use the partial result.
However if you use result = fread(buf, 5000, 1, fp);, i.e. read 1 unit of size 5000, then the contents of the buffer are indeterminate (i.e. the same status as an uninitialized variable), and the return value is 0.
In both cases, a partial read leaves the file pointer in an indeterminate state, i.e. you will need to fseek before doing any further reads.
Using the latter form (i.e. any size other than 1) is probably best used for when you either want to abort if the full size is not available, or if you're reading a file with fixed-size records.
I've always found it best to use 1 for the element size. If fread()
can't read a complete element at the end of the file, it will skip the
last, partial element. This is not desirable when the last element is
short. On the other hand, using 1 for element size does no harm.
Sample code that prints itself and demonstrates this behavior:
#include <stdio.h>
#include <string.h>
#define SIZE 100
#define N 1
int main()
{
FILE *fin;
int ct;
char buf[SIZE * N + 1];
fin = fopen("size_n.c", "r");
while (1) {
ct = fread(buf, SIZE, N, fin);
if (!ct)
break;
buf[ct * SIZE] = '\0';
fputs(buf, stdout);
}
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
FILE *fp;
char ch;
char buffer[80] ;
fp = fopen("c:\\Rasmi Personal\\hello.txt", "w");
if(fp == NULL)
{
printf("File not found");
exit(1);
}
else
{
while(1)
{
gets(buffer);
fwrite(buffer, strlen(buffer), 2, fp); /* I made, size_t nitems = 2 (third element/argument)*/
fwrite("\n", 1, 1, fp);
}
}
fclose(fp);
return 0;
}
Input:
Rasmi Ranjan Nayak
Output:
Rasmi Ranjan Nayak 0# ÿ" 8ÿ"
Why this garbage is coming.
According to fwrite() function. if the size_t nitems is more than 1 then the entered text will be written more than once.
But here why I am getting garbage?
You're telling fwrite() to write two times strlen(buffer) bytes from the buffer (by setting nmemb = 2 you're making it write two "objects", each of which is strlen(buffer) bytes long), so it reads twice the number of bytes that are actually present.
The "garbage" is simply whatever happens to appear in memory after the string ends in buffer.
This is broken code, nmemb should be 1.
The signature of fwrite function is
size_t fwrite ( const void * ptr, size_t size, size_t count, FILE * stream );
ptr
Pointer to the array of elements to be written.
size
Size in bytes of each element to be written.
count
Number of elements, each one with a size of size bytes.
stream
Pointer to a FILE object that specifies an output stream.
In this case, if you try to write count * size who is bigger (in bytes) than the original string you have this garbage. If you clean the buffer
memset(buffer,0,80*sizeof(char));
gets(buffer);
probably will see a different result
$ ./a.out
asdadsadasdsad
$ cat -v hello.txt
asdadsadasdsad^#^#^#^#^#^#^#^#^#^#^#^#^#^#
but the text is always writen once. the difference is how many bytes will be writen
Here when I get file size using stat() it gives different output, why does it behave like this?
When "huffman.txt" contains a simple string like "Hi how are you" it gives file_size = 14. But when "huffman.txt" contains a string like "άSUä5Ñ®qøá"F" it gives file size = 30.
#include <sys/stat.h>
#include <stdio.h>
int main()
{
int size = 0;
FILE* original_fileptr = fopen("huffman.txt", "rb");
if (original_fileptr == NULL) {
printf("ERROR: fopen fail in %s at %d\n", __FUNCTION__, __LINE__);
return 1;
}
/*create variable of stat*/
struct stat stp = { 0 };
stat("huffman.txt", &stp);
/*determine the size of data which is in file*/
int filesize = stp.st_size;
printf("\nFile size is %d\n", filesize);
}
This has got to do with encoding.
Plain-text english characters are encoded in ASCII, where each character is one byte.
However, characters in non-plain text english are encoded in Unicode each being 2-byte.
Easiest way to see what is happening is to print each character using
char c;
/* Read file. */
while (c = fgetc())
printf ("%c", c)
You'll understand why the file size is different.
If you're asking why different strings with the same number of characters could have different sizes in bytes, read up on UTF-8