I'm writing a Huffman algorithm and when I write my file header, I store the length of my file because there will be some spare bits and I need to know where to stop.
This happens instead when I write the length of my file: It writes 8 bytes, but when I read, it reads only 6.
long totChar;
long size;
fprintf(outfile, "%ld", totChar);
fscanf(cmpfile, "%ld", &size);
I'm sure that works because if I add for example:
fgetc(cmpfile); \\compressed file
fgetc(cmpfile);
and then I start reading, the decompression is successful.
You're reading and writing characters, not binary.
For example, maybe when you write data, you write the number 57,843,249 (8 digits). But when you read data, you read 875,345 (6 digits).
Related
Say I have a 90 megabyte file. It's not encrypted, but it is binary.
I want to store this file into a table as an array of byte values so I can process the file byte by byte.
I can spare up to 2 GB of ram, so something with a thing like jotting down what bytes have been processed, which bytes have yet to be processed, and the processed bytes, would all be good. I don't exactly care about how long it may take to process.
How should I approach this?
Note I've expanded and rewritten this answer due to Egor's comment.
You first need the file open in binary mode. The distinction is important on Windows, where the default text mode will change line endings from CR+LF into C newlines. You do this by specifying a mode argument to io.open of "rb".
Although you can read a file one byte at a time, in practice you will want to work through the file in buffers. Those buffers can be fairly large, but unless you know you are handling only small files in a one-off script, you should avoid reading the entire file into a buffer with file:read"*a" since that will cause various problems with very large files.
Once you have a file open in binary mode, you read a chunk of it using buffer = file:read(n), where n is an integer count of bytes in the chunk. Using a moderately sized power of two will likely be the most efficient. The return value will either be nil, or will be a string of up to n bytes. If less than n bytes long, that was the last buffer in the file. (If reading from a socket, pipe, or terminal, however, reads less than n may only indicate that no data has arrived yet, depending on lots of other factors to complex to explain in this sentence.)
The string in buffer can be processed any number of ways. As long as #buffer is not too big, then {buffer:byte(1,-1)} will return an array of integer byte values for each byte in the buffer. Too big partly depends on how your copy of Lua was configured when it was built, and may depend on other factors such as available memory as well. #buffer > 1E6 is certainly too big. In the example that follows, I used buffer:byte(i) to access each byte one at a time. That works for any size of buffer, at least as long as i remains an integer.
Finally, don't forget to close the file.
Here's a complete example, lightly tested. It reads a file a buffer at a time, and accumulates the total size and the sum of all bytes. It then prints the size, sum, and average byte value.
-- sum all bytes in a file
local name = ...
assert(name, "Usage: "..arg[0].." filename")
file = assert(io.open(name, "rb"))
local sum, len = 0,0
repeat
local buffer = file:read(1024)
if buffer then
len = len + #buffer
for i = 1, #buffer do
sum = sum + buffer:byte(i)
end
end
until not buffer
file:close()
print("length:",len)
print("sum:",sum)
print("mean:", sum / len)
Run with Lua 5.1.4 on my Windows box using the example as its input, it reports:
length: 402
sum: 30374
mean: 75.557213930348
To split the contents of a string s into an array of bytes use {s:byte(1,-1)}.
I have question about fprintf and fwrite.
How many bytes are written when this code runs (assuming fp has been correctly set up).
int i = 10000;
fprintf(fp,"%d",i);
fwrite(fp,sizeof(int),1,&i);
When I checked then 5 bytes and 9 bytes respectively. Maybe I am wrong. I thought it is 4 bytes since int. Can someone explain please??? Thanks.
fprintf writes the string 10000 (5 bytes) to the file, while fwrite writes binary representation of 10000 (sizeof(int) bytes) to the file.
How are you checking the number of bytes written? sizeof(int) depends on platform.
Given below is the function signature for fwrite.
size_t fwrite ( const void * ptr, size_t size, size_t count, FILE * stream );
fwrite writes an array of count elements, each one with a size of size bytes, from the block of memory pointed by ptr to stream. The return value gives the actual number of bytes written. Mostly it is going to be size * count.
Similarly fprintf returns the number of characters written/printed.
fprintf(fp,"%d",i); writes 5 bytes. it writes 10000 as string, 5 chars
Say I'm calling a program:
$ ./dataset < filename
where filename is any file with x amount of line pairs where the first line contains a string and second line contains 10 numbers separated by spaces. The last line ends with "END"
How can I then start putting the first lines of pairs (string) into:
char *experiments[20] // max of 20 pairs
and the second lines of the pairs (numbers) into:
int data[10][20] // max of 20, 10 integers each
Any guidance? I don't even understand how I'm supposed to scan the file into my arrays.
Update:
So say this is my file:
Test One
0 1 2 3 4 5 6 7 8 9
END
Then redirecting this file would mean if I want to put the first line into my *experiments, that I would need to scan it as such?
scanf("%s", *experiments[0]);
Doing so gives me an error: Segmentation fault (core dumped)
What is incorrect about this?
Say my file is simply numbers, for ex:
0 1 2 3 4 5 6 7 8 9
Then,
scanf("%d", data[0][0]); works, and will hold value of '1'. Is there an easier way to do this for the whole line of data? i.e. data[0-9][0].
find the pseudo-code, code explains how to read the input
int main()
{
char str[100]; // make sure that this size is enough to hold the single line
int no_line=1;
while(gets(str) != NULL && strcmp(str,"END"))
{
if(no_line % 2 == 0)
{
/*read integer values from the string "str" using sscanf, sscanf can be called in a loop with %d untill it fails */
}
else
{
/*strore string in your variable "experiments" , before copying allocate a memory for the each entry */
}
no_line++;
}
}
The redirected file is associated with the FILE * stdin. It's already opened for you...
otherwise, you can treat it the same as any other text file, and/or use the functions that are dedicated to standard input - with the only exception that you cannot seek in the file and not retrieve the size of the input.
For the data sizes you're talking about, by far the easiest thing to do is just slurp all of the content into a buffer and work on that: you don't have to be super-stingy, just make sure that you don't overrun.
If you want to be super-stingy with memory, preallocate a 4kB buffer with malloc(), progressively read() into it from stdin, and realloc() another 4kB every time the input exceeds what you've already read. If you don't care so much about being stingy with memory (e.g. on a modern machine with gigabytes of memory), just malloc() something much bigger than the expected input (e.g. a megabyte) and bug out if the input is more than that: this is far simpler to implement but less general/elegant.
You then have all of the input in a buffer and you can do what you like with it, which depends too strongly on the format of the input for me to say how you should approach that part.
I am trying to read a binary file in C 1 byte at a time and after searching the internet for hours I still can not get it to retrieve anything but garbage and/or a seg fault. Basically the binary file is in the format of a list that is 256 items long and each item is 1 byte (an unsigned int between 0 and 255). I am trying to use fseek and fread to jump to the "index" within the binary file and retrieve that value. The code that I have currently:
unsigned int buffer;
int index = 3; // any index value
size_t indexOffset = 256 * index;
fseek(file, indexOffset, SEEK_SET);
fread(&buffer, 256, 1, file);
printf("%d\n", buffer);
Right now this code is giving me random garbage numbers and seg faulting. Any tips as to how I can get this to work right?
Your confusing bytes with int. The common term for a byte is an unsigned char. Most bytes are 8-bits wide. If the data you are reading is 8 bits, you will need to read in 8 bits:
#define BUFFER_SIZE 256
unsigned char buffer[BUFFER_SIZE];
/* Read in 256 8-bit numbers into the buffer */
size_t bytes_read = 0;
bytes_read = fread(buffer, sizeof(unsigned char), BUFFER_SIZE, file_ptr);
// Note: sizeof(unsigned char) is for emphasis
The reason for reading all the data into memory is to keep the I/O flowing. There is an overhead associated with each input request, regardless of the quantity requested. Reading one byte at a time, or seeking to one position at a time is the worst case.
Here is an example of the overhead required for reading 1 byte:
Tell OS to read from the file.
OS searches to find the file location.
OS tells disk drive to power up.
OS waits for disk drive to get up to speed.
OS tells disk drive to position to the correct track and sector.
-->OS tells disk to read one byte and put into drive buffer.
OS fetches data from drive buffer.
Disk spins down to a stop.
OS returns 1 byte to your program.
In your program design, the above steps will be repeated 256 times. With everybody's suggestion, the line marked with "-->" will read 256 bytes. Thus the overhead is executed only once instead of 256 times to get the same quantity of data.
In your code you are trying to read 256 bytes to the address of one int. If you want to read one byte at a time, call fread(&buffer, 1, 1, file); (See fread).
But a simpler solution will be to declare an array of bytes, read it all together and process it after that.
unsigned char buffer; // note: 1 byte
fread(&buffer, 1, 1, file);
It is time to read mans I believe.
Couple of problems with the code as it stands.
The prototype for fread is:
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
You've set the size to 256 (bytes) and the count to 1. That's fine, that means "read one lump of 256 bytes, shove it into the buffer".
However, your buffer is on the order of 2-8 bytes long (or, at least, vastly smaller than 256 bytes), so you have a buffer overrun. You probably want to use fred(&buffer, 1, 1, file).
Furthermore, you're writing byte data to an int pointer. This will work on one endian-ness (small-endian, in fact), so you'll be fine on Intel architecture and from that learn bad habits tha WILL come back and bite you, one of these days.
Try real hard to only write byte data into byte-organised storage, rather than into ints or floats.
You are trying to read 256 bytes into a 4-byte integer variable called "buffer". You are overwriting the next 252 bytes of other data.
It seems like buffer should either be unsigned char buffer[256]; or you should be doing fread(&buffer, 1, 1, f) and in that case buffer should be unsigned char buffer;.
Alternatively, if you just want a single character, you could just leave buffer as int (unsigned is not needed because C99 guarantees a reasonable minimum range for plain int) and simply say:
buffer = fgetc(f);
The user should input some file names in the command line and the program will read each file name from argv[] array. I have to perform error checking etc.
I want to read each filename. For example, if argv[2] is 'myfile.txt', the program should read the content of 'myfile.txt' and store value in char buffer[BUFSIZ] and then write the content of buffer into another file.
However before the content is written, the program should also write the name of the file and the size. Such that the file can be easily extracted later. A bit like the tar function.
The file I write the content of buffer, depending on the number of files added by user, should be a string like:
myfile.txt256Thisisfilecontentmyfile2.txt156Thisisfile2content..............
My question is
1) How do I write value of argv[2] into file using write() statement, as having problems writing char array, what should I put as (sizeof(?)) inside write(). see below as I don't know the length of the file name entered by the user.
2) Do I use the '&' to write an integer value into file after name, for example write 4 bytes after file name for the size of file
Here is the code I have written,
char buffer[BUFSIZ];
int numfiles=5; //say this is no of files user entered at command
open(file.....
lseek(fdout, 0, SEEK_SET); //start begging of file and move along each file some for loop
for(i=0-; ......
//for each file write filename,filesize,data....filename,filesize,data......
int bytesread=read(argv[i],buffer,sizeof(buffer));
write(outputfile, argv[i], sizeof(argv)); //write filename size of enough to store value of filename
write(outputfile, &bytesread, sizeof(bytesread));
write(outputfile, buffer, sizeof(buffer));
But the code is not working as I expected.
Any suggestions?
Since argv consists of null-terminated arrays, the length you can write is strlen(argv[2])+1 to write both the argument and null terminator:
size_t sz = strlen (argv[2]);
write (fd, argv[2], sz + 1);
Alternatively, if you want the length followed by the characters, you can write the size_t itself returned from strlen followed by that many characters.
size_t sz = strlen (argv[2]);
write (fd, &sz, sizeof (size_t));
write (fd, argv[2], sz);
You probably also need to write the length of the file as well so that you can locate the next file when reading it back.
1., You can write the string the following way:
size_t size = strlen(string);
write(fd, string, size);
However, most of the time it's not this simple: you will need the size of the string so you'll know how much you need to read. So you should write the string size too.
2., An integer can be written the following way:
write(fd, &integer, sizeof(integer));
This is simple, but if you plan to use the file on different architectures, you'll need to deal with endianness too.
It sounds like your best bet is to use a binary format. In your example, is the file called myfile.txt with a content length of 256, or myfile.txt2 with a content length of 56, or myfile.txt25 with a content length of 6? There's no way to distinguish between the end of the filename and the start of the content length field. Similarly there is no way to distinguish between the end of the content length and the start of the content. If you must use a text format, fixed width fields will help with this. I.e. 32 characters of filename followed by 6 digits of content length. But binary format is more efficient.
You get the filename length using strlen(), don't use sizeof(argv) as you will get completely the wrong result. sizeof(argv[i]) will also give the wrong result.
So write 4 bytes of filename length followed by the filename then 4 bytes of content length followed by the content.
If you want the format to be portable you need to be aware of byte order issues.
Lastly, if the file won't all fit in your buffer then you are stuffed. You need to get the size of the file you are reading to write it to your output file first, and then make sure you read that number of bytes from the first file into the second file. There are various techniques to do this.
thanks for replies guys,
I decided not to use (size_t) structure instead just assigned (int) and (char) types so I know exact value of bytes to read() out. ie I know start at beggining of file and read 4 bytes(int) to get value of lenght of filename, which I use as size in next read()
So, when I am writing (copying file exactly with same name) users inputted file to the output file (copied file) I writing it in long string, without spaces obviously just to make it readable here,
filenamesize filename filecontentsize filecontent
ie 10 myfile.txt 5 hello
So when come to reading that data out I start at begining of file using lseek() and I know the first 4 bytes are (int) which is lenght of filename so I put that into value int namelen using the read function.
My problem is I want to use that value read for the filenamesize(first 4 bytes) to declare my array to store filename with the right lenght. How do I put this array into read() so the read stores value inside that char array specified, see below please
int namelen; //value read from first 4 bytes of file lenght of filename to go in nxt read()
char filename[namelen];
read(fd, filename[namelen], namelen);//filename should have 'myfile.txt' if user entered that filename
So my question is once I read that first 4 bytes from file giving me lenght of filename stored in namelen, I then want to read namelen amount of bytes to give me the filename of originally file so I can create copied file inside directory?
Thanks