How is the buffer allocated by gzbuffer used in zlib? - zlib

Trying to understand how gzbuffer is used in zlib. This is cut from the manual (https://www.zlib.net/manual.html):
ZEXTERN int ZEXPORT gzbuffer OF((gzFile file, unsigned size));
Set the internal buffer size used by this library's functions. The default buffer size is 8192 bytes. This function must be called after gzopen() or gzdopen(), and before any other calls that read or write the file. The buffer memory allocation is always deferred to the first read or write. Three times that size in buffer space is allocated. A larger buffer size of, for example, 64K or 128K bytes will noticeably increase the speed of decompression (reading).
The new buffer size also affects the maximum length for gzprintf().
gzbuffer() returns 0 on success, or –1 on failure, such as being called too late.
So three times the buffer size is allocated. When I call gzwrite, is compressed data written to the buffer (compression is done at every call to gzwrite) or is uncompressed data written to the buffer? (and compression is then delayed until the buffer is filled and gzflush is called internally, or i call gzflush myself)
When I continue to call gzwrite, what happens when the buffer is filled? Is there some allocation of new buffer memory at this point or is the buffer simply flushed to the file and then re-used?

When reading, a size input buffer is allocated, and a 2*size output buffer is allocated. When writing, the same thing, but reversed.
If len is less than size in gzwrite(state, buf, len), then the provided data goes into the input buffer. That input buffer is compressed once it has accumulated size bytes. If len is greater than or equal to size what remains in the buffer is compressed, followed by all of the provided len data. If a flush is requested, then all of the data in the input buffer is compressed.
Compressed data is accumulated in the size output buffer, which is written every time size compressed bytes have been accumulated, or when the gzFile is flushed or closed.

Related

fully-buffered stream get flushed when it is not full

I really confused with how exactly a buffer work. So I write a little snippet to verify:
#include<stdio.h>
#define BUF_SIZE 1024
char buf[BUF_SIZE];
char arr[20];
int main()
{
FILE* fs=fopen("test.txt","r");
setvbuf(fs,buf,_IOFBF,1024);
fread(arr,1,1,fs);
printf("%s",arr);
getchar();
return 0;
}
As you see, I set the file stream fs to fully buffered stream(I know most of the time it would default to fully-buffered. just making sure). And I also set its related buffer to be size 1024, which mean that the stream would not be flushed until it contain 1024 bytes of stuff(right?).
In my opinion, the routine of fread() is that, it read data from the file stream, store it at its buffer buf,and then the data in the buf would be send to the arr as soon as it is full of 1024 bytes of data(right?).
But now, I read only one character from the stream!!And also, there is are only four characters in the file test.txt. why can I find something in the arr in case that there is only one char(I can print that one character out)
The distinctions between fully-buffered, line-buffered, and unbuffered really only matter for output streams. I'm pretty sure that input streams are pretty much always act like they're fully buffered.
But even for fully-buffered input streams, there's at least one case where the buffer won't be fully full, and as you've discovered, that's where there aren't enough characters left in the input to fill the buffer. If there are only 4 characters in the file, then when the system goes to fill the buffer, it gets those 4 characters and puts them in the buffer, and then you can start taking them out, as usual.
(The same situation would arise any time the file contains a number of characters that's not an exact multiple of the buffer size. For example, if the input file contained 1028 characters, then after filling the buffer with the first 1024 characters and letting you read them, the next time it filled the buffer, it'd end up with 4 again.)
What were you expecting it to do in this case? Block waiting to read 1,020 more characters from the file (that were never going to come)?
P.S. You said "the stream would not be flushed until it contained 1024 bytes of stuff, right?" But flushing is only defined for output streams; it doesn't mean anything for input streams.
From what I understand, an input buffer works different to what you suggested: if you request one Byte to be read, the system reads 1023 more Bytes into the buffer, so on the next 1023 subsequent read calls it can return data directly from the buffer instead of having to read from the file.

What does write() write if null terminator is already reached?

For write(fd[1], string, size) - what would happen if string is shorter than size?
I looked up the man page but it doesn't clearly specify that situation. I know that for read, it would simply stop there and read whatever string is, but it's certainly not the case for write. So what is write doing? The return value is still size so is it appending null terminator? Why doesn't it just stop like read.
When you call write(), the system assumes you are writing generic data to some file - it doesn't care that you have a string. A null-terminated string is seen as a bunch of non-zero bytes followed by a zero byte - the system will keep writing out until it's written size bytes.
Thus, specifying size which is longer than your string could be dangerous. It's likely that the system is reading data beyond the end of the string out your file, probably filled with garbage data.
write will write size bytes of data starting at string. If you define string to be an array shorter than size it will have undefined behaviour. But in you previous question the char *line = "apple"; contains 6 characters (i.e. a, p, p, l, e and the null character).
So it is best to write the with the value of size set to the correct value
write(int fildes, const void *buf, size_t nbyte) does not write null terminated strings. It writes the content of a buffer. If there are any null characters in the buffer they will be written as well.
read(int fildes, void *buf, size_t nbyte) also pays no attention to null characters. It reads a number of bytes into the given buffer, up to a maximum of nbyte. It does not add any null terminating bytes.
These are low level routines, designed for reading and writing arbitrary data.
The write call outputs a buffer of the given size. It does not attempt to interpret the data in the buffer. That is, you give it a pointer to a memory location and a number of bytes to write (the length) then, as long as those memory locations exist in a legal portion of your program's data, it will copy those bytes to the output file descriptor.
Unlike the string manipulation routines write, and read for that matter, ignore null bytes, that is bytes with the value zero. read does pay attention to the EOF character and, on certain devices, will only read that amount of data available at the time, perhaps returning less data than requested, but they operate on raw bytes without interpreting them as "strings".
If you attempt to write more data than the buffer contains, it may or may not work depending on the position of the memory. At best the behavior is undefined. At worst you'll get a segment fault and your program will crash.

How to save a specific length string from a file and work with it in C

So what I'm trying to do is open a file and read it until the end in blocks that are 256 bytes long each time it is called. My dilemma is using fgets() or fread() to do it.
I was using fgets() initially, because it returns a string of the bytes that were read, which is great because I can store that data and work with it. However, in my particular file that I'm reading, the 256 bytes often happen over a more than 2 lines, which is a problem because fgets() stops reading when it hits a newline character or the end of the file.
I then thought of using fread(), but I don't know how to save the line that I'm referring to with it because fread() returns an int referring to the number of elements successfully read (according to its documentation).
I've searched and thought of solutions for a while now and can't find anything that works with my particular scenario. I would like some guidance on how to go about this issue, how would you go about this in my position?
You can use fread() to read each 256 bytes block and keep a lineCount variable to keep track of the number of new line characters you have encountered so far in the input. Since you have to process the blocks already this wouldn't mean much of an overhead in the processing.
To read a block of 256 chars, which is what I think you are doing, you just need to create a buffer of chars that can hold 256 of them, in other words a char array of size 256.
#define BLOCK_SIZE 256
char block[BLOCK_SIZE];
Then if you check the documentation for fread() it shows the following signature:
Following is the declaration for fread() function.
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream)
Parameters
ptr -- This is the pointer to a block of memory with a minimum size of size*nmemb bytes.
size -- This is the size in bytes of each element to be read.
nmemb -- This is the number of elements, each one with a size of size bytes.
stream -- This is the pointer to a FILE object that an input stream.
So this means it takes a pointer to the buffer where it will write the read information, the size of each element it's supposed to read, the maximum amount of elements you want it to read and the file pointer. In your case it would be:
int read = fread(block, sizeof(char), BLOCK_SIZE, file);
This will copy the information from the file to the block array, which you can later process and keep track of the lines. The characters that were read by fread are in the block array, so the first char in the last read block would be block[0], the second block[1] and so on. The returned value in read indicates how many elements (in your case chars) were inserted in the array block when you call fread, this number will be equal to BLOCK_SIZE for every call, unless you reach the end of the file or there's an error.
I suggest you read some documentation for a full example, play a little with the code and do some reading on pointers in C to gain a better understanding of how everything works in general. If you still have questions after that, we can take it from there or you can create a new SO question.

zlib: how to dimension avail_out

I would like to deflate a small block of memory (<= 16 KiB) using zlib. The output is stored in a block of memory as well. No disk or database access here.
According to the documentation, I should call deflate() repeatedly until the whole input is deflated. In between, I have to increase the size of the memory block where the output goes.
However, that seems unnecessarily complicated and perhaps even inefficient. As I know the size of the input, can't I predetermine the maximum size needed for the output, and then do everything with just one call to deflate()?
If so, what is the maximum output size? I assume something like: size of input + some bytes overhead
zlib has a function to calculate the maximum size a buffer will deflate to. Your assumption is correct - the returned value is the size of the input buffer + header sizes. After deflation you can realloc the buffer to reclaim the 'wasted' memory.
From zlib.h:
ZEXTERN uLong ZEXPORT deflateBound OF((z_streamp strm, uLong sourceLen));
/*
deflateBound() returns an upper bound on the compressed size after
deflation of sourceLen bytes. It must be called after deflateInit() or
deflateInit2(), and after deflateSetHeader(), if used. This would be used
to allocate an output buffer for deflation in a single pass, and so would be
called before deflate().
*/

fgetc(): Reading and storing a string of unknown length

What I need to do for an assignment is:
open a file (using fopen())
read the name of a student (using fgetc())
store that name in some part of a struct
The problem I have is that I need to read an arbitrary long string into name, and I don't know how to store that string without wasting memory (or writing into non-allocated memory).
EDIT
My first idea was to allocate a 1 byte (char) memory block, then call realloc() if more bytes are needed but this doesn't seem very efficient. Or maybe I could double the array if it is full and then at the end copy the chars into a new block of memory of the exact size.
Don't worry about wasting 100 or 1000 bytes which is likely to be long enough for all names.
I'd probably just put the buffer that you're reading into on the stack.
Do worry about writing over the end of the buffer. i.e. buffer overrun. Program to prevent that!
When you come to store the name into your structure you can malloc a buffer to store the name the exact length you need (don't forget to add an extra byte for the null terminator).
But if you really must store names of any length at all then you could do it with realloc.
i.e. Allocate a buffer with malloc of some size say 50 bytes.
Then when you need more space, use realloc to increase it's length. Increase the length in blocks of say 50 bytes and keep track with an int on how big it is so that you know when you need to grow it again. At some point, you will have to decide how long that buffer is going to be, because it can't grow indefinitely.
You could read the string character by character until you find the end, then rewind to the beginning, allocate a buffer of the right size, and re-read it into that, but unless you are on a tiny embedded system this is probably silly. For one thing, the fgetc, fread, etc functions create buffers in the O/S anyway.
You could allocate a temporary buffer that's large enough, use a length limited read (for safety) into that, and then allocate a buffer of the precise size to copy it into. You probably want to allocate the temporary buffer on the stack rather than via malloc, unless you think it might exceed your available stack space.
If you are writing single threaded code for a tiny system you can allocate a scratch buffer on startup or statically, and re-use it for many purposes - but be really carefully your usage can't overlap!
Given the implementation complexity of most systems, unless you really research how things work it's entirely possible to write memory optimized code that actually takes more memory than doing things the easy way. Variable initializations can be another surprisingly wasteful one.
My suggestion would be to allocate a buffer of sufficient size:
char name_buffer [ 80 ];
Generally, most names (at least common English names) will be less than 80 characters in size. If you feel that you may need more space than that, by all means allocate more.
Keep a counter variable to know how many characters you have already read into your buffer:
int chars_read = 0; /* most compilers will init to 0 for you, but always good to be explicit */
At this point, read character by character with fgetc() until you either hit the end of file marker or read 80 characters (79 really, since you need room for the null terminator). Store each character you've read into your buffer, incrementing your counter variable.
while ( ( chars_read < 80 ) && ( !feof( stdin ) ) ) {
name_buffer [ chars_read ] = fgetc ( stdin );
chars_read++;
}
if ( chars_read < 80 )
name_buffer [ chars_read ] = '\0'; /* terminating null character */
I am assuming here that you are reading from stdin. A more complete example would also check for errors, verify that the character you read from the stream is valid for a person's name (no numbers, for example), etc. If you try to read more data than for which you allocated space, print an error message to the console.
I understand wanting to maintain as small a buffer as possible and only allocate what you need, but part of learning how to program is understanding the trade-offs in code/data size, efficiency, and code readability. You can malloc and realloc, but it makes the code much more complex than necessary, and it introduces places where errors may come in - NULL pointers, array index out-of-bounds errors, etc. For most practical cases, allocate what should suffice for your data requirements plus a small amount of breathing room. If you find that you are encountering a lot of cases where the data exceeds the size of your buffer, adjust your buffer to accommodate it - that is what debugging and test cases are for.

Resources