zlib and buffer sizes

zlib and buffer sizes - zlib

I am currently trying to use zlib for compression in one of my projects. I had a look at the basic zlib tutorial and I am confused by the following statements:
CHUNK is simply the buffer size for feeding data to and pulling data
from the zlib routines. Larger buffer sizes would be more efficient,
especially for inflate(). If the memory is available, buffers sizes on
the order of 128K or 256K bytes should be used.
#define CHUNK 16384
In my case I will always have a small buffer already available at the output end (around 80 bytes) and will continually feed very small data (a few bytes) from the input side through zlib. This means I will not hav a larger buffer on either side, but I am planning on using much smaller ones.
However I am not sure how to interpret the "larger buffer sizes would be more efficient". Is this referring to efficiency of the encoding or time/space efficiency?
One Idea I have to remedy this situation would be to add some more layers of buffering had have accumulate from the input and flush to the output repeatedly. However this would mean I would have to accumulate data and add some more levels of copying to my data, which would also hurt performance.
Now if efficiency is just referring to time/space efficiency, I could just measure the impact of both methods and decide on one to use. However if the actually encoding could be impacted by the smaller buffer size, this might be really hard to detect.
Does anyone have an experience on using zlib with very small buffers?

It means time efficiency. If you give inflate large input and output buffers, it will use faster inflation code internally. It will work just fine with buffers as small as you like (even size 1), but it will be slower.
It is probably worthwhile for you to accumulate input and feed it to inflate in larger chunks. You would also need to provide larger output buffers.

Related

How to choose the best buffer size when you need read large data

Let's assume a scenario where I have a lot of log files for a given system, let's imagine that it's petabytes of data. This is my scenario.
Used Technology
For my purpose, I'm going to choose the C/C++ to do this.
My Problem
I have the need to read these files, which are on disk, and do some processing later, whether sending them to a topic on some pub/sub system or simply displaying these logs on screen.
Questions
What is the best buffer size for me to have the best performance in reading this data and which saves hardware resources such as disk and RAM memory?
I just don't know if I should choose 64 Kilobytes, 128 Kilobytes, 5 Megabytes, 10 Megabytes, how do I calculate this?
And if this calculation depends on how much available resource I have, then how to calculate from these resources?

The optimal buffer size depends on many factors, most notably the hardware. You can find out which size is optimal by picking one size, measuring how long the operation takes then picking another size, measuring, comparing. Repeat until you find optimal size.
Caveats:
You need to measure with the hardware matching the target system to have meaningful measurements.
You also need to measure with inputs comparable to the target task. You may reduce the size of input by using subset of real data to make measuring faster, but at some size it may affect the quality of measurement.
It's possible to encounter a local maxima buffer size that is faster than either slightly larger or smaller buffer, but not as fast as some other buffer size that is more larger or smaller. General global optimisation techniques may be used to avoid getting stuck in the search for the optimal value, such as simulated annealing.
Although benchmarking is a simple concept, it's actually quite difficult to do correctly. It's possible and likely that your measurements are biased by incidental factors that may cause differences in performance of the target system. Environment randomisation may help reduce this.
Typical sizes that may be a good starting point to measure are the size of the caches on the system:
Cache line size
L1 cache size
L2 cache size
L3 cache size
Memory page size
SSD DRAM cache size

I saw this answer regarding the same question in C#, basically buffer size doesn't really matter performance-wise (as long as it's a reasonable value). Then regarding the RAM and disk usage you will have the same quantity of data to read/write, whatever your buffer size might be. Again, as long as you stay between reasonable values you shouldn't have a problem.

Actually you don't have to load all your data into memory for doing anything. You just have to read those which are concerned.
I have the need to read these files, which are on disk, and do some processing later
Just load them later and pass to subsystem at instant. If you want to display these then, Simply Read, Process and Display.
What is the best buffer size for me to have the best performance in reading this data and which saves hardware resources such as disk and RAM memory?
Why do you want to save Disk resource, Isn't where your files are? You have to load data from here to RAM in small Quantities like a particular log file then do whatever you want and finally Flush it all. Repeat.
I just don't know if I should choose 64 Kilobytes, 128 Kilobytes, 5 Megabytes, 10 Megabytes, how do I calculate this?
Again load files one by one not there data in specific amounts.
And if this calculation depends on how much available resource I have, then how to calculate from these resources?
No calculation Needed. Just smartly handle RAM resources by focusing on one or may be two file at a time. Don't care about Disk resources.

Is there a optimal batch size for arc4random_buf?

I need billions of random bytes from arc4random_buf, and my strategy is to request X random bytes at a time, and repeat this many times.
My question is how large should X be. Since the nbytes argument to arc4random_buf can be arbitrarily large, I suppose there must be some kind of internal loop that generates some entropy each time its body is executed. Say, if X is a multiple of the number of random bytes generated each iteration, the performance can be improved because I’m not wasting any entropy.
I’m on macOS, which is unfortunately closed-source, so I cannot simply read the source code. Is there any portable way to determine the optimal X?

Doing some benchmarks on typical target systems is probably the best way to figure this out, but looking at a couple of implementations, it seems unlikely that the buffer size will make much difference to the cost of arc4random_buffer.
The original implementation implements arc4random_buffer as a simple loop around a function which generates one byte. As long as the buffer is big enough to avoid excessive call overhead, it should make little difference.
The FreeBSD library implementation appears to attempt to optimise by periodically computing about 1K of random bytes. Then arc4random_buffer uses memcpy to copy the bytes from the internal buffer to the user buffer.
For the FreeBSD implementation, the optimal buffer size would be the amount of data available in the internal buffer, because that minimizes the number of calls to memcpy. However, there's no way to know how much that is, and it will not be the same on every call because of the rekeying algorithm.
My guess is that you will find very little difference between buffer sizes greater than, say, 16K, and probably even less. For the FreeBSD implementation, it will be very slightly more efficient if your buffer size is a multiple of 8.
Addendum: All the implementations I know of have a global rekey threshold, so you cannot influence the cost of rekeying by changing the buffer size in arc4random_buffer. The library simply rekeys every X bytes generated.

decompression algorithms that can work virtually without RAM (LZ like if possible)

edit: I try to rephrase this as to make this clearer the best I can :)
I need to find a suitable way / choose a suitable compression to store a blob of data (say approx. 900KB) in a ROM where the available amount of free space is only about 700KB. If I compress the blob with some modern compression tool (eg. WinZIP/WinRAR) I can achieve the required compression easily.
The matter here is that the decompression will take place on a very VERY VERY limited hardware where I can't afford to have more than few bytes of RAM available (say no more than 100 bytes, for the sake of it).
I already tried RLE'ing the data... the data hardly compress.
While I'm working trying to change the data blob format so that it could potentially have more redundancy and achieve better compression ratio, I'm at the same time seeking a compression method that will enable me to decompress on my limited hardware. I have a limited knowledge of compression algorithms so I'm seeking suggestions/pointers to continue with my hunt.
Thanks!
Original question was "I need info/pointers on decompression algorithms that can work without using the uncompressed data, as this will be unavailable right after decompression. LZ like approaches would still be preferred."

I'm afraid this is off topic because too broad.
LZW uses a sizable state that is not very different from keeping a slice of uncompressed data. Even if the state is constant and read from ROM, handling it with just registers seems difficult. There are many different algorithms than can use a constant state, but if you really have NO RAM, then only the most basic algorithms can be used.
Look up RLE, run length encoding.
EDIT: OK, no sliding window, but if you can access ROM, 100 bytes of RAM give you quite some possibilities. You want to implement this in assembly, so stick with very simple algorithms. RLE plus a dictionary. Given your requirements, the choice of algorithm should be based on the type of data you need to decompress.

Which compression to use between embedded processors (known byte distributions)

I'm working with radio-to-radio communications where bandwidth is really really precious. It's all done with on the metal C code (no OS, small atmel 8bit microprocessors). So the idea of compression becomes appealing for some large, but rare, transmissions.
I'm no compression expert. I've used the command line tools to shrink files and looked at how much I get. And linked a library or two over the years. But never anything this low level.
In one example, I want to move about 28K over the air between processors. If I just do a simple bzip2 -9 on a representative file, I get about 65% of the original size.
But I'm curious if I can do better though. I am (naively?) under the impression that most basic compression formats must be some declaration of metadata up front, that describes how to inflate a bitstream that follows. What I don't know is how much space that metadata itself takes up. I histogram'ed said same file, and a number of other ones, and found that due to the nature of what's being transmitted, the histogram is almost always about the same. So I'm curious if I could hard code these frequencies in my code so that that was no longer dynamic, but also wasn't transmitted as part of the packet.
For example, my understanding of a huffman encoding is that usually there's a "dictionary" up front, followed by a bitstream. And that if a compressor does it by blocks, each block will have its own dictionary.
On top of this, it's a small processor, with a small footprint, I'd like to keep whatever I do small, simple, and straightforward.
So I guess the basic question is, what, if any, basic compression algorithm would you implement in this kind of environment/scenario. Especially taking into account, that you can basically precompile a representative histogram of the bytes per transmission.

What you are suggesting, providing preset frequency data, would help very little. Or more likely it would hurt, since you will take a hit by not using optimal codes. As an example, only about 80 bytes at the start of a deflate block is needed to represent the literal/length and distance Huffman codes. A slight increase in the, say, 18 KB of your compressed data could easily cancel that.
With zlib, you could use a representative one of your 28K messages as a dictionary in which to search for matching strings. This could help the compression quite a bit, if there are many common strings in your messages. See deflateSetDictionary() and inflateSetDictionary().

What can be the least possible value of data-compression-ratio for any real dataset

I am writing ZLIB like API for an embedded hardware compressor which uses deflate algorithm for compression of given input stream.
Before going further i would like to explain data compression ratio. Data compression ratio is defined as the ratio between the uncompressed size and compressed size.
Compression ratio is usually greater than one. which mean compressed data is usually smaller than uncompressed data, which is whole point to do compression. but this is not always the case. for example using ZLIB library and pseudo-random data generated on some Linux machine give compression ratio of 0.996 roughly. which mean 9960 bytes compressed into 10000 bytes.
I know ZLIB handle this situation by using type 0 block where it simply return original uncompressed data with roughly 5 byte header so it give only 5 byte overhead up to 64KB data-block. This is intelligent solution of this problem but for some reason i can not use this in my API. I must have to provide extra safe space in advance to handle this situation.
Now if i know the least possible known data compression ratio it would be easy for me to calculate the extra space i have to provide. Otherwise to be safe, i have to provide more than needed extra space which can be crucial in embedded system.
While calculating data compression ratio, i am not concerned with header,footer,extremely small dataset and system specific details as i am separately handling that. What i am particularly interested in, is there exist any real dataset with minimum size of 1K and which can provide compression ratio less than 0.99 using deflate algorithm. In that case calculation would be:
Compression ratio = uncompressed size/(compressed size using deflate excluding header,footer and system specific overhead)
Please provide feedback. Any help would be appreciated. It would be great if reference to such dataset could be provided.
EDIT:
#MSalters comment indicate that hardware compressor is not following deflate specification properly and this can be a bug in microcode.

because the pigeon priciple
http://en.wikipedia.org/wiki/Pigeonhole_principle
you will always have strings that get compressed and strings that get expanded
http://matt.might.net/articles/why-infinite-or-guaranteed-file-compression-is-impossible/
theoretically you can achieve best compression with 0 entropy data (infinite compression ratio) and worst compression with infinite entropy data (AWGN noise, so you have 0 compression ratio).

I can't tell from your question whether you're using zlib or not. If you're using zlib, it provides a function, deflateBound(), which does exactly what you're asking for, taking an uncompressed size and returning the maximum compressed size. It takes into account how the deflate stream was initialized with deflateInit() or deflateInit2() in computing the proper header and trailer sizes.
If you're writing your own deflate, then you will already know what the maximum compressed size is based on how often you allow it to use stored blocks.
Update: The only way to know for sure the maximum data expansion of a hardware deflator is to obtain the algorithm used. Then through inspection you can determine how often it will emit stored blocks for random data.
The only alternative is empirical and unreliable. You can feed the hardware compressor random data, and examine the results. You can use infgen to disassemble the deflate output and see the stored blocks and their sizes. Then you can write a linear bounding formula for the expansion. Then add some margin to the additive and multiplicative terms to cover for situations that you did not observe in your tests.
This will only work if the hardware deflate algorithm is well behaved, which means that it will not write a fixed or dynamic deflate block if a stored block would be smaller. If it is not well behaved, then all bets are off.

The deflate algorithm has a similar approach as the ZLIB algorithm. It uses a 3 bit header, and the lower two bits are 00 when the following block is stored length-prefixed but otherwise uncompressed.
This means the worst case is an one byte input that blows up to 6 bytes (3 bits header, 32 bits length, 8 bits data, 5 bits padding), so the worst ratio is 1/6 = 0.16.
This is of course assuming an optimal encoder. A suboptimal encoder would transmit an Huffman table for that one byte.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight