Can anybody give pointers how I can implement lzw compression/decompression in low memory conditions (< 2k). is that possible?
The zlib library that everyone uses is bloated among other problems (for embedded). I am pretty sure it wont work for your case. I had a little more memory maybe 16K and couldnt get it to fit. It allocates and zeros large chunks of memory and keeps copies of stuff, etc. The algorithm can maybe do it but finding existing code is the challenge.
I went with http://lzfx.googlecode.com The decompression loop is tiny, it is the older lz type compression that relies on the prior results so you need to have access to the uncompressed results...The next byte is a 0x5, the next byte is a 0x23, the next 15 bytes are a copy of the 15 200 bytes ago, the next 6 bytes are a copy of 127 ago...the newer lz algorithm is variable width table based that can be big or grow depending on how implemented.
I was dealing with repetitive data and trying to squeeze a few K down into a few hundred, I think the compression was about 50%, not great but did the job and the decompression routine was tiny. The lzfx package above is small, not like zlib, like two main functions that have the code right there, not dozens of files. You could likely change the depth of the buffer, perhaps improve the compression algorithm if you so desire. I did have to modify the decompression code (like 20 or 30 lines of code perhaps) it was pointer heavy and I switched it to arrays because in my embedded environment the pointers were in the wrong place. Burns maybe an extra register or not depending on how you implement it and your compiler. I also did that so I could abstract the fetches and the stores of the bytes as I had them packed into memory that wasnt byte addressable.
If you find something better please post it here or ping me through stackoverflow, I am also very interested in other embedded solutions. I searched quite a bit and the above was the only useful one I found and I was lucky that my data was such that it compressed well enough using that algorithm...for now.
Can anybody give pointers how I can implement lzw compression/decompression in low memory conditions (< 2k). is that possible?
Why LZW? LZW needs lots of memory. It is based on a hash/dictionary and compression ratio is proportional to the hash/dictionary size. More memory - better compression. Less memory - output can be even larger than input.
I haven't touched encoding for very long time, but IIRC Huffman coding is little bit better when it comes to memory consumption.
But it all depends on type of information you want to compress.
I have used LZSS. I used code from Haruhiko Okumura as base. It uses the last portion of uncompressed data(2K) as dictionary. The code I linked can be modified to use almost no memory if you have all the uncompressed data available in memory. With a bit of googling you will find that a lot of different implementations.
If the choice of compression algorithm isn't set in stone, you might try gzip/LZ77 instead. Here's a very simple implementation I used and adapted once:
ftp://quatramaran.ens.fr/pub/madore/misc/myunzip.c
You'll need to clean up the way it reads input, error handling, etc. but it's a good start. It's probably also way too big if your data AND code need to fit in 2k, but at least the data size is small already.
Big plus is that it's public domain so you can use it however you like!
It has been over 15 years since I last played with the LZW compression algorithm, so take the following with a grain of salt.
Given the memory constraints, this is going to be difficult at best. The dictionary you build is going to consume the vast majority of what you have available. (Assuming that code + memory <= 2k.)
Pick a small fixed size for your dictionary. Say 1024 entries.
Let each dictionary entry take the form of ....
struct entry {
intType prevIdx;
charType newChar;
};
This structure makes the dictionary recursive. You need the item at the previous index to be valid in order for it to work properly. Is this workable? I'm not sure. However, let us assume for the moment that it is and find out where it leads us ....
If you use the standard types for int and char, you are going to run out of memory fast. You will want to pack things together as tightly as possible. 1024 entries will take 10 bits to store. Your new character, will likely take 8 bits. Total = 18 bits.
18 bits * 1024 entries = 18432 bits or 2304 bytes.
At first glance this appears too large. What do we do? Take advantage of the fact that the first 256 entries are already known--your typical extended ascii set or what have you. This means we really need 768 entries.
768 * 18 bits = 13824 bits or 1728 bytes.
This leaves you with about 320 bytes to play with for code. Naturally, you can play around with the dictionary size and see what's good for you, but you will not end up with very much space for your code. Since you are looking at so little code space, I would expect that you would end up coding in assembly.
I hope this helps.
My best recommendation is to examine the BusyBox source and see if their LZW implementation is sufficiently small to work in your environment.
The lowest dictionary for lzw is trie on linked list. See original implementation in LZW AB. I've rewrited it in fork LZWS. Fork is compatible with compress. Detailed documentation here.
n bit dictionary requires (2 ** n) * sizeof(code) + ((2 ** n) - 257) * sizeof(code) + (2 ** n) - 257.
So:
9 bit code - 1789 bytes.
12 bit code - 19709 bytes.
16 bit code - 326909 bytes.
Please be aware that it is a requirements for dictionary. You need to have about 100-150 bytes for state or variables in stack.
Decompressor will use less memory than compressor.
So I think that you can try to compress your data with 9 bit version. But it won't provide good compression ratio. More bits you have - ratio is better.
typedef unsigned int UINT;
typedef unsigned char BYTE;
BYTE *lzw_encode(BYTE *input ,BYTE *output, long filesize, long &totalsize);
BYTE *lzw_decode(BYTE *input ,BYTE *output, long filesize, long &totalsize);
Related
To start off: this might be a duplicate, but i can't seem to find a definitive answer on this question after having searched for it on google.
For a project i am designing a script that makes 2 ATMega328p chips communicate. At this moment i'm testing the best speed to do this, but my goal is to achieve really high baudrates. I have plenty of experience with making code efficient, but not with the memory management part. The problem:
I want to store a multiple of 8 bits (ex.: 48 bits). My first thought was to use an array of length 6 and type uint8_t, but I don't know how efficient arrays are compared to other types. Some people say pointers are more efficient and others say it doesn't matter, but I cant find a definitive answer on what the case is for really small amounts of memory. last quesion: I know the size of the sent bits will never be bigger than 64 bits, so would it matter if i just Always jused uint64_t?
Edit:
to clarify: My goal is to minimize the storage size, not the transmission size
Edit2:
What i meant by having a varying size: The size is determined on compile time, not while running the program.
The ATmega328p is a 8bit processor. All of its instructions are 8bit. Nothing will be faster than simply having an uint8_t array.
What you can do is, when you compile, look at your .lss file, it will show you the assmebly code, then you can look up the AVR instruction set and see the clock cycles each one will take. I think you will find using a uint64_t will just add unncessary overhead unless you are very careful with the way you are putting the bytes into it.
If the length of your packages might vary, the most efficient approach would be to compress the package before communication.
For example the first 3 bits of each package, could determine the size of that package.
The compressed packages are communicated faster, and use up less memory space.
I need to allocate memory of order of 10^15 to store integers which can be of long long type.
If i use an array and declare something like
long long a[1000000000000000];
that's never going to work. So how can i allocate such a huge amount of memory.
Really large arrays generally aren't a job for memory, more one for disk. 1015 array elements at 64 bits apiece is (I think) 8 petabytes. You can pick up 8G memory slices for about $15 at the moment so, even if your machine could handle that much memory or address space, you'd be outlaying about $15 million dollars.
In addition, with upcoming DDR4 being clocked up to about 4GT/s (giga-transfers), even if each transfer was a 64-bit value, it would still take about one million seconds just to initialise that array to zero. Do you really want to be waiting around for eleven and a half days before your code even starts doing anything useful?
And, even if you go the disk route, that's quite a bit. At (roughly) $50 per TB, you're still looking at $400,000 and you'll possibly have to provide your own software for managing those 8,000 disks somehow. And I'm not even going to contemplate figuring out how long it would take to initialise the array on disk.
You may want to think about rephrasing your question to indicate the actual problem rather than what you currently have, a proposed solution. It may be that you don't need that much storage at all.
For example, if you're talking about an array where many of the values are left at zero, a sparse array is one way to go.
You can't. You don't have all this memory, and you'll don't have it for a while. Simple.
EDIT: If you really want to work with data that does not fit into your RAM, you can use some library that work with mass storage data, like stxxl, but it will work a lot slower, and you have always disk size limits.
MPI is what you need, that's actually a small size for parallel computing problems the blue gene Q monster at Lawerence Livermore National Labs holds around 1.5 PB of ram. you need to use block decomposition to divide up your problem and viola!
the basic approach is dividing up the array into equal blocks or chunks among many processors
You need to uppgrade to a 64-bit system. Then get 64-bit-capable compiler then put a l at the end of 100000000000000000.
Have you heard of sparse matrix implementation? In one of the sparse matrices, you just use very little part of the matrix despite of the matrix being huge.
Here are some libraries for you.
Here is a basic info about sparse-matrices You dont actually use all of it. Just the needed few points.
I am writing a program in C to solve an optimisation problem, for which I need to create an array of type float with an order of 1013 elements. Is it practically possible to do so on a machine with 20GB memory.
A float in C occupies 4 bytes (assuming IEEE floating point arithmetic, which is pretty close to universal nowadays). That means 1013 elements are naïvely going to require 4×1013 bytes of space. That's quite a bit (40 TB, a.k.a. quite a lot of disk for a desktop system, and rather more than most people can afford when it comes to RAM) so you need to find another approach.
Is the data sparse (i.e., mostly zeroes)? If it is, you can try using a hash table or tree to store only the values which are anything else; if your data is sufficiently sparse, that'll let you fit everything in. Also be aware that processing 1013 elements will take a very long time. Even if you could process a billion items a second (very fast, even now) it would still take 104 seconds (several hours) and I'd be willing to bet that in any non-trivial situation you'll not be able to get anything near that speed. Can you find some way to make not just the data storage sparse but also the processing, so that you can leave that massive bulk of zeroes alone?
Of course, if the data is non-sparse then you're doomed. In that case, you might need to find a smaller, more tractable problem instead.
I suppose if you had a 64 bit machine with a lot of swap space, you could just declare an array of size 10^13 and it may work.
But for a data set of this size it becomes important to consider carefully the nature of the problem. Do you really need random access read and write operations for all 10^13 elements? Is the array at all sparse? Could you express this as a map/reduce problem? If so, sequential access to 10^13 elements is much more practical than random access.
I would like to represent a structure containing 250 M states(1 bit each) somehow into as less memory as possible (100 k maximum). The operations on it are set/get. I cold not say that it's dense or sparse, it may vary.
The language I want to use is C.
I looked at other threads here to find something suitable also. A probabilistic structure like Bloom filter for example would not fit because of the possible false answers.
Any suggestions please?
If you know your data might be sparse, then you could use run-length encoding. But otherwise, there's no way you can compress it.
The size of the structure depends on the entropy of the information. You cannot squeeze information something in less than a given size if you have no repeated pattern. The worst case would still be about 32Mb of storage in your case. If you know something about the relation between the bits then it's maybe possible...
I don't think it's possible to do what you're asking. If you need to cover 250 million states of 1 bit each, you'd need 250Mbits/8 = 31.25MBytes. A far cry from 100KBytes.
You'd typically create a large array of bytes, and use functions to determine the byte (index >> 3) and bit position (index & 0x07) to set/clear/get.
250M bits will take 31.25 megabytes to store (assuming 8 bits/byte, of course), much much more than your 100k goal.
The only way to beat that is to start taking advantage of some sparseness or pattern in your data.
The max number of bits you can store in 100K of mem is 819,200 bits. This is assuming that 1 K = 1024 bytes, and 1 byte = 8 bits.
are files possible in your environment ?
if so, you might swap, say for example 4k sized segmented bit buffer.
your solution shoud access those bits in a serialized way to
minimize disk load/save operation.
I have a generic growing buffer indended to accumulate "random" string pieces and then fetch the result. Code to handle that buffer is written in plain C.
Pseudocode API:
void write(buffer_t * buf, const unsigned char * bytes, size_t len);/* appends */
const unsigned char * buffer(buffer_t * buf);/* returns accumulated data */
I'm thinking about the growth strategy I should pick for that buffer.
I do not know if my users would prefer memory or speed — or what would be the nature of user's data.
I've seen two strategies in the wild: grow buffer by fixed size increments (that is what I've currently implemented) or grow data exponentially. (There is also a strategy to allocate the exact amount of memory needed — but this is not that interesting in my case.)
Perhaps I should let user to pick the strategy... But that would make code a bit more complex...
Once upon a time, Herb Sutter wrote (referencing Andrew Koenig) that the best strategy is, probably, exponential growth with factor 1.5 (search for "Growth Strategy"). Is this still the best choice?
Any advice? What does your experience say?
Unless you have a good reason to do otherwise, exponential growth is probably the best choice. Using 1.5 for the exponent isn't really magical, and in fact that's not what Andrew Koenig originally said. What he originally said was that the growth factor should be less than (1+sqrt(5))/2 (~1.6).
Pete Becker says when he was at Dinkumware P.J. Plauger, owner of Dinkumware, says they did some testing and found that 1.5 worked well. When you allocate a block of memory, the allocator will usually allocate a block that's at least slightly larger than you requested to give it room for a little book-keeping information. My guess (though unconfirmed by any testing) is that reducing the factor a little lets the real block size still fit within the limit.
References:
I believe Andrew originally published this in a magazine (the Journal of Object Oriented Programming, IIRC) which hasn't been published in years now, so getting a re-print would probably be quite difficult.
Andrew Koenig's Usenet post, and P.J. Plauger's Usenet post.
The exponential growth strategy is used throughout STL and it seems to work fine. I'd say stick with that at least until you find a definite case where it won't work.
I usually use a combination of addition of a small fixed amount and multiplication by 1.5 because it is efficent to implement and leads to reasonable step widths which are bigger at first and more memory sensible when the buffer grows. As fixed offset I usually use the initial size of the buffer and start with rather small initial sizes:
new_size = old_size + ( old_size >> 1 ) + initial_size;
As initial_size I use 4 for collection types, 8, 12 or 16 for string types and 128 to 4096 for in-/output buffers depending on the context.
Here is a little chart that shows that this grows much faster (yellow+red) in the early steps compared to multiplying by 1.5 only (red).
So, if you started with 100 you would need for example 6 increases to accommodate 3000 elements while multiplying with 1.5 alone would need 9.
At larger sizes the influence of the addition becomes negligible, which makes both approaches scale equally well by a factor of 1.5 then. These are the effective growth factors if you use the initial size as fixed amount for the addition:
2.5
1.9
1.7
1.62
1.57
1.54
1.53
1.52
1.51
1.5
...
The key point is that the exponential growth strategy lets you avoid expensive copies of the buffer content when you hit the current size for the cost of some wasted memory. The article you link has the numbers for the trade-of.
The answer, as always is, it "depends".
The idea behind exponential growth - ie allocating a new buffer that is x times the current size is that as you require more buffer, you'll need more buffer ansd the chances are that you'll be needing much more buffer than a small fixed increment provides.
So, if you have a 8-byte buffer, and need more allocating an extra 8 bytes is ok, then allocating an additional 16 bytes is probably a good idea - someone with a 16-byte buffer is not likely to require a extra 1 byte. And if they do, all that's happening is you're wasting a little memory.
I thought the best growth factor was 2 - ie double your buffer, but if Koenig/Sutter say 1.5 is optimal, then I'm agreeing with them. You may want to tweak your growth rate after getting some usage statistics though.
So exponential growth is a good trade-off between performance and keeping memory usage low.
Double the size until a threshold (~100MB?) and then lower the exponential growth toward 1.5,..,
1.3
Another option would be to make the default buffer size configurable at runtime.
The point of using exponential growth (whether the factor be 1.5 or 2) is to avoid copies. Each time you realloc the array, you can trigger an implicit copy of the item, which, of course, gets more expensive the larger it gets. By using an exponential growth, you get an amortized constant number of recopies -- i.e. you rarely end up copying.
As long as you're running on a desktop computer of some kind, you can expect an essentially unlimited amount of memory, so time is probably the right side of that tradeoff. For hard real-time systems, you would probably want to find a way to avoid the copies altogether -- a linked list comes to mind.
There's no way anyone can give good advice without knowing something about the allocations, runtime environment, execution characteristics, etc., etc.
Code which works is way more important than highly optimized code... which is under development. Choose some algorithm—any workable algorithm—and try it! If it proves suboptimal, then change the strategy. Placing this in the control of the library user often does them no favors. But if you already have some option scheme in place, then adding it could be useful, unless you hit on a good algorithm (and n^1.5 is a pretty good one).
Also, the use of a function named write in C (not C++) conflicts with <io.h> and <stdio.h>. It's fine if nothing uses them, but it would also be hard to add them later. Best to use a more descriptive name.
As a wild idea, for this specific case, you could change the API to require the caller to allocate the memory for each chunk, and then remembering the chunks instead of copying the data.
Then, when it's time to actually produce the result, you know exactly how much memory is going to be needed and can allocate exactly that.
This has the benefit that the caller will need to allocate memory for the chunks anyway, and so you might as well make use of that. This also avoids copying data more than once.
It has the disadvantage that the caller will have to dynamically allocate each chunk. To get around that, you could allocate memory for each chunk, and remember those, rather than keeping one large buffer, which gets resized when it gets full. This way, you'll copy data twice (once into the chunk you allocate, another time into the resulting string), but no more. If you have to resize several times, you may end up with more than two copies.
Further, really large areas of free memory may be difficult for the memory allocator to find. Allocating smaller chunks may well be easier. There might not be space for a one-gigabyte chunk of memory, but there might be space for a thousand megabyte chunks.