I'm developing a embedded system that can test a large numbers of wires (upto 360) - essentially a continuity checking system. The system works by clocking in a test vector and reading the output from the other end. The output is then compared with a stored result (which would be on an SD Card) that tells what the output should have been. The test-vectors are just a walking ones so there's no need to store them anywhere. The process would be a bit like follows:
Clock out test-vector (walking ones)
Read in output test-vector.
Read corresponding output test-vector from SD Card which tells what the output vector should be.
Compare the test-vectors from step 2 and 3.
Note down the errors/faults in a separate array.
Continue back to step 1 unless all wires are checked.
Output the errors/faults to the LCD.
My hardware consists of a large shift register thats clocked into the AVR microcontroller. For every test vector (which would also be 360 bits), I will need to read in 360 bits. So, for 360 wires the total amount of data would be 360*360 = 16kB or so. I already know I cannot do this in one pass (i.e. read the entire data and then compare), so it will have to be test-vector by test-vector.
As there are no inherent types that can hold such large numbers, I intend to use a bit-array of length 360 bit. Now, my question is, how should I store this bit array in a txt file?
One way is to store raw values i.e. on each line store the raw binary data that I read in from the shift register. So, for 8 wires, it would be 0b10011010. But this can get ugly for upto 360 wires - each line would contain 360 bytes.
Another way is to store hex values - this would just be two characters for 8 bits (9A for the above) and about 90 characters for 360 bits. This would, however, require me to read in the text - line by line - and convert the hex value to be represented in the bit-array, somehow.
So whats the best solution for this sort of problem? I need the solution to be completely "deterministic" - I can't have calls to malloc or such. They are a bit of a no-no in embedded systems from what I've read.
SUMMARY
I need to store large values that can't be represented by any traditional variable types. Currently I intend to store these values in a bitarray. What's the best way to store these values in a text file on an SD Card?
These are not integer values but rather bit maps; they have no arithmetic meaning. What you are suggesting is simply a byte array of length 360/8, and not related to "large integers" at all. However some more appropriate data structure or representation may be possible.
If the test vector is a single bit in 360, then it is both inefficient and unnecessary to store 360 bits for each vector, a value 0 to 359 is sufficient to unambiguously define each vector. If the correct output is also a single bit, then that could also be stored as a bit index, if not then you could store it as a list of indices for each bit that should be set, with some sentinel value >=360 or <0 to indicate the end of the list. Where most vectors contain less than fewer than 22 set bits, this structure will be more efficient that storing a 45 byte array.
From any bit index value, you can determine the address and mask of the individual wire by:
byte_address = base_address + bit_index / 8 ;
bit_mask = 0x01 << (bit_index % 8) ;
You could either test each of the 360 bits iteratively or generate a 360 bit vector on the fly from the list of bits.
I can see no need for dynamic memory allocation in this, but whether or not it is advisable in an embedded system is largely dependent on the application and target resources. A typical AVR system has very little memory, and dynamic memory allocation carries an overhead for heap management and block alignment that you may not be able to afford. Dynamic memory allocation is not suited in situations where hard real-time deterministic timing is required. And in all cases you should have a well defined strategy or architecture for avoiding memory leak issues (repeatedly allocating memory that never gets released).
Related
As title says, I am searching for the optimal way of storing sets in memory. I am only interested in sets of bytes (array of integers from 0 to 255 where order is not important). It is not required that encoding/decoding be fast. The only necessary thing is that sets should require as little memory as possible.
The first method I came up with is to allocate array of 256 bits (32 bytes) for each set and the bit at position n tells if there is n in the set or not. The problem with this approach is because it requires the same amount of memory even if the set are mostly empty (has only few elements).
The second approach I tried is to store sets as regular arrays. So, if a set contains n elements, then it will require n + 1 bytes to be stored. The first byte represents the number of elements and other bytes represents elements. But, as we know, order in sets are not important, so something strongly tells me that there must be a way to impove this.
My third attempt is to enumerate all possible sets and the just store the index of set (integer representing its index in list of all possible sets of bytes). But, it turned out that it is absolutelly equivalent as the first approach. Basically, I will still need 32 bytes to store any set, so it is not very useful.
Fourth attempt I made is based on my second approach. I noticed that is the set contains n elements it will, of course, require n + 1 bytes (if I use my second method). But, if, for example, element k appeared in set (actually in array, because in my second attempt I store sets as arrays), then it cannot appear again. Basically, if k appears again, then it must mean something different (maybe k - 1). So, I did some optimizations and I noticed that I can save some bytes if I differently encode each next element (for examle [3, 3, 5, 7] is interpreted as set of 3 elements whose elements are {3, 4, 5} (every next element is decreased by its index) and [3, 3, 5, 6] is interpreted as {3, 4, 2} (notice that 3 and 4 already exists, so 6 is decreased by 2 and it becomes 4, but 4 exists and 3 exists, so it must be 2)). But how can this approach can actually save bytes? I experimented and realized that I can order elements in the array to make it possible, for some cases, to avoid using high bit to encode element, so I saved 1 bit per element, which is about n / 16 bytes saved (which is n / 2 * 1 / 8).
Fifth approach I made is similar to my second approach, but it differently interpret number fo elements. If number of elements are less than 128 then it normally read all the lements from the following array in the memory. But, if the number fo ellements is greater than 128 then it creates a full set and then just remove elements from the following array in memory. On average, is saves a lot of bytes, but it is still far away from optimal.
My last attempt (sixth attempt) is to enumerate just some sets (for example create a list of sets which will contain: full set, set with only even numbers, set with only odd numbers, set with elements less than 128, set with elements greater than 128, etc) and then to use elements from that list and basic set operations (union, intersection, etc) to reconstruct original set. It will require few bytes for each base set we use from the list and it will require a few bits for union or intersection operations, and of course one byte for length of our sequence. It very depends on number of elements in the base set list which should be hardcoded, but it seems hard to preoperly create and properly chose elements which are in that list. Anyway, something tells me that this is not very clever approach.
But hat is actually the most optimal way? Something tells me that my fourth attempt is not so bad, but can we do better? The sets I opereate with have random number of elements, so on average 128 elements per set, so I am looking for a way to allocate 128 bits (16 bytes) per set. The best I did so far is using my fourth approach which is far away from my goal.
Just to mention again, speed is not important. Encoding/decoding may be extremelly slow, the only important thing is that sets require as little amount of memory as possible. When I said "in memory" I meant encoded in memory (compressed). Also, I am interested in as little bits as possible (not only bytes) because I want to store billions of sets compressed on my HDD, so it is important to calculate average amount of bits I need for each set so I know how many resources are available to what I want to achieve.
P.S. If you want some code (but I really don't see why would you) I can post here my solutions I made in C for all of these approaches. Anyway, I am not asking for code or technical details how to implement this in specific programming language, I am just asking for method/algorithm for compressing sets.
Thank you in advance.
Your first method (and the third method, which is equivalent) is already optimal. It cannot be improved.
There are 2256 possible sets of numbers you're working with. By the pigeonhole principle, you need 2256 numbers to identify them all, and you'll need 256 bits to represent those numbers. Any method of identifying the sets which used fewer than 256 bits would leave at least one pair (and probably many pairs) of sets sharing the same identifier.
There are 2^256 possible sets of bytes.
If all sets are equally likely, then the best you can do is to use a constant 256 bits (32 bytes) to indicate which of the 2^256 possibilities you have.
You seem not to like this idea, because you think that sets with only a few elements should take fewer bits. But if they are no more likely to occur than any other sets, then that would not be optimal.
If sets with fewer elements are more likely, then using a constant 32-bytes is not optimal, but the optimal encoding depends on the precise probability distribution of possible sets, which you haven't given. The relevant concept from information theory is "entropy": https://en.wikipedia.org/wiki/Entropy_(information_theory)
Succinctly, in an optimal encoding, the average number of bits required will be the Sum_of_all Pᵢ * -log₂(Pᵢ) over all 2^256 possible sets, where each Pᵢ is the probability of having to encode a particular set (all the Pᵢ must sum to 1)
If the number of elements is the only thing that you think should affect the size of the encoding, then you can't go too far wrong with something like this:
1) Use 1 byte to write out the number of elements in the set. There are 257 possible set sizes, but you can use 0 for both 0 and 256 elements.
2) Write out the index of the set in an enumeration of all sets with that length. (If you wrote a 0 then you need 1 bit to indicate the empty or full set). If the set is known to have N elements, then the number of bits required for this number will be log₂(256!/(N!*(256-N)!)
I'm assured that I get numbers from 0-7, and I'm interested to make the code as efficient as possible.
I want to input only the three most least significant bits into the binary file, and not the whole byte.
Is there anyway I can write only 3 bits? I get a huge number of numbers...
The other way I found is to try to mash up the numbers (00000001 shl 3 & next number)
Though there's always a odd one out.
Files work at a byte level, there's no way to output single bits1. You have to read the original bytes containing the bits of your interest, fix them with the bits you have to modify (using bitwise operations) and write them back where they were.
1. And it would not be efficient to do so anyway. Hard disks work best with large chunks to write; flash disks actually require to work with large blocks (=> a single bit change requires a full block erase and rewrite); they are some reasons why operating systems and disk controllers do a lot of write caching.
I have a peripheral connected to my altera fpga and am able read data from it using SPI. I would like to store this incoming data into an array, preferably as a floating point value. Further, I have a csv file on my computer and want to store that data in another array, and then after triggering a 'start' signal multiply both arrays and send the output via rs-232 to my pc. Any suggestions on how to go about this? Code for reading data from peripheral is as follows:
// we sample on negative edge of clock
always #(negedge SCL)
begin
// data comes as MSB first.
MOSI_reg[31:0] <= {MOSI_reg[30:0], MOSI}; // left shift for MOSI data
MISO_reg[31:0] <= {MISO_reg[30:0], MISO}; // left shift for MISO data
end
thank you.
A 1024x28 matrix of 32 bits each element requires 917504 bits of RAM in your FPGA, plus another 28*32 = 896 bits for the SPI data. Multiplying these two matrices will result in a vector of 1024x1 elements, thus add 32768 bits for the result. This sums 951168 bits you will need in your device. Does your FPGA chip have this memory?
Asumming you have, yes: you can instantiate a ROM inside your design and initialize with $readmemh or $readmemb (for values in binary or hexadecimal form respectively).
If precission is not an issue, go for fixed point, as implementing multiplication and addition in floating point is kind of hard job.
You need then a FSM to fill your source vector with SPI data, do the multiplication and store the result in your destination vector. You may consider instantiating a small processor to do the job more easily
Multiplication is non-trivial in hardware, and 'assign c = a*b' is not necessarily going to produce what you want.
If your FPGA has DSP blocks, you can use one of Altera's customizable IP cores to do your multiplication in a DSP block. If not, you can still use an IP core to tune the multiplier the way you want (with regards to signed/unsigned, latency, etc.) and likely produce a better result.
I have a bit array that can be very dense in some parts and very sparse in others. The array can get as large as 2**32 bits. I am turning it into a bunch of tuples containing offset and length to make it more efficient to deal with in memory. However, this sometimes is less efficient with things like 10101010100011. Any ideas on a good way of storing this in memory?
If I understand correctly, you're using tuples of (offset, length) to represent runs of 1 bits? If so, a better approach would be to use runs of packed bitfields. For dense areas, you get a nice efficient array, and in non-dense areas you get implied zeros. For example, in C++, the representation might look like:
// The map key is the offset; the vector's length gives you the length
std::map<unsigned int, std::vector<uint32_t> >
A lookup would consist of finding the key before the bit position in question, and seeing if the bit falls in its vector. If it does, use the value from the vector. Otherwise, return 0. For example:
typedef std::map<unsigned int, std::vector<uint32_t> > bitmap; // for convenience
typedef std::vector<uint32_t> bitfield; // also convenience
bool get_bit(const bitmap &bm, unsigned int idx) {
unsigned int offset = idx / 32;
bitmap::const_iterator it = bm.upper_bound(offset);
// bm is the element /after/ the one we want
if (it == bm.begin()) {
// but it's the first, so we don't have the target element
return false;
}
it--;
// make offset be relative to this element start
offset -= it.first;
// does our bit fall within this element?
if (offset >= it.second.size())
return false; // nope
unsigned long bf = it.second[offset];
// extract the bit of interest
return (bf & (1 << (offset % 32))) != 0;
}
It would help to know more. By "very sparse/dense," do you mean millions of consecutive zeroes/ones, or do you mean local (how local?) proportions of 0's very close to 0 or 1? Does one or the other value predominate? Are there any patterns that might make run-length encoding effective? How will you use this data structure? (Random access? What kind of distribution of accessed indexes? Are huge chunks never or very rarely accessed?)
I can only guess you aren't going to be randomly accessing and modifying all 4 billion bits at rates of billions of bits/second. Unless it is phenomenally sparse/dense on a local level (such as any million consecutive bits are likely to be the same except for 5 or 10 bits) or full of large scale repetition or patterns, my hunch is that the choice of data structure depends more on how the array is used than on the nature of the data.
How to structure things will be dependent on what is your data. For trying to represent large amounts of data, you will need to have long runs of zeros or ones. This would eliminate the need to have it respresented. If this is not the case and you have approxiately the same amount of one's and zeros, you would be better off with all of the memory.
It might help to think of this as a compression problem. For compression to be effective there has to be a pattern (or a limit set of items used out of an entire space) and an uneven distribution in order for compression to work. If all the elements are used and evenly distributed, compression is hard to do, or could take more space then the actual data.
If there are only runs of zero and ones, (more then just one), using offset and length might make some sense. If there is inconsistent runs, you could just copy the bits as a bit array where you have offset, length, and values.
How efficient the above is will depend upon if you have a large runs of ones or zeros. You will want to be careful to make sure you are not using more memory to reperesent your memory, then just using memory itself, (i.e. your are using more memory to represent the memory then just placing it into memory).
Check out bison source code. Look at biset implementation. It provides several flavors of implementations to deal with bit arrays with different densities.
How many of these do you intend to keep in memory at once?
As far as I can see, 2**32 bits = 512M, only half a gig, which isn't very much memory nowadays. Do you have anything better to do with it?
Assuming your server has enough ram, allocate it all at startup, then keep it in memory, the network handling thread can execute in just a few instructions in constant time - it should be able to keep up with any workload.
Is there a historical reason or something ? I've seen quite a few times something like char foo[256]; or #define BUF_SIZE 1024. Even I do mostly only use 2n sized buffers, mostly because I think it looks more elegant and that way I don't have to think of a specific number. But I'm not quite sure if that's the reason most people use them, more information would be appreciated.
There may be a number of reasons, although many people will as you say just do it out of habit.
One place where it is very useful is in the efficient implementation of circular buffers, especially on architectures where the % operator is expensive (those without a hardware divide - primarily 8 bit micro-controllers). By using a 2^n buffer in this case, the modulo, is simply a case of bit-masking the upper bits, or in the case of say a 256 byte buffer, simply using an 8-bit index and letting it wraparound.
In other cases alignment with page boundaries, caches etc. may provide opportunities for optimisation on some architectures - but that would be very architecture specific. But it may just be that such buffers provide the compiler with optimisation possibilities, so all other things being equal, why not?
Cache lines are usually some multiple of 2 (often 32 or 64). Data that is an integral multiple of that number would be able to fit into (and fully utilize) the corresponding number of cache lines. The more data you can pack into your cache, the better the performance.. so I think people who design their structures in that way are optimizing for that.
Another reason in addition to what everyone else has mentioned is, SSE instructions take multiple elements, and the number of elements input is always some power of two. Making the buffer a power of two guarantees you won't be reading unallocated memory. This only applies if you're actually using SSE instructions though.
I think in the end though, the overwhelming reason in most cases is that programmers like powers of two.
Hash Tables, Allocation by Pages
This really helps for hash tables, because you compute the index modulo the size, and if that size is a power of two, the modulus can be computed with a simple bitwise-and or & rather than using a much slower divide-class instruction implementing the % operator.
Looking at an old Intel i386 book, and is 2 cycles and div is 40 cycles. A disparity persists today due to the much greater fundamental complexity of division, even though the 1000x faster overall cycle times tend to hide the impact of even the slowest machine ops.
There was also a time when malloc overhead was occasionally avoided at great length. Allocation's available directly from the operating system would be (still are) a specific number of pages, and so a power of two would be likely to make the most use of the allocation granularity.
And, as others have noted, programmers like powers of two.
I can think of a few reasons off the top of my head:
2^n is a very common value in all of computer sizes. This is directly related to the way bits are represented in computers (2 possible values), which means variables tend to have ranges of values whose boundaries are 2^n.
Because of the point above, you'll often find the value 256 as the size of the buffer. This is because it is the largest number that can be stored in a byte. So, if you want to store a string together with a size of the string, then you'll be most efficient if you store it as: SIZE_BYTE+ARRAY, where the size byte tells you the size of the array. This means the array can be any size from 1 to 256.
Many other times, sizes are chosen based on physical things (for example, the size of the memory an operating system can choose from is related to the size of the registers of the CPU etc) and these are also going to be a specific amount of bits. Meaning, the amount of memory you can use will usually be some value of 2^n (for a 32bit system, 2^32).
There might be performance benefits/alignment issues for such values. Most processors can access a certain amount of bytes at a time, so even if you have a variable whose size is let's say) 20 bits, a 32 bit processor will still read 32 bits, no matter what. So it's often times more efficient to just make the variable 32 bits. Also, some processors require variables to be aligned to a certain amount of bytes (because they can't read memory from, for example, addresses in the memory that are odd). Of course, sometimes it's not about odd memory locations, but locations that are multiples of 4, or 6 of 8, etc. So in these cases, it's more efficient to just make buffers that will always be aligned.
Ok, those points came out a bit jumbled. Let me know if you need further explanation, especially point 4 which IMO is the most important.
Because of the simplicity (read also cost) of base 2 arithmetic in electronics: shift left (multiply by 2), shift right (divide by 2).
In the CPU domain, lots of constructs revolve around base 2 arithmetic. Busses (control & data) to access memory structure are often aligned on power 2. The cost of logic implementation in electronics (e.g. CPU) makes for arithmetics in base 2 compelling.
Of course, if we had analog computers, the story would be different.
FYI: the attributes of a system sitting at layer X is a direct consequence of the server layer attributes of the system sitting below i.e. layer < x. The reason I am stating this stems from some comments I received with regards to my posting.
E.g. the properties that can be manipulated at the "compiler" level are inherited & derived from the properties of the system below it i.e. the electronics in the CPU.
I was going to use the shift argument, but could think of a good reason to justify it.
One thing that is nice about a buffer that is a power of two is that circular buffer handling can use simple ands rather than divides:
#define BUFSIZE 1024
++index; // increment the index.
index &= BUFSIZE; // Make sure it stays in the buffer.
If it weren't a power of two, a divide would be necessary. In the olden days (and currently on small chips) that mattered.
It's also common for pagesizes to be powers of 2.
On linux I like to use getpagesize() when doing something like chunking a buffer and writing it to a socket or file descriptor.
It's makes a nice, round number in base 2. Just as 10, 100 or 1000000 are nice, round numbers in base 10.
If it wasn't a power of 2 (or something close such as 96=64+32 or 192=128+64), then you could wonder why there's the added precision. Not base 2 rounded size can come from external constraints or programmer ignorance. You'll want to know which one it is.
Other answers have pointed out a bunch of technical reasons as well that are valid in special cases. I won't repeat any of them here.
In hash tables, 2^n makes it easier to handle key collissions in a certain way. In general, when there is a key collission, you either make a substructure, e.g. a list, of all entries with the same hash value; or you find another free slot. You could just add 1 to the slot index until you find a free slot; but this strategy is not optimal, because it creates clusters of blocked places. A better strategy is to calculate a second hash number h2, so that gcd(n,h2)=1; then add h2 to the slot index until you find a free slot (with wrap around). If n is a power of 2, finding a h2 that fulfills gcd(n,h2)=1 is easy, every odd number will do.