Random formula based of 15 seeds - c

I am working at my university degree and I got stuck at a random function.
I am using a microcontroller, which has no configured clock. So, I decided to use the ADC (analog to digital conversion) as seeds for my random function.
So I have 15 two bytes variables with stores some 'random' values ( the conversion is not always the same, and the difference is at the LSB ( the last bit in my case :eg now the value of an adc read is 700, in 5ms it is 701, then back to 700, then 702 etc). So, I was thinking to build a random function with use the last 4 bits lets say from those variables.
My question is: Can you give me an example of a good random formula?
Like ( Variable1 >> 4 ) ^ ( Variable2 << 4 ) and so on ...
I want to be able to obtain a pretty random number on 1 byte ( this is the best case ). It will be used in a RSA algorithm, which I have already implemented ( I have a big look up table with prime numbers, and I need 2 random numbers from that table ).

Usually a cryptographic hash function like SHA or MD5 is used for this purpose. As long as your input data contains enough entropy, you will get a random output. See https://en.wikipedia.org/wiki/Entropy_(computing)
However, that may be a little too much work for your use case. If you only need 8 bits, you could use an 8-bit cyclic redundancy code (CRC). It will have similar properties -- since any 8 of your input bits can be used to completely determine the output, the output will be random as long as at least 8 of your input bits are random. See http://www.sunshine2k.de/articles/coding/crc/understanding_crc.html
That will do what you ask for... but beware! It sounds like you are writing a completely insecure implementation of RSA. Under no circumstances could you use only 8 bits of randomness to securely generate an RSA key.

If you think that the LS bit of every word is truly random (which is likely), and if they are uncorrelated, pack 8 LS bits into 1 byte. There is no use for the remaining 15 x 16 - 8 bits.

Related

How to generate an integer random value within an interval, starting from random byte

I looked at the other answers I found, but they don't seem suitable to my case.
I am working with ANSI C, on an embedded 32bit ARM system.
I have a register that generates a random 8bit value (generated from thermal noise in the chip). From that value I would like to generate evenly distributed integer values within certain ranges, those are:
0,1,2,3,4,5
0,1,2,3,4,5,6,7,8,9
"true" randomness is very important in my application, I need to generate white noise that could make a measurement drift.
Thanks!
Taking RandomValue % SizeOfRange will not produce a truly random value because in general the bucketing into the discrete possible values will be uneven.
I would suggest using a bit mask to ignore all bits outside the range of interest, then repeatedly getting a new random number until the masked value falls within the desired range.
For the range 0..5, look at the right-most 3 bits. That will produce a value in the range 0..7. "Reroll" results of 6 or 7.
For the range 0..9 look at the right-most 5 bits. The range is 0..16. Ignore results from 10..16.
As a real-word analog, think of trying to get a random number between 1 and 5 with a 6-sided die. There is no "fair" algorithm to map a roll of 6 into one of the desired numbers 1..5. Simply reroll a 6 until you get something in the desired range.
Masking high bits ensures that the number of "rerolls" is minimal.
Be sure to pay attention to any physical limitations on how often you can pull the special register and expect to get an entirely random value.

lightweight (quasi-random) integer fingerprint of C string

I would like to generate a nicely-mixed-up integer fingerprint of an arbitrary C string (s). Most C strings will consist of ASCII text characters:
I want very different fingerprints for similar strings, esp such similar strings as "ab" and "ba"
I want it to be difficult to invert back from the fingerprint to the string (well, my string is typically longer than 32 bits, which means that many strings would map into the same integer), which means again that I want similar strings to yield very different codes;
I want to use the 32 bits available to me efficiently in the integer result,
I want the function source to be small
I want the function to be fast.
one usage is security (but not encryption) related. I can ask a user for a text password, convert it into an integer for storage and later test whether this integer is correct. (I know I could store strings, but I don't want to. guessing a 32-bit integer correctly is impossible if my program can slow down incorrect attempts to the point where brute force cannot work faster than password guessing. another use of this function is as the start of a hash index function (mod array length) into an array.)
alas, I am probably reinventing the wheel here. such functions have probably been written a million times, and by people who are much more versed in cryptography. I don't need AES, of course, but something much more lightweight. the use is different.
my first thinking was
mod 64 each character to take advantage of the ASCII text aspect. now I have 6 bits. call this x.
I can place a 6bit string into 5 locations in a 32-bit space, leaving 2 bits over.
take the current string index position (0, 1, 2...), mod5 it to determine where I want to start to place my x into my running integer result code. XOR my x into this running-result integer.
use the remaining 2 bits to increment a counter [mod 4 to prevent overflow] for each character processed.
then I thought that bit operations may be computer-fast but take more source code. I can think of other choices. take each index position i and multiply it by an ascii representation of each character [or the x from above], and call this y[i]. now do the following:
calculate the natural logarithm of the sums of the y (or this sum plus the running result), and just pretend that the first 32 bits of this result [maybe leaving off the first few bits], which are really a double, are an integer representation. I can XOR each bitint(log(y[i])) into the running integer result.
do it even cheaper. just add the y's, and then do the logarithm with 32-bit pickoff just once at the end. alternatively, run a sum-y through srand as a seed and grab a rand.
there are probably a few other ways to do it, too. in sum, the function should map strings into very different integers, be short to code, and be very fast.
Any pointers?
A common method of generating a non-reversible digest or hash of a string is to generate a Cyclic Redundancy Checksum (CRC).
Source for CRC is widely available, in this case you should use a common CRC-32 such as that used by Ethernet. Different CRCs work on the same principle, buy use different polynomials. Do not be tempted to invent your own polynomial; the distribution is likely to be sub-optimal.
What you're looking for is called a "hash". Two examples of hash functions I'm aware of that return short integers are MurmurHash and SipHash. MurmurHash, as I recall, is not designed to be a cryptographic hash, while SipHash, on the other hand, is indeed designed with security in mind, as stated on its homepage. MurmurHash has 2 versions that return a 32-bit and a 64-bit output. SipHash returns a 64-bit output.

Huffman table entropy decoding simplification (in C)

First time using this site to ask a question, but I have gotten many many answers!
Background:
I am decoding a variable length video stream that was encoded using RLE and Huffman encoding. The stream is 10 to 20 Kilobytes long and therefore I am trying to "squeeze" as much time out of every step that I can so it can be decoded efficiently in real time.
Right now the step I am working on involves converting the bitstream into a number based on a Huffman table. I do this by counting the number of leading zeros to determine the number of trailing bits to include. The table looks like:
001xs range -3 to 3
0001xxs range -7 to 7
00001xxxs range -15 to 15
And on till 127. The s is a sign bit, 0 means positive, 1 means negative. So for example if clz=2 then I would read the next 3 bits, 2 for value and 1 for sign.
Question:
Right now the nasty expression I created to do this is:
int outblock[64];
unsigned int value;
//example value 7 -> 111 (xxs) which translates to -3
value=7;
outblock[index]=(((value&1)?-1:1)*(value>>1)); //expression
Is there a simpler and faster way to do this?
Thanks for any help!
Tim
EDIT: Expression edited because it was not generating proper positive values. Generates positive and negative properly now.
I just quickly googled "efficient huffman decoding" and found the following links which may be useful:
Efficient Huffman Decoding with Table Lookup
Previous question - how to decode huffman efficiently
It seems the most efficient way to huffman decode is to use table lookup. Have you tried a method like this?
I'd be interested to see your times of the original algorithm before doing any optimisations. Finally, what hardware / OS are you running on?
Best regards,

Storing Large Integers/Values in an Embedded System

I'm developing a embedded system that can test a large numbers of wires (upto 360) - essentially a continuity checking system. The system works by clocking in a test vector and reading the output from the other end. The output is then compared with a stored result (which would be on an SD Card) that tells what the output should have been. The test-vectors are just a walking ones so there's no need to store them anywhere. The process would be a bit like follows:
Clock out test-vector (walking ones)
Read in output test-vector.
Read corresponding output test-vector from SD Card which tells what the output vector should be.
Compare the test-vectors from step 2 and 3.
Note down the errors/faults in a separate array.
Continue back to step 1 unless all wires are checked.
Output the errors/faults to the LCD.
My hardware consists of a large shift register thats clocked into the AVR microcontroller. For every test vector (which would also be 360 bits), I will need to read in 360 bits. So, for 360 wires the total amount of data would be 360*360 = 16kB or so. I already know I cannot do this in one pass (i.e. read the entire data and then compare), so it will have to be test-vector by test-vector.
As there are no inherent types that can hold such large numbers, I intend to use a bit-array of length 360 bit. Now, my question is, how should I store this bit array in a txt file?
One way is to store raw values i.e. on each line store the raw binary data that I read in from the shift register. So, for 8 wires, it would be 0b10011010. But this can get ugly for upto 360 wires - each line would contain 360 bytes.
Another way is to store hex values - this would just be two characters for 8 bits (9A for the above) and about 90 characters for 360 bits. This would, however, require me to read in the text - line by line - and convert the hex value to be represented in the bit-array, somehow.
So whats the best solution for this sort of problem? I need the solution to be completely "deterministic" - I can't have calls to malloc or such. They are a bit of a no-no in embedded systems from what I've read.
SUMMARY
I need to store large values that can't be represented by any traditional variable types. Currently I intend to store these values in a bitarray. What's the best way to store these values in a text file on an SD Card?
These are not integer values but rather bit maps; they have no arithmetic meaning. What you are suggesting is simply a byte array of length 360/8, and not related to "large integers" at all. However some more appropriate data structure or representation may be possible.
If the test vector is a single bit in 360, then it is both inefficient and unnecessary to store 360 bits for each vector, a value 0 to 359 is sufficient to unambiguously define each vector. If the correct output is also a single bit, then that could also be stored as a bit index, if not then you could store it as a list of indices for each bit that should be set, with some sentinel value >=360 or <0 to indicate the end of the list. Where most vectors contain less than fewer than 22 set bits, this structure will be more efficient that storing a 45 byte array.
From any bit index value, you can determine the address and mask of the individual wire by:
byte_address = base_address + bit_index / 8 ;
bit_mask = 0x01 << (bit_index % 8) ;
You could either test each of the 360 bits iteratively or generate a 360 bit vector on the fly from the list of bits.
I can see no need for dynamic memory allocation in this, but whether or not it is advisable in an embedded system is largely dependent on the application and target resources. A typical AVR system has very little memory, and dynamic memory allocation carries an overhead for heap management and block alignment that you may not be able to afford. Dynamic memory allocation is not suited in situations where hard real-time deterministic timing is required. And in all cases you should have a well defined strategy or architecture for avoiding memory leak issues (repeatedly allocating memory that never gets released).

Will MD5 ever return the same output as its input? [duplicate]

Is there a fixed point in the MD5 transformation, i.e. does there exist x such that md5(x) == x?
Since an MD5 sum is 128 bits long, any fixed point would necessarily also have to be 128 bits long. Assuming that the MD5 sum of any string is uniformly distributed over all possible sums, then the probability that any given 128-bit string is a fixed point is 1/2128.
Thus, the probability that no 128-bit string is a fixed point is (1 − 1/2128)2128, so the probability that there is a fixed point is 1 − (1 − 1/2128)2128.
Since the limit as n goes to infinity of (1 − 1/n)n is 1/e, and 2128 is most certainly a very large number, this probability is almost exactly 1 − 1/e ≈ 63.21%.
Of course, there is no randomness actually involved – either there is a fixed point or there isn't. But, we can be 63.21% confident that there is a fixed point. (Also, notice that this number does not depend on the size of the keyspace – if MD5 sums were 32 bits or 1024 bits, the answer would be the same, so long as it's larger than about 4 or 5 bits).
My brute force attempt found a 12 prefix and 12 suffix match.
prefix 12:
54db1011d76dc70a0a9df3ff3e0b390f -> 54db1011d76d137956603122ad86d762
suffix 12:
df12c1434cec7850a7900ce027af4b78 -> b2f6053087022898fe920ce027af4b78
Blog post:
https://plus.google.com/103541237243849171137/posts/SRxXrTMdrFN
Since the hash is irreversible, this would be very hard to figure out. The only way to solve this, would be to calculate the hash on every possible output of the hash, and see if you came up with a match.
To elaborate, there are 16 bytes in an MD5 hash. That means there are 2^(16*8) = 3.4 * 10 ^ 38 combinations. If it took 1 millisecond to compute a hash on a 16 byte value, it would take 10790283070806014188970529154.99 years to calculate all those hashes.
While I don't have a yes/no answer, my guess is "yes" and furthermore that there are maybe 2^32 such fixed points (for the bit-string interpretation, not the character-string intepretation). I'm actively working on this because it seems like an awesome, concise puzzle that will require a lot of creativity (if you don't settle for brute force search right away).
My approach is the following: treat it as a math problem. We have 128 boolean variables, and 128 equations describing the outputs in terms of the inputs (which are supposed to match). By plugging in all of the constants from the tables in the algorithm and the padding bits, my hope is that the equations can be greatly simplified to yield an algorithm optimized to the 128-bit input case. These simplified equations can then be programmed in some nice language for efficient search, or treated abstractly again, assigning single bits at a time, watching out for contraditions. You only need to see a few bits of the output to know that it is not matching the input!
Probably, but finding it would take longer than we have or would involve compromising MD5.
There are two interpretations, and if one is allowed to pick either, the probability of finding a fixed point increases to 81.5%.
Interpretation 1: does the MD5 of a MD5 output in binary match its input?
Interpretation 2: does the MD5 of a MD5 output in hex match its input?
Strictly speaking, since the input of MD5 is 512 bits long and the output is 128 bits, I would say that's impossible by definition.

Resources