Huffman table entropy decoding simplification (in C) - c

First time using this site to ask a question, but I have gotten many many answers!
Background:
I am decoding a variable length video stream that was encoded using RLE and Huffman encoding. The stream is 10 to 20 Kilobytes long and therefore I am trying to "squeeze" as much time out of every step that I can so it can be decoded efficiently in real time.
Right now the step I am working on involves converting the bitstream into a number based on a Huffman table. I do this by counting the number of leading zeros to determine the number of trailing bits to include. The table looks like:
001xs range -3 to 3
0001xxs range -7 to 7
00001xxxs range -15 to 15
And on till 127. The s is a sign bit, 0 means positive, 1 means negative. So for example if clz=2 then I would read the next 3 bits, 2 for value and 1 for sign.
Question:
Right now the nasty expression I created to do this is:
int outblock[64];
unsigned int value;
//example value 7 -> 111 (xxs) which translates to -3
value=7;
outblock[index]=(((value&1)?-1:1)*(value>>1)); //expression
Is there a simpler and faster way to do this?
Thanks for any help!
Tim
EDIT: Expression edited because it was not generating proper positive values. Generates positive and negative properly now.

I just quickly googled "efficient huffman decoding" and found the following links which may be useful:
Efficient Huffman Decoding with Table Lookup
Previous question - how to decode huffman efficiently
It seems the most efficient way to huffman decode is to use table lookup. Have you tried a method like this?
I'd be interested to see your times of the original algorithm before doing any optimisations. Finally, what hardware / OS are you running on?
Best regards,

Related

Converting from Base 16-8 to Base 2 without Functions nor Bitwise op in C

I have an assignment to make the Full Adder, it was chosen for us to practice the loops and conditinals in C.
So i did the easiest part of checking wether the number is in Base-2 and printing C-Out and Sum. But for Base-16 and Base-8 I couldn't figure out how to convert them to a smaller bases.
No advanced techniques are allowed, rules as follows:
You are not allowed to use data structures such as arrays to store values for the conversion
operation.
You are not allowed to use bitwise operators.
You are not allowed to define your own functions.
I hope that you don't give me the full solution for this step, like only help me with converting one base to another, and i will try figuring out the rest of it by myself.
Think of it this way: you must be familiar with base 10, or decimal numbers. You use them every day. So how do they work? First, the number of symbols to represent them is the base number, 10. This is why, as you are counting the numbers, whenever you get to a power of 10, you need to increase the number of symbols used to represent the number. What you are asked to do here is kind of the reverse of that process. If you had to write down the digits of a number in base 10 without being allowed to see the number, how would you do it? I will give you the first step: you can get the least significant digit by diving the number by 10 and taking the remainder. This will give you the number of times you had to change the symbol used since the last time you had to increase the number of symbols used.
If you do num%2 you will get the right most bit (LSBit) -- depending on how you want to return the bit pattern (string etc) -- save this bit.
If you divide by two then you will lose the right most bit (LSBit) .. keep doing this in a loop until the number becomes zero.

Random formula based of 15 seeds

I am working at my university degree and I got stuck at a random function.
I am using a microcontroller, which has no configured clock. So, I decided to use the ADC (analog to digital conversion) as seeds for my random function.
So I have 15 two bytes variables with stores some 'random' values ( the conversion is not always the same, and the difference is at the LSB ( the last bit in my case :eg now the value of an adc read is 700, in 5ms it is 701, then back to 700, then 702 etc). So, I was thinking to build a random function with use the last 4 bits lets say from those variables.
My question is: Can you give me an example of a good random formula?
Like ( Variable1 >> 4 ) ^ ( Variable2 << 4 ) and so on ...
I want to be able to obtain a pretty random number on 1 byte ( this is the best case ). It will be used in a RSA algorithm, which I have already implemented ( I have a big look up table with prime numbers, and I need 2 random numbers from that table ).
Usually a cryptographic hash function like SHA or MD5 is used for this purpose. As long as your input data contains enough entropy, you will get a random output. See https://en.wikipedia.org/wiki/Entropy_(computing)
However, that may be a little too much work for your use case. If you only need 8 bits, you could use an 8-bit cyclic redundancy code (CRC). It will have similar properties -- since any 8 of your input bits can be used to completely determine the output, the output will be random as long as at least 8 of your input bits are random. See http://www.sunshine2k.de/articles/coding/crc/understanding_crc.html
That will do what you ask for... but beware! It sounds like you are writing a completely insecure implementation of RSA. Under no circumstances could you use only 8 bits of randomness to securely generate an RSA key.
If you think that the LS bit of every word is truly random (which is likely), and if they are uncorrelated, pack 8 LS bits into 1 byte. There is no use for the remaining 15 x 16 - 8 bits.

Normalising 18 bit input between 0-9999

I'm writing a program in which i require to normalise an 18-bit input between 0-9999. This is something i have never come across before,
I have searched the internet and correct me if i am wrong here, but is this as simple as converting the 18-bit binary(000000000000000000) input into a natural number and then divide it by 1000.
Is there is a different and more efficient method ????
Thank you
No, what you want to do is multiply your input by 0.03814697265.
The reasoning is pretty simple: you take your range of inputs (0..2^18) and split it in 10000 "slices". Thus each slice will have a range of just over 26. Then if you divide your input from the original range by this 26 (or multiply it by 1/26), you'll get your number in the 0..9999 range.
Edit: depending on your background, you may need to know that here I use ^ with the meaning of exponentiation. Might be moot since this question is tagged C and it has no first-class concept of exponentiation, but it's definetly not XOR!

HDLC Frames - Octets/modulo 8 doubts

I am trying to implement the HDLC frame format type 3 and I have some doubts as regards Octets/Modulo 8 encoding of frames.
Firstly, Is the HDLC frame transmitted entirely in Octets?
What do they mean by a frame is 'n' Octet in length? Please give an example.
I believe that Octet and Modulo is the same, so assuming that we have a frame X of one byte, what then do they mean by the encoding of X shall be modulo 8.
I am getting a little confused with all this, so i need more clarifications. Example and illustration will be of great help.
Thanks in Advance.
Thanks #clifford and #masoud. Your answer was really helpful. But I have to read this Octet String: What is it? (though it sounds funny because it explained in a simple way), and I came back to read your comments, then I understood all you explained. All the same, wish me a happy coding.
In HDLC every field length must be modulo 8 for example:
A frame of HDLC is like below:
[FLAG(8bits)|ADDRESS(8bits)|CONTROL(8/16bits)|INFORMATION(n*8bits)|FCS(8bits)|FLAG(8bits)]
each field is modulo 8, even length of INFORMATION must be modulo 8.
It means if you want to send a data with length of 1 bit, you must consume a byte(8 bits).
If you are looking for some HDLC frame sample, look at this link: Click me!
and read this: Click me
Firstly, Is the HDLC frame transmitted entirely in Octets?
That simply means that the data length is a multiple of 8 bits. Yes it is.
What do they mean by a frame is 'n' Octet in length?
Who is "they"? Cite your reference material. An octet is simply a group of eight bits. It is a less ambiguous term that byte (which can rarely be used refer to a machine word of length other than eight bits). The term octet is widely used in telecommunications, and is also used in languages other than English to mean "byte" (when a byte is eight bits).
I believe that Octet and Modulo
Not at all, modulo is a mathematical term, used here perhaps inaccurately to mean exactly divisible by (or an exact multiple of) eight.
[...] what then do they mean by the encoding of X shall be modulo 8.[?]
Again who are "they"? If we can see where you are reading this in context, you may get a better explanation.
Edit:
I have not gone to the length of referencing ISO 3309 which is the standard defining HDLC frame structure, but the term "Modulo 8" in at least the Wikipedia article is used only in the context of frame sequence numbers, where it simply means that a sequence number increments from 0 to 7, then restarts at 0 (i.e. it is the frame number modulo 8 - or the remainder of frame_num/8 or simple frame_num % 8 in C code. I wonder whether you are confusing terms - again a citation or extract would help.

Will MD5 ever return the same output as its input? [duplicate]

Is there a fixed point in the MD5 transformation, i.e. does there exist x such that md5(x) == x?
Since an MD5 sum is 128 bits long, any fixed point would necessarily also have to be 128 bits long. Assuming that the MD5 sum of any string is uniformly distributed over all possible sums, then the probability that any given 128-bit string is a fixed point is 1/2128.
Thus, the probability that no 128-bit string is a fixed point is (1 − 1/2128)2128, so the probability that there is a fixed point is 1 − (1 − 1/2128)2128.
Since the limit as n goes to infinity of (1 − 1/n)n is 1/e, and 2128 is most certainly a very large number, this probability is almost exactly 1 − 1/e ≈ 63.21%.
Of course, there is no randomness actually involved – either there is a fixed point or there isn't. But, we can be 63.21% confident that there is a fixed point. (Also, notice that this number does not depend on the size of the keyspace – if MD5 sums were 32 bits or 1024 bits, the answer would be the same, so long as it's larger than about 4 or 5 bits).
My brute force attempt found a 12 prefix and 12 suffix match.
prefix 12:
54db1011d76dc70a0a9df3ff3e0b390f -> 54db1011d76d137956603122ad86d762
suffix 12:
df12c1434cec7850a7900ce027af4b78 -> b2f6053087022898fe920ce027af4b78
Blog post:
https://plus.google.com/103541237243849171137/posts/SRxXrTMdrFN
Since the hash is irreversible, this would be very hard to figure out. The only way to solve this, would be to calculate the hash on every possible output of the hash, and see if you came up with a match.
To elaborate, there are 16 bytes in an MD5 hash. That means there are 2^(16*8) = 3.4 * 10 ^ 38 combinations. If it took 1 millisecond to compute a hash on a 16 byte value, it would take 10790283070806014188970529154.99 years to calculate all those hashes.
While I don't have a yes/no answer, my guess is "yes" and furthermore that there are maybe 2^32 such fixed points (for the bit-string interpretation, not the character-string intepretation). I'm actively working on this because it seems like an awesome, concise puzzle that will require a lot of creativity (if you don't settle for brute force search right away).
My approach is the following: treat it as a math problem. We have 128 boolean variables, and 128 equations describing the outputs in terms of the inputs (which are supposed to match). By plugging in all of the constants from the tables in the algorithm and the padding bits, my hope is that the equations can be greatly simplified to yield an algorithm optimized to the 128-bit input case. These simplified equations can then be programmed in some nice language for efficient search, or treated abstractly again, assigning single bits at a time, watching out for contraditions. You only need to see a few bits of the output to know that it is not matching the input!
Probably, but finding it would take longer than we have or would involve compromising MD5.
There are two interpretations, and if one is allowed to pick either, the probability of finding a fixed point increases to 81.5%.
Interpretation 1: does the MD5 of a MD5 output in binary match its input?
Interpretation 2: does the MD5 of a MD5 output in hex match its input?
Strictly speaking, since the input of MD5 is 512 bits long and the output is 128 bits, I would say that's impossible by definition.

Resources