HDLC Frames - Octets/modulo 8 doubts - c

I am trying to implement the HDLC frame format type 3 and I have some doubts as regards Octets/Modulo 8 encoding of frames.
Firstly, Is the HDLC frame transmitted entirely in Octets?
What do they mean by a frame is 'n' Octet in length? Please give an example.
I believe that Octet and Modulo is the same, so assuming that we have a frame X of one byte, what then do they mean by the encoding of X shall be modulo 8.
I am getting a little confused with all this, so i need more clarifications. Example and illustration will be of great help.
Thanks in Advance.
Thanks #clifford and #masoud. Your answer was really helpful. But I have to read this Octet String: What is it? (though it sounds funny because it explained in a simple way), and I came back to read your comments, then I understood all you explained. All the same, wish me a happy coding.

In HDLC every field length must be modulo 8 for example:
A frame of HDLC is like below:
[FLAG(8bits)|ADDRESS(8bits)|CONTROL(8/16bits)|INFORMATION(n*8bits)|FCS(8bits)|FLAG(8bits)]
each field is modulo 8, even length of INFORMATION must be modulo 8.
It means if you want to send a data with length of 1 bit, you must consume a byte(8 bits).
If you are looking for some HDLC frame sample, look at this link: Click me!
and read this: Click me

Firstly, Is the HDLC frame transmitted entirely in Octets?
That simply means that the data length is a multiple of 8 bits. Yes it is.
What do they mean by a frame is 'n' Octet in length?
Who is "they"? Cite your reference material. An octet is simply a group of eight bits. It is a less ambiguous term that byte (which can rarely be used refer to a machine word of length other than eight bits). The term octet is widely used in telecommunications, and is also used in languages other than English to mean "byte" (when a byte is eight bits).
I believe that Octet and Modulo
Not at all, modulo is a mathematical term, used here perhaps inaccurately to mean exactly divisible by (or an exact multiple of) eight.
[...] what then do they mean by the encoding of X shall be modulo 8.[?]
Again who are "they"? If we can see where you are reading this in context, you may get a better explanation.
Edit:
I have not gone to the length of referencing ISO 3309 which is the standard defining HDLC frame structure, but the term "Modulo 8" in at least the Wikipedia article is used only in the context of frame sequence numbers, where it simply means that a sequence number increments from 0 to 7, then restarts at 0 (i.e. it is the frame number modulo 8 - or the remainder of frame_num/8 or simple frame_num % 8 in C code. I wonder whether you are confusing terms - again a citation or extract would help.

Related

how do I know if the case is true?

Let say if we are given a byte of binary data, how can you know what that data represents?
Is it true that you cant really know what the data represents because you need to know whether the one byte of binary data is represented in base 2, if it unsigned, signed, etc.
or is it that you can know what it represents since binary is base 2?
I am sorry to tell that a byte of data has nothing to do with it's supposed representation.
You state that because it's a byte, it's a binary representation. this is purely assumption.
It depends on the intention of the guy who store the very data.
It might represent anything. As #nos told you, it really depend on the convention the setter used to store it.
You may have a complementary to 2 number, a signed byte on 7 bit, un unsigned on 8 bits, an octal representation (or a partial representation) or a mask (each group of byte within the byte may describer something totally different than another). It could also be a representation of a special coding. Etc.
This is truly unlimited.
In order to properly interpret it you need to know the underlying convention (a spec). #fede1024 told you about files, which use special character so that you can double check with the convention.
One more thing… Bear in mind that even binary data can be stored in natural order or in reverse order: that's endianness. So when you examine a number store in at least 2 bytes, you have to know whether the most significant byte is stored first or sec on din memory. If you misinterpret this, you won't understand the underlying piece of data. Endianness is a constant for a given processor.
Base-2 and binary refer to the same thing. Typically, you do need to know whether the byte is signed or unsigned at least (in C). As for what the data represents - well, "it depends". Whether you want to interpret it as a single byte, as a character (or not), etc. With multi-byte data, you often also have to take endianness (ordering of the bytes into larger words) into account.
Some files format start with a magic number, for example all PNG files starts with 89 50 4E 47 0D 0A 1A 0A. That said, if you have a general binary file without any kind of magic number, you can just guess about his contents.
You can try to open it with an hexadecimal editor, but there is no automatic way to understand what the data represents.
You know it's base 2 since it's a byte of binary data, as you said. To see if it is true, in C everything but 0 is true. If it's 0, then it's false.

big-endian && little -endian?

Can any one tell me what this statement means:
"Specify the endianness of the object files. This only affects disassembly. This can be useful when disassembling a file format which does not describe endianness information, such as S-records. "
This article explains really well what endianness is and how to program in an endian independent way.
Different environments store numbers in different ways,
Big endian environments store information with the most significant byte first.
Little endian environments store information with the least significant byte first.
Nowadays most high level frameworks take care of all this for you, however if your programming at a lower level then this will be important.
Have a peek at the wikipedia entry for it, its really not as bad as some others
I don't understand all the question, but endianess is used to describe which way round numbers are stored.
For example, the number 256 is too big for 1 byte so it is represented in 2 bytes as a 1 in one of the bytes and a 0 in the other, representing 1 * 256 plus 0 * units.
The way round these bytes are stored is endianess

Huffman table entropy decoding simplification (in C)

First time using this site to ask a question, but I have gotten many many answers!
Background:
I am decoding a variable length video stream that was encoded using RLE and Huffman encoding. The stream is 10 to 20 Kilobytes long and therefore I am trying to "squeeze" as much time out of every step that I can so it can be decoded efficiently in real time.
Right now the step I am working on involves converting the bitstream into a number based on a Huffman table. I do this by counting the number of leading zeros to determine the number of trailing bits to include. The table looks like:
001xs range -3 to 3
0001xxs range -7 to 7
00001xxxs range -15 to 15
And on till 127. The s is a sign bit, 0 means positive, 1 means negative. So for example if clz=2 then I would read the next 3 bits, 2 for value and 1 for sign.
Question:
Right now the nasty expression I created to do this is:
int outblock[64];
unsigned int value;
//example value 7 -> 111 (xxs) which translates to -3
value=7;
outblock[index]=(((value&1)?-1:1)*(value>>1)); //expression
Is there a simpler and faster way to do this?
Thanks for any help!
Tim
EDIT: Expression edited because it was not generating proper positive values. Generates positive and negative properly now.
I just quickly googled "efficient huffman decoding" and found the following links which may be useful:
Efficient Huffman Decoding with Table Lookup
Previous question - how to decode huffman efficiently
It seems the most efficient way to huffman decode is to use table lookup. Have you tried a method like this?
I'd be interested to see your times of the original algorithm before doing any optimisations. Finally, what hardware / OS are you running on?
Best regards,

How do I work with bit data in C

In class I've been tasked with writing a C program that decompresses a text file and prints out the characters it contains. Each character in the file is represented by 2 bits (4 possible characters).
I've recently been informed that a byte is not necessarily 8 bits on all systems, and a char is not necessarily 1 byte. This then makes me wonder how on earth I'm supposed to know how many bits got loaded from a file when I loaded 1 byte. Also how am I supposed to keep the loaded data in memory when there are no data types that can guarantee a set amount of bits.
How do I work with bit data in C?
A byte is not necessarily 8 bits. That much is certainly true. A char, on the other hand, is defined to be a byte - C does not differentiate between the two things.
However, the systems you will write for will almost certainly have 8-bit bytes. Bytes of different sizes are essentially non-existant outside of really, really old systems, or certain embedded systems.
If you have to write your code to work for multiple platforms, and one or more of those have differently sized chars, then you write code specifically to handle that platform - using e.g. CHAR_BIT to determine how many bits each byte contains.
Given that this is for a class, assume 8-bit bytes, unless told otherwise. The point is not going to be extreme platform independence, the point is to teach you something about bit fiddling (or possibly bit fields, but that depends on what you've covered in class).
This then makes me wonder how on earth I'm supposed to know how many
bits got loaded from a file when I loaded 1 byte.
You'll be hard pressed to find a platform where a byte is not 8 bits. (though as noted above CHAR_BIT can be used to verify that). Also clarify the portability requirements with your instructor or state your assumptions.
Usually bits are extracted using shifts and bitwise operations, e.g. (x & 3) are the rightmost 2 bits of x. ((x>>2) & 3) are the next two bits. Pick the right data type for the platforms you are targettiing or as others say use something like uint8_t if available for your compiler.
Also see:
Type to use to represent a byte in ANSI (C89/90) C?
I would recommend not using bit fields. Also see here:
When is it worthwhile to use bit fields?
You can use bit fields in C. These indices explicitly let you specify the number of bits in each part of the field, if you are truly concerned about width. This page gives a discussion: http://msdn.microsoft.com/en-us/library/yszfawxh(v=vs.80).aspx
As an example, check out the ieee754.h for usage in the context of implementing IEEE754 floats

Will MD5 ever return the same output as its input? [duplicate]

Is there a fixed point in the MD5 transformation, i.e. does there exist x such that md5(x) == x?
Since an MD5 sum is 128 bits long, any fixed point would necessarily also have to be 128 bits long. Assuming that the MD5 sum of any string is uniformly distributed over all possible sums, then the probability that any given 128-bit string is a fixed point is 1/2128.
Thus, the probability that no 128-bit string is a fixed point is (1 − 1/2128)2128, so the probability that there is a fixed point is 1 − (1 − 1/2128)2128.
Since the limit as n goes to infinity of (1 − 1/n)n is 1/e, and 2128 is most certainly a very large number, this probability is almost exactly 1 − 1/e ≈ 63.21%.
Of course, there is no randomness actually involved – either there is a fixed point or there isn't. But, we can be 63.21% confident that there is a fixed point. (Also, notice that this number does not depend on the size of the keyspace – if MD5 sums were 32 bits or 1024 bits, the answer would be the same, so long as it's larger than about 4 or 5 bits).
My brute force attempt found a 12 prefix and 12 suffix match.
prefix 12:
54db1011d76dc70a0a9df3ff3e0b390f -> 54db1011d76d137956603122ad86d762
suffix 12:
df12c1434cec7850a7900ce027af4b78 -> b2f6053087022898fe920ce027af4b78
Blog post:
https://plus.google.com/103541237243849171137/posts/SRxXrTMdrFN
Since the hash is irreversible, this would be very hard to figure out. The only way to solve this, would be to calculate the hash on every possible output of the hash, and see if you came up with a match.
To elaborate, there are 16 bytes in an MD5 hash. That means there are 2^(16*8) = 3.4 * 10 ^ 38 combinations. If it took 1 millisecond to compute a hash on a 16 byte value, it would take 10790283070806014188970529154.99 years to calculate all those hashes.
While I don't have a yes/no answer, my guess is "yes" and furthermore that there are maybe 2^32 such fixed points (for the bit-string interpretation, not the character-string intepretation). I'm actively working on this because it seems like an awesome, concise puzzle that will require a lot of creativity (if you don't settle for brute force search right away).
My approach is the following: treat it as a math problem. We have 128 boolean variables, and 128 equations describing the outputs in terms of the inputs (which are supposed to match). By plugging in all of the constants from the tables in the algorithm and the padding bits, my hope is that the equations can be greatly simplified to yield an algorithm optimized to the 128-bit input case. These simplified equations can then be programmed in some nice language for efficient search, or treated abstractly again, assigning single bits at a time, watching out for contraditions. You only need to see a few bits of the output to know that it is not matching the input!
Probably, but finding it would take longer than we have or would involve compromising MD5.
There are two interpretations, and if one is allowed to pick either, the probability of finding a fixed point increases to 81.5%.
Interpretation 1: does the MD5 of a MD5 output in binary match its input?
Interpretation 2: does the MD5 of a MD5 output in hex match its input?
Strictly speaking, since the input of MD5 is 512 bits long and the output is 128 bits, I would say that's impossible by definition.

Resources