I have 4 characters that I want to encode:
Is there a way to give them an "encoded version", instead of ASCII? Binary would be the best but I have only 0 and 1 for the binary, and if I would then use the sequence it wouldn't be clear which character is 0 and which 1 and which 11 for example. Is there any other way to encode efficiently, with minimum number of bits?
Thanks
There are 4 different values. 2 bits can encode 4 values.
00
01
10
11
That means each byte can encode 4 different values.
+---+---+---+---+---+---+---+---+
| 4 | 4 | 3 | 3 | 2 | 2 | 1 | 1 |
+---+---+---+---+---+---+---+---+
For example, we could choose the following encoding scheme:
T = 00
G = 01
A = 10
C = 11
110 (0b01101110) would therefore mean ACAG (assuming the first value is found in the least significant bits).
+---+---+---+---+---+---+---+---+
| 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
+---+---+---+---+---+---+---+---+
---G--- ---A--- ---C--- ---A---
That's means the string would only use as little as 25% of the space used when using ASCII.
Except this doesn't quite work. There's no way to know the length of the sequence. For example, how would you encode ACA using the above scheme?
There are options:
Somehow prefix the sequence by its length.
This could end up doubling the length of the encoded string if it's really short.
Introduce a 5th, sentinel value to indicate the end of the string.
This complicates the encoding (since we no longer have a power of 2). It also reduces the compression factor (8 values per 3 bytes, so as little as 37.5% of the space used when using ASCII).
Use the first 2 bits of each byte to indicate how many values are actually present in the byte. This reduces the compression factor (3 values per byte, so as little as 33% of the space used when using ASCII).
You can can use real compression techniques (e.g. use frequency analysis to use shorter sequences to more common subsequences), probably using using zlib or a more modern equivalent. This method is very effective (perhaps even using 1/10th of what ASCII would), but it's only effective if you have very long sequences. It also prevents random access. This means that can't get the Nth value without first reading all the previous one. In short, you'd have to decode the string to ASCII to search it.
You indicates in a comment that you wanted to search the sequences for subsequences, but none of these approaches makes that easier (and the fourth one prevents it, as mentioned above). They make it very complicated, in fact. Converting the sequence to ASCII to search it is highly recommended.
I am completely new to embedded programming, I'm examining the code below and trying to understand how it work, but I really got stuck.
The program is used to count and print out the numbers from 0 to 9.
So can someone please explain the line const uint8_t ? why do I need an array of heximal number here?
#include <avr/io.h>
#include <util/delay.h>
#include "debug.h"
const uint8_t segments[10] = {0xFC,0x60,0xDA,0xF2,0x66,0xB6,0xBE,0xE4,0xFE,0xF6};
int main(void) {
uint8_t i=0; //s
int g;
init_debug_uart0();
/* set PORT A pins as outputs */
DDRA = 0xFF;
for (;;) {
PORTA = segments[i];
printf("%d\n\r",i);
_delay_ms(1000);
if (i >=9) {
fprintf(stderr , "Count Overflow\n\r"); i = 0;
scanf("%d", &g);
}else
i++;
}
}
And a final question, does anyone know good sources to read about embedded programming? Currently i'm learning about the IIMatto, 8-bits processor and it has 32-registers, designed in Harvard architecture and has 1 level pipeline.
The const uint8_t segments[10] = {0xFC,0x60,0xDA, ... simple defines a constant 10-byte array of bytes.
Code does not need an array of hexadecimal, it could have been decimal.
But consider the benefit of
0xFC,0x60,0xDA,0xF2,0x66,0xB6,0xBE,0xE4,0xFE,0xF6
// versus
252,96,218,...
A casual inspection shows that the number of bits set in each byte is
6,2,5,5,...
This just happens to match the number of segments set in a 7-segment display of the digits 0,1,2,3 ...
Closer inspection of the bits set will detail which bit activate what segment.
Other methods could be employed to get this mapping of 7-segment to digit, but showing the data in hexadecimal is one step closer than decimal.
Perhaps code could be like (the proper segment mapping is TBD).
typedef enum {
LED7_a = 1 << 0,
LED7_b = 1 << 1,
LED7_c = 1 << 2,
LED7_d = 1 << 3,
LED7_e = 1 << 4,
LED7_f = 1 << 5,
LED7_g = 1 << 6,
LED7_dp = 1 << 7
} LED7_Segment_t;
/* ****************************************************************************
7 Segment Pattern
Layout
aaa
f b
f b
ggg
e c
e c
ddd dp
**************************************************************************** */
const uint8_t segments[] = {
/*'0*/ LED7_a | LED7_b | LED7_c | LED7_d | LED7_e | LED7_f ,
/*'1*/ LED7_b | LED7_c ,
/*'2*/ LED7_a | LED7_b | LED7_d | LED7_e | LED7_g,
/*'3*/ LED7_a | LED7_b | LED7_c | LED7_d | LED7_g,
/*'4*/ LED7_b | LED7_c | LED7_f | LED7_g,
/*'5*/ LED7_a | LED7_c | LED7_d | LED7_f | LED7_g,
/*'6*/ LED7_a | LED7_c | LED7_d | LED7_e | LED7_f | LED7_g,
/*'7*/ LED7_a | LED7_b | LED7_c ,
/*'8*/ LED7_a | LED7_b | LED7_c | LED7_d | LED7_e | LED7_f | LED7_g,
/*'9*/ LED7_a | LED7_b | LED7_c | LED7_d | LED7_f | LED7_g};
First of all, ask yourself where is segments used, what is it used for, how is it used?
Hints:
Where: Used to assign a value to PORTA
What: PORTA Is an output from an embedded system. Perhaps to an external device. segments is used to store outputs.
How: Each time around the loop, the value i is incremented. i is used as the index for segments when its value is assigned to PORTA.
Also: A hexadecimal number, specifically 2 digits long, is a byte which is 8 bits. Look on your microcontroller for up to 8 pins labelled "PORTA"
When writing to PORTA (an I/O port) you are concerned with the state of individual I/O lines associated with each bit.
To display a specific set of segments representing a digit on a 7-segment display, you have to write a specific bit pattern - one bit for each segment you wish to light. The segments array is indexed by the digit you want to display, and the value at that index represents the bit pattern on PORTA required to light the segments that represent that digit.
The reason hexadecimal is used is because there is a direct mapping of single hexadecimal digit to exactly four binary digits, so hex is a compact way of representing bit patterns. For an experienced embedded developer, mentally converting a bit-pattern to hex and vice versa becomes second nature. If the values were decimal, the representation would bear no direct relationship to the bit pattern, and the conversion or mental visualisation of the bit pattern less simple.
The hex digit to binary pattern conversion is as follows:
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
A 1010
B 1011
C 1100
D 1101
E 1110
F 1111
0xFC = 11111100
0x60 = 1100000
0xDA = 11011010
....
you need an array of hex numbers, most probably you have connected some sort of device to portA, and in this device
0xFC or 11111100 means display 0,(when portA pins 7,6,5,4,3,2, are high and 1,0 are low this device will display 0
0x60 or 11000000 means display 1(when PortA pins 7,6 are high and the rest low, this device will display 1.
and so on..
The memory locations in the microcontroller are 8 bits wide so 16 I/O will require two 8 bit registers called PORTA and PORTB. in your case its only 8bits wide ie. 8 I/O pins per PORTA, another 8 for PORTB and so on.
the outputs of these ports are controlled by the 8bits..
Suppose we wish to turn on an LED which we are going to connect to bit 4 on PORTB. We first of all have to instruct the microcontroller to ensure that PORTB bit 4 is an output.(the micro needs to know if ports are outputs are inputs you cant just plug stuff in)
in a pic micro you would say TRIA = 0x00010000 this tell the micro portB4 is an output.Now that we have said its an output, PORTA = 0b00010000 sets this portB4 to a high voltage, in other words this will illuminate the LED or whatever you have connected to the port.
PORTA = 0b00000000 // turns off PORTA outputs,
PORTA = 0b10000000 // turns on PORTA pin 7
The hexadecimal values its just a notation, the first letter represents the first 4 bits of your int and the other the other four bits because you are declaring a const uint8 variable wich means an unsigned integer of 8 bits, this is an integer but without sign, the bit of the sign is removed, so it can store 2^8 posibles values.
I'm just answering the first question. Hope it help!!
Literally confused about htonl(). In so many links I found that code to do htonl is :
#define HTONL(n) (((((unsigned long)(n) & 0xFF)) << 24) | \
((((unsigned long)(n) & 0xFF00)) << 8) | \
((((unsigned long)(n) & 0xFF0000)) >> 8) | \
((((unsigned long)(n) & 0xFF000000)) >> 24))
If the same code is ran on both the machines, it is going to swap the byte orders.
Example : uint32_t a = 0x1;
On Little Endian:
Addr value
100 1
101 0
102 0
103 0
After htonl(a)
Addr value
100 0
101 0
102 0
103 1
============================================
On Big Endian machine:
Addr value
100 0
101 0
102 0
103 1
After htonl(a)
Addr value
100 1
101 0
102 0
103 0
Does that mean that htonl() will change the order of the bytes irrespective of machine architecture ?
If you use it correctly then it should not swap bytes on big endian machines.
htonl is defined in a header which is architecture specific. Normally machine/endian.h will include your architecture specific header. If you redefine it then it will do what you set it to. If you want the real behaviour then you should always use the right architecture header. On big endian machines it's a no op. On little endian machines it's often linked to a specific processor instruction.
I think key to understanding the function is understanding its name.
The function htonl is:
H ost to
N etwork L ong
That is: it converts from the Host order to the Network defined (Big Endian) order.
Different hosts could have different representations:
Big-Ending
Little-Endian
even some other representation (imagine some new machine that works on base-3, or has Middle-Outwards representation?
Whatever the machine format, this function converts to a common Network format so the data can be easily, reliably sent to other machines on the network that may have different representations.
Once you understand the concept of Host / Network, then it shouldn't be hard to understand that Network order is Big-Endian, and any Host that is Big-Endian doesn't need any conversion at all.
I have two problems related to my implementation -
I need a function which can convert a given link-layer address from text to standard format like we have a similar function at n/w layer for IP addresses inet_pton() which converts a given IP address from text to standard IPv4/IPv6 format.
Is there any difference b/w Link-layer address and 48-bit mac address
(in case of IPv6 specifically)?
If no, then link layer address should always also be of 48 bit in length, if I am not wrong.
Thanks in advance. Please excuse if I am missing something trivial.
EDIT :
Ok.. I am clear on the difference b/w link layer address and ethernet mac address. There are several types of data link layer addresses, ethernet mac address is just one.
Now, this arises one more problem ... As i said in my first question I need to convert a link layer address given from command line to its standard form. The solution provided here will just work for ethernet mac address only.
Isn't there any standard function for the purpose ? What I would like to do is to create an application where user will enter values for different options present in ICMP router advertisement message as stated in RFC 4861.
Option Formats
Neighbor Discovery messages include zero or more options, some of
which may appear multiple times in the same message. Options should
be padded when necessary to ensure that they end on their natural
64-bit boundaries. All options are of the form:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ ... ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields:
Type 8-bit identifier of the type of option. The
options defined in this document are:
Option Name Type
Source Link-Layer Address 1
Target Link-Layer Address 2
Prefix Information 3
Redirected Header 4
MTU 5
Length 8-bit unsigned integer. The length of the option
(including the type and length fields) in units of
8 octets. The value 0 is invalid. Nodes MUST
silently discard an ND packet that contains an
option with length zero.
4.6.1. Source/Target Link-layer Address
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Link-Layer Address ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields:
Type
1 for Source Link-layer Address
2 for Target Link-layer Address
Length The length of the option (including the type and
length fields) in units of 8 octets. For example,
the length for IEEE 802 addresses is 1
[IPv6-ETHER].
Link-Layer Address
The variable length link-layer address.
The content and format of this field (including
byte and bit ordering) is expected to be specified
in specific documents that describe how IPv6
operates over different link layers. For instance,
[IPv6-ETHER].
One more thing I am not quite handy with C++, can you please provide a C alternative ?
Thanks.
Your first question, it's not that hard to write, and since MAC addresses are represented by a 6 byte array you don't need to take machine-dependency into account ( like endian-ness and stuff )
void str2MAC(string str,char* mac) {
for(int i=0;i<5;i++) {
string b = str.substr(0,str.find(':'));
str = str.substr(str.find(':')+1);
mac[i] = 0;
for(int j=0;j<b.size();b++) {
mac[i] *= 0x10;
mac[i] += (b[j]>'9'?b[j]-'a'+10:b[j]-'0');
}
}
mac[5] = 0;
for(int i=0;i<str.size();i++) {
mac[5] *= 0x10;
mac[5] += (str[i]>'9'?str[i]-'a'+10:str[i]-'0');
}
}
About your second question, IP ( and IPv6 specifically) is a Network Layer protocol and is above the Link Layer, thus doesn't have to do anything with the Link Layer.
If by Link Layer you mean Ethernet, yes Ethernet Address is always 48bits, but there are other Link Layer protocols presents which may use other formats.
I have been given the following bit of code as an example:
Make port 0 bits 0-2 outputs, other to be inputs.
FIO0DIR = 0x00000007;
Set P0.0, P0.1, P0.2 all low (0)
FIO0CLR = 0x00000007;
I have been told that the port has 31 LED's attached to it. I cant understand why, to enable the first 3 outputs, it is 0x00000007 not 0x00000003?
These GPIO config registers are bitmaps.
Use your Windows calculator to convert the hex to binary:
0x00000007 = 111, or with 32 bits - 00000000000000000000000000000111 // three outputs
0x00000003 = 11, or with 32 bits - 00000000000000000000000000000011 // only two outputs
Because the value you write to the register is a binary bit-mask, with a bit being one meaning "this is an output". You don't write the "number of outputs I'd like to have", you are setting 8 individual flags at the same time.
The number 7 in binary is 00000111, so it has the lower-most three bits set to 1, which here seems to mean "this is an output". The decimal value 3, on the other hand, is just 00000011 in binary, thus only having two bits set to 1, which clearly is one too few.
Bits are indexed from the right, starting at 0. The decimal value of bit number n is 2n. The decimal value of a binary number with more than one bit set is simply the sum of all the values of all the set bits.
So, for instance, the decimal value of the number with bits 0, 1 and 2 set is 20 + 21 + 22 = 1 + 2 + 4 = 7.
Here is an awesome ASCII table showing the 8 bits of a byte and their individual values:
+---+---+---+---+---+---+---+---+
index | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+---+---+---+---+---+---+---+---+
value |128| 64| 32| 16| 8 | 4 | 2 | 1 |
+---+---+---+---+---+---+---+---+