inet_pton() counterpart for link layer address - c

I have two problems related to my implementation -
I need a function which can convert a given link-layer address from text to standard format like we have a similar function at n/w layer for IP addresses inet_pton() which converts a given IP address from text to standard IPv4/IPv6 format.
Is there any difference b/w Link-layer address and 48-bit mac address
(in case of IPv6 specifically)?
If no, then link layer address should always also be of 48 bit in length, if I am not wrong.
Thanks in advance. Please excuse if I am missing something trivial.
EDIT :
Ok.. I am clear on the difference b/w link layer address and ethernet mac address. There are several types of data link layer addresses, ethernet mac address is just one.
Now, this arises one more problem ... As i said in my first question I need to convert a link layer address given from command line to its standard form. The solution provided here will just work for ethernet mac address only.
Isn't there any standard function for the purpose ? What I would like to do is to create an application where user will enter values for different options present in ICMP router advertisement message as stated in RFC 4861.
Option Formats
Neighbor Discovery messages include zero or more options, some of
which may appear multiple times in the same message. Options should
be padded when necessary to ensure that they end on their natural
64-bit boundaries. All options are of the form:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ ... ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields:
Type 8-bit identifier of the type of option. The
options defined in this document are:
Option Name Type
Source Link-Layer Address 1
Target Link-Layer Address 2
Prefix Information 3
Redirected Header 4
MTU 5
Length 8-bit unsigned integer. The length of the option
(including the type and length fields) in units of
8 octets. The value 0 is invalid. Nodes MUST
silently discard an ND packet that contains an
option with length zero.
4.6.1. Source/Target Link-layer Address
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Link-Layer Address ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields:
Type
1 for Source Link-layer Address
2 for Target Link-layer Address
Length The length of the option (including the type and
length fields) in units of 8 octets. For example,
the length for IEEE 802 addresses is 1
[IPv6-ETHER].
Link-Layer Address
The variable length link-layer address.
The content and format of this field (including
byte and bit ordering) is expected to be specified
in specific documents that describe how IPv6
operates over different link layers. For instance,
[IPv6-ETHER].
One more thing I am not quite handy with C++, can you please provide a C alternative ?
Thanks.

Your first question, it's not that hard to write, and since MAC addresses are represented by a 6 byte array you don't need to take machine-dependency into account ( like endian-ness and stuff )
void str2MAC(string str,char* mac) {
for(int i=0;i<5;i++) {
string b = str.substr(0,str.find(':'));
str = str.substr(str.find(':')+1);
mac[i] = 0;
for(int j=0;j<b.size();b++) {
mac[i] *= 0x10;
mac[i] += (b[j]>'9'?b[j]-'a'+10:b[j]-'0');
}
}
mac[5] = 0;
for(int i=0;i<str.size();i++) {
mac[5] *= 0x10;
mac[5] += (str[i]>'9'?str[i]-'a'+10:str[i]-'0');
}
}
About your second question, IP ( and IPv6 specifically) is a Network Layer protocol and is above the Link Layer, thus doesn't have to do anything with the Link Layer.
If by Link Layer you mean Ethernet, yes Ethernet Address is always 48bits, but there are other Link Layer protocols presents which may use other formats.

Related

What is the most efficient way to encode a character ? (with regards to memory)

I have 4 characters that I want to encode:
Is there a way to give them an "encoded version", instead of ASCII? Binary would be the best but I have only 0 and 1 for the binary, and if I would then use the sequence it wouldn't be clear which character is 0 and which 1 and which 11 for example. Is there any other way to encode efficiently, with minimum number of bits?
Thanks
There are 4 different values. 2 bits can encode 4 values.
00
01
10
11
That means each byte can encode 4 different values.
+---+---+---+---+---+---+---+---+
| 4 | 4 | 3 | 3 | 2 | 2 | 1 | 1 |
+---+---+---+---+---+---+---+---+
For example, we could choose the following encoding scheme:
T = 00
G = 01
A = 10
C = 11
110 (0b01101110) would therefore mean ACAG (assuming the first value is found in the least significant bits).
+---+---+---+---+---+---+---+---+
| 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
+---+---+---+---+---+---+---+---+
---G--- ---A--- ---C--- ---A---
That's means the string would only use as little as 25% of the space used when using ASCII.
Except this doesn't quite work. There's no way to know the length of the sequence. For example, how would you encode ACA using the above scheme?
There are options:
Somehow prefix the sequence by its length.
This could end up doubling the length of the encoded string if it's really short.
Introduce a 5th, sentinel value to indicate the end of the string.
This complicates the encoding (since we no longer have a power of 2). It also reduces the compression factor (8 values per 3 bytes, so as little as 37.5% of the space used when using ASCII).
Use the first 2 bits of each byte to indicate how many values are actually present in the byte. This reduces the compression factor (3 values per byte, so as little as 33% of the space used when using ASCII).
You can can use real compression techniques (e.g. use frequency analysis to use shorter sequences to more common subsequences), probably using using zlib or a more modern equivalent. This method is very effective (perhaps even using 1/10th of what ASCII would), but it's only effective if you have very long sequences. It also prevents random access. This means that can't get the Nth value without first reading all the previous one. In short, you'd have to decode the string to ASCII to search it.
You indicates in a comment that you wanted to search the sequences for subsequences, but none of these approaches makes that easier (and the fourth one prevents it, as mentioned above). They make it very complicated, in fact. Converting the sequence to ASCII to search it is highly recommended.

Confusion about a memory alignment example

When reading some posts for memory alignment knowlodge, I have a question about a good answer from What is aligned memory allocation?, #dan04.
Reading the example he gives,
0 1 2 3 4 5 6 7
|a|a|b|b|b|b|c|d| bytes
| | | words
The problem is that on some CPU architectures, the instruction to load a 4-byte integer from memory only works on word boundaries. So your program would have to fetch each half of b with separate instructions.
Why can't (Can it?) read the 4 bytes(a word, assume 32bits) directly that contains b?
For example, if I want b
0 1 2 3 4 5 6 7
|a|a|b|b|b|b|c|d| bytes
| | a word(assume it's 32 bit, get b directly)
read 1 word starts from address 2.
if I want a
0 1 2 3 4 5 6 7
|a|a|b|b|b|b|c|d| bytes
| | a word
read 1 word starts from address 0 and get the first 2 bytes and discard the latter 2 bytes.
if I want c and d
0 1 2 3 4 5 6 7
|a|a|b|b|b|b|c|d| bytes
| | a word
read 1 word starts from address 4 and get the last 2 bytes and discard the first 2 bytes.
Then it seems alignment is not needed which is definitely incorrect..
I must have misunderstood something or lack some other knowledge, please help correct me..
"Why can't (Can it?) read the 4 bytes(a word, assume 32bits) directly that contains b?"
The answer you have quoted already right above. The key is "on word boundaries". That is not the same as "in word size". I.e. those CPUs can read word width only from exactly N*wordwidth, not from N*wordwidth+2.
A wordboundary (only applicable on the mentioned platforms) is a clean multiple of the wordwidth. 0, 4, 8, 12... But not 2, 6, 10...
Picking up your phrasing from comment, yes.
Those CPUs can only read from address 0, 4, 8, 12, 16 and so on.
E.g. one word from addresses 0-3, one word from address 4-7.
(Note the added 12.)

Time based UUID does not follow creation order according when implementing RFC 4122

I am creating a custom algorithm to embed info into a timeUUID. When studying the RFC 4122. In the spec, the version 1 UUID has the following structure:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| time_low |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| time_mid | time_hi_and_version |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|clk_seq_hi_res | clk_seq_low | node (0-1) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| node (2-5) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
I've found that the lower part of timestamp (rightmost 32 bits) goes in front of the ID making it the most relevant part when sorting UUID.
What I do not understand is how this specification works in a way when sorting UUIDs the sorting will follow creation order.
To illustrate the question, please find two examples here where timestamp t1 > t2 but the created UUID with that timestamp will be in the reverse order.
t1 = 137601405637595834 // 0x1e8dbbfd79f92ba
t2 = 3617559227 // 0xd79f92bb
are transformed to the following parts
t1_low: Uint = 3617559226 // 0xd79f92ba
t1_mid: Ushort = 56255 // 0xdbbf
t1_hi: Ushort = 1e8 // 0x1e8
t2_low: Uint = 3617559226 // 0xd79f92bb
t2_mid: Ushort = 0 // 0x0
t2_hi: Ushort = 0 // 0x0
Since the least significant bytes are not relevant for the order in this case, I will ignore that for the sake of simplification.
The UUIDs geenrated using these timestamps are
UUID1 = d79f92ba-dbbf-11e8-8808-000000000002
UUID2 = d79f92bb-0000-1000-a68b-000000000004
Clearly UUID1 < UUID2 even when its timestamps are in the reverse order.
What is wrong on my analysis?
The UUIDv1 spec deliberately puts the most entropy in the high-order bits so that keys do not sort as you expected; instead, they will be seemingly randomly yet roughly evenly distributed across the full number range regardless of creation order--just like UUIDv3/v4/v5.
If you want a sortable timestamp, add another column; using UUID as anything but an opaque identifier will end up biting you later.

Why do I need to use type** to point to type*?

I've been reading Learn C The Hard Way for a few days, but here's something I want to really understand. Zed, the author, wrote that char ** is for a "pointer to (a pointer to char)", and saying that this is needed because I'm trying to point to something 2-dimensional.
Here is what's exactly written in the webpage
A char * is already a "pointer to char", so that's just a string. You however need 2 levels, since names is 2-dimensional, that means you need char ** for a "pointer to (a pointer to char)" type.
Does this mean that I have to use a variable that can point to something 2-dimensional, which is why I need two **?
Just a little follow-up, does this also apply for n dimension?
Here's the relevant code
char *names[] = { "Alan", "Frank", "Mary", "John", "Lisa" };
char **cur_name = names;
No, that tutorial is of questionable quality. I wouldn't recommend to continue reading it.
A char** is a pointer-to-pointer. It is not a 2D array.
It is not a pointer to an array.
It is not a pointer to a 2D array.
The author of the tutorial is likely confused because there is a wide-spread bad and incorrect practice saying that you should allocate dynamic 2D arrays like this:
// BAD! Do not do like this!
int** heap_fiasco;
heap_fiasco = malloc(X * sizeof(int*));
for(int x=0; x<X; x++)
{
heap_fiasco[x] = malloc(Y * sizeof(int));
}
This is however not a 2D array, it is a slow, fragmented lookup table allocated all over the heap. The syntax of accessing one item in the lookup table, heap_fiasco[x][y], looks just like array indexing syntax, so therefore a lot of people for some reason believe this is how you allocate 2D arrays.
The correct way to allocate a 2D array dynamically is:
// correct
int (*array2d)[Y] = malloc(sizeof(int[X][Y]));
You can tell that the first is not an array because if you do memcpy(heap_fiasco, heap_fiasco2, sizeof(int[X][Y])) the code will crash and burn. The items are not allocated in adjacent memory.
Similarly memcpy(heap_fiasco, heap_fiasco2, sizeof(*heap_fiasco)) will also crash and burn, but for other reasons: you get the size of a pointer not an array.
While memcpy(array2d, array2d_2, sizeof(*array2d)) will work, because it is a 2D array.
Pointers took me a while to understand. I strongly recommend drawing diagrams.
Please have a read and understand this part of the C++ tutorial (at least with respect to pointers the diagrams really helped me).
Telling you that you need a pointer to a pointer to char for a two dimensional array is a lie. You don't need it but it is one way of doing it.
Memory is sequential. If you want to put 5 chars (letters) in a row like in the word hello you could define 5 variables and always remember in which order to use them, but what happens when you want to save a word with 6 letters? Do you define more variables? Wouldn't it be easier if you just stored them in memory in a sequence?
So what you do is you ask the operating system for 5 chars (and each char just happens to be one byte) and the system returns to you a memory address where your sequence of 5 chars begins. You take this address and store it in a variable which we call a pointer, because it points to your memory.
The problem with pointers is that they are just addresses. How do you know what is stored at that address? Is it 5 chars or is it a big binary number that is 8 bytes? Or is it a part of a file that you loaded? How do you know?
This is where the programming language like C tries to help by giving you types. A type tells you what the variable is storing and pointers too have types but their types tell you what the pointer is pointing to. Hence, char * is a pointer to a memory location that holds either a single char or a sequence of chars. Sadly, the part about how many chars are there you will need to remember yourself. Usually you store that information in a variable that you keep around to remind you how many chars are there.
So when you want to have a 2 dimensional data structure how do you represent that?
This is best explained with an example. Let's make a matrix:
1 2 3 4
5 6 7 8
9 10 11 12
It has 4 columns and 3 rows. How do we store that?
Well, we can make 3 sequences of 4 numbers each. The first sequence is 1 2 3 4, the second is 5 6 7 8 and the third and last sequence is 9 10 11 12. So if we want to store 4 numbers we will ask the system to reserve 4 numbers for us and give us a pointer to them. These will be pointers to numbers. However since we need to have 3 of them we will ask the system to give us 3 pointers to pointers numbers.
And that's how you end up with the proposed solution...
The other way to do it would be to realize that you need 4 times 3 numbers and just ask the system for 12 numbers to be stored in a sequence. But then how do you access the number in row 2 and column 3? This is where maths comes in but let's try it on our example:
1 2 3 4
5 6 7 8
9 10 11 12
If we store them next to each other they would look like this:
offset from start: 0 1 2 3 4 5 6 7 8 9 10 11
numbers in memory: [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]
So our mapping is like this:
row | column | offset | value
1 | 1 | 0 | 1
1 | 2 | 1 | 2
1 | 3 | 2 | 3
1 | 4 | 3 | 4
2 | 1 | 4 | 5
2 | 2 | 5 | 6
2 | 3 | 6 | 7
2 | 4 | 7 | 8
3 | 1 | 8 | 9
3 | 2 | 9 | 10
3 | 3 | 10 | 11
3 | 4 | 11 | 12
And we now need to work out a nice and easy formula for converting a row and column to an offset... I'll come back to it when I have more time... Right now I need to get home (sorry)...
Edit: I'm a little late but let me continue. To find the offset of each of the numbers from a row and column you can use the following formula:
offset = (row - 1) * 4 + (column - 1)
If you notice the two -1's here and think about it you will come to understand that it is because our row and column numberings start with 1 that we have to do this and this is why computer scientists prefer zero based offsets (because of this formula). However with pointers in C the language itself applies this formula for you when you use a multi-dimensional array. And hence this is the other way of doing it.
From your question what i understand is that you are asking why you need char ** for the variable which is declared as *names[]. So the answer is when you simply write names[], than that it is the syntax of array and array is basically a pointer.
So when you write *names[] than that means you are pointing to an array. And as array is basically a pointer so that means you have a pointer to a pointer and thats why compiler will not complain if you write
char ** cur_name = names ;
In above line you are declaring a pointer to a character pointer and then initialinzing it with the pointer to an array (remember array is also pointer).

Matrices - memory

Let's say that I have a matrix A=[];
I want to know if there is any way to represent it in a way where only the filled blocks must occupy memory and remaining must not, e.g.:
A = 1 0 0
0 1 0
0 0 1
Now, every block would take 1 bit of memory to store the matrix,
hence I would like to know is it possible to store matrix as:
A = 1
1
1
and the empty spaces must not occupy any memory at all. Is there any file format to represent a matrix in such a way?
No. You're dealing with bits. It would take MORE memory to store a list of the "filled" bits than it would to simply store the bits. e.g. for a simple 1x8 matrix:
0 1 2 3 4 5 6 7 <---bit-wise addresses
m = [0,1,0,0,0,1,1,1]
could be stored as a SINGLE byte of memory, at a storage ratio of 1 bit per bit.
To store just the locations of the SET bits would take 4 bytes. If all of the bits were set, you'd need 8 bytes to store those locations. So now you've got from a constant 1 byte requirement, to a variable 0 -> 8 bytes.
You could develop an way where you can store Informatiosn about the positions in a List but that would at least consummee more memmory as you would win this way. So at least no.

Resources