What does this line of code do? - c

for (nbyte=0; nbyte<6; nbyte++) {
mac->byte[nbyte] = (char) (strtoul(string+nbyte*3, 0, 16) & 0xFF);
}
This is a small piece of code found in macchanger, string is a char pointer that points a MAC address, what i don't know is that why must i convert it to a unsigned long int,and why must i *3 then AND it with 0xFF.

Most likely the string is a mac address in the form of
XX:YY:ZZ:AA:BB:CC
Doing nbyte*3 moves the "starting offset" pointer up 3 characters in the string each iteration, skipping over the :. Then strotoul reads 16bits (2 characters) and converts them to an unsigned long, which is then ANDed with 0xFF to strip off all but the lowest byte, which gets cast to a char.

It's parsing from a hexadecimal string, The third parameter of strtoul is the base of the conversion (16 in this case). The input is presumably in this form:
12:34:56:78:9a:bc
The pointer is incremented by 3 each time to start at each pair of hexadecimal digits, which are three apart including the colon.
I don't think the & 0xFF is strictly necessary here. It was presumably there to attempt to correctly handle the case where an input contains a number larger than 0xFF, but the algorithm will still fail for this case for other reasons.

string+nbyte*3
string is a pointer to char (as all C strings are). When you add an integer, x, to a pointer you get the location pointer+x. By adding nbyte*3 you add 3 to pointer, then 6, then 9th,
strtoul converts strings to integers. Specifically here, by passing 16, its specifying base 16 (hex) as the format in the string. Here by passing nbyte*3, your pointer points to the substring beginning at the 3rd, 6th, 9th, etc character of string.
Afte the conversion at each location, the & 0xFF unsets any bits past the 8 LSB, then casts that value to a char.
The result is then stored in a location in the byte array.

Related

Convert a char* to uppercase in C without using a loop

Is it possible to convert a char* to uppercase without traversing character by character in a loop ?
Assumption:
1. Char pointer points to fixed size string array.
2. The array pointed to contains only lowercase characters
In the ASCII encoding, converting lowercase to uppercase amounts to setting the bit of weight 32 (i.e. 20H, the space character).
With a bitwise operator,
Char|= 0x20;
You can process several characters at a time by mapping longer data types on the array. For instance, to convert an array of 11 characters,
int ToUpper= 0x20202020;
*(int*) &Char[0]|= ToUpper;
*(int*) &Char[4]|= ToUpper;
*(short*)&Char[8]|= ToUpper;
Char[10]|= ToUpper;
You can go to 64 bit ints and even larger (up to 512 bits = 64 characters at a time) with the SIMD intrinsics (SSE, AVX).
If your code allows it, it is better to extend the buffer length to the next larger data type so that all bytes can be updated in a single operation. But don't forget to restore the terminating null.

Using int to print character constants [duplicate]

This question already has answers here:
Multi-character constant warnings
(6 answers)
Print decimal value of a char
(5 answers)
Closed 5 years ago.
I wrote the following program,
#include<stdio.h>
int main(void)
{
int i='A';
printf("i=%c",i);
return 0;
}
and I got the result as,
i=A
So I tried another program,
#include<stdio.h>
int main(void)
{
int i='ABC';
printf("i=%c",i);
return 0;
}
According to me, since 32 bits are used to store an int value and each of 'A', 'B' and 'C' have 8 bit ASCII codes which totals to 24 bits therefore 24 bits were stored in a 32 bit unit. So I expected the output to be,
i=ABC
but the output instead was
i=C
and I can't understand why?
'ABC' in this case is a integer character constant as per section 6.4.4.4.10 of the standard.
An integer character constant has type int. The value of an integer
character constant containing a single character that maps to a
single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer. The
value of an integer character constant containing more than one
character (e.g.,'ab'), or containing a character or escape sequence
that does not map to a single-byteexecution character, is
implementation-defined. If an integer character constant contains a
single character or escape sequence, its value is the one that results
when an object with type char whose value is that of the single
character or escape sequence is converted to type int.
In this case, 'A'==0x41, 'B'==0x42, 'C'==0x43, and your compiler then interprets i to be 0x414243. As said in the other answer, this value is implementation dependent.
When you try to access it using '%c', the overflown part will be cut and you are only left with 0x43, which is 'C'.
To get more insight to it, read the answers to this question as well.
The conversion specifier c used in this call
printf("i=%c",i);
in fact extracts one character from the integer argument. So using this specifier you in any case can not get three characters as the output.
From the C Standard (7.21.6.1 The fprintf function)
c If no l length modifier is present, the int argument is converted to
an unsigned char, and the resulting character is written
Take into account that the internal representation of a multi-byte character constant is implementation defined. From the C Standard (6.4.4.4 Character constants)
...The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape
sequence that does not map to a single-byte execution character, is
implementation-defined.
'ABC' is an integer character constant. Depending on code set (overwhelming it is ASCII), endian, int width (apparently 32 bits in OP's case), it may have the same value like below. It is implementation defined behavior.
'ABC'
0x41424300
0x434241
or others.
The "%c" directs printf() to take the int value, cast it to unsigned char and print the associated character. This is the main reason for apparent loss of information.
In OP's case, it appears that i took on the value of 0x434241.
int i='A';
printf("i=%c",i); --> 'A'
// same as
printf("i=%c",0x434241); --> 'A'
if you want i to contain 3 characters you need to init a array that contains 3 characters
char i[3];
i[0]= 'A';
i[1]= 'B';
i[2]='C';
the ' ' can contain only one char your code converts the integer i into a character or better you store in your 32 bit intiger a converted 8 bit character. But i think You want to seperate the 32 bits into 8 bit containers make a char array like char i[3]. and then you will see that
int j=i;
this will result in an error because you are unable to convert a char array into a integer.
In C, 'A' is an int constant that's guaranteed to fit into a char.
'ABC' is a multicharacter constant. It has an int type, but an implementation defined value. The behaviour on using %c to print that in printf is possibly undefined if the value cannot fit into a char.

uint64 to string in C

I have a uint64 value that I want to convert into a string because it has to be inserted as the payload of an HTTP POST request.
I've already tried many solutions (ltoa, this solution ) but my problem still remains.
My function is the following:
void check2(char* fingerprint, guint64 s_id) {
//stuff
char poststr[400] = "action=CheckFingerprint&sessionid=";
//convert s_id to, for example, char* myChar
strcat(poststr, myChar);
}
I want to convert s_id to char*. I've tried:
1) char ses[8]; ltoa(s_id,ses,10) but I have a segmentation fault;
2) char *buf; sprintf(buf, "%" PRIu64, s_id);
I'm working on a APIs, so I have seen that when this guint64 variable is printed, it has the following form:
JANUS_LOG(LOG_INFO, "Creating new session: %"SCNu64"\n", session_id);
sprintf is the right way to go with an unsigned 64 bit format specifier.
You'll need to allocate enough space for 16 hex digits and the null byte. Here I've allocated 20 bytes to accommodate a leading 0x as well and then I rounded it up to 20 for no good reason other than it feels better than 19.
char foo[20];
sprintf(foo, "0x%016" PRIx64, (uint64_t)numberToConvert);
will print the number in hex with leading 0x and leading zeros padded up to 16. You do not need the cast if numberToConvert is already a uint64_t
i have a uint64 value that i want to convert into char* because of it have to be inserted as payload of an HTTP POST request.
What you have is a fundamental misunderstanding.
To insert a text representation of your value into a document, you need to convert it to a sequence of characters, which is quite a different thing from a pointer to a character (char *). One of your options, which seems to be what you're really after, is to convert the value to a sequence of characters in the form of a C string -- that is, a null-terminated array of characters. You would then have or be able to obtain a pointer to the first character in the sequence.
That explains what's wrong with this attempted solution:
char *buf;
sprintf(buf, "%" PRIu64, s_id);
You are trying to write the string representation of your number into the array pointed-to by buf, but it doesn't point to one. Not having been initialized or assigned, its value is indeterminate.
Even if you your buf pointed to an array, it is essential that the array be long enough to accommodate all the digits of the value's decimal representation, plus a terminator. That's probably what's wrong with your other attempt:
char ses[8]; ltoa(s_id,ses,10)
An unsigned, 64-bit binary number may require up to 20 decimal digits, plus you need space for a terminator. The array you're providing is not nearly large enough, unless you can be confident that the actual values you're going to write will not exceed 9,999,999 (which is well within the range of a 32-bit integer).

How does memset( ) work even for an array as well for a string

As memset takes the address of an string or array and consider them as a buffer of character.
How does it know that the given value that has to be assigned should be assigned in sets of 1 byte (character) or in sets of 4 bytes (integer).
Except it doesn't. In the third parameter You must specify how many bytes to write. It uses the unsigned char(one byte) of the data in second parameter. So If You used memset(ptr, 257, 4) You would set 4 bytes to 0x01.
memset always set that same byte value to every single byte. It has no way of differentiating between byte and integer arrays.
So if you memset an integer to 0x02, The integer will be set to 0x02020202.
It doesn't need to.. you have to provide exact number of bytes to be set as the last arguement to memset. If you provide lesser byte numbers, it will not set all the bytes

Get length of multibyte UTF-8 sequence

I am parsing some UTF-8 text but am only interested in characters in the ASCII range, i.e., I can just skip multibyte sequences.
I can easily detect the beginning of a sequence because the sign bit is set, so the char value is < 0. But how can I tell how many bytes are in the sequence so I can skip over it?
I do not need to perform any validation, i.e., I can assume the input is valid UTF-8.
Just strip out all bytes which are no valid ascii, don't try to get cute and interpret bytes >127 at all. This works as long as you don't have any combining sequences with base character in ascii range. For those you would need to interpret the codepoints themselves.
Although Deduplicator's answer is more appropriate to the specific purpose of skipping over multibyte sequences, if there is a need to get the length of each such character, pass the first byte to this function:
int getUTF8SequenceLength (unsigned char firstPoint) {
firstPoint >>= 4;
firstPoint &= 7;
if (firstPoint == 4) return 2;
return firstPoint - 3;
}
This returns the total length of the sequence, including the first byte. I'm using an unsigned char value as the firstPoint parameter here for clarity, but note this function will work exactly the same way if the parameter is a signed char.
To explain:
UTF-8 uses bits 5, 6, and 7 in the first byte of a sequence to indicate the remaining length. If all three are set, the sequence is 3 additional bytes. If only the first of these from the left (the 7th bit) is set, the sequence is 1 additional byte. If the first two from the left are set, the sequence is 2 additional bytes. Hence, we want to examine these three bits (the value here is just an example):
11110111
^^^
The value is shifted down by 4 then AND'd with 7. This leaves only the 1st, 2nd, and 3rd bits from the right as the only possible ones set. The value of these bits are 1, 2, and 4 respectively.
00000111
^^^
If the value is now 4, we know only the first bit from the left (of the three we are considering) is set and can return 2.
After this, the value is either 7, meaning all three bits are set, so the sequence is 4 bytes in total, or 6, meaning the first two from the left are set so the sequence is 3 bytes in total.
This covers the range of valid Unicode characters expressed in UTF-8.

Resources