How to convert a hexadecimal char into a 4-bit binary representation? - c

I wish to compare a SHA-256 hash which is stored in u8[32](after being calculated in kernel-space) with a 64 char string that the user passes as string.
For eg. : User passes a SHA-256 hash "49454bcda10e0dd4543cfa39da9615a19950570129420f956352a58780550839" as char* which will take 64 bytes. But this has to be compared with a hash inside the kernel space which is represented as u8 hash[32].
The hash inside the kernel gets properly printed in ASCII by the following code:
int i;
u8 hash[32];
for(i=0; i<32; i++)
printk(KERN_CONT "%hhx ", hash[i]);
Output :
"49 45 4b cd a1 0e 0d d4 54 3c fa 39 da 96 15 a1 99 50 57 01 29 42 0f 95 63 52 a5 87 80 55 08 39 "
As the complete hash is stored in 32 bytes and printed as 64 chars in groups of 2 chars per u8 space, I assume that currently one u8 block stores information worth 2 chars i.e. 00101111 prints to be 2f.
Is there a way to store the 64 bytes string in 32 bytes so that it can be compared?

Here is how to use scanf to do the conversion:
char *shaStr = "49454bcda10e0dd4543cfa39da9615a19950570129420f956352a58780550839";
uint8_t sha[32];
for (int i = 0 ; i != 32 ; i++) {
sscanf(shaStr+2*i, "%2" SCNx8, &sha[i]);
printf("%02x ", sha[i]);
}
The approach here is to call sscanf repeatedly with the "%2" SCNx8 format specifier, which means "two hex characters converted to uint8_t". The position is determined by the index of the loop iteration, i.e. shaStr+2*i
Demo.

Characters are often stored in ASCII, so start by having a look at an ASCII chart. This will show you the relationship between a character like 'a' and the number 97.
You will note all of the numbers are right next to each other. This is why you often see people do c-'0' or c-48 since it will convert the ASCII-encoded digits into numbers you can use.
However you will note that the letters and the numbers are far away from each other, which is slightly less convenient. If you arrange them by bits, you may notice a pattern: Bit 6 (&64) is set for letters, but unset for digits. Observing that, converting hex-ASCII into numbers is straightforward:
int h2i(char c){return (9*!!(c&64))+(c&15);}
Once you have converted a single character, converting a string is also straightforward:
void hs(char*d,char*s){while(*s){*d=(h2i(*s)*16)+h2i(s[1]);s+=2;++d;}}
Adding support for non-hex characters embedded (like whitespace) is a useful exercise you can do to convince yourself you understand what is going on.

Related

Printing out stack gives weird positioning of

I'm currently trying to understand string formatting vulnerabilities in C, but to get there, I have to understand some weird (at least for me) behaviour of the memory stack.
I have a program
#include <string.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
char buffer[200];
char key[] = "secret";
printf("Location of key: %p\n", key);
printf("Location of buffer: %p\n", &buffer);
strcpy(buffer, argv[1]);
printf(buffer);
printf("\n");
return 0;
}
which I call with
./form AAAA.BBBE.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
What I would expect is to get something like
... .41414141.42424245. ...
but I get
... .41414141.4242422e.30252e45. ... (there is some character in between B and E).
What is happening here?
I disabled ASLR and stack protection and compile it with -m32 flag.
I think your output is just fine. x86 is little-endian - least significant byte of a number has smaller address in memory, so 1000 (0x3E8) is stored as E8 03, not 03 E8 (that would be big-endian).
Let's assume that the compiler passes all arguments to printf through stack and variadic arguments are expected to be laid on the stack from its top to its end (on x86 that means "from lower addresses to higher addresses").
So, before calling printf our stack would like like this:
<return address><something>AAAA.BBBE.%08x.%<something>
^ - head of the stack
Or, if we spell each byte in hex:
<return address><something>414141412e424242452e253038782e25<something>
^ - head of the stack A A A A . B B B E . % 0 8 x . %
Then you ask printf to take a lot of unsigned ints from the stack (32-bit, presumably) and print them in hexadecimal, separated by dots. It skips <return address> and some other details of stack frame and starts from some random point in the stack before buffer (because buffer is in parent's stack frame). Suppose that at some point it takes the following chunk as 4-byte int:
<return address><something>414141412e424242452e253038782e25<something>
^ - head of the stack A A A A . B B B E . % 0 8 x . %
^^^^^^^^
That is, our int is represented in memory with four bytes. Their values are, starting from the byte with the smallest address: 41 41 41 2e. As x86 is a little-endian, 2e is the most significant byte, which means this sequence is interpreted as 0x2e414141 and printed as such.
Now, if we look at your output:
41414141.4242422e.30252e45
We see that there are three ints: 0x41414141 (stored as 41 41 41 41 in memory), 0x4242422e (stored as 2e 42 42 42 in memory because the least significant byte has the smallest address) and 0x30252e45 (stored as 45 2e 25 30 in memory). That is, in that case printf read the following bytes:
number one |number two |number three|
41 41 41 41|2e 42 42 42|45 2e 25 30 |
A A A A |. B B B |E . % 0 |
Which looks perfectly correct to me - it's beginning of buffer as expected.
This is essentially what you're outputting with the %08x formats, and you're on a little-endian machine:
41 41 41 41 2e 42 42 42 45 2e 25 30 38 78 2e 25 30 38 78 2e 25 30 38 78 2e
The first is all 41s, and they get flipped to be all 41s.
The next four bytes are 2e424242, which become 4242422e.
Then, 452e2530 becomes 30252e45.
It's easier to figure this out if you look at buffer in a memory window in your debugger.
By the way, you can print the address of buffer like this (without the &):
printf("Location of buffer: %p\n", buffer);
You're passing AAAA.BBBE.%08x... to printf which is the format specifier. So printf expects an additional unsigned integer argument for every %08x. But you don't provide any, the behaviour will be undefined.
You can read in the C Draft Standard (n1256):
If there are insufficient arguments for the format, the behavior is undefined.
You're getting hexadecimal output from anywhere which is in your case from the stack.

int to hex conversion not going proper for high values 225943 is being converted into 0x000372ffffff97

My C program takes a random int high value and convert it into hex and write it to a file. Everything goes well if the value is 225919 or less
eg. 225875 is 00 03 72 53
but if the value is above 225919 it starts writing extra ffffff for last byte in the hex value example 885943 is 00 03 72 ffffff97, while the right value would have been 00 03 72 97.
Code that writes the value into file is as follows:
char *temp = NULL;
int cze = 225943;
temp = (char *)(&cze);
for (ii = 3; ii >= 0; ii--) {
printf(" %02x ", temp[ii]); //just for printing the values
fprintf(in, "%02x", temp[ii]);
}
Output is: 00 03 72 ffffff97
Expected output: 00 03 72 97
Please help, any pointer is appreciated.
Your temp array contains char values, which in most cases means signed char. The values are then being printed as signed chars, so any byte greater than 0x7f is considered a negative value. When that value is passed to printf, it is implicitly converted to int. This adds one or more bytes containing all 1 bits if the number is negative.
Change the datatype to unsigned char. This will cause the implicit promotion to change to unsigned int and you'll get the correct values.
unsigned char *temp=NULL;
int cze=225943;
temp=(unsigned char *)(&cze);
for(ii=3;ii>=0;ii--){
printf(" %02x ",temp[ii] );//just for printing the values
fprintf(in,"%02x",temp[ii]);
}
Alternately, you can use the hh length modifier in printf, which tells it that the argument is a char or unsigned char. This will restrict it to printing 1 byte's worth of data.
printf(" %02hhx ",temp[ii] );

Compress a struct into a binary file? [C]

This is part of my homework that I'm having difficults to solve.
I have a simple structure:
typedef struct Client {
char* lname;
unsigned int id;
unsigned int car_id;
} Client;
And the exercise is:
Create a text file named as the company name and then branch number with txt extention.
the file contain all clients' details.
The file you created in exercise 1 will be compressed. as a result, a binary file be created with .cmpr extention.
I don't really have an idea how to implement 2.
I remember at the lectures that the professor said we have to use "all" the variable, with binary operators (<< , >> , | , &, ~), but I don't know how to used it.
I'm using Ubuntu, under GCC and Eclipse. I'm using C.
I'd be glad to get helped. thanks!
Let's say the file from step 1 looks like:
user1798362
2324
462345
where the three fields were simply printed on three lines. Note that the above is the text/readable (i.e. ASCII) representation of that file.
Looking at the contents of this file in hex(adecimal) representation we get (with the ASCII character printed below each byte value):
75 73 65 72 31 37 39 38 33 36 32 0a 32 33 32 34 0a 34 36 32 33 34 35 0a
u s e r 1 7 9 8 3 6 2 nl 2 3 2 4 nl 4 6 2 3 4 5 nl
here nl is of course the newline character. You can count that there are 24 bytes.
In step 2 you have to invent another format that saves as many bits as possible. The simplest way to do this is to compress each of the three fields individually.
Similar to where the text format uses a nl to mark the end of a field, you also need a way to define where a binary field begins and ends. A common way is to put a length in front of the binary field data. As a first step we could replace the nl's with a length and get:
58 75 73 65 72 31 37 39 38 33 36 32 20 32 33 32 34 30 34 36 32 33 34 35
-- u s e r 1 7 9 8 3 6 2 -- 2 3 2 4 -- 4 6 2 3 4 5
For now we simply take a whole byte for the length in bits. Note that 58 is the hex representation of 77 (i.e. 11 characters * 8 bits), the bit length of lname',20hex equals 4 * 8 = 32, and30is 6 * 8 = 48. This does not compress anything, as it's still 24 bytes in total. But we already got a binary format because58,20and30` got a special meaning.
The next step would be to compress each field. This is where it gets tricky. The lname field consists of ASCII character. In ASCII only 7 of the 8 bits are needed/used; here's a nice table For example the letter u in binary is 01110101. We can safely chop off the leftmost bit, which is always 0. This yields 1110101. The same can be done for all the characters. So you'll end up with 11 7-bit values -> 77 bits.
These 77 bits now must be fit in 8-bit bytes. Here are the first 4 bytes user in binary representation, before chopping the leftmost bit off:
01110101 01110011 01100101 01110010
Chopping off a bit in C is done by shifting the byte (i.e. unsigned char) to the left with:
unsigned char byte = lname[0];
byte = byte << 1;
When you do this for all characters you get:
1110101- 1110011- 1100101- 1110010-
Here I use - to indicate the bits in these bytes that are now available to be filled; they became available by shifting all bits one place to the left. You now use one or more bit from the right side of the next byte to fill up these - gaps. When doing this for these four bytes you'll get:
11101011 11001111 00101111 0010----
So now there's a gap of 4 bits that should be filled with the bit from the character 1, etc.
Filling up these gaps is done by using the binary operators in C which you mention. We already use the shift left <<. To combine 1110101- and 1110011- for example we do:
unsigned char* name; // name MUST be unsigned to avoid problems with binary operators.
<allocated memory for name and read it from text file>
unsigned char bytes[10]; // 10 is just a random size that gives us enough space.
name[0] = name[0] << 1; // We shift to the left in-place here, so `name` is overwritten.
name[1] = name[1] << 1; // idem.
bytes[0] = name[0] | (name[1] >> 7);
bytes[1] = name[1] << 1;
With name[1] >> 7 we have 1110011- >> 7 which gives: 00000001; the right most bit. With the bitwise OR operator | we then 'add' this bit to 1110101-, resulting in 111010111.
You have to do things like this in a loop to get all the bits in the correct bytes.
The new length of this name field is 11 * 7 = 77, so we've lost a massive 11 bits :-) Note that with a byte length, we assume that the lname field will never be more than 255 / 7 = 36 characters long.
As with the bytes above, you can then coalesce the second length against the final bits of the lname field.
To compress the numbers you first read 'em in with (fscanf(file, %d, ...)) in an unsigned int. There will be many 0s at the left side in this 4-byte unsigned int. The first field for example is (shown in chunks of 4 bit only for readability):
0000 0000 0000 0000 0000 1001 0001 0100
which has 20 unused bits at the left.
You need to get rid of these. Do 32 minus the number of zero's at the left, and you get the bit-length of this number. Add this length to the bytes array by coalescing its bits against those of previous field. Then only add the significant bits of the number to the bytes. This would be:
1001 0001 0100
In C, when working with the bits of an 'int' (but also 'short', 'long', ... any variable/number larger than 1 byte), you must take byte-order or endianness into account.
When you do the above step twice for both numbers, you're done. You then have a bytes array you can write to a file. Of course you must have kept where you were writing in bytes in the steps above; so you know the number of bytes. Note that in most cases there will be a few bits in the last byte that are not filled with data. But that doesn't hurt and it simply unavoidable waste of the fact that files are stored in chunks of 8 bits = 1 byte minimally.
When reading the binary file, you'll get a reverse process. You'll read in a unsigned char bytes array. You then know that the first byte (i.e. bytes[0]) contains the bit-length of the name field. You then fill in the bytes of the 'lname' byte-by-byte by shifting and masking. etc....
Good luck!

Converting large decimal input to hexadecimal output in a 256 bit array in C

I have a written a code to convert a large number based on the use input to hexadecimal number. However when the result is printed, the only part of the number is converted to hexadecimal and there are other random values in the array.
for example:
decimal = 1234567891012 ---- the hexa would = 00 00 00 02 00 00 00 65 00 6b 42 48 71 fb 08 44
the last four vales (71 FB 08 44) are the correct hexadecimal value, but the others are incorrect
i am using uint8_t buf[];
Code:
int main()
{
uint8_t buf[] = {0};
long int i,a;
printf("Enter Number: ");
scanf("%d", &buf);
printf("\n");
printf("Input #: ");
/* put a test vector */
for (i = 15; i >= 0; i--)
{
printf("%02x ", buf[i]);
}
printf("\n\n");
printf("\n\n");
printf("%d\n",sizeof(buf));
system("pause");
return 0;
}
Disclaimer: since you've not provided the source code, I shall assume a few things:
this happens because you've used unsigned int to store the decimal, which is 32 bit only on your computer. Use a unsigned long to store a decimal that big.
unsigned long decimal = 12345678901012L;
And for 16 byte decimal, use GMP Lib.
--- edit ---
You must use scanf("%lu", &decimal) to store into a long decimal. The scanf("%d", &decimal) only copies "integer(signed)" which probably is 32 bit on your machine!
After you posted code the problems become more apparent:
First this:
uint8_t buf[] = {0};
Is no good. You need to assign a size to your array, (or make it dynamic), that's why you're getting "garbage" when you go to access the elements. For now, we can just give it an arbitrary size:
uint8_t buf[100] = {0};
That fixes the "garbage" values problem.
Second problem is your scan if is expecting a normal int sized value: "%d" you need to tell it a to look for a bigger values, something like:
scanf("%llu", &buf[0]);
Still, you should validate your input against the limits. Make sure what the user inputs is in the range of LONG_MAX or INT_MAX or whatever type you have.

memory layout - C union

I have a union type of array of three integers (4 bytes each), a float (4 bytes), a double (8 bytes) and a character (1 byte).
if I assign 0x31313131 to each of the three integer elements and then printed the union's character, I will get the number 1. Why ?
I don't understand the output I know that the bits of 3 0x31313131 is
001100010011000100110001001100010011000100110001001100010011000100110001001100010011000100110001
Because '1' == 0x31. You are printing it as character, not integer.
since it is a union all the int and char share the same memory location (the float and double does not matter in this context). So assigning 0x31313131 to the int does affect the char value -- nothing much confusing there.
Every member of a union has the same starting address; different members may have different sizes. The size of the union as a whole is at least the maximum size of any member; there may be extra padding at the end for alignment requirements.
You store the value 0x31313131 in the first three int-sized memory areas of your union object. 0x31313131 is 4 bytes, each of which has the value 0x31.
You then read the first byte (from offset 0) by accessing the character member. That byte has the value 0x31, which happens to be the encoding for the character '1' in ASCII and similar character sets. (If you ran your program on an EBCDIC-based system, you'd see different results.)
Since you haven't shown us any actual source code, I will, based on your description:
#include <stdio.h>
#include <string.h>
void hex_dump(char *name, void *base, size_t size) {
unsigned char *arr = base;
char c = ' ';
printf("%-8s : ", name);
for (size_t i = 0; i < size; i ++) {
printf("%02x", arr[i]);
if (i < size - 1) {
putchar(' ');
}
else {
putchar('\n');
}
}
}
int main(void) {
union u {
int arr[3];
float f;
double d;
char c;
};
union u obj;
memset(&obj, 0xff, sizeof obj);
obj.arr[0] = 0x31323334;
obj.arr[1] = 0x35363738;
obj.arr[2] = 0x393a3b3c;
hex_dump("obj", &obj, sizeof obj);
hex_dump("obj.arr", &obj.arr, sizeof obj.arr);
hex_dump("obj.f", &obj.f, sizeof obj.f);
hex_dump("obj.d", &obj.d, sizeof obj.d);
hex_dump("obj.c", &obj.c, sizeof obj.c);
printf("obj.c = %d = 0x%x = '%c'\n",
(int)obj.c, (unsigned)obj.c, obj.c);
return 0;
}
The hex_dump function dumps the raw representation of any object, regardless of its type, by showing the value of each byte in hexadecimal.
I first fill the union object with 0xff bytes. Then, as you describe, I initialize each element of the int[3] member arr -- but to show more clearly what's going on, I use different values for each byte.
The output I get on one system (which happens to be little-endian) is:
obj : 34 33 32 31 38 37 36 35 3c 3b 3a 39 ff ff ff ff
obj.arr : 34 33 32 31 38 37 36 35 3c 3b 3a 39
obj.f : 34 33 32 31
obj.d : 34 33 32 31 38 37 36 35
obj.c : 34
obj.c = 52 = 0x34 = '4'
As you can see, the initial bytes of each member are consistent with each other, because they're stored in the same place. The trailing ff bytes are unaffected by assigning values to arr (this is not the only valid behavior; the standard says they take unspecified values). Because the system is little-endian, the high-order byte of each int value is stored at the lowest position in memory.
The output on a big-endian system is:
obj : 31 32 33 34 35 36 37 38 39 3a 3b 3c ff ff ff ff
obj.arr : 31 32 33 34 35 36 37 38 39 3a 3b 3c
obj.f : 31 32 33 34
obj.d : 31 32 33 34 35 36 37 38
obj.c : 31
obj.c = 49 = 0x31 = '1'
As you can see, the high-order byte of each int is at the lowest position in memory.
In all cases, the value of obj.c is the first byte of obj.arr[0] -- which will be either the high-order or the low-order byte, depending on endianness.
There are a lot of ways this can vary across different systems. The sizes of int, float, and double can vary. The way floating-point numbers are represented can vary (though this example doesn't show that). Even the number of bits in a byte can vary; it's at least 8, but it can be bigger. (It's exactly 8 on any system you're likely to encounter). And the standard allows padding bits in integer representations; there are none in the examples I've shown.

Resources