When a variable is associated with a union, the compiler allocates the memory by considering the size of the largest memory. So, size of union is equal to the size of largest member. so it means Altering the value of any of the member will alter other member values.
but when I am executing the following code,
output: 4 5 7.000000
union job
{
int a;
struct data
{
double b;
int x
}q;
} w;
w.q.b=7;
w.a=4;
w.q.x=5;
printf("%d %d %f",w.a,w.q.x,w.q.b);
return 0;
}
Issue is that, first I assign the value of a and later modified the value of q.x, then the value of a would be overridden by q.x. But in the output it still shows the original value of a as well as of q.x. I am not able to understand why it is happening?
Your understanding is correct - the numbers should change. I took your code, and added a little bit more, to show you exactly what is going on.
The real issue is quite interesting, and has to do with the way floating point numbers are represented in memory.
First, let's create a map of the bytes used in your struct:
aaaa
bbbbbbbbxxxx
As you can see, the first four bytes of b overlap with a. This will turn out to be important.
Now we have to take a look at the way double is typically stored (I am writing this from the perspective of a Mac, with 64 bit Intel architecture. It so happens that the format in memory is indeed the IEEE754 format):
The important thing to note here is that Intel machines are "little endian" - that is, the number that will be stored first is the "thing on the right", i.e. the least significant bits of the "fraction".
Now let's look at a program that does the same thing that your code did - but prints out the contents of the structure so we see what is happening:
#include <stdio.h>
#include <string.h>
void dumpBytes(void *p, int n) {
int ii;
char hex[9];
for(ii = 0; ii < n; ii++) {
sprintf(hex, "%02x", (char)*((char*)p + ii));
printf("%s ", hex + strlen(hex)-2);
}
printf("\n");
}
int main(void) {
static union job
{
int a;
struct data
{
double b;
int x;
}q;
} w;
printf("intial value:\n");
dumpBytes(&w, sizeof(w));
w.q.b=7;
printf("setting w.q.b = 7:\n");
dumpBytes(&w, sizeof(w));
w.a=4;
printf("setting w.a = 4:\n");
dumpBytes(&w, sizeof(w));
w.q.x=5;
printf("setting w.q.x = 5:\n");
dumpBytes(&w, sizeof(w));
printf("values are now %d %d %.15lf\n",w.a,w.q.x,w.q.b);
w.q.b=7;
printf("setting w.q.b = 7:\n");
dumpBytes(&w, sizeof(w));
printf("values are now %d %d %.15lf\n",w.a,w.q.x,w.q.b);
return 0;
}
And the output:
intial value:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
All zeros (I declared the variable static - that makes sure everything will be initialized). Note that the function prints out 16 bytes, even though you might have thought that a struct whose biggest element is double + int should only be 12 bytes long. This is related to byte alignment - when the largest element is 8 bytes long, the structure will be aligned on 8 bit boundaries.
setting w.q.b = 7:
00 00 00 00 00 00 1c 40 00 00 00 00 00 00 00 00
Let's look at the bytes representing the double in their correct order:
40 1c 00 00 00 00 00 00
Sign bit = 0
exponent = 1 0000 0000 0111b (binary representation)
mantissa = 0
setting w.a = 4:
04 00 00 00 00 00 1c 40 00 00 00 00 00 00 00 00
When we now write a, we have modified the first byte. This corresponds to the least significant bits of the mantissa, which is now (in hex):
00 00 00 00 00 00 04
Now the format of the mantissa implies a 1 to the left of this number; so changing the last bits from 0 to 4 in changed the magnitude of the number by just a tiny fraction - you need to look at the 15th decimal to see it.
setting w.q.x = 5:
04 00 00 00 00 00 1c 40 05 00 00 00 00 00 00 00
The value 5 is written in its own little space
values are now 4 5 7.000000000000004
Note - when I used a large number of digits, you can see that the least significant part of b is not exactly 7 - even though a double is perfectly capable of representing an integer accurately.
setting w.q.b = 7:
00 00 00 00 00 00 1c 40 05 00 00 00 00 00 00 00
values are now 0 5 7.000000000000000
After writing 7 into the double again, you can see that the first byte is once again 00, and now the result of the printf statement is indeed 7.0 exactly.
So - your understanding was correct. The problem was in your diagnosis - the number was different but you couldn't see it.
Usually a good way to look for these things is to just store the number in a temporary variable, and look at the difference. You would have found it easily enough, then.
You can see altered values if you run the below code:-
#include <stdio.h>
union job
{
struct data
{
int x;
double b;
}q;
int a;
} w;
int main() {
w.q.b=7;
w.a=4;
w.q.x=5;
printf("%d %d %f",w.a,w.q.x,w.q.b);
return 0;
}
OUTPUT: 5 5 7.000000
I have slightly modified the structure inside the union, but that explains your concern.
Actualy the instruction w.a = 4 overrides the data of w.q.b. Here is how your memory looks like:
After w.q.b=7; After w.a=4; After w.q.x=5;
|0|1|0|0|0|0|0|0| |0|1|0|0|0|0|0|0| |0|1|0|0|0|0|0|0| \ \
|0|0|0|1|1|1|0|0| |0|0|1|1|1|0|0|0| |0|0|1|1|1|0|0|0| | w.a |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| | |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|1|0|0| |0|0|0|0|0|1|0|0| / | w.q.b
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| /
----------------- ----------------- -----------------
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| \
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| | w.q.x
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|1|0|1| /
As you can see the 30th bit of w.q.b is changed from 0 to 1 due to the assignment of 4 to the first 4 bytes, but this change is too low as only the mantissa part is affected and the precision of printing w.q.b doesn't show this change.
Related
Is there a function in a C lib to print data packets similar to Wireshark format (position then byte by byte)
I looked up their code and they use trees which was too complex for my task. I could also write my own version from scratch but I don't wanna be reinventing the wheel, so I was wondering if there is some code written that I can utilize. Any suggestions of a lib that I can use?
*The data I have is in a buffer of unsigned ints.
0000 01 02 ff 45 a3 00 90 00 00 00 00 00 00
0010 00 00 00 00 00 00 00 00 00 00 00 00 00
0020 00 00 00 00 00 00 00 00 00 00 00 00 00 ... etc
Thanks!
I doubt such a specific function exists in the libC, but the system is rather simple:
for (unsigned k = 0; k < len; k++)
{
if (k % 0x10 == 0)
printf("\n%04x", k);
if (k % 0x4 == 0)
printf(" ");
printf(" %02x", buffer[k] & 0xff);
}
Replace the first modulo by the line length, and the second by the word length and you're good (of course, try to make one a multiple of the other)
EDIT:
As I just noticed you mentioned the data you have is in a buffer of unsigned ints, you will have to cast it to an unsigned char buffer for this part.
Of course, you can do it with an unsigned buffer with bitwise shifts and four prints per loop, but that really makes for cumbersome code where it isn't necessary
i'm in serious trouble with a heap/stack corruption. To be able to set a data breakpoint and find the root of the problem, i want to take two core dumps using gdb and then compare them.
First one when i think the heap and stack are still ok, and a second one shortly before my program crashes.
How can i compare those dumps?
Information about my project:
using gcc 5.x
Plugin for a legacy, 3rd-party-program with RT-support. No sources available for the project (for me).
Legacy Project is C, My Plugin is C++.
Other things i tried:
Using address sanitizers -> won't work because the legacy program wont start with them.
Using undefined behavior sanitizers -> same
Figuring out what memory gets corrupted for data breakpoint -> no success, because the corrupted memory does not belong to my code.
Ran Valgrind -> no errors around my code.
Thank you for your help
Independent from your underlying motivation, I'd like to get into your question. You ask how the difference between two core dumps can be identified. This is going to be lengthy, but will hopefully give you your answer.
A core dump is represented by an ELF file that contains metadata and a specific set of memory regions (on Linux, this can be controlled via /proc/[pid]/coredump_filter) that were mapped into the given process at the time of dump creation.
The obvious way to compare the dumps would be to compare a hex-representation:
$ diff -u <(hexdump -C dump1) <(hexdump -C dump2)
--- /dev/fd/63 2020-05-17 10:01:40.370524170 +0000
+++ /dev/fd/62 2020-05-17 10:01:40.370524170 +0000
## -90,8 +90,9 ##
000005e0 00 00 00 00 00 00 00 00 00 00 00 00 80 1f 00 00 |................|
000005f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
The result is rarely useful because you're missing the context. More specifically, there's no straightforward way to get from the offset of a value change in the file to the offset corresponding to the process virtual memory address space.
So, more context if needed. The optimal output would be a list of VM addresses including before and after values.
Before we can get on that, we need a test scenario that loosely resembles yours. The following application includes a use-after-free memory issue that does not lead to a segmentation fault at first (a new allocation with the same size hides the issue). The idea here is to create a core dump using gdb (generate) during each phase based on break points triggered by the code:
dump1: Correct state
dump2: Incorrect state, no segmentation fault
dump3: Segmentation fault
The code:
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
int **g_state;
int main()
{
int value = 1;
g_state = malloc(sizeof(int*));
*g_state = &value;
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
printf("no corruption\n");
raise(SIGTRAP);
free(g_state);
char **unrelated = malloc(sizeof(int*));
*unrelated = "val";
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
printf("use-after-free hidden by new allocation (invalid value)\n");
raise(SIGTRAP);
printf("use-after-free (segfault)\n");
free(unrelated);
int *unrelated2 = malloc(sizeof(intptr_t));
*unrelated2 = 1;
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
return 0;
}
Now, the dumps can be generated:
Starting program: test
state: 1
no corruption
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7a488df in raise () from /lib64/libc.so.6
(gdb) generate dump1
Saved corefile dump1
(gdb) cont
Continuing.
state: 7102838
use-after-free hidden by new allocation (invalid value)
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7a488df in raise () from /lib64/libc.so.6
(gdb) generate dump2
Saved corefile dump2
(gdb) cont
Continuing.
use-after-free (segfault)
Program received signal SIGSEGV, Segmentation fault.
main () at test.c:31
31 printf("state: %d\n", **g_state);
(gdb) generate dump3
Saved corefile dump3
A quick manual inspection shows the relevant differences:
# dump1
(gdb) print g_state
$1 = (int **) 0x602260
(gdb) print *g_state
$2 = (int *) 0x7fffffffe2bc
# dump2
(gdb) print g_state
$1 = (int **) 0x602260
(gdb) print *g_state
$2 = (int *) 0x4008c1
# dump3
$2 = (int **) 0x602260
(gdb) print *g_state
$3 = (int *) 0x1
Based on that output, we can clearly see that *g_state changed but is still a valid pointer in dump2. In dump3, the pointer becomes invalid. Of course, we'd like to automate this comparison.
Knowing that a core dump is an ELF file, we can simply parse it and generate a diff ourselves. What we'll do:
Open a dump
Identify PROGBITS sections of the dump
Remember the data and address information
Repeat the process with the second dump
Compare the two data sets and print the diff
Based on elf.h, it's relatively easy to parse ELF files. I created a sample implementation that compares two dumps and prints a diff that is similar to comparing two hexdump outputs using diff. The sample makes some assumptions (x86_64, mappings either match in terms of address and size or they only exist in dump1 or dump2), omits most error handling and always chooses a simple implementation approach for the sake of brevity.
#include <elf.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>
#define MAX_MAPPINGS 1024
struct dump
{
char *base;
Elf64_Shdr *mappings[MAX_MAPPINGS];
};
unsigned readdump(const char *path, struct dump *dump)
{
unsigned count = 0;
int fd = open(path, O_RDONLY);
if (fd != -1) {
struct stat stat;
fstat(fd, &stat);
dump->base = mmap(NULL, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
Elf64_Ehdr *header = (Elf64_Ehdr *)dump->base;
Elf64_Shdr *secs = (Elf64_Shdr*)(dump->base+header->e_shoff);
for (unsigned secinx = 0; secinx < header->e_shnum; secinx++) {
if (secs[secinx].sh_type == SHT_PROGBITS) {
if (count == MAX_MAPPINGS) {
count = 0;
break;
}
dump->mappings[count] = &secs[secinx];
count++;
}
}
dump->mappings[count] = NULL;
}
return count;
}
#define DIFFWINDOW 16
void printsection(struct dump *dump, Elf64_Shdr *sec, const char mode,
unsigned offset, unsigned sizelimit)
{
unsigned char *data = (unsigned char *)(dump->base+sec->sh_offset);
uintptr_t addr = sec->sh_addr+offset;
unsigned size = sec->sh_size;
data += offset;
if (sizelimit) {
size = sizelimit;
}
unsigned start = 0;
for (unsigned i = 0; i < size; i++) {
if (i%DIFFWINDOW == 0) {
printf("%c%016x ", mode, addr+i);
start = i;
}
printf(" %02x", data[i]);
if ((i+1)%DIFFWINDOW == 0 || i + 1 == size) {
printf(" [");
for (unsigned j = start; j <= i; j++) {
putchar((data[j] >= 32 && data[j] < 127)?data[j]:'.');
}
printf("]\n");
}
addr++;
}
}
void printdiff(struct dump *dump1, Elf64_Shdr *sec1,
struct dump *dump2, Elf64_Shdr *sec2)
{
unsigned char *data1 = (unsigned char *)(dump1->base+sec1->sh_offset);
unsigned char *data2 = (unsigned char *)(dump2->base+sec2->sh_offset);
unsigned difffound = 0;
unsigned start = 0;
for (unsigned i = 0; i < sec1->sh_size; i++) {
if (i%DIFFWINDOW == 0) {
start = i;
difffound = 0;
}
if (!difffound && data1[i] != data2[i]) {
difffound = 1;
}
if ((i+1)%DIFFWINDOW == 0 || i + 1 == sec1->sh_size) {
if (difffound) {
printsection(dump1, sec1, '-', start, DIFFWINDOW);
printsection(dump2, sec2, '+', start, DIFFWINDOW);
}
}
}
}
int main(int argc, char **argv)
{
if (argc != 3) {
fprintf(stderr, "Usage: compare DUMP1 DUMP2\n");
return 1;
}
struct dump dump1;
struct dump dump2;
if (readdump(argv[1], &dump1) == 0 ||
readdump(argv[2], &dump2) == 0) {
fprintf(stderr, "Failed to read dumps\n");
return 1;
}
unsigned sinx1 = 0;
unsigned sinx2 = 0;
while (dump1.mappings[sinx1] || dump2.mappings[sinx2]) {
Elf64_Shdr *sec1 = dump1.mappings[sinx1];
Elf64_Shdr *sec2 = dump2.mappings[sinx2];
if (sec1 && sec2) {
if (sec1->sh_addr == sec2->sh_addr) {
// in both
printdiff(&dump1, sec1, &dump2, sec2);
sinx1++;
sinx2++;
}
else if (sec1->sh_addr < sec2->sh_addr) {
// in 1, not 2
printsection(&dump1, sec1, '-', 0, 0);
sinx1++;
}
else {
// in 2, not 1
printsection(&dump2, sec2, '+', 0, 0);
sinx2++;
}
}
else if (sec1) {
// in 1, not 2
printsection(&dump1, sec1, '-', 0, 0);
sinx1++;
}
else {
// in 2, not 1
printsection(&dump2, sec2, '+', 0, 0);
sinx2++;
}
}
return 0;
}
With the sample implementation, we can re-evaluate our scenario above. A except from the first diff:
$ ./compare dump1 dump2
-0000000000601020 86 05 40 00 00 00 00 00 50 3e a8 f7 ff 7f 00 00 [..#.....P>......]
+0000000000601020 00 6f a9 f7 ff 7f 00 00 50 3e a8 f7 ff 7f 00 00 [.o......P>......]
-0000000000602260 bc e2 ff ff ff 7f 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
-0000000000602280 6e 6f 20 63 6f 72 72 75 70 74 69 6f 6e 0a 00 00 [no corruption...]
+0000000000602280 75 73 65 2d 61 66 74 65 72 2d 66 72 65 65 20 68 [use-after-free h]
-0000000000602290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602290 69 64 64 65 6e 20 62 79 20 6e 65 77 20 61 6c 6c [idden by new all]
The diff shows that *gstate (address 0x602260) was changed from 0x7fffffffe2bc to 0x4008c1:
-0000000000602260 bc e2 ff ff ff 7f 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
The second diff with only the relevant offset:
$ ./compare dump1 dump2
-0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
+0000000000602260 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [................]
The diff shows that *gstate (address 0x602260) was changed from 0x4008c1 to 0x1.
There you have it, a core dump diff. Now, whether or not that can prove to be useful in your scenario depends on various factors, one being the timeframe between the two dumps and the activity that takes place within that window. A large diff will possibly be difficult to analyze, so the aim must be to minimize its size by choosing the diff window carefully.
The more context you have, the easier the analysis will turn out to be. For example, the relevant scope of the diff could be reduced by limiting the diff to addresses of the .data and .bss sections of the library in question if changes in there are relevant to your situation.
Another approach to reduce the scope: excluding changes to memory that is not referenced by the library. The relationship between arbitrary heap allocations and specific libraries is not immediately apparent. Based on the the addresses of changes in your initial diff, you could search for pointers in the .data and .bss sections of the library right in the diff implementation. This does not take every possible reference into account (most notably indirect references from other allocations, register and stack references of library-owned threads), but it's a start.
typedef unsigned char *byte_pointer;
void show_bytes(byte_pointer start, size_t len) {
size_t i;
for (i = 0; i < len; i++)
printf(" %.2x", start[i]); //line:data:show_bytes_printf
printf("\n");
}
void show_integer(int* p,size_t len){
size_t i;
for(i=0;i<len;i++){
printf(" %d",p[i]);
}
printf("\n");
}
Suppose I have two functions above, and I use main function to test my functions:
int main(int argc, char *argv[])
{
int a[5]={12345,123,23,45,1};
show_bytes((byte_pointer)a,sizeof(a));
show_integer(a,5);
}
I got the following results in my terminal:
ubuntu#ubuntu:~/OS_project$ ./show_bytes
39 30 00 00 7b 00 00 00 17 00 00 00 2d 00 00 00 01 00 00 00
12345 123 23 45 1
Can someone tell me why I got the result? I understand the second function, but I have no idea why I got 39 30 00 00 7b 00 00 00 17 00 00 00 2d 00 00 00 01 00 00 00 for the first function. Actually I know the number sequence above are hexadecimal decimal for 12345, 123, 23, 45, 1. However, I have no idea: start[i] pointer doesn't point to the whole number such as 12345 or 123 in the first function. Instead, the start[0] just point to the least significant digit for the first number 12345? Can someone help me explain why these two functions are different?
12345 is 0x3039 in hex. because int is 32bits on your machine it will be represented as 0x00003039. then because your machine is little endian it will be represented as 0x39300000. you can read more about Big and Little endian on: https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html
the same applies for other results.
On your platform, sizeof(int) is 4 and your platform uses little endian system. The binary representation of 12345 using a 32-bit representation is:
00000000 00000000 00110000 00111001
In a little endian system, that is captured using the following byte sequence.
00111001 00110000 00000000 00000000
In hex, those bytes are:
39 30 00 00
That's what you are seeing as the output corresponding to the first number.
You can do similar processing of the other numbers in the array to understand the output corresponding to them.
I was writing a function that prints the "hexdump" of a given file. The function is as stated below:
bool printhexdump (FILE *fp) {
long unsigned int filesize = 0;
char c;
if (fp == NULL) {
return false;
}
while (! feof (fp)) {
c = fgetc (fp);
if (filesize % 16 == 0) {
if (filesize >= 16) {
printf ("\n");
}
printf ("%08lx ", filesize);
}
printf ("%02hx ", c);
filesize++;
}
printf ("\n");
return true;
}
However, on certain files, certain invalid integer representations seem to be get printed, for example:
00000000 4d 5a ff90 00 03 00 00 00 04 00 00 00 ffff ffff 00 00
00000010 ffb8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 ff80 00 00 00
00000040 ffff
Except for the last ffff caused due to the EOF character, the ff90, ffff, ffb8 etc. are wrong. However, if I change char to unsigned char, I get the correct representation:
00000000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00
00000010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00
00000040 ff
Why would the above behaviour happen?
Edit: the treatment of c by printf() should be the same since the format specifiers don't change. So I'm not sure how char would get sign extended while unsigned char won't?
Q: the treatment of c by printf() should be the same since the format specifiers don't change.
A: OP is correct, the treatment of c by printf() did not change. What changed was what was passed to printf(). As char or unsigned char, c goes through the usual integer promotions typically to int. char, if signed, gets a sign extension. A char value like 0xFF is -1. An unsigned char value like 0xFF remains 255.
Q: So I'm not sure how char would get sign extended while unsigned char won't?
A: They both got a sign extension. char may be negative, so its sign extension may be 0 or 1 bits. unsigned char is always positive, so its sign extension is 0 bits.
Solution
char c;
printf ("%02x ", (unsigned char) c);
// or
printf ("%02hhx ", c);
// or
unsigned char c;
printf ("%02x ", c);
// or
printf ("%02hhx ", c);
char can be a signed type, and in that case values 0x80 to 0xff get sign-extended before being passed to printf.
(char)0x80 is sign-extended to -128, which in unsigned short is 0xff80.
[edit] To be clearer about promotion; the value stored in a char is eight bits, and in that eight-bit representation a value like 0x90 will represent either -112 or 114, depending on whether the char is signed or unsigned. This is because the most significant bit is taken as the sign bit for signed types, and a magnitude bit for unsigned types. If that bit is set, it either makes the value negative (by subtracting 128) or it makes it larger (by adding 128) depending on the whether or not it's a signed type.
The promotion from char to int will always happen, but if char is signed then converting it to int requires that the sign bit be unrolled up to the sign bit of the int so that the int represents the same value as the char did.
Then printf gets ahold of it, but that doesn't know whether the original type was signed or unsigned, and it doesn't know that it used to be a char. What it does know is that the format specifier is for an unsigned hexadecimal short, so it prints that number as if it were unsigned short. The bit pattern for -112 in a 16-bit int is 1111111110010000, formatted as hex, that's ff90.
If your char is unsigned then 0x90 does not represent a negative value, and when you convert it to an int nothing needs to be changed in the int to make it represent the same value. The rest of the bit pattern is all zeroes and printf doesn't need those to display the number correctly.
Because in unsigned char the most significant bit has a different meaning than that of signed char.
For example, 0x90 in binary is 10010000 which is 144 decimal, unsigned, but signed it is -16 decimal.
Whether or not char is signed is platform-dependant. This means that the sign bit may or may not be extended depending on your machine, and thus you can get different results.
However, using unsigned char ensures that there is no sign extension (because there is no sign bit anymore).
The problem is simply caused by the format. %h02x takes an int. When you take a character below 128, all is fine it is positive and will not change when converted to an int.
Now, let's take a char above 128, say 0x90. As an unsigned char, its value is 144, it will be converted to an int value of 144, and be printed at 90. But as a signed char, its value is -112 (still 0x90) it will be converted to an int of value -112 (0xff90 for a 16 bits int) and be printed as ff90.
I thought shift operator shifts the memory of the integer or the char on which it is applied but the output of the following code came a surprise to me.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(void) {
uint64_t number = 33550336;
unsigned char *p = (unsigned char *)&number;
size_t i;
for (i=0; i < sizeof number; ++i)
printf("%02x ", p[i]);
printf("\n");
//shift operation
number = number<<4;
p = (unsigned char *)&number;
for (i=0; i < sizeof number; ++i)
printf("%02x ", p[i]);
printf("\n");
return 0;
}
The system on which it ran is little endian and produced the following output:
00 f0 ff 01 00 00 00 00
00 00 ff 1f 00 00 00 00
Can somebody provide some reference to the detailed working of the shift operators?
I think you've answered your own question. The machine is little endian, which means the bytes are stored in memory with the least significant byte to the left. So your memory represents:
00 f0 ff 01 00 00 00 00 => 0x0000000001fff000
00 00 ff 1f 00 00 00 00 => 0x000000001fff0000
As you can see, the second is the same as the first value, shifted left by 4 bits.
Everything is right:
(1 * (256^3)) + (0xff * (256^2)) + (0xf0 * 256) = 33 550 336
(0x1f * (256^3)) + (0xff * (256^2)) = 536 805 376
33 550 336 * (2^4) = 536 805 376
Shifting left by 4 bits is the same as multiplying by 2^4.
I think you printf confuses you. Here are the values:
33550336 = 0x01FFF000
33550336 << 4 = 0x1FFF0000
Can you read you output now?
It doesn't shift the memory, but the bits. So you have the number:
00 00 00 00 01 FF F0 00
After shifting this number 4 bits (one hexadecimal digit) to the left you have:
00 00 00 00 1F FF 00 00
Which is exactly the output you get, when transformed to little endian.
Your loop is printing bytes in the order they are stored in memory, and the output would be different on a big-endian machine. If you want to print the value in hex just use %016llx. Then you'll see what you expect:
0000000001fff000
000000001fff0000
The second value is left-shifted by 4.