I thought shift operator shifts the memory of the integer or the char on which it is applied but the output of the following code came a surprise to me.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(void) {
uint64_t number = 33550336;
unsigned char *p = (unsigned char *)&number;
size_t i;
for (i=0; i < sizeof number; ++i)
printf("%02x ", p[i]);
printf("\n");
//shift operation
number = number<<4;
p = (unsigned char *)&number;
for (i=0; i < sizeof number; ++i)
printf("%02x ", p[i]);
printf("\n");
return 0;
}
The system on which it ran is little endian and produced the following output:
00 f0 ff 01 00 00 00 00
00 00 ff 1f 00 00 00 00
Can somebody provide some reference to the detailed working of the shift operators?
I think you've answered your own question. The machine is little endian, which means the bytes are stored in memory with the least significant byte to the left. So your memory represents:
00 f0 ff 01 00 00 00 00 => 0x0000000001fff000
00 00 ff 1f 00 00 00 00 => 0x000000001fff0000
As you can see, the second is the same as the first value, shifted left by 4 bits.
Everything is right:
(1 * (256^3)) + (0xff * (256^2)) + (0xf0 * 256) = 33 550 336
(0x1f * (256^3)) + (0xff * (256^2)) = 536 805 376
33 550 336 * (2^4) = 536 805 376
Shifting left by 4 bits is the same as multiplying by 2^4.
I think you printf confuses you. Here are the values:
33550336 = 0x01FFF000
33550336 << 4 = 0x1FFF0000
Can you read you output now?
It doesn't shift the memory, but the bits. So you have the number:
00 00 00 00 01 FF F0 00
After shifting this number 4 bits (one hexadecimal digit) to the left you have:
00 00 00 00 1F FF 00 00
Which is exactly the output you get, when transformed to little endian.
Your loop is printing bytes in the order they are stored in memory, and the output would be different on a big-endian machine. If you want to print the value in hex just use %016llx. Then you'll see what you expect:
0000000001fff000
000000001fff0000
The second value is left-shifted by 4.
Related
Is there a function in a C lib to print data packets similar to Wireshark format (position then byte by byte)
I looked up their code and they use trees which was too complex for my task. I could also write my own version from scratch but I don't wanna be reinventing the wheel, so I was wondering if there is some code written that I can utilize. Any suggestions of a lib that I can use?
*The data I have is in a buffer of unsigned ints.
0000 01 02 ff 45 a3 00 90 00 00 00 00 00 00
0010 00 00 00 00 00 00 00 00 00 00 00 00 00
0020 00 00 00 00 00 00 00 00 00 00 00 00 00 ... etc
Thanks!
I doubt such a specific function exists in the libC, but the system is rather simple:
for (unsigned k = 0; k < len; k++)
{
if (k % 0x10 == 0)
printf("\n%04x", k);
if (k % 0x4 == 0)
printf(" ");
printf(" %02x", buffer[k] & 0xff);
}
Replace the first modulo by the line length, and the second by the word length and you're good (of course, try to make one a multiple of the other)
EDIT:
As I just noticed you mentioned the data you have is in a buffer of unsigned ints, you will have to cast it to an unsigned char buffer for this part.
Of course, you can do it with an unsigned buffer with bitwise shifts and four prints per loop, but that really makes for cumbersome code where it isn't necessary
typedef unsigned char *byte_pointer;
void show_bytes(byte_pointer start, size_t len) {
size_t i;
for (i = 0; i < len; i++)
printf(" %.2x", start[i]); //line:data:show_bytes_printf
printf("\n");
}
void show_integer(int* p,size_t len){
size_t i;
for(i=0;i<len;i++){
printf(" %d",p[i]);
}
printf("\n");
}
Suppose I have two functions above, and I use main function to test my functions:
int main(int argc, char *argv[])
{
int a[5]={12345,123,23,45,1};
show_bytes((byte_pointer)a,sizeof(a));
show_integer(a,5);
}
I got the following results in my terminal:
ubuntu#ubuntu:~/OS_project$ ./show_bytes
39 30 00 00 7b 00 00 00 17 00 00 00 2d 00 00 00 01 00 00 00
12345 123 23 45 1
Can someone tell me why I got the result? I understand the second function, but I have no idea why I got 39 30 00 00 7b 00 00 00 17 00 00 00 2d 00 00 00 01 00 00 00 for the first function. Actually I know the number sequence above are hexadecimal decimal for 12345, 123, 23, 45, 1. However, I have no idea: start[i] pointer doesn't point to the whole number such as 12345 or 123 in the first function. Instead, the start[0] just point to the least significant digit for the first number 12345? Can someone help me explain why these two functions are different?
12345 is 0x3039 in hex. because int is 32bits on your machine it will be represented as 0x00003039. then because your machine is little endian it will be represented as 0x39300000. you can read more about Big and Little endian on: https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html
the same applies for other results.
On your platform, sizeof(int) is 4 and your platform uses little endian system. The binary representation of 12345 using a 32-bit representation is:
00000000 00000000 00110000 00111001
In a little endian system, that is captured using the following byte sequence.
00111001 00110000 00000000 00000000
In hex, those bytes are:
39 30 00 00
That's what you are seeing as the output corresponding to the first number.
You can do similar processing of the other numbers in the array to understand the output corresponding to them.
I was writing a function that prints the "hexdump" of a given file. The function is as stated below:
bool printhexdump (FILE *fp) {
long unsigned int filesize = 0;
char c;
if (fp == NULL) {
return false;
}
while (! feof (fp)) {
c = fgetc (fp);
if (filesize % 16 == 0) {
if (filesize >= 16) {
printf ("\n");
}
printf ("%08lx ", filesize);
}
printf ("%02hx ", c);
filesize++;
}
printf ("\n");
return true;
}
However, on certain files, certain invalid integer representations seem to be get printed, for example:
00000000 4d 5a ff90 00 03 00 00 00 04 00 00 00 ffff ffff 00 00
00000010 ffb8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 ff80 00 00 00
00000040 ffff
Except for the last ffff caused due to the EOF character, the ff90, ffff, ffb8 etc. are wrong. However, if I change char to unsigned char, I get the correct representation:
00000000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00
00000010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00
00000040 ff
Why would the above behaviour happen?
Edit: the treatment of c by printf() should be the same since the format specifiers don't change. So I'm not sure how char would get sign extended while unsigned char won't?
Q: the treatment of c by printf() should be the same since the format specifiers don't change.
A: OP is correct, the treatment of c by printf() did not change. What changed was what was passed to printf(). As char or unsigned char, c goes through the usual integer promotions typically to int. char, if signed, gets a sign extension. A char value like 0xFF is -1. An unsigned char value like 0xFF remains 255.
Q: So I'm not sure how char would get sign extended while unsigned char won't?
A: They both got a sign extension. char may be negative, so its sign extension may be 0 or 1 bits. unsigned char is always positive, so its sign extension is 0 bits.
Solution
char c;
printf ("%02x ", (unsigned char) c);
// or
printf ("%02hhx ", c);
// or
unsigned char c;
printf ("%02x ", c);
// or
printf ("%02hhx ", c);
char can be a signed type, and in that case values 0x80 to 0xff get sign-extended before being passed to printf.
(char)0x80 is sign-extended to -128, which in unsigned short is 0xff80.
[edit] To be clearer about promotion; the value stored in a char is eight bits, and in that eight-bit representation a value like 0x90 will represent either -112 or 114, depending on whether the char is signed or unsigned. This is because the most significant bit is taken as the sign bit for signed types, and a magnitude bit for unsigned types. If that bit is set, it either makes the value negative (by subtracting 128) or it makes it larger (by adding 128) depending on the whether or not it's a signed type.
The promotion from char to int will always happen, but if char is signed then converting it to int requires that the sign bit be unrolled up to the sign bit of the int so that the int represents the same value as the char did.
Then printf gets ahold of it, but that doesn't know whether the original type was signed or unsigned, and it doesn't know that it used to be a char. What it does know is that the format specifier is for an unsigned hexadecimal short, so it prints that number as if it were unsigned short. The bit pattern for -112 in a 16-bit int is 1111111110010000, formatted as hex, that's ff90.
If your char is unsigned then 0x90 does not represent a negative value, and when you convert it to an int nothing needs to be changed in the int to make it represent the same value. The rest of the bit pattern is all zeroes and printf doesn't need those to display the number correctly.
Because in unsigned char the most significant bit has a different meaning than that of signed char.
For example, 0x90 in binary is 10010000 which is 144 decimal, unsigned, but signed it is -16 decimal.
Whether or not char is signed is platform-dependant. This means that the sign bit may or may not be extended depending on your machine, and thus you can get different results.
However, using unsigned char ensures that there is no sign extension (because there is no sign bit anymore).
The problem is simply caused by the format. %h02x takes an int. When you take a character below 128, all is fine it is positive and will not change when converted to an int.
Now, let's take a char above 128, say 0x90. As an unsigned char, its value is 144, it will be converted to an int value of 144, and be printed at 90. But as a signed char, its value is -112 (still 0x90) it will be converted to an int of value -112 (0xff90 for a 16 bits int) and be printed as ff90.
When a variable is associated with a union, the compiler allocates the memory by considering the size of the largest memory. So, size of union is equal to the size of largest member. so it means Altering the value of any of the member will alter other member values.
but when I am executing the following code,
output: 4 5 7.000000
union job
{
int a;
struct data
{
double b;
int x
}q;
} w;
w.q.b=7;
w.a=4;
w.q.x=5;
printf("%d %d %f",w.a,w.q.x,w.q.b);
return 0;
}
Issue is that, first I assign the value of a and later modified the value of q.x, then the value of a would be overridden by q.x. But in the output it still shows the original value of a as well as of q.x. I am not able to understand why it is happening?
Your understanding is correct - the numbers should change. I took your code, and added a little bit more, to show you exactly what is going on.
The real issue is quite interesting, and has to do with the way floating point numbers are represented in memory.
First, let's create a map of the bytes used in your struct:
aaaa
bbbbbbbbxxxx
As you can see, the first four bytes of b overlap with a. This will turn out to be important.
Now we have to take a look at the way double is typically stored (I am writing this from the perspective of a Mac, with 64 bit Intel architecture. It so happens that the format in memory is indeed the IEEE754 format):
The important thing to note here is that Intel machines are "little endian" - that is, the number that will be stored first is the "thing on the right", i.e. the least significant bits of the "fraction".
Now let's look at a program that does the same thing that your code did - but prints out the contents of the structure so we see what is happening:
#include <stdio.h>
#include <string.h>
void dumpBytes(void *p, int n) {
int ii;
char hex[9];
for(ii = 0; ii < n; ii++) {
sprintf(hex, "%02x", (char)*((char*)p + ii));
printf("%s ", hex + strlen(hex)-2);
}
printf("\n");
}
int main(void) {
static union job
{
int a;
struct data
{
double b;
int x;
}q;
} w;
printf("intial value:\n");
dumpBytes(&w, sizeof(w));
w.q.b=7;
printf("setting w.q.b = 7:\n");
dumpBytes(&w, sizeof(w));
w.a=4;
printf("setting w.a = 4:\n");
dumpBytes(&w, sizeof(w));
w.q.x=5;
printf("setting w.q.x = 5:\n");
dumpBytes(&w, sizeof(w));
printf("values are now %d %d %.15lf\n",w.a,w.q.x,w.q.b);
w.q.b=7;
printf("setting w.q.b = 7:\n");
dumpBytes(&w, sizeof(w));
printf("values are now %d %d %.15lf\n",w.a,w.q.x,w.q.b);
return 0;
}
And the output:
intial value:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
All zeros (I declared the variable static - that makes sure everything will be initialized). Note that the function prints out 16 bytes, even though you might have thought that a struct whose biggest element is double + int should only be 12 bytes long. This is related to byte alignment - when the largest element is 8 bytes long, the structure will be aligned on 8 bit boundaries.
setting w.q.b = 7:
00 00 00 00 00 00 1c 40 00 00 00 00 00 00 00 00
Let's look at the bytes representing the double in their correct order:
40 1c 00 00 00 00 00 00
Sign bit = 0
exponent = 1 0000 0000 0111b (binary representation)
mantissa = 0
setting w.a = 4:
04 00 00 00 00 00 1c 40 00 00 00 00 00 00 00 00
When we now write a, we have modified the first byte. This corresponds to the least significant bits of the mantissa, which is now (in hex):
00 00 00 00 00 00 04
Now the format of the mantissa implies a 1 to the left of this number; so changing the last bits from 0 to 4 in changed the magnitude of the number by just a tiny fraction - you need to look at the 15th decimal to see it.
setting w.q.x = 5:
04 00 00 00 00 00 1c 40 05 00 00 00 00 00 00 00
The value 5 is written in its own little space
values are now 4 5 7.000000000000004
Note - when I used a large number of digits, you can see that the least significant part of b is not exactly 7 - even though a double is perfectly capable of representing an integer accurately.
setting w.q.b = 7:
00 00 00 00 00 00 1c 40 05 00 00 00 00 00 00 00
values are now 0 5 7.000000000000000
After writing 7 into the double again, you can see that the first byte is once again 00, and now the result of the printf statement is indeed 7.0 exactly.
So - your understanding was correct. The problem was in your diagnosis - the number was different but you couldn't see it.
Usually a good way to look for these things is to just store the number in a temporary variable, and look at the difference. You would have found it easily enough, then.
You can see altered values if you run the below code:-
#include <stdio.h>
union job
{
struct data
{
int x;
double b;
}q;
int a;
} w;
int main() {
w.q.b=7;
w.a=4;
w.q.x=5;
printf("%d %d %f",w.a,w.q.x,w.q.b);
return 0;
}
OUTPUT: 5 5 7.000000
I have slightly modified the structure inside the union, but that explains your concern.
Actualy the instruction w.a = 4 overrides the data of w.q.b. Here is how your memory looks like:
After w.q.b=7; After w.a=4; After w.q.x=5;
|0|1|0|0|0|0|0|0| |0|1|0|0|0|0|0|0| |0|1|0|0|0|0|0|0| \ \
|0|0|0|1|1|1|0|0| |0|0|1|1|1|0|0|0| |0|0|1|1|1|0|0|0| | w.a |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| | |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|1|0|0| |0|0|0|0|0|1|0|0| / | w.q.b
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| /
----------------- ----------------- -----------------
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| \
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| | w.q.x
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |
|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0| |0|0|0|0|0|1|0|1| /
As you can see the 30th bit of w.q.b is changed from 0 to 1 due to the assignment of 4 to the first 4 bytes, but this change is too low as only the mantissa part is affected and the precision of printing w.q.b doesn't show this change.
I need a function which will print out the binary representation of a read file like the xxd program in unix, but I want to make my own. Hexidecimal works just fine with %x but there is no built in format for binary. Anyone know how to do this?
I usually do not believe in answering these sorts of questions with full code implementations, however I was handed this bit of code many years ago and I feel obligated to pass it on. I have removed all the comments except for the usage, so you can try to figure out how it works yourself.
Code base 16
#include <stdio.h>
#include <ctype.h>
// Takes a pointer to an arbitrary chunk of data and prints the first-len bytes.
void dump (void* data, unsigned int len)
{
printf ("Size: %d\n", len);
if (len > 0) {
unsigned width = 16;
char *str = (char *)data;
unsigned int j, i = 0;
while (i < len) {
printf (" ");
for (j = 0; j < width; j++) {
if (i + j < len)
printf ("%02x ", (unsigned char) str [j]);
else
printf (" ");
if ((j + 1) % (width / 2) == 0)
printf (" - ");
}
for (j = 0; j < width; j++) {
if (i + j < len)
printf ("%c", isprint (str [j]) ? str [j] : '.');
else
printf (" ");
}
str += width;
i += j;
printf ("\n");
}
}
}
Output base 16 (Excerpt from first 512 bytes* of a flash video)
Size: 512
00 00 00 20 66 74 79 70 - 69 73 6f 6d 00 00 02 00 - ... ftypisom....
69 73 6f 6d 69 73 6f 32 - 61 76 63 31 6d 70 34 31 - isomiso2avc1mp41
00 06 e8 e6 6d 6f 6f 76 - 00 00 00 6c 6d 76 68 64 - ....moov...lmvhd
00 00 00 00 7c 25 b0 80 - 7c 25 b0 80 00 00 03 e8 - ....|%..|%......
00 0c d6 2a 00 01 00 00 - 01 00 00 00 00 00 00 00 - ...*............
00 00 00 00 00 01 00 00 - 00 00 00 00 00 00 00 00 - ................
00 00 00 00 00 01 00 00 - 00 00 00 00 00 00 00 00 - ................
00 00 00 00 40 00 00 00 - 00 00 00 00 00 00 00 00 - ....#...........
00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 - ................
00 01 00 02 00 01 9f 38 - 74 72 61 6b 00 00 00 5c - .......8trak...\
I assume you already know how to tell the size of a file and read a file in binary mode, so I will leave that out of the discussion. Depending on your terminal width you may need to adjust the variable: width -- the code is currently designed for 80 character terminals.
I am also assuming that when you mentioned xxd in conjunction with "binary" you meant non-text as opposed to base 2. If you want base 2, set width to 6 and replace printf ("%02x ", (unsigned char) str [j]); with this:
{
for (int k = 7; k >= 0; k--)
printf ("%d", ((unsigned char)str [j] >> k) & 1);
printf (" ");
}
The required change is pretty simple, you just need to individually shift all 8 bits of your octet and mask off all but the least-significant bit. Remember to do this in an order that seems counter-intuitive at first, since we print left-to-right.
Output base 2 (Excerpt from first 512 bytes* of a flash video)
Size: 512
00000000 00000000 00000000 - 00100000 01100110 01110100 - ... ft
01111001 01110000 01101001 - 01110011 01101111 01101101 - ypisom
00000000 00000000 00000010 - 00000000 01101001 01110011 - ....is
01101111 01101101 01101001 - 01110011 01101111 00110010 - omiso2
01100001 01110110 01100011 - 00110001 01101101 01110000 - avc1mp
00110100 00110001 00000000 - 00000110 11101000 11100110 - 41....
01101101 01101111 01101111 - 01110110 00000000 00000000 - moov..
00000000 01101100 01101101 - 01110110 01101000 01100100 - .lmvhd
00000000 00000000 00000000 - 00000000 01111100 00100101 - ....|%
10110000 10000000 01111100 - 00100101 10110000 10000000 - ..|%..
00000000 00000000 00000011 - 11101000 00000000 00001100 - ......
*For the sake of simplicity, let us pretend that a byte is always 8-bits.
Depending on the language, assuming you have bitwise operations, which lets you act on each bit of a variable, you can do the following. Read the file into a buffer, or a line, if encoding is needed, force it to extended ASCII (8 bit/ 1 byte character) now, when you get the buffer, you loop from 7 to 0 and using the and bitwise and a shift to check each bit value, let me give an example in C:
// gcc -Wall -Wextra -std=c99 xxd.c
#include <stdio.h>
#include <string.h>
int main() {
// Whatever buffer size you chose.
char buffer[32];
//Feel free to replace stdin to a File Pointer, or any other stream
// Reading into a char, means reading each byte at once
while (!feof(stdin)) {
// Read at most buffer bytes. Since its ASCII 1 byte = 1 char.
fgets(buffer, sizeof(buffer), stdin);
// Iterate though each character in the string/buffer.
const size_t len = strlen(buffer);
for (size_t j = 0; j < len; j++) {
// Print the most significant bit first.
for (int i = 7; i >=0; i--) {
// Check if the i-th bit is set
printf(buffer[j] & (1 << i) ? "1" : "0");
}
}
}
return 0;
}