Understand Strlen applied to int array with char * cast

Understand Strlen applied to int array with char * cast - c

I'm currently stuck on this problem.
I have thefollowing code:
int v[10] = {-1, 1000, 2};
int a;
a = strlen((char *)v+1);
printf("strlen result for (char *) v+1: %d\n", a);
I cannot undestand why the final result is always 5 on a little endian "machine".
According to strlen documentation, the char * cast in this situation should return 4.
I tried also in online compiler and other machines but the result remains the same. What am I missing ?

Most likely the bytes in V are represented as:
0xff, 0xff, 0xff, 0xff, 0xe8, 0x03, 0x00, 0x00, ...
So after skipping the first byte (not the first int, since you cast to char *) there are 5 bytes before the first '\0' character.

You should not use strlen on an int array. The function's name already suggested it should only be used on legally formed C strings.
That being said, for the reason of studying, array v's memory layout looks something like this on a little-endian machine:
ff ff ff ff e8 03 00 00 02 00 00 00 (00 onwards)
low 0>-^^----<0 1>-------<1 2>-------<2 3> ...----<9 high
Since you are calling strlen((char *)v + 1), the computation starts at the location I marked with ^. There're 5 non-zero elements (ff, ff, ff, e8, 03), so the result is 5, not 4. Remember 1000 == 0x3e8.

This little demo can help you clarify how bytes are ordered in little-endian machine (presuming every thing is as written above):
char* z = (char*)v + 1;
for (int i = 0; i < 10; i++)
printf("%02x ", (uint8_t)z[i]);
the print out should be like this:
ff ff ff e8 03 00 00 02 00 00 (like Harlan Wei showed above).
this little snippet can also help you see how bytes are stored in big-endian machines too. So mere counting should give you 5 not 4 as strlen(...) was trying to say.

Related

128 bit floating point binary representation error

Let's say we have some 128bit floating point number, for example x = 2.6 (1.3 * 2^1 ieee-754).
I put in in union like this:
union flt {
long double flt;
int64_t byte8[OCTALC];
} d;
d = x;
Then i run this to get it hexadecimal representation in memory:
void print_bytes(void *ptr, int size)
{
unsigned char *p = ptr;
int i;
for (i=0; i<size; i++) {
printf("%02hhX ", p[i]);
}
printf("\n");
}
// some where in the code
print_bytes(&d.byte8[0], 16);
And i get something like
66 66 66 66 66 66 66 A6 00 40 00 00 00 00 00 00
So by assumption i expect to see one of the leading bits(the left ones) to be 1(because exponent of 2.6 is 1) but in fact i see right bits to be 1(like it treating value big-endian). If i flip sign the output changes to:
66 66 66 66 66 66 66 A6 00 C0 00 00 00 00 00 00
So it seems like sign bit is righter than i thought. And if you count the bytes it seems like there is only 10 bytes used remaining 6 is like truncated or something.
I trying to find out why this happens any help?

You have a number of misconceptions.
First of all, you don't have a 128-bit floating point number. long double is probably a float in the x86 extended precision format on an x86-64. This is an 80 bit (10 byte) value, which is padded to 16 bytes. (I suspect this is for alignment purposes.)
And of course, it's going to be in little-endian byte order (since this is an x86/x86-64). This doesn't refer to the order of the bits in each byte, it refers to the order of the bytes in the whole.
And finally, the exponent is biased. An exponent of 1 isn't stored as 1. It's stored as 1+0x3FFF. This allows for negative exponents.
So we get the following:
66 66 66 66 66 66 66 A6 00 40 00 00 00 00 00 00
Demo on Compiler Explorer
If we remove the padding and reverse the bytes to better match the image in the Wikipedia page, we get
4000A666666666666666
This translates to
+0x1.4CCCCCCCCCCCCCCC × 2^(0x4000-0x3FFF)
(0xA66...6 = 0b1010 0110 0110...0110 ⇒ 0b1.0100 1100 1100...110[0] = 0x1.4CC...C)
or
+1.29999999999999999995663191310057982263970188796520233154296875 × 2^1
Decimal conversion obtained using
perl -Mv5.10 -e'
use Math::BigFloat;
Math::BigFloat->div_scale( 1000 );
say
Math::BigFloat->from_hex( "4CCCCCCCCCCCCCCC" ) /
Math::BigFloat->from_hex( "10000000000000000" )
'
or
perl -Mv5.10 -e'
use Math::BigFloat;
Math::BigFloat->div_scale( 1000 );
say
Math::BigFloat->from_hex( "A666666666666666" ) /
Math::BigFloat->from_hex( "8000000000000000" )
'

You've been bamboozled by some very strange aspects of the way extended-precision floating-point is typically implemented in C on Intel architectures. So don't feel too bad. :-)
What you're seeing is that although sizeof(long double) may be 16 (== 128 bits), deep down inside what you're really getting is the 80-bit Intel extended format. It's being padded out with 6 bytes, which in your case happen to be 0. So, yes, "the sign bit is righter than you thought".
I see the same thing on my machine, and it's something I've always wondered about. It seems like a real waste, doesn't it? I used to think it was for some kind of compatibility with machines which actually do have 128-bit long doubles. But that can't be it, because this 0-padded 16-byte format is not binary-compatible with true IEEE 128-bit floating point, among other things because the padding is on the wrong end.

Understanding fwrite() behaviour

When I use fwrite to stdout it becomes undefined results?
What is the use of the size_t count argument in fwrite and fread function?
#include<stdio.h>
struct test
{
char str[20];
int num;
float fnum;
};
int main()
{
struct test test1={"name",12,12.334};
fwrite(&test1,sizeof(test1),1,stdout);
return 0;
}
output:
name XEA
Process returned 0 (0x0) execution time : 0.180 s
Press any key to continue.
And when i use fwrite for file
#include<stdio.h>
#include<stdlib.h>
struct test
{
char str[20];
int num;
float fnum;
};
int main()
{
struct test test1={"name",12,12.334};
FILE *fp;
fp=fopen("test.txt","w");
if(fp==NULL)
{
printf("FILE Error");
exit(1);
}
fwrite(&test1,sizeof(test1),1,fp);
return 0;
}
The File also contains like this
name XEA
Why is the output like this?
And when I put size_t count as 5, it also becomes undefined result. What is the purpose of this argument?

In the way that you're using it, fwrite is writing "binary" output, which is a byte-for-byte representation of your struct test as in memory. This representation is not generally human-readable.
When you write
struct test test1 = {"name", 12, 12.334};
you get a structure in memory which might be represented like this (with all byte valuess in hexadecimal):
test1: str: 6e 61 6d 65 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
num: 0c 00 00 00
fnum: 10 58 45 41
Specifically: 0x6e, 0x61, 0x6d, and 0x65 are the ASCII codes for the letters in "name". Your str array was size 20, so the string name is 0-terminated and then padded out with 15 more 0 characters. 0x0c is the hexadecimal representation of the number 12, and I'm assuming that type int is 32 bits or 4 bytes on your machine, so there are 3 more 0's there, also. (Also I'm assuming your machine is "little endian", with the least-significant byte of a 4-byte quantity like 0x0000000c coming first in memory.) Finally, in the IEEE-754 format which your machine uses, the number 12.334 is represented in 32-bit single-precision floating point as 0x41455810, which is again stored in the opposite order in memory.
So those 28 bytes (plus possibly some padding, which we won't worry about for now) are precisely the bytes that get written to the screen, or to your file. The string "name" will probably be human-readable, but all the rest will be, literally, "binary garbage". It just so happens that three of the bytes making up the float number 12.334, namely 0x58, 0x45, and 0x41, correspond in ASCII to the capital letters X, E, and A, so that's why you see those characters in the output.
Here's the result of passing the output of your program into a "hex dump utility:
0 6e 61 6d 65 00 00 00 00 00 00 00 00 00 00 00 00 name............
16 00 00 00 00 0c 00 00 00 10 58 45 41 .........XEA
You can see the letters name at the beginning, and the letters XEA at the end, and all the 0's and other binary characters in between.
If you're on a Unix or Linux (or Mac OS X) system, you can use tools like od or hexdump to get a hex dump similar to the one I've shown.
You asked about the "count" argument to fwrite. fwrite is literally designed for writing out binary structures, just like you're doing. Its function signature is
size_t fwrite(void *ptr, size_t sz, size_t n, FILE *fp);
As you know, ptr is a pointer to the data structure(s) you're writing, and fp is the file pointer you're writing it/them to. And then you're saying you want to write n (or 'count') items, each of size sz.
You called
fwrite(&test1, sizeof(test1), 1, fp);
The expression sizeof(test1) gives the size of test1 in bytes. (It will probably be 28 or so, as I mentioned above.) And of course you're writing one struct, so passing sizeof(test1) and 1 as sz and n is perfectly reasonable and correct.
It would also not be unreasonable or incorrect to call
fwrite(&test1, 1, sizeof(test1), fp);
Now you're telling fwrite to write 28 bytes, each of size 1. (A byte is of course always size 1.)
One other thing about fwrite (as noted by #AnttiHaapala in a comment): when you're writing binary data to a file, you should specify the b flag when you open the file:
fp = fopen("test.txt", "wb");
Finally, if this isn't what you want, if you want human-readabe text output instead, then fwrite is not what you want here. You could use something like
printf("str: %s\n", test1.str);
printf("num: %d\n", test1.num);
printf("fnum: %f\n", test1.fnum);
Or, to your file:
fprintf(fp, "str: %s\n", test1.str);
fprintf(fp, "num: %d\n", test1.num);
fprintf(fp, "fnum: %f\n", test1.fnum);

Dissecting a binary file in C

I'm working on assignment in which I need to dissect a binary file retrieve the source address from the header data. I was able to get hex data from the file to write out as we were instructed but I can't make heads or tails of what I am looking at. Here's the print out code I used.
FILE *ptr_myfile;
char buf[8];
ptr_myfile = fopen("packets.1","rb");
if (!ptr_myfile)
{
printf("Unable to open file!");
return 1;
}
size_t rb;
do {
rb = fread(buf, 1, 8, ptr_myfile);
if( rb ) {
size_t i;
for(i = 0; i < rb; ++i) {
printf("%02x", (unsigned int)buf[i]);
}
printf("\n");
}
} while( rb );
And here's a small portion of the output:
120000003c000000
4500003c195d0000
ffffff80011b60ffffff8115250b
4a7d156708004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c00000000
ffffffff01ffffffb5ffffffbc4a7d1567
ffffff8115250b00005556
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195d0000
ffffff8001775545ffffffcfffffffbe29
ffffff8115250108004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195f0000
......
So we are using this diagram to aid in the assignment
I'm really having difficulty translating information from the binary file to some thing useful that I can manage, and searching the website hasn't yielded me much. I just need some help putting me in the right direction.

Ok, it looks like you actually are reversing parts of an IP packet based on the diagram. This diagram is based on 32-bit words, with each bit being shown as the small 'ticks' along the horizontal ruler looking thing at the top. Bytes are shown as the big 'ticks' on the top ruler.
So, if you were to read the first byte of the file, the low-order nibble (the low-order four bytes) contains the version, and the high order nibble contains the number of 32-bit words in the header (assuming we can interpret this as an IP header).
So, from you diagram, you can see that the source address is in the fourth word so to read this, you can advance the file point to this point and read in four bytes. So in pseudo-code you should be able to do this:
fp = fopen("the file name")
fseek(fp, 12) // advance the file pointer 12 bytes
fread(buf, 1, 4, fp) // read in four bytes from the file.
Now you should have the source address in buf.
OK, to make this a bit more concrete, here is a packet I captured off my home network:
0000 00 15 ff 2e 93 78 bc 5f f4 fc e0 b6 08 00 45 00 .....x._......E.
0010 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 5e 1f .(..#.........^.
0020 1d 9a fd d3 00 50 bd 72 7e e9 cf 19 6a 19 50 10 .....P.r~...j.P.
0030 41 10 3d 81 00 00 A.=...
The first 14 bytes are the EthernetII header, with the first six bytes (00 15 ff 2e 93 78) being the destination MAC address, the next six bytes (bc 5f f4 fc e0 b6) is the source MAC address and the new two bytes (08 00) denote that the next header is of type IP.
The next twenty bytes is the IP header (which you show in your figure), these bytes are:
0000 45 00 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 E..(..#.........
0010 5e 1f 1d 9a ^...
So to interpret this lets look at 4-byte words.
The first 4-byte word (45 00 00 28), according to your figure is:
first byte : version & length, we have 0x45 meaning IPv4, and 5 4-byte words in length
second byte : Type of Service 0x00
3rd & 4th bytes: total length 0x00 0x28 or 40 bytes.
The second 4-byte word (18 c7 40 00), according to your figure is:
1st & 2nd bytes: identification 0x18 0xc7
3rd & 4th bytes: flags (3-bits) & fragmentation offset (13-bits)
flags - 0x02 0x40 is 0100 0000 in binary, and taking the first three bits 010 gives us 0x02 for the flags.
offset - 0x00
The third 4-byte word (80 06 00 00), according to your figure is:
first byte : TTL, 0x80 or 128 hops
second byte : protocol 0x06 or TCP
3rd & 4th bytes: 0x00 0x00
The fourth 4-byte word (c0 a8 01 05), according to your figure is:
1st to 4th bytes: source address, in this case 192.168.1.5
notice that each byte corresponds to one of the octets in the IP address.
The fifth 4-byte word (5e 1f 1d 9a), according to your figure is:
1st to 4th bytes: destination address, in this case 94.31.29.154
Doing this type of programming is a bit confusing at first, I recommend doing a paring by hand (like I did above) a few times to get the hang of it.
One final thing, in this line of code printf("%02x", (unsigned int)buf[i]);, I'd recommend changing it to printf("%02x ", (unsigned char)buf[i]);. Remember that each element in you buf array represents a single byte read from the file.
Hope this helps,
T.

What is the exact usage of shift operator in C

I thought shift operator shifts the memory of the integer or the char on which it is applied but the output of the following code came a surprise to me.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(void) {
uint64_t number = 33550336;
unsigned char *p = (unsigned char *)&number;
size_t i;
for (i=0; i < sizeof number; ++i)
printf("%02x ", p[i]);
printf("\n");
//shift operation
number = number<<4;
p = (unsigned char *)&number;
for (i=0; i < sizeof number; ++i)
printf("%02x ", p[i]);
printf("\n");
return 0;
}
The system on which it ran is little endian and produced the following output:
00 f0 ff 01 00 00 00 00
00 00 ff 1f 00 00 00 00
Can somebody provide some reference to the detailed working of the shift operators?

I think you've answered your own question. The machine is little endian, which means the bytes are stored in memory with the least significant byte to the left. So your memory represents:
00 f0 ff 01 00 00 00 00 => 0x0000000001fff000
00 00 ff 1f 00 00 00 00 => 0x000000001fff0000
As you can see, the second is the same as the first value, shifted left by 4 bits.

Everything is right:
(1 * (256^3)) + (0xff * (256^2)) + (0xf0 * 256) = 33 550 336
(0x1f * (256^3)) + (0xff * (256^2)) = 536 805 376
33 550 336 * (2^4) = 536 805 376
Shifting left by 4 bits is the same as multiplying by 2^4.

I think you printf confuses you. Here are the values:
33550336 = 0x01FFF000
33550336 << 4 = 0x1FFF0000
Can you read you output now?

It doesn't shift the memory, but the bits. So you have the number:
00 00 00 00 01 FF F0 00
After shifting this number 4 bits (one hexadecimal digit) to the left you have:
00 00 00 00 1F FF 00 00
Which is exactly the output you get, when transformed to little endian.

Your loop is printing bytes in the order they are stored in memory, and the output would be different on a big-endian machine. If you want to print the value in hex just use %016llx. Then you'll see what you expect:
0000000001fff000
000000001fff0000
The second value is left-shifted by 4.

How to ANSI-C cast from unsigned int * to char *?

I want these two print functions to do the same thing:
unsigned int Arraye[] = {0xffff,0xefef,65,66,67,68,69,0};
char Arrage[] = {0xffff,0xefef,65,66,67,68,69,0};
printf("%s", (char*)(2+ Arraye));
printf("%s", (char*)(2+ Arrage));
where Array is an unsigned int. Normally, I would change the type but, the problem is that most of the array is numbers, although the particular section should be printed as ASCII. Currently, the unsigned array prints as "A" and the char array prints as the desired "ABCDE".

This is how the unsigned int version will be arranged in memory, assuming 32-bit big endian integers.
00 00 ff ff 00 00 ef ef 00 00 00 41 00 00 00 42
00 00 00 43 00 00 00 44 00 00 00 45 00 00 00 00
This is how the char version will be arranged in memory, assuming 8-bit characters. Note that 0xffff does not fit in a char.
ff ef 41 42 43 44 45 00
So you can see, casting is not enough. You'll need to actually convert the data.
If you know that your system uses 32-bit wchar_t, you can use the l length modifier for printf.
printf("%ls", 2 + Arraye);
This is NOT portable. The alternative is to copy the unsigned int array into a char array by hand, something like this:
void print_istr(unsigned int const *s)
{
unsigned int const *p;
char *s2, *p2;
for (p = s; *p; p++);
s2 = xmalloc(p - s + 1);
for (p = s, p2 = s2; *p2 = *p; p2++, p++);
fputs(s2, stdout);
free(s2);
}

As Dietrich said, a simple cast will not do, but you don't need a complicated conversion either. Simply loop over your array.
uint_t Arraye[] = {0xffff,0xefef,65,66,67,68,69,0};
char Arrage[] = {0xffff,0xefef,65,66,67,68,69,0};
uint_t *p;
for(p = Arraye+2; p; p++)
printf("%c", p);
printf("%s", (char*)(2+ Arrage));

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Understand Strlen applied to int array with char * cast - c

Most likely the bytes in V are represented as: 0xff, 0xff, 0xff, 0xff, 0xe8, 0x03, 0x00, 0x00, ... So after skipping the first byte (not the first int, since you cast to char *) there are 5 bytes before the first '\0' character.

Related

128 bit floating point binary representation error

Understanding fwrite() behaviour

Dissecting a binary file in C

What is the exact usage of shift operator in C

How to ANSI-C cast from unsigned int * to char *?

Categories

Resources