Converting AMF Number from char to double an back - c

in context of a protokoll I get messages in AMF Format.
The AMF Object Type "Number" is defined as
number-type = number-marker DOUBLE
The data following a Number type marker is always an 8 byte IEEE-754 double [...] in network byte order.
The following Examples are captured using Wireshark:
Hex: 40 00 00 00 00 00 00 00
Number: 2
Hex: 40 08 00 00 00 00 00 00
Number: 3
Hex: 3f f0 00 00 00 00 00 00
Number: 1
I tried to treat these as doube, long long and int64_t but none of these Types seems to use the correct order/format.
The implementation needs to be in C so I cant use any Librarys (The are none as it seems)
What would be the correct approach?

Likely your platform supports 8-byte IEEE-754 doubles but requires them to be in little-endian format. Your examples are in big-endian format. If you store them in an aligned array of unsigned characters from last to first and cast the pointer to a double *, you should get the right value.

Related

is there a function in a c lib to print data packets similar to Wireshark format (position then byte by byte)

Is there a function in a C lib to print data packets similar to Wireshark format (position then byte by byte)
I looked up their code and they use trees which was too complex for my task. I could also write my own version from scratch but I don't wanna be reinventing the wheel, so I was wondering if there is some code written that I can utilize. Any suggestions of a lib that I can use?
*The data I have is in a buffer of unsigned ints.
0000 01 02 ff 45 a3 00 90 00 00 00 00 00 00
0010 00 00 00 00 00 00 00 00 00 00 00 00 00
0020 00 00 00 00 00 00 00 00 00 00 00 00 00 ... etc
Thanks!
I doubt such a specific function exists in the libC, but the system is rather simple:
for (unsigned k = 0; k < len; k++)
{
if (k % 0x10 == 0)
printf("\n%04x", k);
if (k % 0x4 == 0)
printf(" ");
printf(" %02x", buffer[k] & 0xff);
}
Replace the first modulo by the line length, and the second by the word length and you're good (of course, try to make one a multiple of the other)
EDIT:
As I just noticed you mentioned the data you have is in a buffer of unsigned ints, you will have to cast it to an unsigned char buffer for this part.
Of course, you can do it with an unsigned buffer with bitwise shifts and four prints per loop, but that really makes for cumbersome code where it isn't necessary

Incorrect hex representations of characters with char but correct with unsigned char

I was writing a function that prints the "hexdump" of a given file. The function is as stated below:
bool printhexdump (FILE *fp) {
long unsigned int filesize = 0;
char c;
if (fp == NULL) {
return false;
}
while (! feof (fp)) {
c = fgetc (fp);
if (filesize % 16 == 0) {
if (filesize >= 16) {
printf ("\n");
}
printf ("%08lx ", filesize);
}
printf ("%02hx ", c);
filesize++;
}
printf ("\n");
return true;
}
However, on certain files, certain invalid integer representations seem to be get printed, for example:
00000000 4d 5a ff90 00 03 00 00 00 04 00 00 00 ffff ffff 00 00
00000010 ffb8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 ff80 00 00 00
00000040 ffff
Except for the last ffff caused due to the EOF character, the ff90, ffff, ffb8 etc. are wrong. However, if I change char to unsigned char, I get the correct representation:
00000000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00
00000010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00
00000040 ff
Why would the above behaviour happen?
Edit: the treatment of c by printf() should be the same since the format specifiers don't change. So I'm not sure how char would get sign extended while unsigned char won't?
Q: the treatment of c by printf() should be the same since the format specifiers don't change.
A: OP is correct, the treatment of c by printf() did not change. What changed was what was passed to printf(). As char or unsigned char, c goes through the usual integer promotions typically to int. char, if signed, gets a sign extension. A char value like 0xFF is -1. An unsigned char value like 0xFF remains 255.
Q: So I'm not sure how char would get sign extended while unsigned char won't?
A: They both got a sign extension. char may be negative, so its sign extension may be 0 or 1 bits. unsigned char is always positive, so its sign extension is 0 bits.
Solution
char c;
printf ("%02x ", (unsigned char) c);
// or
printf ("%02hhx ", c);
// or
unsigned char c;
printf ("%02x ", c);
// or
printf ("%02hhx ", c);
char can be a signed type, and in that case values 0x80 to 0xff get sign-extended before being passed to printf.
(char)0x80 is sign-extended to -128, which in unsigned short is 0xff80.
[edit] To be clearer about promotion; the value stored in a char is eight bits, and in that eight-bit representation a value like 0x90 will represent either -112 or 114, depending on whether the char is signed or unsigned. This is because the most significant bit is taken as the sign bit for signed types, and a magnitude bit for unsigned types. If that bit is set, it either makes the value negative (by subtracting 128) or it makes it larger (by adding 128) depending on the whether or not it's a signed type.
The promotion from char to int will always happen, but if char is signed then converting it to int requires that the sign bit be unrolled up to the sign bit of the int so that the int represents the same value as the char did.
Then printf gets ahold of it, but that doesn't know whether the original type was signed or unsigned, and it doesn't know that it used to be a char. What it does know is that the format specifier is for an unsigned hexadecimal short, so it prints that number as if it were unsigned short. The bit pattern for -112 in a 16-bit int is 1111111110010000, formatted as hex, that's ff90.
If your char is unsigned then 0x90 does not represent a negative value, and when you convert it to an int nothing needs to be changed in the int to make it represent the same value. The rest of the bit pattern is all zeroes and printf doesn't need those to display the number correctly.
Because in unsigned char the most significant bit has a different meaning than that of signed char.
For example, 0x90 in binary is 10010000 which is 144 decimal, unsigned, but signed it is -16 decimal.
Whether or not char is signed is platform-dependant. This means that the sign bit may or may not be extended depending on your machine, and thus you can get different results.
However, using unsigned char ensures that there is no sign extension (because there is no sign bit anymore).
The problem is simply caused by the format. %h02x takes an int. When you take a character below 128, all is fine it is positive and will not change when converted to an int.
Now, let's take a char above 128, say 0x90. As an unsigned char, its value is 144, it will be converted to an int value of 144, and be printed at 90. But as a signed char, its value is -112 (still 0x90) it will be converted to an int of value -112 (0xff90 for a 16 bits int) and be printed as ff90.

Get specific byte from M68k ram address with C language

Through the IDA disassembler I've reached this address:
0010FD74 00 00 00 00 00 00 03 00 00 00 00 00 82 03 80 02
Now I need, given the address to get particular bytes; for example the 7th position where there is "03".
I've tried using C language to do this:
char *dummycharacter;
*dummycharacter = *(char*)0x10FD74;
Now if I try to access 7th value with this:
dummycharacter[6]
I don't get 0x03…where am I going wrong?
You're trying to assign the value dummycharacter points to (which is pretty much nowhere, since it's not initialized). Try dummycharacter = (char*)0x10FD74;.

Is accessing a global array outside its bound undefined behavior?

I just had an exam in my class today --- reading C code and input, and the required answer was what will appear on the screen if the program actually runs. One of the questions declared a[4][4] as a global variable and at a point of that program, it tries to access a[27][27], so I answered something like "Accessing an array outside its bounds is an undefined behavior" but the teacher said that a[27][27] will have a value of 0.
Afterwards, I tried some code to check whether "all uninitialized golbal variable is set to 0" is true or not. Well, it seems to be true.
So now my question:
Seems like some extra memory had been cleared and reserved for the code to run. How much memory is reserved? Why does a compiler reserve more memory than it should, and what is it for?
Will a[27][27] be 0 for all environment?
Edit :
In that code, a[4][4] is the only global variable declared and there are some more local ones in main().
I tried that code again in DevC++. All of them is 0. But that is not true in VSE, in which most value are 0 but some have a random value as Vyktor has pointed out.
You were right: it is undefined behavior and you cannot count it always producing 0.
As for why you are seeing zero in this case: modern operating systems allocate memory to processes in relatively coarse-grained chunks called pages that are much larger than individual variables (at least 4KB on x86). When you have a single global variable, it will be located somewhere on a page. Assuming a is of type int[][] and ints are four bytes on your system, a[27][27] will be located about 500 bytes from the beginning of a. So as long as a is near the beginning of the page, accessing a[27][27] will be backed by actual memory and reading it won't cause a page fault / access violation.
Of course, you cannot count on this. If, for example, a is preceded by nearly 4KB of other global variables then a[27][27] will not be backed by memory and your process will crash when you try to read it.
Even if the process does not crash, you cannot count on getting the value 0. If you have a very simple program on a modern multi-user operating system that does nothing but allocate this variable and print that value, you probably will see 0. Operating systems set memory contents to some benign value (usually all zeros) when handing over memory to a process so that sensitive data from one process or user cannot leak to another.
However, there is no general guarantee that arbitrary memory you read will be zero. You could run your program on a platform where memory isn't initialized on allocation, and you would see whatever value happened to be there from its last use.
Also, if a is followed by enough other global variables that are initialized to non-zero values then accessing a[27][27] would show you whatever value happens to be there.
Accessing an array out of bounds is undefined behavior, which means the results are unpredictable so this result of a[27][27] being 0 is not reliable at all.
clang tell you this very clearly if we use -fsanitize=undefined:
runtime error: index 27 out of bounds for type 'int [4][4]'
Once you have undefined behavior the compiler can really do anything at all, we have even seen examples where gcc has turned a finite loop into an infinite loop based on optimizations around undefined behavior. Both clang and gcc in some circumstances can generate and undefined instruction opcode if it detects undefined behavior.
Why is it undefined behavior, Why is out-of-bounds pointer arithmetic undefined behaviour? provides a good summary of reasons. For example, the resulting pointer may not be a valid address, the pointer could now point outside the assigned memory pages, you could be working with memory mapped hardware instead of RAM etc...
Most likely the segment where static variables are being stored is much larger then the array you are allocating or the segment that you are stomping though just happens to be zeroed out and so you are just lucky in this case but again completely unreliable behavior. Most likely your page size is 4k and access of a[27][27] is within that bound which is probably why you are not seeing a segmentation fault.
What the standard says
The draft C99 standard tell us this is undefined behavior in section 6.5.6 Additive operators which covers pointer arithmetic which is what an array access comes down to. It says:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
[...]
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined. If the result points one past the last element of the
array object, it shall not be used as the operand of a unary *
operator that is evaluated.
and the standards definition of undefined behavior tells us that the standard imposes no requirements on the behavior and notes possible behavior is unpredictable:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, [...]
Here is the quote from the standard, that specifies what is undefined behavior.
J.2 Undefined behavor
An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]) (6.5.6).
Addition or subtraction of a pointer into, or just beyond, an array object and an
integer type produces a result that points just beyond the array object and is used as
the operand of a unary * operator that is evaluated (6.5.6).
In your case you the array subscript is completely outside of the array. Depending that the value will be zero is completely unreliable.
Furthermore the behavior of entire program is in question.
If just run your code from visual studio 2012 and got result like this (different at each run):
Address of a: 00FB8130
Address of a[4][4]: 00FB8180
Address of a[27][27]: 00FB834C
Value of a[27][27]: 0
Address of a[1000][1000]: 00FBCF50
Value of a[1000][1000]: <<< Unhandled exception at 0x00FB3D8F in GlobalArray.exe:
0xC0000005: Access violation reading location 0x00FBCF50.
When you look at Modules window you see that your application module memory range is 00FA0000-00FBC000. And unless you have CRT Checks turned on nothing will control what do you do inside your memory (as long as you don't violate memory protection).
So you got 0 at a[27][27] purely by chance. When you open memory view from position 00FB8130 (a) you will probably see something like this:
0x00FB8130 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8180 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................
0x00FB8190 c0 90 45 00 b0 e9 45 00 00 00 00 00 00 00 00 00 À.E.°éE.........
0x00FB81A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB81B0 00 00 00 00 80 5c af 0f 00 00 00 00 00 00 00 00 ....€\¯.........
0x00FB81C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
..........
0x00FB8330 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8340 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ <<<<
0x00FB8350 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
.......... ^^ ^^ ^^ ^^
It's possible that with your compiler you will always get 0 for that code because of how it uses memory, but just few bytes away you can find another variable.
For example with memory shown above a[6][0] points to address 0x00FB8190 which contains integer value of 4559040.
Then get your teacher to explain this one.
I don't know if this will work on your system but playing about with blatting memory AFTER the array a with non-zero'd bytes gives a different result for a[27][27].
On my system, when I printed contents of a[27][27] it was 0xFFFFFFFF. ie -1 converted to unsigned is all bits set in twos complement.
#include <stdio.h>
#include <string.h>
#define printer(expr) { printf(#expr" = %u\n", expr); }
unsigned int d[8096];
int a[4][4]; /* assuming an int is 4 bytes, next 4 x 4 x 4 bytes will be initialised to zero */
unsigned int b[8096];
unsigned int c[8096];
int main() {
/* make sure next bytes do not contain zero'd bytes */
memset(b, -1, 8096*4);
memset(c, -1, 8096*4);
memset(d, -1, 8096*4);
/* lets check normal access */
printer(a[0][0]);
printer(a[3][3]);
/* Now we disrepect the machine - undefined behaviour shall result */
printer(a[27][27]);
return 0;
}
This is my output:
a[0][0] = 0
a[3][3] = 0
a[27][27] = 4294967295
I saw in comments about viewing memory in Visual Studio. Easiest way is to add a break-point somewhere in your code (to halt execution) then go into Debug... windows... Memory menu, select eg Memory 1. You then find the memory address of your array a. In my case address was 0x0130EFC0. so you enter 0x0130EFC0 in the address fiend and press Enter. This shows the memory at that location.
Eg in my case.
0x0130EFC0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ..................................
0x0130EFE2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff ..............................ÿÿÿÿ
0x0130F004 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
0x0130F026 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
0x0130F048 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
The zeros are of the course the array a, which has a byte size of 4 x 4 x sizeof an int (4 in my case) = 64 bytes. The bytes from address 0x0130EFC0 are 0xFF each (from b,c, or d contents).
Note that:
0x130EFC0 + 64 = 0x130EFC0 + 0x40 = 130F000
which is that the start of all those ff bytes you see. Probably array b.
For common compilers, accessing an array beyond its bounds can give predictable results only in very special cases, and you should not rely on that. Example :
int a[4][4];
int b[4][4];
Provided there are no alignment problem, and you ask neither aggressive optimisation nor sanitization checks, a[6][1] should in reality be b[2][1]. But please never do that in production code !
On a particular system, your teacher may be correct -- that may be how your particular compiler and operating system would behave.
On a generic system (i.e. without "insider" knowledge) then your answer is correct: this is UB.
First of all C language have not boundary check. In effect it have no check at all on almost everything. This is the joy and the doom of C.
Now going back to the issue, if you overflow the memory doesn't mean that you trigger a segfault.
Lets have a closer look to how it works.
When you start a program, or enter a subroutine the processor saves on the stack the address to which return when function ends.
The stack has been initialized from OS during process memory allocation, and got a range of legal memory where you can read or write as you like, not only store return addresses.
The common practice used by compilers to create local (automatic) variables is to reserve some space on the stack, and use that space for variables. Look following well known 32 bits assembler sequence, named prologue, that you'll find on any function enter:
push ebp ;save register on the stack
mov ebp,esp ;get actual stack address
sub esp,4 ;displace the stack of 4 bytes that will be used to store a 4 chars array
considering that stack grows in the reverse direction of data, the layout of memory is:
0x0.....1C [Parameters (if any)] ;former function
0x0.....18 [Return Address]
0x0.....14 EBP
0x0.....10 0x0......x ;Local DWORD parameter
0x0.....0C [Parameters (if any)] ;our function
0x0.....08 [Return Address]
0x0.....04 EBP
0x0.....00 0, 'c', 'b', 'a' ;our string of 3 chars plus final nul
This is known as stack frame.
Now consider the string of four bytes starting at 0x0....0 and ending at 0x....3. If we write more than 3 chars in the array we will go replacing sequentially: the saved copy of EBP, the return address, parameters, local variables of previous function then its EBP, return address, etc.
The most scenographic effect we get is that, on function return, the CPU try to jump back to a wrong address generating a segfault. Same behaviour can be achieved if one of local variables are pointers, in this case we will try to read, or write, to wrong locations triggering again the segfault.
When segfault could not happen:
when the bloated variable is not on the stack, or you have so many local variables that you overwrite them without touching the return address (and they are not pointers).
Another case is that the processor reserves a guard space between local variables and return address, in this case the buffer overflow doesn't reach the address.
Another possibility is accessing array elements randomly, in this case an oversized array can exceed stack space and overflow on other data, but luckily we mdon't touch those elements that are mapped where is saved the return address (everythibng can happen...).
When we can have segfault bloating variables that are not on stack?
When overflowing array bound or pointers.
I hope these are useful info...

How can I access members of a struct when it's not aligned properly?

I'm afraid that I'm not very good at low level C stuff, I'm more used to
using objects in Obj-c, so please excuse me if this is an obvious question, or if I've completely misunderstood something...
I am attempting to write an application in Cocoa/Obj-C which communicates with an external bit of hardware (a cash till.) I have the format of the data the device sends and receives - and have successfully got some chunks of data from the device.
For example: the till exchanges PLU (price data) in chunks of data in the following format: (from the documentation)
Name Bytes Type
Name Bytes Type
PLU code h 4 long
PLU code L 4 long
desc 19 char
Group 3 char
Status 5 char
PLU link code h 4 long
PLU link code l 4 long
M&M Link 1 char
Min. Stock. 2 int
Price 1 4 long
Price 2 4 long
Total 54 Bytes
So I have a struct in the following form in which to hold the data from the till:
typedef struct MFPLUStructure {
UInt32 pluCodeH;
UInt32 pluCodeL;
unsigned char description[19];
unsigned char group[3];
unsigned char status[5];
UInt32 linkCodeH;
UInt32 linkCodeL;
unsigned char mixMatchLink;
UInt16 minStock;
UInt32 price[2];
} MFPLUStructure;
I have some known sample data from the till (below) which I have checked by hand and is valid
00 00 00 00 4E 61 BC 00 54 65 73 74 20 50 4C 55 00 00 00 00 00 00 00 00 00 00 00 09 08 07 17 13 7C 14 04 00 00 00 00 09 03 00 00 05 BC 01 7B 00 00 00 00 00 00 00
i.e.
bytes 46 to 50 are <7B 00 00 00> == 123 as I would expect as the price is set to '123' on the till.
byte 43 is <05> == 5 as I would expect as the 'mix and match link' is set to 5 on the till.
bytes 39 to 43 are <09 03 00 00> == 777 as I would expect as the 'link code' is set to '777' on the till.
Bytes 27,28,29 are <09 08 07> which are the three groups (7,8 & 9) that I would expect.
The problem comes when I try to get some of the data out of the structure programmatically: The early members work correctly right up to, and including the five 'status' bytes. However, members after that don't come out properly. (see debugger screenshot below.)
Image 1 - http://i.stack.imgur.com/nOdER.png
I assume that the reason for this is because the five status bytes push later members out of alignment - i.e. they are over machine word boundaries. Is this right?
Image 2 - i.imgur.com/ZhbXU.png
Am I right in making that assumption?
And if so, how can I get the members in and out correctly?
Thanks for any help.
Either access the data a byte at a time and assemble it into larger types, or memcpy it into an aligned variable. The latter is better if the data is known to be in a format specific to the host's endianness, etc. The former is better if the data follows an external specification that might not match the host.
If you're sure that endianness of host and wire agree, you could also use a packed structure to read the data in a single pass. However, this is compiler-specific and will most likely impact performance of member access.
Assuming gcc, you'd use the following declarations:
struct __attribute__ ((__packed__)) MFPLUStructure { ... };
typedef struct MFPLUStructure MFPLUStructure;
If you decide to use a packed structure, you should also verify that it is of correct size:
assert(sizeof (MFPLUStructure) == 54);

Resources