Dissecting a binary file in C

Dissecting a binary file in C - c

I'm working on assignment in which I need to dissect a binary file retrieve the source address from the header data. I was able to get hex data from the file to write out as we were instructed but I can't make heads or tails of what I am looking at. Here's the print out code I used.
FILE *ptr_myfile;
char buf[8];
ptr_myfile = fopen("packets.1","rb");
if (!ptr_myfile)
{
printf("Unable to open file!");
return 1;
}
size_t rb;
do {
rb = fread(buf, 1, 8, ptr_myfile);
if( rb ) {
size_t i;
for(i = 0; i < rb; ++i) {
printf("%02x", (unsigned int)buf[i]);
}
printf("\n");
}
} while( rb );
And here's a small portion of the output:
120000003c000000
4500003c195d0000
ffffff80011b60ffffff8115250b
4a7d156708004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c00000000
ffffffff01ffffffb5ffffffbc4a7d1567
ffffff8115250b00005556
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195d0000
ffffff8001775545ffffffcfffffffbe29
ffffff8115250108004d56
0001000561626364
65666768696a6b6c
6d6e6f7071727374
7576776162636465
666768693c000000
4500003c195f0000
......
So we are using this diagram to aid in the assignment
I'm really having difficulty translating information from the binary file to some thing useful that I can manage, and searching the website hasn't yielded me much. I just need some help putting me in the right direction.

Ok, it looks like you actually are reversing parts of an IP packet based on the diagram. This diagram is based on 32-bit words, with each bit being shown as the small 'ticks' along the horizontal ruler looking thing at the top. Bytes are shown as the big 'ticks' on the top ruler.
So, if you were to read the first byte of the file, the low-order nibble (the low-order four bytes) contains the version, and the high order nibble contains the number of 32-bit words in the header (assuming we can interpret this as an IP header).
So, from you diagram, you can see that the source address is in the fourth word so to read this, you can advance the file point to this point and read in four bytes. So in pseudo-code you should be able to do this:
fp = fopen("the file name")
fseek(fp, 12) // advance the file pointer 12 bytes
fread(buf, 1, 4, fp) // read in four bytes from the file.
Now you should have the source address in buf.
OK, to make this a bit more concrete, here is a packet I captured off my home network:
0000 00 15 ff 2e 93 78 bc 5f f4 fc e0 b6 08 00 45 00 .....x._......E.
0010 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 5e 1f .(..#.........^.
0020 1d 9a fd d3 00 50 bd 72 7e e9 cf 19 6a 19 50 10 .....P.r~...j.P.
0030 41 10 3d 81 00 00 A.=...
The first 14 bytes are the EthernetII header, with the first six bytes (00 15 ff 2e 93 78) being the destination MAC address, the next six bytes (bc 5f f4 fc e0 b6) is the source MAC address and the new two bytes (08 00) denote that the next header is of type IP.
The next twenty bytes is the IP header (which you show in your figure), these bytes are:
0000 45 00 00 28 18 c7 40 00 80 06 00 00 c0 a8 01 05 E..(..#.........
0010 5e 1f 1d 9a ^...
So to interpret this lets look at 4-byte words.
The first 4-byte word (45 00 00 28), according to your figure is:
first byte : version & length, we have 0x45 meaning IPv4, and 5 4-byte words in length
second byte : Type of Service 0x00
3rd & 4th bytes: total length 0x00 0x28 or 40 bytes.
The second 4-byte word (18 c7 40 00), according to your figure is:
1st & 2nd bytes: identification 0x18 0xc7
3rd & 4th bytes: flags (3-bits) & fragmentation offset (13-bits)
flags - 0x02 0x40 is 0100 0000 in binary, and taking the first three bits 010 gives us 0x02 for the flags.
offset - 0x00
The third 4-byte word (80 06 00 00), according to your figure is:
first byte : TTL, 0x80 or 128 hops
second byte : protocol 0x06 or TCP
3rd & 4th bytes: 0x00 0x00
The fourth 4-byte word (c0 a8 01 05), according to your figure is:
1st to 4th bytes: source address, in this case 192.168.1.5
notice that each byte corresponds to one of the octets in the IP address.
The fifth 4-byte word (5e 1f 1d 9a), according to your figure is:
1st to 4th bytes: destination address, in this case 94.31.29.154
Doing this type of programming is a bit confusing at first, I recommend doing a paring by hand (like I did above) a few times to get the hang of it.
One final thing, in this line of code printf("%02x", (unsigned int)buf[i]);, I'd recommend changing it to printf("%02x ", (unsigned char)buf[i]);. Remember that each element in you buf array represents a single byte read from the file.
Hope this helps,
T.

Related

PIC Embedded C printf Corrupted Output - Very odd

I have a 69 element array MBRespon[i] of hex data which I'm sending out of two USARTs on a PIC18F46K40. The first USART loops through the data and transmits everything fine, and when that's done the code goes through a 2nd loop where it prints the formatted data out of the 2nd UART using:
printf(" Byte %02i : 0x%02x \r\n", i, MBRespon[i]);
At first it looks like the data is being printed out fine, however upon closer inspection around the 57th byte it sends the wrong thing. At first I thought this might have been a EUSART2_TX_BUFFER_SIZE issue, and indeed changing the buffer sizes does have a impact; more gets corrupt when it's anything different than 32.
#define EUSART2_TX_BUFFER_SIZE 32 // 32 Works (Sort of)
#define EUSART2_RX_BUFFER_SIZE 32 // 32 Works (Sort of)
If i reduce the number of elements in the array to 58 or less it's fine, anything more and it's corrupt.
My code:
void PrintModRespon(){
int i=0;
printf("Modbus Response Count %i:\r\n",MBResCnt);
while(!EUSART2_is_tx_ready()); // Hold the program until TX is ready
for(i=0; i< MBResCnt ; i++ ){
while(!EUSART2_is_tx_ready()); // Hold the program until TX is ready
printf(" Byte %02i : 0x%02x \r\n", i, MBRespon[i]);
while(!EUSART2_is_tx_done()); // Hold until done.
}
while(!EUSART2_is_tx_ready()); // Hold the program until TX is ready
printf("\r\n\n");
while(!EUSART2_is_tx_done());
}
I added in while(!EUSART2_is_tx_ready()); and while(!EUSART2_is_tx_done()); as I thought it may be the UART wasn't ready/busy, but they didn't make any difference.
UART1 Output (The good one):
06 03 40 00 01 00 02 00 03 ....etc.... 00 1a 00 1b 00 1c 00 1d 00 1e 00 1f 00 20 5c 30
UART2 Output (The bad one):
Modbus Response Count 69:
Byte 00 : 0x06
Byte 01 : 0x03
Byte 02 : 0x40
Byte 03 : 0x00
Byte 04 : 0x01
Byte 05 : 0x00
Byte 06 : 0x02
Byte 07 : 0x00
Byte 08 : 0x03
..etc...
Byte 53 : 0x00
Byte 54 : 0x1a
Byte 55 : 0x00
Byte 56 : 0x1b
Byte 57 : 0x14 // <-- This is NOT in the array!
Byte 58 : 0x1c
Byte 59 : 0x00
Byte 60 : 0x1d
Byte 61 : 0x00
Byte 62 : 0x1e
Byte 63 : 0x00
Byte 64 : 0x1f
Byte 65 : 0x00
Byte 66 : 0x20
Byte 67 : 0x5c
Byte 68 : 0x30
Oddly, as another test, I changed the printf function to:
printf(" Byte %02i : 0x%02x \r\n", i, 0x00 +i);
I.E Not using the array of data, and it output 69x incremented values fine! Which suggests it's not a UART buffer/timing issue?
Modbus Response Count 69:
Byte 00 : 0x00
Byte 01 : 0x01
Byte 02 : 0x02
...etc.
Byte 51 : 0x33
Byte 52 : 0x34
Byte 53 : 0x35
Byte 54 : 0x36
Byte 55 : 0x37
Byte 56 : 0x38
Byte 57 : 0x39
Byte 58 : 0x3a
Byte 59 : 0x3b
Byte 60 : 0x3c
Byte 61 : 0x3d
Byte 62 : 0x3e
Byte 63 : 0x3f
Byte 64 : 0x40
Byte 65 : 0x41
Byte 66 : 0x42
Byte 67 : 0x43
Byte 68 : 0x44
Any suggestions greatly appreciated. I've been stuck on this for days now!

This is very likely a padding issue since the first element of your structure has an odd number of bytes.
When defining structures, the compiler will create one-byte padding between elements if they do not fall on the default alignment of the processor. A 16-bit processor will want most things aligned on word boundaries (i.e. even addresses), including byte arrays (e.g. arrays of char). The exception is elements that only occupy one byte.
On a 16-bit Microchip MPU, if you try to access word or dword data on odd address, it will cause a memory fault. The compiler is trying to keep this from happening.
If your structure only contains byte-sized elements (char, uint8_t, etc.), or arrays of the same, you can force byte alignment by adding the qualifier __attribute__((packed)) to the structure declaration. This can be dangerous because you may end up with elements having odd addresses. This is OK as long as the odd-addressed elements are only accessed as bytes (e.g. a char array), but proceed with caution.

Understanding fwrite() behaviour

When I use fwrite to stdout it becomes undefined results?
What is the use of the size_t count argument in fwrite and fread function?
#include<stdio.h>
struct test
{
char str[20];
int num;
float fnum;
};
int main()
{
struct test test1={"name",12,12.334};
fwrite(&test1,sizeof(test1),1,stdout);
return 0;
}
output:
name XEA
Process returned 0 (0x0) execution time : 0.180 s
Press any key to continue.
And when i use fwrite for file
#include<stdio.h>
#include<stdlib.h>
struct test
{
char str[20];
int num;
float fnum;
};
int main()
{
struct test test1={"name",12,12.334};
FILE *fp;
fp=fopen("test.txt","w");
if(fp==NULL)
{
printf("FILE Error");
exit(1);
}
fwrite(&test1,sizeof(test1),1,fp);
return 0;
}
The File also contains like this
name XEA
Why is the output like this?
And when I put size_t count as 5, it also becomes undefined result. What is the purpose of this argument?

In the way that you're using it, fwrite is writing "binary" output, which is a byte-for-byte representation of your struct test as in memory. This representation is not generally human-readable.
When you write
struct test test1 = {"name", 12, 12.334};
you get a structure in memory which might be represented like this (with all byte valuess in hexadecimal):
test1: str: 6e 61 6d 65 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
num: 0c 00 00 00
fnum: 10 58 45 41
Specifically: 0x6e, 0x61, 0x6d, and 0x65 are the ASCII codes for the letters in "name". Your str array was size 20, so the string name is 0-terminated and then padded out with 15 more 0 characters. 0x0c is the hexadecimal representation of the number 12, and I'm assuming that type int is 32 bits or 4 bytes on your machine, so there are 3 more 0's there, also. (Also I'm assuming your machine is "little endian", with the least-significant byte of a 4-byte quantity like 0x0000000c coming first in memory.) Finally, in the IEEE-754 format which your machine uses, the number 12.334 is represented in 32-bit single-precision floating point as 0x41455810, which is again stored in the opposite order in memory.
So those 28 bytes (plus possibly some padding, which we won't worry about for now) are precisely the bytes that get written to the screen, or to your file. The string "name" will probably be human-readable, but all the rest will be, literally, "binary garbage". It just so happens that three of the bytes making up the float number 12.334, namely 0x58, 0x45, and 0x41, correspond in ASCII to the capital letters X, E, and A, so that's why you see those characters in the output.
Here's the result of passing the output of your program into a "hex dump utility:
0 6e 61 6d 65 00 00 00 00 00 00 00 00 00 00 00 00 name............
16 00 00 00 00 0c 00 00 00 10 58 45 41 .........XEA
You can see the letters name at the beginning, and the letters XEA at the end, and all the 0's and other binary characters in between.
If you're on a Unix or Linux (or Mac OS X) system, you can use tools like od or hexdump to get a hex dump similar to the one I've shown.
You asked about the "count" argument to fwrite. fwrite is literally designed for writing out binary structures, just like you're doing. Its function signature is
size_t fwrite(void *ptr, size_t sz, size_t n, FILE *fp);
As you know, ptr is a pointer to the data structure(s) you're writing, and fp is the file pointer you're writing it/them to. And then you're saying you want to write n (or 'count') items, each of size sz.
You called
fwrite(&test1, sizeof(test1), 1, fp);
The expression sizeof(test1) gives the size of test1 in bytes. (It will probably be 28 or so, as I mentioned above.) And of course you're writing one struct, so passing sizeof(test1) and 1 as sz and n is perfectly reasonable and correct.
It would also not be unreasonable or incorrect to call
fwrite(&test1, 1, sizeof(test1), fp);
Now you're telling fwrite to write 28 bytes, each of size 1. (A byte is of course always size 1.)
One other thing about fwrite (as noted by #AnttiHaapala in a comment): when you're writing binary data to a file, you should specify the b flag when you open the file:
fp = fopen("test.txt", "wb");
Finally, if this isn't what you want, if you want human-readabe text output instead, then fwrite is not what you want here. You could use something like
printf("str: %s\n", test1.str);
printf("num: %d\n", test1.num);
printf("fnum: %f\n", test1.fnum);
Or, to your file:
fprintf(fp, "str: %s\n", test1.str);
fprintf(fp, "num: %d\n", test1.num);
fprintf(fp, "fnum: %f\n", test1.fnum);

C program Reading Hex values from Fat MBR

I am new to the C language and am trying to create a forensic tool. So far i have this. It reads a dd file which is a dump of a fat16 MBR. I am able to read certain bytes properly but not some.
What i need help with is the SizeOfFat variable needs to get the values of byte 0x16 and 0x17 read in little endian. How would i have it read FB00 and then convert it to 00FB and then print the value ?
char buf[64];
int startFat16 = part_entry[0].start_sect,sectorSize = 512,dirEntrySize=32;
fseek(fp, startFat16*512, SEEK_SET);
fread(buf, 1, 64, fp);
int rootDir = *(int*)(buf+0x11);
int sectorPerCluster = *(int*)(buf+0x0D);
int sizeOfFat = *(int*)(buf+0x16);
int fatCopies = *(int*)(buf+0x10);
printf("\n Phase 2 \n no of sectors per cluster : %d \n",(unsigned char)sectorPerCluster);
printf("size of fat : %d \n",(unsigned char)sizeOfFat);
printf("no of Fat copies : %d \n",(unsigned char)fatCopies);
printf("maximum number of root directories : %d \n",(unsigned char)rootDir);
The hex values im working with here are -
EB 3C 90 4D 53 44 4F 53 35 2E 30 00 02 08 02 00
02 00 02 00 00 F8 FB 00 3F 00 FF 00 3F 00 00 00
E1 D7 07 00 80 00 29 CD 31 52 F4 4E 4F 20 4E 41

With int, you only got the guarantee that it can hold 32-bit signed integers. With your code, you read sizeof(int) bytes for every of your variables, even though they differ in size. There are uint16_t, uint8_t, uint32_t types on most systems. Use those for fixed-width data. Note also that they are unsigned. You don't want negative sectors per cluster, do you?

Unable to Display Packet's Ethertype

So, I have a program to sniff packets, and so far it has been working very well. I am now trying to add additional functionality, but I keep running into problems when I try to decode the ethertype of a captured packet's Ethernet header. To isolate the location in which the error is occurring, I wrote a modified program to dump only the Ethernet header in hexadecimal, and then, using custom headers, decode the the source MAC address, destination MAC address, and ethertype, but this is the output:
Sniffing on device wlan0
308 BYTE PACKET
01 00 5e 00 00 fb 04 e5 36 4a 5d 3a 08 00 | ..^.....6J]:..
SRC 01:00:5e:00:00:fb DST 04:e5:36:4a:5d:3a TYPE 0000
328 BYTE PACKET
33 33 00 00 00 fb 04 e5 36 4a 5d 3a 86 dd | 33......6J]:..
SRC 33:33:00:00:00:fb DST 04:e5:36:4a:5d:3a TYPE ff11
194 BYTE PACKET
01 00 5e 00 00 fb 04 e5 36 4a 5d 3a 08 00 | ..^.....6J]:..
SRC 01:00:5e:00:00:fb DST 04:e5:36:4a:5d:3a TYPE 0000
Captured 3 packets
Apparently, the program has no problem decoding the MAC addresses, which are the first twelve bytes of the fourteen byte header, but, for some reason, when it reaches the final two bytes comprising the ethertype, the program fails to display them properly. The first and third packets' ethertypes are 0x0800, or IP, and the second packet's ethertype is 0x86dd, or IPv6. I have tried displaying the ethertype in numerous formats, but none have yielded proper results.
These are the lines of code responsible for displaying the MAC addresses and ethertype:
ethernet_header = (const struct sniff_ethernet_hdr *)packet;
printf ("SRC %02x", ethernet_header->ether_shost[0]);
for (i = 1; i < ETH_ALEN; i++)
printf (":%02x", ethernet_header->ether_shost[i]);
printf (" DST %02x", ethernet_header->ether_dhost[0]);
for (i = 1; i < ETH_ALEN; i++)
printf (":%02x", ethernet_header->ether_dhost[i]);
printf (" TYPE %.4x", ethernet_header->ether_type);
printf ("\n");
Does anyone have any suggestions or notice any problems with the code?
EDIT: I continued to play with this program, and I discovered something strange. I set a pointer called eth_ptr to point to the thirteenth byte of the packet, where the ethertype begins. When I dereferenced this pointer and printed the result, it was, indeed, the first byte of the ethertype. So, I added a line to the program that prints the addresses of eth_ptr and the ethertype of the Ethernet header struct. These are the results:
Sniffing on device wlan0
104 BYTE PACKET
01 00 5e 00 00 fb 04 e5 36 4a 5d 3a 08 00 | ..^.....6J]:..
SRC 01:00:5e:00:00:fb DST 04:e5:36:4a:5d:3a TYPE 8
eth_ptr # 0x7ffafd0b1944 ether_type # 0x7ffafd0b194c
124 BYTE PACKET
33 33 00 00 00 fb 04 e5 36 4a 5d 3a 86 dd | 33......6J]:..
SRC 33:33:00:00:00:fb DST 04:e5:36:4a:5d:3a TYPE 86
eth_ptr # 0x7ffafd0b1944 ether_type # 0x7ffafd0b194c
319 BYTE PACKET
01 00 5e 00 00 fb 04 e5 36 4a 5d 3a 08 00 | ..^.....6J]:..
SRC 01:00:5e:00:00:fb DST 04:e5:36:4a:5d:3a TYPE 8
eth_ptr # 0x7ffafd0b1944 ether_type # 0x7ffafd0b194c
Captured 3 packets
$ gdb -q
(gdb) p /x 0x7ffafd0b194c - 0x7ffafd0b1944
$1 = 0x8
The type that is being printed is really the dereferenced pointer, which points to the first byte of the ethertype. This pointer is located eight bytes before ethernet_header->ether_type in memory; therefore, the problem is that the ethertype element of the struct is located eight bytes ahead of where it is supposed to be. I do not know why this is, or how to fix this. Can anyone offer an explanation?
EDIT AGAIN: Well, I am a fool. I just took a close look at the Ethernet header structure:
struct sniff_ethernet_hdr
{
uint8_t ether_shost[ETH_ALEN];
uint8_t ether_dhost[ETH_HLEN];
uint16_t ether_type;
} __attribute__ ((__packed__));
The macros:
#define ETH_ALEN 6
#define ETH_HLEN 14
After I fixed this careless mistake, the program ran flawlessly.

How can I access members of a struct when it's not aligned properly?

I'm afraid that I'm not very good at low level C stuff, I'm more used to
using objects in Obj-c, so please excuse me if this is an obvious question, or if I've completely misunderstood something...
I am attempting to write an application in Cocoa/Obj-C which communicates with an external bit of hardware (a cash till.) I have the format of the data the device sends and receives - and have successfully got some chunks of data from the device.
For example: the till exchanges PLU (price data) in chunks of data in the following format: (from the documentation)
Name Bytes Type
Name Bytes Type
PLU code h 4 long
PLU code L 4 long
desc 19 char
Group 3 char
Status 5 char
PLU link code h 4 long
PLU link code l 4 long
M&M Link 1 char
Min. Stock. 2 int
Price 1 4 long
Price 2 4 long
Total 54 Bytes
So I have a struct in the following form in which to hold the data from the till:
typedef struct MFPLUStructure {
UInt32 pluCodeH;
UInt32 pluCodeL;
unsigned char description[19];
unsigned char group[3];
unsigned char status[5];
UInt32 linkCodeH;
UInt32 linkCodeL;
unsigned char mixMatchLink;
UInt16 minStock;
UInt32 price[2];
} MFPLUStructure;
I have some known sample data from the till (below) which I have checked by hand and is valid
00 00 00 00 4E 61 BC 00 54 65 73 74 20 50 4C 55 00 00 00 00 00 00 00 00 00 00 00 09 08 07 17 13 7C 14 04 00 00 00 00 09 03 00 00 05 BC 01 7B 00 00 00 00 00 00 00
i.e.
bytes 46 to 50 are <7B 00 00 00> == 123 as I would expect as the price is set to '123' on the till.
byte 43 is <05> == 5 as I would expect as the 'mix and match link' is set to 5 on the till.
bytes 39 to 43 are <09 03 00 00> == 777 as I would expect as the 'link code' is set to '777' on the till.
Bytes 27,28,29 are <09 08 07> which are the three groups (7,8 & 9) that I would expect.
The problem comes when I try to get some of the data out of the structure programmatically: The early members work correctly right up to, and including the five 'status' bytes. However, members after that don't come out properly. (see debugger screenshot below.)
Image 1 - http://i.stack.imgur.com/nOdER.png
I assume that the reason for this is because the five status bytes push later members out of alignment - i.e. they are over machine word boundaries. Is this right?
Image 2 - i.imgur.com/ZhbXU.png
Am I right in making that assumption?
And if so, how can I get the members in and out correctly?
Thanks for any help.

Either access the data a byte at a time and assemble it into larger types, or memcpy it into an aligned variable. The latter is better if the data is known to be in a format specific to the host's endianness, etc. The former is better if the data follows an external specification that might not match the host.

If you're sure that endianness of host and wire agree, you could also use a packed structure to read the data in a single pass. However, this is compiler-specific and will most likely impact performance of member access.
Assuming gcc, you'd use the following declarations:
struct __attribute__ ((__packed__)) MFPLUStructure { ... };
typedef struct MFPLUStructure MFPLUStructure;
If you decide to use a packed structure, you should also verify that it is of correct size:
assert(sizeof (MFPLUStructure) == 54);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Dissecting a binary file in C - c

Related

PIC Embedded C printf Corrupted Output - Very odd

Understanding fwrite() behaviour

C program Reading Hex values from Fat MBR

Unable to Display Packet's Ethertype

How can I access members of a struct when it's not aligned properly?

Categories

Resources