How to parse this Apple-specific MPEG file header? - file-format

I have an MPEG file which starts like this:
0: 00 0f 6d 79 5f 66 69 6c 65 6e 61 6d 65 2e 6d 70 ..my_filename.mp
10: 67 00 04 fc 00 00 f0 00 b2 10 39 a8 b2 10 39 ad g.........9...9.
20: 0f 6d 79 5f 66 69 6c 65 6e 61 6d 65 2e 6d 70 67 .my_filename.mpg
30: 03 92 3b 40 00 00 00 00 03 7a b5 7c 03 7a d7 d0 ..;#.....z.|.z..
40: 00 4d 6f 6f 56 54 56 4f 44 01 00 01 2a 00 80 00 .MooVTVOD...*...
50: 00 00 00 00 36 b2 83 00 00 04 fc b2 10 39 a8 b2 ....6........9..
60: 10 39 ad 00 00 00 00 00 00 00 00 00 00 00 00 00 .9..............
70: 00 00 00 00 00 00 00 00 00 00 81 81 35 d3 00 00 ............5...
80: 00 36 b2 83 6d 64 61 74 00 00 01 ba 21 00 01 00 .6..mdat....!...
90: 05 80 2b 81 00 00 01 bb 00 0c 80 2f d9 04 e1 ff ..+......../....
a0: c0 c0 20 e0 e0 2e 00 00 01 c0 07 ea ff ff ff ff .. .............
What is the file format of the beginning of the file (first 0x80 bytes), and how do I parse it?
I've run a Google search on MooVTVOD, it looks like something related to QuickTime and iTunes.
What I understand already:
There is 4 bytes of big endian file size in front mdat, according to the QuickTime .mov file format when the .mov contains an MPEG.
Right after mdat there is the MPEG-PS header 00 00 01 ba. Shortly after there is the MPEG-PES header 00 00 01 c0 indicating an audio stream.
However, the first 0x80 bytes in this file seem to be in a different file file format (not QuickTime .mov, not MPEG-PS, not MPEG-PES), and in this question I'm only interested in the file format of the first 0x80 bytes.
Media players such as VLC routinely ignore junk at the beginning of the file, and start playing the MPEG-PS stream at offset 0x80. However, I'm interested in the 0x80 bytes they ignore.

The file format is "Quicktime movie atom," that contains meta information about the media file, or the media itself. mdat is a Media Data atom.
Media atoms describe and define a track’s media type and sample data.
the spec is here: https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap2/qtff2.html
You can parse it with this python script:
https://github.com/kzahel/quicktime-parse
or this software, mentioned in the linked question:
https://archive.codeplex.com/?p=mp4explorer

The file format of the first 0x80 bytes in the question is MacBinary II.
Based on the file format description https://github.com/mietek/theunarchiver/wiki/MacBinarySpecs , it's not MacBinary I, because MacBinary II has data[0x7a] == '\x81' and data[0x7b] == '\x81' (and MacBinary I has data[0x7a] == '\x00', and MacBinary III has data[0x7a] == '\x82'); and it's also not MacBinary III, because that one has data[0x66 : 0x6a] == 'mBIN'.
The CRC value data[0x7c : 0x7e] is incorrect because the filename was modified before posting to StackOverflow. FYI The CRC algorithm used is CRC16/XMODEM (on https://crccalc.com/), also implemented as CalcCRC in
MacBinary.c.
The data[0x41 : 0x45] == 'MooV' is the file type code. According to the Excel spreadsheet downloadable from TCDB, it means QuickTime movie (video and maybe audio).
The data[0x45 : 0x49] == 'TVOD' is the file creator code. TCDB and this database indicate that it means QuickTime Player.
More info and links about MacBinary: http://fileformats.archiveteam.org/wiki/MacBinary
Please note that all these headers are unnecessary to play the video: by removing the first 0x88 bytes we get an MPEG-PS video file, which many video players can play (not only QuickTime Player, and not only players running on macOS).

Related

C program in Docker: fwrite(3) and write(2) fail to modify files on Windows but not on MacOS

I am writing a guest OS on top of Linux (Ubuntu distribution) within a Docker container. The filesystem is implemented as a single file resting inside the host OS, so anytime a file is changed in the guest OS filesystem, the file on the host OS must be opened, the correct block(s) must be overwritten, and the file must be closed.
My partner and I have developed the following recursive helper function to take in a block number and offset to abstract away all details at the block-level for higher level functions:
/**
* Recursive procedure to write n bytes from buf to the
* block specified by block_num. Also updates FAT to
* reflect changes.
*
* #param block_num identifier for block to begin writing
* #param buf buffer to write from
* #param n number of bytes to write
* #param offset number of bytes to start writing from as
* measured from start of file
*
* #returns number of bytes written
*/
int write_bytes(int block_num, const char *buf, int n, int offset) {
BlockTuple red_tup = reduce_block_offset(block_num, offset);
block_num = red_tup.block;
offset = red_tup.offset;
FILE *fp = fopen(fat->fname, "r+");
int bytes_to_write = min(n, fat->block_size - offset);
int write_n = max(bytes_to_write, 0);
fseek(fp, get_block_start(block_num) + offset, SEEK_SET);
fwrite(buf, 1, write_n, fp); // This line is returning 48 bytes written
fclose(fp);
// Check if there are bits remaining
int bytes_left = n - write_n;
if (bytes_left > 0) {
// Recursively write on next block
int next_block = get_free_block();
set_fat_entry(block_num, next_block); // point block to next block
set_fat_entry(next_block, 0xFFFF);
return write_bytes(next_block, buf + write_n, bytes_left, max(0, offset - fat->block_size)) + write_n;
} else {
set_fat_entry(block_num, 0xFFFF); // mark file as terminated
return write_n;
}
}
The issue is that fwrite(3) is reporting 48 bytes written (when n is passed as 48) but hexdumping the file on the host OS reveals no bytes have been changed:
00000000 00 01 ff ff ff ff 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00008000
This is particularly wacky because when my partner runs the code on the exact same commit (with no uncommitted changes), her write goes through and the file on the host OS hexdumps to:
00000000 00 01 ff ff ff ff 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 66 31 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |f1..............|
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000120 00 01 00 00 02 00 01 06 e7 36 75 63 00 00 00 00 |.........6uc....|
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000200 e0 53 f8 88 c0 0d 37 ca 84 1f 19 b0 6c a8 68 7b |.S....7.....l.h{|
00000210 57 83 cf 13 f0 42 21 d3 21 e1 da de d4 8a f1 e6 |W....B!.!.......|
00000220 f0 12 98 fb 1c 30 4c 04 b3 16 1d 96 17 ba d7 5a |.....0L........Z|
00000230 7e f3 8a f5 6a 42 6b ef 58 f6 bc 01 db 0c 02 53 |~...jBk.X......S|
00000240 e5 10 7e f3 4a d5 3f ac 8e 38 82 c3 95 f8 11 8e |..~.J.?..8......|
00000250 a6 82 eb 3b 24 56 9a 75 44 36 8b 25 60 83 4c 04 |...;$V.uD6.%`.L.|
00000260 07 9e 14 99 9c 9f 87 3c 8a d4 c3 e8 17 60 81 0e |.......<.....`..|
00000270 bc eb 1d 35 68 fc d5 be 4f 1c 9d 5e 72 57 65 01 |...5h...O..^rWe.|
00000280 b7 43 54 26 d6 6d ba 51 bf 12 8c a1 03 d5 66 b3 |.CT&.m.Q......f.|
00000290 90 0d 60 b8 95 8d 15 bd 53 9a 70 77 4f 7a 04 1e |..`.....S.pwOz..|
000002a0 9e b2 4c 9a 79 dd de 48 cd fe 1e dc 57 7d d1 7f |..L.y..H....W}..|
000002b0 3f f5 77 96 fa e7 d7 33 33 48 ce 0a 4d 61 ab 96 |?.w....33H..Ma..|
000002c0 5f c4 88 bf c6 3a 09 37 76 c4 b8 db bc 6a 7d c0 |_....:.7v....j}.|
000002d0 c4 89 68 e7 b4 70 f8 a6 a8 00 9d c4 63 da fb 66 |..h..p......c..f|
000002e0 be d2 cd 68 1c d2 ff bf 00 e9 37 ab 6b 1a 3c f2 |...h......7.k.<.|
000002f0 7b c1 a2 c4 46 ae db 93 b4 4f 64 79 14 2a 1a d4 |{...F....Ody.*..|
00000300 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00008000
The 48 bytes I'm referring to that don't get written are the bytes written to the directory block running from address 00000100-0000012E (the bytes below that represent the actual file being written, the code seg faults on my end before reaching that write). It's worth noting my container can still format the filesystem file, so all writes aren't broken. This snippet just represents the first write that did not work.
We are both running the code in an identical Docker container. The only difference I could imagine is my computer is a Windows and hers is a Mac. What could possibly be the issue?
The very first thing I believed was that there was some conflict with the host OS that blocked my write, but assigning and printing the return value of fwrite(3) returned that 48 bytes were indeed written on both machines.
I was also expecting that my buffer was simply all 0s (it is initially allocated using calloc(3)), but printing out the first 48 bytes of the buffer proved that theory false.
I finally considered that this was some issue with the higher level interface in <stdio.h> instead of the lower level one in <unistd.h>. I replaced fopen(3), fwrite(3), flseek(3), fclose(3) each with their lower-level equivalents (write(2) etc) and it still turned up 48 bytes written with no actual change to the files.
EDIT:
The guest OS filesystem can be formatted with respect to user parameters. All testing above was performed with a block size of 256 bytes and 128 blocks total. I've attempted the exact same write sequence again with a block size of 1024 bytes and 16384 blocks total, and there was no error. It's still unclear why the code works on my partner's machine for both format configs and not mine, but this may narrow it down.
Running strace reveals the following excerpt around the write:
openat(AT_FDCWD, "minfs", O_RDWR) = 4
newfstatat(4, "", {st_mode=S_IFREG|0777, st_size=32768, ...}, AT_EMPTY_PATH) = 0
lseek(4, 0, SEEK_SET) = 0
read(4, "\0\1\377\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256
write(4, "f1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 48) = 48
close(4)
It again appears the bytes get written, but a hd after the program finishes reveals the same output as above. My thought was perhaps the bytes written in the excerpt are overwritten later on, but the only write after the excerpt above in the strace is:
lseek(4, 0, SEEK_SET) = 0
read(4, "\0\1\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 320) = 320
write(4, "\0", 1) = 1
close(4)
which should be at address 320, squarely after the write at address 256 above.
It turns out the mismatch was due to undefined behavior concerning when changes were synchronized in mmap(2). There was a section of code where a region of memory mapped via mmap(2) was changed and then immediately followed by reads/writes to the file on the host OS containing the mapped region of memory. It seems the Mac would write through the changes before the following section while the Windows wouldn't synchronize until after the fact, resulting in the undefined behavior.
The problem was fixed by making a call to msync(2) immediately after modifying the mapped region from mmap(2) with the MS_SYNC flag forcing the write-through behavior.
Links to documentation here: mmap(2), msync(2).

What is the Informix database file extension

I've got an image file of a hard drive from a client that wants a database extracted out of it. The client does not know any details except that the database was once installed on the server from which the image was created.
I found out that it is a UNIX system with a Informix DBS installed, but I am unable to find any database files. I'm not sure about the version of Informix, but it seem that it was installed around 15 years ago.
I'm not able to boot from the image. I'm just viewing the files.
Do informix database files have an extension and what could it be? Any other tips how to identify the database files?
Do you know whether the database was Informix Standard Engine (SE) or Informix (Informix Dynamic Server — IDS — or one of its multitude of namesakes over the years)?
Standard Engine
If it was SE, then the database files are in a directory database.dbs and the files holding the indexes and data for the tables have extensions .idx and .dat. That's pretty much fixed and very easy. The files and the directory should belong to group informix; the owner will be whoever created the database or the table within the database.
Informix Dynamic Server
If it was IDS, then there is no guaranteed naming convention, and Informix didn't even recommend one. Depending on the state of the disk, I'd look for large files that are owned by user informix and belong to group informix and have 660 (-rw-rw----) permissions. The files will have structure, but it isn't easy to discern it.
For example, I have 'chunk 0 of the root dbspace' in a file toru_31.rootdbs.c0 (server name toru_31 — a naming standard I imposed on my systems). It starts with:
0x0000: 00 00 00 00 01 00 AB 89 03 00 00 18 30 01 C0 06 ............0...
0x0010: 00 00 00 00 00 00 00 00 49 42 4D 20 49 6E 66 6F ........IBM Info
0x0020: 72 6D 69 78 20 44 79 6E 61 6D 69 63 20 53 65 72 rmix Dynamic Ser
0x0030: 76 65 72 20 43 6F 70 79 72 69 67 68 74 20 32 30 ver Copyright 20
0x0040: 30 31 2C 20 32 30 31 31 20 20 49 42 4D 20 43 6F 01, 2011 IBM Co
0x0050: 72 70 6F 72 61 74 69 6F 6E 2E 00 00 00 00 00 00 rporation.......
0x0060: 00 00 00 00 00 00 00 00 00 00 03 00 00 08 00 00 ................
0x0070: BA 2D CC 50 1A 00 00 00 00 00 00 00 C8 00 00 00 .-.P............
0x0080: 31 31 37 33 05 00 00 00 00 00 00 00 00 00 00 00 1173............
0x0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
However, 'chunk 1 of the root dbspace' (file toru_31.rootdbs.c1) starts:
0x0000: 00 00 00 00 02 00 F1 C6 00 00 00 18 18 00 E4 07 ................
0x0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
* (125)
0x07F0: 00 00 00 00 00 00 00 00 00 00 00 00 61 C6 92 00 ............a...
0x0800: 69 6E 66 6F 72 6D 69 78 69 6E 66 6F 72 6D 69 78 informixinformix
* (127)
0x1000: 02 00 00 00 02 00 BE B6 06 00 08 08 20 00 D8 07 ............ ...
0x1010: 00 00 00 00 00 00 00 00 10 06 00 00 01 00 00 00 ................
0x1020: EA 06 00 00 03 00 00 00 75 08 00 00 08 00 00 00 ........u.......
0x1030: A1 08 00 00 08 00 00 00 B5 0A 00 00 30 03 00 00 ............0...
There is very little information there to give the game away. Although the appearance of informix (128 lines containing informix twice) looks like a tell-tale sign, it isn't — I created the chunk with a program that writes informix over the disk space. It simply shows where the server has not yet written data to this chunk.
You can look for $INFORMIXDIR. In fact, if there's a directory /INFORMIXTMP, you can look in the text file /INFORMIXTMP/.infxdirs to see where Informix has been installed. From those directories, you can look in $INFORMIXDIR/etc for onconfig files - they're text files that contain an entry ROOTPATH, which is the pathname of chunk 0 of the root dbspace — basically, the starting point for the whole system. There's usually an onconfig.std which is a template; the naming convention is not firmly fixed, though I always use onconfig.servername (so the config file for toru_31 is onconfig.toru_31). You may also find other files around in $INFORMIXDIR/etc. For example, there's a file oncfg_toru_31.31 (prefix oncfg_, followed by server name toru_31 followed by dot and the server number 31) which contains information about the chunks and other disk space etc used by the server. You might also see binary files analogous to .conf.toru_31 and .infos.toru_31 — the .infos file is normally present only while the server is up but the .conf file persists. These files have some limited information in them, most notably the name of the onconfig file.
If you can find these files on the disk, then you can proceed to identify where the data was stored on the disk.

Xbee sending wrong ZDO responses

I´m playing around with two Xbees, one defined as coordinator, another as router. I want to read information about the network interoperably so i decided to use the ZDO messages.
I send a message like this ((profile ID 0x00 00, cluster ID 0x 00 31) and receive for example the following response from the router:
7E 00 2D 91 00 13 A2 00 40 E5 F0 B4 FB CE 00 00 80 31 00 00 01 2C 00 01 00 01 58 CE C1 8D 7A 3F 2D 40 AB F0 E5 40 00 A2 13 00 00 00 04 02 00 FF 33
Correct answer cluster ID: 0x 80 31
Focussing on the RF Data i have the following:
2C 00 01 00 01 58 CE C1 8D 7A 3F 2D 40 AB F0 E5 40 00 A2 13 00 00 00 04 02 00 FF
I now try to decode this hex string and face some problems.
From my point of view, this string should be encoded like defined within the ZigBee Spec from 2012, at Table 2.126 and 2.127
Unfortunately this don´t work for me. If i ignore, that the first byte should be the status and take the first two of them, i can read out NeighborTableEntries, StartIndex, NeighborTabelListCount. But when it comes to the NeighTableList i only can read out the Extended PAN id, the extended address and the network address, the rest of the string does not fit to the standard. Am i doing something wrong here or does the xbee´s don´t stick to the standard?
2C = Sequence Number
00 = Status (Success)
01 = 1 entry (total)
00 = starting at index 0
01 = 1 entry (in packet)
58 CE C1 8D 7A 3F 2D 40 = Extended Pan ID
AB F0 E5 40 00 A2 13 00 = IEEE address
00 00 = NodeId
04 = (Coordinator, RxOnWhenIdle)
02 = (Unknown Permit Join)
00 = (Coordinator)
FF = (LQI)
The values after the NodeId are bitmasks, not bytes.

Visual C++, Extract text information from a DLL database (sentence mining)

The 2006 version of the free offline Chinese sentence dictionary Jukuu contains a collection of 100,000 publicly sourced example sentences in Chinese and English in a .dll file.
The application size is about 80mb, but once installed a 500mb DLL dictionary file is created with the source text. For whatever reason the application doesn't run on my computer, and I'd like to extract all the example sentences so I can do some POS analysis on them.
Opening the 500mb .DLL file is mostly gibberish, except for some fragments of text here and there and references to other resources.
I'm wondering if there is any way I can extract the information in plain text?
The application can be downloaded here: http://www.jukuu.com/down/download.html
Thanks!
Edit: Never-mind, It looks like when viewed in HEX, the file is ordered in a way that is not conducive to sentence mining at all:
00 06 00 02 00 07 03 00 01 00 00 00 FF FE 6E 65 76 65 72 7E 73 74 61
6E 64 20 75 70 0B 00 00 80 00 00 00 00 00 00 00 00 FF FE 33 30 32
32 31 38 32 36 38 2D 00 16 00 06 00 02 00 07 03 00 01 00 00 00 FF
FE 65 78 74 72 65 6D 65 6C 79 7E 63 6C 6F 73 65 0B 00 00 80 00 00
00 00 00 00 00
Something like ÿþnever~stand upÿþ302218268-ÿþextremely~close
Any other ideas on how to mine sentences from the application? Maybe a batch script?

Can anyone guess what protocol these packets belong to?

We see these packets being injected in an FTP-DTP channel during a downlink file transfer on Telstra's NEXTG mobile network. We are not sure if these are network level packets, a problem with our 3G modem (HC25 based) or something like our firewall injecting in the stream.
Using a tool we noticed that the PPP framing fails with protocol length errors, so they are mostly likely mobile network packets.
I am hoping someone here can identify the signature of the packets so that I can chase this up with the appropriate vendor.
There is definitely a format to these packets: -
Packet1:
00 00 00 24 c4 b8 7b 1a 00 90 7f 43 0f a1 08 00 45 00 01 10 f4 4e 00 00 40 06 2f 13 cb 7a 9d e9 7b d0 71 52 7a ed 04 06 8c 61 5d a9 01 f7 0c eb 50 10 ff ff 58 b9 00 00
Packet2:
00 00 00 24 c4 b8 7b 1a 00 90 7f 43 0f a1 08 00 45 00 00 ff 6b 50 00 00 40 06 b8 22 cb 7a 9d e9 7b d0 71 52 7a ed 04 06 8c 61 7b 82 01 f7 0c eb 50 10 ff ff a3 79 00 00
Packet3:
00 00 00 24 c4 b8 7b 1a 00 90 7f 43 0f a1 08 00 45 00 02 20 5b 50 00 00 40 06 c7 01 cb 7a 9d e9 7b d0 71 52 7a ed 04 06 8c 61 7c 59 01 f7 0c eb 50 10 ff ff e2 5d 00 00
Packet4:
00 00 00 24 c4 b8 7b 1a 00 90 7f 43 0f a1 08 00 45 00 01 38 d8 52 00 00 40 06 4a e7 cb 7a 9d e9 7b d0 71 52 7a ed 04 06 8c 62 42 f9 01 f7 0c eb 50 10 ff ff 20 91 00 00
Packet5:
00 00 00 24 c4 b8 7b 1a 00 90 7f 43 0f a1 08 00 45 00 00 d0 4d 58 00 00 40 06 d6 49 cb 7a 9d e9 7b d0 71 52 7a ee 04 08 4b fb 0b 8f 03 5d 51 1a 50 10 ff ff e9 88 00 00
I converted your packet trace snippet into a format understood by text2pcap so I could convert them into the pcap format for viewing in Wireshark (a very handy packet capture and analysis tool):
Looks like some sort of IPv4 multicast traffic at a very rough guess. Here's what I got from the first packet (rest came up as malformed):
No. Time Source Destination Protocol Info
1 0.000000 7b:1a:00:90:7f:43 00:00:00_24:c4:b8 0x0fa1 Ethernet II
Frame 1 (31 bytes on wire, 31 bytes captured)
Arrival Time: Dec 1, 2009 00:33:05.000000000
[Time delta from previous captured frame: 0.000000000 seconds]
[Time delta from previous displayed frame: 0.000000000 seconds]
[Time since reference or first frame: 0.000000000 seconds]
Frame Number: 1
Frame Length: 31 bytes
Capture Length: 31 bytes
[Frame is marked: False]
[Protocols in frame: eth:data]
Ethernet II, Src: 7b:1a:00:90:7f:43 (7b:1a:00:90:7f:43), Dst: 00:00:00_24:c4:b8 (00:00:00:24:c4:b8)
Destination: 00:00:00_24:c4:b8 (00:00:00:24:c4:b8)
Address: 00:00:00_24:c4:b8 (00:00:00:24:c4:b8)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Source: 7b:1a:00:90:7f:43 (7b:1a:00:90:7f:43)
Address: 7b:1a:00:90:7f:43 (7b:1a:00:90:7f:43)
.... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
.... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
Type: Unknown (0x0fa1)
Data (17 bytes)
0000 08 00 45 00 01 10 f4 4e 00 00 40 06 2f 13 cb 7a ..E....N..#./..z
0010 9d .
Data: 080045000110F44E000040062F13CB7A9D
These look like ordinary TCP packets but with two extra 00 bytes tagged on at the front. Not sure why that would happen, but they appear to be from 00-90-7f-43-0f-a1 (Watchguard) to 00-24-c4-b8-7b-1a (Cisco).
IP header is 45 00 01 10 f4 4e 00 00 40 06 2f 13 cb 7a 9d e9 7b d0 71 52
TCP header is 7a ed 04 06 8c 61 5d a9 01 f7 0c eb 50 10 ff ff 58 b9 00 00
So you can get the rest of the details from there.
00:24:c4 is a NIC from Cisco and 00:90:7F is a NIC from WatchGuard.
From the IEEE OUI Registry.
How much help that might be ... don't know. Might therefore be an attempted VPN connection.
As already decoded by others:
first 6+6+2 bytes identifying NIC and Ethernet II.
bytes 0x0800 EtherType telling that it is IP. http://en.wikipedia.org/wiki/EtherType
next octet starting with nibble "4" is IPv4
etc.

Resources