Parsing a .pcap file in plain C

Parsing a .pcap file in plain C - c

I'm trying to create my own pcap files parser. According to Wireshark's docs:
Global Header
This header starts the libpcap file and will be followed by the first packet header:
typedef struct pcap_hdr_s {
guint32 magic_number; /* magic number */
guint16 version_major; /* major version number */
guint16 version_minor; /* minor version number */
gint32 thiszone; /* GMT to local correction */
guint32 sigfigs; /* accuracy of timestamps */
guint32 snaplen; /* max length of captured packets, in octets */
guint32 network; /* data link type */
} pcap_hdr_t;
magic_number: used to detect the file format itself and the byte ordering. The writing application writes 0xa1b2c3d4 with it's native byte ordering format into this field. The reading application will read either 0xa1b2c3d4 (identical) or 0xd4c3b2a1 (swapped). If the reading application reads the swapped 0xd4c3b2a1 value, it knows that all the following fields will have to be swapped too. For nanosecond-resolution files, the writing application writes 0xa1b23c4d, with the two nibbles of the two lower-order bytes swapped, and the reading application will read either 0xa1b23c4d (identical) or 0x4d3cb2a1 (swapped).
version_major, version_minor: the version number of this file format (current version is 2.4)
thiszone: the correction time in seconds between GMT (UTC) and the local timezone of the following packet header timestamps. Examples: If the timestamps are in GMT (UTC), thiszone is simply 0. If the timestamps are in Central European time (Amsterdam, Berlin, ...) which is GMT + 1:00, thiszone must be -3600. In practice, time stamps are always in GMT, so thiszone is always 0.
sigfigs: in theory, the accuracy of time stamps in the capture; in practice, all tools set it to 0
snaplen: the "snapshot length" for the capture (typically 65535 or even more, but might be limited by the user), see: incl_len vs. orig_len below
network: link-layer header type, specifying the type of headers at the beginning of the packet (e.g. 1 for Ethernet, see tcpdump.org's link-layer header types page for details); this can be various types such as 802.11, 802.11 with various radio information, PPP, Token Ring, FDDI, etc.
/!\ Note: if you need a new encapsulation type for libpcap files (the value for the network field), do NOT use ANY of the existing values! I.e., do NOT add a new encapsulation type by changing an existing entry; leave the existing entries alone. Instead, send mail to tcpdump-workers#lists.tcpdump.org , asking for a new link-layer header type value, and specifying the purpose of the new value.
The first integer in the file should be either 0xA1B2C3D4 or 0xD4C3B2A1, but my code's output:
#include <stdio.h>
typedef unsigned int guint32;
typedef unsigned short guint16;
int main()
{
FILE * file = fopen("test.pcap", "rb");
guint32 magic_number;
fscanf(file, "%d", &magic_number);
printf("%x\n", magic_number);
return 0;
}
Is 0x8. Why is that?

The magic number are the first 4 bytes of the file. With fscanf(...%d you don't read these 4 bytes but you instead try to interpret the beginning of the file as the ASCII representation of a number, i.e. "1234" instead of "\x01\x02\x03\x04". Thus instead of fscanf you need to use fread to read exactly 4 bytes.
fread((void*)&magic_number, 4, 1, file)

Related

google protobuf - how to access size of buffer as specified in .proto file?

My protobuf file is
message Msg{
// User Authentication data as bytes.
bytes MsgData = 1 [(nanopb).max_size = 2048];
}
When I generate the C API, the relevant parts are:
#define PB_BYTES_ARRAY_T(n) struct { pb_size_t size; pb_byte_t bytes[n]; }
/* Struct definitions */
typedef PB_BYTES_ARRAY_T(2048) Msg_AuthenticationData_t;
/* Maximum encoded size of messages (where known) */
#define Msg_size 2051
QUESTION - 1
The size of the buffer, is 2048; how do I access that value using the API?
I know I can do sizeof(Msg_AuthenticationData_t.bytes) because it's set at 2048 at compile time but seems like there should be an API.
QUESTION - 2
And what is with the 2051? How is that usable?

sizeof(Msg_AuthenticationData_t.bytes) is the API to get the size of the array. That's the way it works for any C type.
The encoded message size is the maximum size of the output of pb_encode(). Because of the field type and length headers in protobuf encoding, it can be a few bytes larger than the payload.

XErrorEvent structure field meaning

I'm currently having some issues with Xlib and CEF and I need to investigate the XErrorEvent that is sent to the function registered with XSetErrorHandler.
typedef struct {
int type;
Display *display; /* Display the event was read from */
XID resourceid; /* resource id */
unsigned long serial; /* serial number of failed request */
unsigned char error_code; /* error code of failed request */
unsigned char request_code; /* Major op-code of failed request */
unsigned char minor_code; /* Minor op-code of failed request */
} XErrorEvent;
I would like to know the meaning of the type, request_code, and minor_code fields. There is a book on C language interface for the X window system but I couldn't find anything about this field.

type is what identifies a typeless memory pointer as pointer to an XErrorEvent - its value is always X_Error.
request_code is a protocol request of the procedure that failed, as defined in X11/Xproto.h, basically what kind of request generated the error (line 2020 and forward):
/* Request codes */
#define X_CreateWindow 1
#define X_ChangeWindowAttributes 2
#define X_GetWindowAttributes 3
#define X_DestroyWindow 4
#define X_DestroySubwindows 5
#define X_ChangeSaveSet 6
#define X_ReparentWindow 7
#define X_MapWindow 8
...
minor_code is similar to request_code except being used by extensions. Each extension gets its own request_code in the range 128-255. The minor_code identifies a particular request defined by that extension. So, X11 support up to 127 extensions, and each extension may define up to 255 requests. The exact paragraph:
Each extension is assigned a single opcode from that range, also known
as it's “major opcode.” For each operation provided by that extension,
typically a second byte is used as a “minor opcode.” Minor opcodes for
each extension are defined by the extension.

Reading a FAT16 file system

I am trying to read a FAT16 file system to gain information about it like number of sectors, clusters, bytespersector etc...
I am trying to read it like this:
FILE *floppy;
unsigned char bootDisk[512];
floppy = fopen(name, "r");
fread(bootDisk, 1, 512, floppy);
int i;
for (i = 0; i < 80; i++){
printf("%u,",bootDisk[i]);
}
and it outputs this:
235,60,144,109,107,100,111,115,102,115,0,0,2,1,1,0,2,224,0,64,11,240,9,0,18,0,2,0,0,0,0,0,0,0,0,0,0,0,41,140,41,7,68,32,32,32,32,32,32,32,32,32,32,32,70,65,84,49,50,32,32,32,14,31,190,91,124,172,34,192,116,11,86,180,14,187,7,0,205,16,
What do these numbers represent and what type are they? Bytes?

You are not reading the values properly. Most of them are longer than 1 byte.
From the spec you can obtain the length and meaning of every attributes in the boot sector:
Offset Size (bytes) Description
0000h 3 Code to jump to the bootstrap code.
0003h 8 Oem ID - Name of the formatting OS
000Bh 2 Bytes per Sector
000Dh 1 Sectors per Cluster - Usual there is 512 bytes per sector.
000Eh 2 Reserved sectors from the start of the volume.
0010h 1 Number of FAT copies - Usual 2 copies are used to prevent data loss.
0011h 2 Number of possible root entries - 512 entries are recommended.
0013h 2 Small number of sectors - Used when volume size is less than 32 Mb.
0015h 1 Media Descriptor
0016h 2 Sectors per FAT
0018h 2 Sectors per Track
001Ah 2 Number of Heads
001Ch 4 Hidden Sectors
0020h 4 Large number of sectors - Used when volume size is greater than 32 Mb.
0024h 1 Drive Number - Used by some bootstrap code, fx. MS-DOS.
0025h 1 Reserved - Is used by Windows NT to decide if it shall check disk integrity.
0026h 1 Extended Boot Signature - Indicates that the next three fields are available.
0027h 4 Volume Serial Number
002Bh 11 Volume Label - Should be the same as in the root directory.
0036h 8 File System Type - The string should be 'FAT16 '
003Eh 448 Bootstrap code - May schrink in the future.
01FEh 2 Boot sector signature - This is the AA55h signature
You should probably use a custom struct to read the boot sector.
Like:
typedef struct {
unsigned char jmp[3];
char oem[8];
unsigned short sector_size;
unsigned char sectors_per_cluster;
unsigned short reserved_sectors;
unsigned char number_of_fats;
unsigned short root_dir_entries;
[...]
} my_boot_sector;
Keep in mind your endianness and padding rules in your implementation. This struct is an example only.
If you need more details this is a thorough example.

libpcap Radiotap header extraction

I've got some code that is using the functions ieee80211_radiotap_iterator_init() and ieee80211_radiotap_iterator_next() from radiotap-parser.c,
I'm not sure what I'm doing incorrectly, perhaps someone can educate me? I'm using the sample code from the documentation more or less without modification, it fits very well to what I was trying to achieve:
/* where packet is `const u_char *packet' */
struct ieee80211_radiotap_iterator rti;
struct ieee80211_radiotap_header *rth = ( struct ieee80211_radiotap_header * ) packet;
/* 802.11 frame starts here */
struct wi_frame *fr= ( struct wi_frame * ) ( packet + rth->it_len );
/* Set up the iteration */
int ret = ieee80211_radiotap_iterator_init(&rti, rth, rth->it_len);
/* Loop until we've consumed all the fields */
while(!ret) {
printf("Itteration: %d\n", count++);
ret = ieee80211_radiotap_iterator_next(&rti);
if(ret) {
/*
* The problem is here, I'm dropping into this clause with
* a value of `1` consistently, that doesn't match my platform's
* definition of EINVAL or ENOENT.
*/
continue;
}
switch(rti.this_arg_index) {
default:
printf("Constant: %d\n", *rti.this_arg);
break;
}
}
There's limited scope for having screwed something up in that code, I think, I'm confused by the 1 being returned from ieee80211_radiotap_iterator_next() which according to the implenentation doesn't seem like an error condition in the implementation.
I'm filtering for packets of type "(type mgt) and (not type mgt subtype beacon)", and I'm not even certain if libpcap will include these attributes when the data link is set to DLT_IEEE802_11_RADIO?

First:
I'm filtering for packets of type "(type mgt) and (not type mgt subtype beacon)", and I'm not even certain if libpcap will include these attributes when the data link is set to DLT_IEEE802_11_RADIO?
It will. The filter code generated for DLT_IEEE802_11_RADIO fetches the radiotap header length and skips the radiotap header, so it'll just skip the radiotap header and check the 802.11 header following it.
Second:
the implementation
You linked to two different implementations of ieee80211_radiotap_iterator_next() - the one at radiotap-parser.c, in which ieee80211_radiotap_iterator_next() returns the "next present arg index" on success, and the one from the Linux kernel, in which ieee80211_radiotap_iterator_next() returns 0 on success. If the radiotap iterator code you're using is the one at radiotap-parser.c, pay no attention whatsoever to the one from the Linux kernel, as it doesn't behave the way the one you're using behaves.

Getting width and height from jpeg image file

I wrote this function to given filename(a jpeg file) shall print its size in pixels, w and h. According to tutorial that I'm reading,
//0xFFC0 is the "Start of frame" marker which contains the file size
//The structure of the 0xFFC0 block is quite simple [0xFFC0][ushort
length][uchar precision][ushort x][ushort y]
So, I wrote this struct
#pragma pack(1)
struct imagesize {
unsigned short len; /* 2-bytes */
unsigned char c; /* 1-byte */
unsigned short x; /* 2-bytes */
unsigned short y; /* 2-bytes */
}; //sizeof(struct imagesize) == 7
#pragma pack()
and then:
#define SOF 0xC0 /* start of frame */
void jpeg_test(const char *filename)
{
FILE *fh;
unsigned char buf[4];
unsigned char b;
fh = fopen(filename, "rb");
if(fh == NULL)
fprintf(stderr, "cannot open '%s' file\n", filename);
while(!feof(fh)) {
b = fgetc(fh);
if(b == SOF) {
struct imagesize img;
#if 1
ungetc(b, fh);
fread(&img, 1, sizeof(struct imagesize), fh);
#else
fread(buf, 1, sizeof(buf), fh);
int w = (buf[0] << 8) + buf[1];
int h = (buf[2] << 8) + buf[3];
img.x = w;
img.y = h;
#endif
printf("%dx%d\n",
img.x,
img.y);
break;
}
}
fclose(fh);
}
But I'm getting 520x537 instead of 700x537, that's the real size.
Can someone point and explain where I'm wrong?

A JPEG file consists of a number of sections. Each section starts with 0xff, followed by 1-byte section identifier, followed by number of data bytes in the section (in 2 bytes), followed by the data bytes. The sequence 0xffc0, or any other 0xff-- two-byte sequence, inside the data byte sequence, has no significance and does not mark a start of a section.
As an exception, the very first section does not contain any data or length.
You have to read each section header in turn, parse the length, then skip corresponding number of bytes before starting to read next section. You cannot just search for 0xffc0, let alone just 0xc0, without regard to the section structure.
Source.

There are several issues to consider, depending on how "universal" you want your program to be. First, I recommend using libjpeg. A good JPEG parser can be a bit gory, and this library does a lot of the heavy lifting for you.
Next, to clarify n.m.'s statement, you have no guarantee that the first 0xFFCO pair is the SOF of interest. I've found that modern digital cameras like to load up the JPEG header with a number of APP0 and APP1 blocks, which can mean that the first SOF marker you encounter during a sequential read may actually be the image thumbnail. This thumbnail is usually stored in JPEG format (as far as I have observed, anyway) and is thus equipped with its own SOF marker. Some cameras and/or image editing software can include an image preview that is larger than a thumbnail (but smaller than the actual image). This preview image is usually JPEG and again has it's own SOF marker. It's not unusual for the image SOF marker to be the last one.
Most (all?) modern digital cameras also encode the image attributes in the EXIF tags. Depending upon your application requirements, this might be the most straightforward, unambiguous way to obtain the image size. The EXIF standard document will tell you all you need to know about writing an EXIF parser. (libExif is available, but it never fit my applications.) Regardless, if you roll your own EXIF or rely on a library, there are some good tools for inspecting EXIF data. jhead is very good tool, and I've also had good luck with ExifTool.
Lastly, pay attention to endianess. SOF and other standard JPEG markers are big-endian, but EXIF markers may vary.

As you mention, the spec states that the marker is 0xFFC0. But it seems that you only ever look for a single byte with the code if (b==SOF)
If you open the file up with a hex editor, and search for 0xFFC0 you'll find the marker. Now as long as the first 0xC0 in the file is the marker, your code will work. If it's not though, you get all sorts of undefined behaviour.
I'd be inclined to read the whole file first. It's a jpg right, how big could it be? (thought this is important if on an embedded system) Then just step through it looking for the first char of my marker. When found, I'd use a memcmp to see if the next 3bytes mathed the rest of the sig.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight