Weird error when using fread with struct - c

So, I was trying to read a BMP image using fread and structs. I create the following struct to read the header
struct head{
char sigBM[2];//This will get the 'B' and 'M' chars
int fileSize;
int reserved;
int offset
...
};
And in the main function I used
fread(pointerToStruct,sizeof(struct head),1,image);
And I just got some weird results. But then I decided to take the char sigBM[2] from the struct and read it with a different fread. Something like:
char sigBM[2];
struct head *p = malloc(sizeof(struct head));/*
Without the char sigBM[2]
*/
fread(sigBM,sizeof(char),2,image);
fread(p,sizeof(struct head),1,image);
And it worked!
I already got it working, I just want to know why it worked like that

Your data seem to be written to disk without padding. That is; that integer fileSize comes directly after the two chars. This is normally not how structs are kept in memory.
the size of
struct head{
char sigBM[2];//This will get the 'B' and 'M' chars
// two padding bytes hide here
int fileSize;
}
is 8 on my machine. not 2+4 as you may expect. If you read/write with the same compiler options on the same platform you can expect the struct to be read in correctly. If not, you need to be in control of details like these.
Most architectures require (or prefer) that numeric types start at specific multipliers of two [for example the size of the type itself].

In a 16-bit int, 32-bit long world with
struct head{
char sigBM[2];
long fileSize;
long reserved;
long offset
...
};
all works well. But try to read with a struct that has padding (extra space between fields) causes OP's issue.
A solution is to read each field, one at a time - like OP's approach. Another approach is to use a compiler specific option or keyword to "pack" the structure.
In general, given C variant int size, better to use think of these fields as int8_t, int16_t, int32_t rather than char, int, etc.

the OP has workable idea. It does need some extension.
Of some note is that several fields within the .bmp file
have non-fixed offsets from the start of the file.
so a single struct will not properly handle the whole file
Of critical interest is the number of 'special' color entries
as that changes the offsets for all the rest of the file.
the actual image, depending on the pixel width and the image width
can have from 0 to 3 bytes of filler for each pixel line of the image
certain fields are optional in the second section of the file.
and some fields within the first section of the file contain
offsets into the file for other key areas.
in general, it is best to treat the file as a long string of bytes
and use offsets (and field length) to access specific fields
this has been demonstrated, with full C code,
on several answers in stackoverflow.com
suggest performing a search.

Related

How to create a C struct with specific size to send over socket to DalmatinerDB?

I'm trying to create a C client for dalmatinerdb but having trouble to understand how to combine the variables, write it to a buffer and send it to the database. The fact that dalmatinerdb is written in Erlang makes it more difficult. However, by looking at a python client for dalmatinerdb i have (probably) found the necessary variable sizes and order.
The erlang client has a function called "encode", see below:
encode({stream, Bucket, Delay}) when
is_binary(Bucket), byte_size(Bucket) > 0,
is_integer(Delay), Delay > 0, Delay < 256->
<<?STREAM,
Delay:?DELAY_SIZE/?SIZE_TYPE,
(byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>;
According to the official dalmatinerdb protocol we can see the following:
-define(STREAM, 4).
-define(DELAY_SIZE, 8). /bits
-define(BUCKET_SS, 8). /bits
Let's say i would like to create this kind of structure in C,
would it look something like the following:
struct package {
unsigned char[1] mode; // = "4"
unsigned char[1] delay; // = for example "5"
unsigned char[1] bucketNameSize; // = "5"
unsigned char[1] bucketName; // for example "Test1"
};
Update:
I realized that the dalmatinerdb frontend (web interface) only reacts and updates when values have been sent to the bucket. With other words just sending the first struct won't give me any clue if it's right or wrong. Therefore I will try to create a secondary struct with the actual values.
The erland code snippet which encodes values looks like this:
encode({stream, Metric, Time, Points}) when
is_binary(Metric), byte_size(Metric) > 0,
is_binary(Points), byte_size(Points) rem ?DATA_SIZE == 0,
is_integer(Time), Time >= 0->
<<?SENTRY,
Time:?TIME_SIZE/?SIZE_TYPE,
(byte_size(Metric)):?METRIC_SS/?SIZE_TYPE, Metric/binary,
(byte_size(Points)):?DATA_SS/?SIZE_TYPE, Points/binary>>;
The different sizes:
-define(SENTRY, 5)
-define(TIME_SIZE, 64)
-define(METRIC_SS, 16)
-define(DATA_SS, 32)
Which gives me this gives me:
<<?5,
Time:?64/?SIZE_TYPE,
(byte_size(Metric)):?16/?SIZE_TYPE, Metric/binary,
(byte_size(Points)):?32/?SIZE_TYPE, Points/binary>>;
My guess is that my struct containing a value should look like this:
struct Package {
unsigned char sentry;
uint64_t time;
unsigned char metricSize;
uint16_t metric;
unsigned char pointSize;
uint32_t point;
};
Any comments on this structure?
The binary created by the encode function has this form:
<<?STREAM, Delay:?DELAY_SIZE/?SIZE_TYPE,
(byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>
First let's replace all the preprocessor macros with their actual values:
<<4, Delay:8/unsigned-integer,
(byte_size(Bucket):8/unsigned-integer, Bucket/binary>>
Now we can more easily see that this binary contains:
a byte of value 4
the value of Delay as a byte
the size of the Bucket binary as a byte
the value of the Bucket binary
Because of the Bucket binary at the end, the overall binary is variable-sized.
A C99 struct that resembles this value can be defined as follows:
struct EncodedStream {
unsigned char mode;
unsigned char delay;
unsigned char bucket_size;
unsigned char bucket[];
};
This approach uses a C99 flexible array member for the bucket field, since its actual size depends on the value set in the bucket_size field, and you are presumably using this structure by allocating memory large enough to hold the fixed-size fields together with the variable-sized bucket field, where bucket itself is allocated to hold bucket_size bytes. You could also replace all uses of unsigned char with uint8_t if you #include <stdint.h>. In traditional C, bucket would be defined as a 0- or 1-sized array.
Update: the OP extended the question with another struct, so I've extended my answer below to cover it too.
The obvious-but-wrong way to write a struct corresponding to the metric/time/points binary is:
struct Wrong {
unsigned char sentry;
uint64_t time;
uint16_t metric_size;
unsigned char metric[];
uint32_t points_size;
unsigned char points[];
};
There are two problems with the Wrong struct:
Padding and alignment: Normally, fields are aligned on natural boundaries corresponding to their sizes. Here, the C compiler will align the time field on an 8-byte boundary, which means there will be padding of 7 bytes following the sentry field. But the Erlang binary contains no such padding.
Illegal flexible array field in the middle: The metric field size can vary, but we can't use the flexible array approach for it as we did in the earlier example because such arrays can only be used for the final field of a struct. The fact that the size of metric can vary means that it's impossible to write a single C struct that matches the Erlang binary.
Solving the padding and alignment issue requires using a packed struct, which you can achieve with compiler support such as the gcc and clang __packed__ attribute (other compilers might have other ways of achieving this). The variable-sized metric field in the middle of the struct can be solved by using two structs instead:
typedef struct __attribute((__packed__)) {
unsigned char sentry;
uint64_t time;
uint16_t size;
unsigned char metric[];
} Metric;
typedef struct __attribute((__packed__)) {
uint32_t size;
unsigned char points[];
} Points;
Packing both structs means their layouts will match the layouts of the corresponding data in the Erlang binary.
There's still a remaining problem, though: endianness. By default, fields in an Erlang binary are big-endian. If you happen to be running your C code on a big-endian machine, then things will just work, but if not — and it's likely you're not — the data values your C code reads and writes won't match Erlang.
Fortunately, endianness is easily handled: you can use byte swapping to write C code that can portably read and write big-endian data regardless of the endianness of the host.
To use the two structs together, you'd first have to allocate enough memory to hold both structs and both the metric and the points variable-length fields. Cast the pointer to the allocated memory — let's call it p — to a Metric*, then use the Metric pointer to store appropriate values in the struct fields. Just make sure you convert the time and size values to big-endian as you store them. You can then calculate a pointer to where the Points struct is in the allocated memory as shown below, assuming p is a pointer to char or unsigned char:
Points* points = (Points*)(p + sizeof(Metric) + <length of Metric.metric>);
Note that you can't just use the size field of your Metric instance for the final addend here since you stored its value as big-endian. Then, once you fill in the fields of the Points struct, again being sure to store the size value as big-endian, you can send p over to Erlang, where it should match what the Erlang system expects.

Data Structure inside a Union (C Programming)

Anytime structures are thrown inside other structures I just get confused for some reason. I'm writing a driver for a I2C (2-wire Serial Interface) device and I'm using the manufacturers drivers as a reference for creating mine. I have this union statement below (which is defined in a header file) and I just can't understand a few lines inside it. Just a brief background so you know what you're looking at is the main snippet below is setting up this TWI_statusReg variable which holds the information from a status register every time i'm transmitting/receiving data across the I2c bus. This data register is 8 bits long and belongs to a Atmel Atmega328P microcontroller. Here are my questions...
1.) Its hard to formulate this question in words but can you explain in easy terms of why you would declare a data struct inside a union struct like this? What key points should I pick out from this?
2.) In the ".c" header definition file which is too long to post here, there is a single line that says the following
TWI_statusReg.all = 0;
I know there is a char variable in the header file called 'all' as seen in the main snippet of code below. However, I'm not understanding what happens when it gets assigned a zero. Is this setting all the bits in the status register to zero?
3.) The two lines
unsigned char lastTransOK:1;
unsigned char unusedBits:7;
are confusing to me specifically what the colon operator is doing.
The main snippet of CODE
/****************************************************************************
Global definitions
****************************************************************************/
union TWI_statusReg // Status byte holding flags.
{
unsigned char all;
struct
{
unsigned char lastTransOK:1;
unsigned char unusedBits:7;
};
};
extern union TWI_statusReg TWI_statusReg;
1) The main reason for writing such a union is convenience. Instead of doing manually bit masks every time you need to access specific bits, you now have aliases for those bits.
2) Unions let you refer to memory as if its components were different variables representing different types. Unions only allocate space for the biggest component inside them. So if you have
union Example {
char bytes[3];
uint32_t num;
};
such a union would take 4 bytes, since its biggest type uint32_t takes 4 bytes of space. It would probably make more sense to have a union like this though, since you're using that space anyway and it's more convenient:
union Example {
char bytes[4];
uint32_t num;
};
bytes array will let you access individual bytes of num.
Your guess is correct - writing value to all will set the corresponding bits of the union.
3) This construct is called a bit field, and is an optimization of memory usage - if you were to use a struct of 2 chars it would actually take 2 bytes of memory space, instead if you declare a bit field it will only take 1 byte (and you still have 6 more "unused" bits)

Packing a C Struct [duplicate]

I am porting an application to an ARM platform in C, the application also runs on an x86 processor, and must be backward compatible.
I am now having some issues with variable alignment. I have read the gcc manual for
__attribute__((aligned(4),packed)) I interpret what is being said as the start of the struct is aligned to the 4 byte boundry and the inside remains untouched because of the packed statement.
originally I had this but occasionally it gets placed unaligned with the 4 byte boundary.
typedef struct
{
unsigned int code;
unsigned int length;
unsigned int seq;
unsigned int request;
unsigned char nonce[16];
unsigned short crc;
} __attribute__((packed)) CHALLENGE;
so I change it to this.
typedef struct
{
unsigned int code;
unsigned int length;
unsigned int seq;
unsigned int request;
unsigned char nonce[16];
unsigned short crc;
} __attribute__((aligned(4),packed)) CHALLENGE;
The understand I stated earlier seems to be incorrect as both the struct is now aligned to a 4 byte boundary, and and the inside data is now aligned to a four byte boundary, but because of the endianess, the size of the struct has increased in size from 42 to 44 bytes. This size is critical as we have other applications that depend on the struct being 42 bytes.
Could some describe to me how to perform the operation that I require. Any help is much appreciated.
If you're depending on sizeof(yourstruct) being 42 bytes, you're about to be bitten by a world of non-portable assumptions. You haven't said what this is for, but it seems likely that the endianness of the struct contents matters as well, so you may also have a mismatch with the x86 there too.
In this situation I think the only sure-fire way to cope is to use unsigned char[42] in the parts where it matters. Start by writing a precise specification of exactly what fields are where in this 42-byte block, and what endian, then use that definition to write some code to translate between that and a struct you can interact with. The code will likely be either all-at-once serialisation code (aka marshalling), or a bunch of getters and setters.
This is one reason why reading whole structs instead of memberwise fails, and should be avoided.
In this case, packing plus aligning at 4 means there will be two bytes of padding. This happens because the size must be compatible for storing the type in an array with all items still aligned at 4.
I imagine you have something like:
read(fd, &obj, sizeof obj)
Because you don't want to read those 2 padding bytes which belong to different data, you have to specify the size explicitly:
read(fd, &obj, 42)
Which you can keep maintainable:
typedef struct {
//...
enum { read_size = 42 };
} __attribute__((aligned(4),packed)) CHALLENGE;
// ...
read(fd, &obj, obj.read_size)
Or, if you can't use some features of C++ in your C:
typedef struct {
//...
} __attribute__((aligned(4),packed)) CHALLENGE;
enum { CHALLENGE_read_size = 42 };
// ...
read(fd, &obj, CHALLENGE_read_size)
At the next refactoring opportunity, I would strongly suggest you start reading each member individually, which can easily be encapsulated within a function.
I've been moving structures back and forth from Linux, Windows, Mac, C, Swift, Assembly, etc.
The problem is NOT that it can't be done, the problem is that you can't be lazy and must understand your tools.
I don't see why you can't use:
typedef struct
{
unsigned int code;
unsigned int length;
unsigned int seq;
unsigned int request;
unsigned char nonce[16];
unsigned short crc;
} __attribute__((packed)) CHALLENGE;
You can use it and it doesn't require any special or clever code. I write a LOT of code that communicates to ARM. Structures are what make things work. __attribute__ ((packed)) is my friend.
The odds of being in a "world of hurt" are nil if you understand what is going on with both.
Finally, I can't for the life make out how you get 42 or 44. Int is either 4 or 8 bytes (depending on the compiler). That puts the number at either 16+16+2=34 or 32+16+2=50 -- assuming it is truly packed.
As I say, knowing your tools is part of your problem.
What is your true goal?
If it's to deal with data that's in a file or on the wire in a particular format what you should do is write up some marshaling/serialization routines that move the data between the compiler struct that represents how you want to deal with the data inside the program and a char array that deals with how the data looks on the wire/file.
Then all that needs to be dealt with carefully and possibly have platform specific code is the marshaling routines. And you can write some nice-n-nasty unit tests to ensure that the marshaled data gets to and from the struct properly no matter what platform you might have to port to today and in the future.
I would guess that the problem is that 42 isn't divisible by 4, and so they get out of alignment if you put several of these structs back to back (e.g. allocate memory for several of them, determining the size with sizeof). Having the size as 44 forces the alignment in these cases as you requested. However, if the internal offset of each struct member remains the same, you can treat the 44 byte struct as though it was 42 bytes (as long as you take care to align any following data at the correct boundary).
One trick to try might be putting both of these structs inside a single union type and only use 42-byte version from within each such union.
As I am using linux, I have found that by echo 3 > /proc/cpu/alignment it will issue me with a warning, and fix the alignment issue. This is a work around but it is very helpful with locating where the structures are failing to be misaligned.

Copying structure to char* buffer

Basicly i have a custom structure that contains different kind of data. For example:
typedef struct example_structure{
uint8_t* example_1[4];
int example_2[4];
int example_3;
} example_structure;
What i need to do is to copy context of this structure to a const char* buffer so i can send that copied data (buffer) using winsock2's send(SOCKET s, const char* buffer, int len, int flags) function. I tried using memcpy() but wouldn't i just copy address of pointers and not the data?
Yes, if you copied or sent that structure through a socket you would end up copying/sending pointers, which would obviously be meaningless to the recipient, however, if the recipient is running on different hardware (e.g. not the same endian), all of the data may be meaningless anyway. On top of that, differences in the amount of padding between structure members may also become a problem.
For non-trivial situations it is best to use an existing protocol (such as protobuf), or roll your own protocol, keeping in mind the potential differences in hardware representation of your data.
You need to design a protocol before you can encode the data in accord with that protocol. Decide exactly how the data will be encoded at the byte level. Then write code to encode and decode to that format that you decided on.
Do not skip the step of actually documenting the wire protocol at the byte level. It will save you pain later, I promise.
See this answer for a bit more detail.
const char* buffer
This buffer has a constant value so u cant copy anything to it. You probably don't need to copy anything. Just use send function in such a way:
send(s, (char*)&example_structure, sizeof(structure), flags)
But here is the problem with pointers in your structure (uint8_t* example_1[4];).
Sending pointers between different applications / machine does not make sense.
Hmm, your struct contains uint8_t * fields, what looks like C strings... It does not make sense copying or sending a pointer which is just a mere memory address in sending process user space.
If your struct has been (note, no pointers):
typedef struct example_structure{
uint8_t example_1[4];
int example_2[4];
int example_3;
} example_structure;
and provided you transfer it on exactly same architecture (same hardware, same compiler, same compiler options), you could do simply:
example_structure ex_struc;
// initialize the struct
...
send(s, &ex_struc, sizeof(ex_struc), flags);
And even in that case, I would strongly advise you to define and use a protocol - as already said by #DavidSchwartz, it could save you time and headaches later...
But as you have pointers, you cannot do that and must define a protocol.
it could be (but you are free to prefere little endian order, or 2 or 8 bytes for each int depending on your actual data):
one byte (or two) for length of first uint8_t array, followed by the array
above repeated 3 more times
four bytes in big endian order for first int of example_2
repeated 3 times
four bytes in big endian order for int of example_3
This clearly defines the format of a message.

Why does such a struct contain two array fields containing only one element?

Please Note: This question is not a duplicate of ( One element array in struct )
The following code is excerpted from the Linux kernel source (version: 3.14)
struct files_struct
{
atomic_t count;
struct fdtable __rcu *fdt;
struct fdtable fdtab;
spinlock_t file_lock ____cacheline_aligned_in_smp;
int next_fd;
unsigned long close_on_exec_init[1];
unsigned long open_fds_init[1];
struct file __rcu * fd_array[NR_OPEN_DEFAULT];
};
I just wonder why close_on_exec_init and open_fds_init are defined as arrays containing one element, rather than just defined as unsigned long close_on_exec_init; and unsigned long open_fds_init;.
These fields are an optimization so Linux doesn't have to perform as many allocations for a typical process that has no more than BITS_PER_LONG open file descriptors.
The close_on_exec_init field provides the initial storage for fdt->close_on_exec when a files_struct is allocated. (See dup_fd in fs/file.c.)
Each bit of fdt->close_on_exec is set if the corresponding file descriptor has the “close-on-exec” flag set. Thus Linux only needs to allocate additional space for fdt->close_on_exec if the process has more open file descriptors than the number of bits in an unsigned long.
The open_fds_init field serves the same function for the fdt->open_fds field. The fd_array field serves the same function for the fdt->fd field. (Note that fd_array has a size of BITS_PER_LONG.)
The close_on_exec_init and open_fds_init fields formerly had type struct embedded_fd_set, but were changed to bare arrays in this commit. The commit message doesn't explain why the author chose to use one-element arrays instead of bare scalars. Perhaps the author (David Howells) simply wanted to avoid using the & operator.
My best guess: The addresses of these fields are used much more often than their actual values. In this case, making them size-1 arrays saves typing & every time their address is needed, since in C using the name of an array in an expression is in nearly all cases exactly equivalent to taking the address of its first element:
int x;
int y[1];
function_that_needs_address_of_int(&x);
function_that_needs_address_of_int(y);
function_that_needs_address_of_int(&y[0]); // Identical to previous line
(As others have pointed out in the comments, it can't be that the fields are being used as a hack for variable-length arrays, since there is more than one and they don't appear at the end of the struct.)
[EDIT: As pointed out by user3477950, an array name is not always identical to the address of its first element -- in certain contexts, like the argument to sizeof, they mean different things. (That's the only context I can think of for C; in C++, passing an array name as an argument can also enable a template parameter's type to be inferred to be a reference type.)]

Resources