fwrite'ing a struct results in mixed data being saved - c

I'm obtaining data from an accelerometer and trying to log it to a file. However, I'm a little perplexed by the output I'm getting. I'm only logging one sample to ensure the data is written correctly.
I've created the following struct to group data:
struct AccData
{
int16_t x;
int16_t y;
int16_t z;
unsigned int time;
};
The above should amount to 10 bytes in total.
I'm writing to stdout and getting the following data from the sensor:
I (15866) Accelerometer: Measurement: X25 Y252 Z48 Time: 10
The data that's stored to the sd card looks like so:
1900FC00300000000A00
Splitting those up gives us:
1900 FC00 3000 00000A00
This is where I'm starting to get confused. The first 3 sectors only make sense if I reverse the order of the bytes such that:
X
Y
Z
Time
1900 -> 0019 = 25
1900 -> 0019 = 25
3000 -> 0030 = 48
00000A00 -> 000A0000 = 655.360
First, this may be due to my limited C knowledge, but is it normal for the output to be swapped like above?
Additionally, I can't get the time to make sense at all. It almost looks like only 3 bytes are being allocated for the unsigned integer, which would give the correct result if you didn't reverse it.

Like #Someprogrammerdude pointed out in the comments, this had to do with endianess and the fact that my struct was being padded, resulting in the struct being 12 bits instead of 10.
Accounting for the padding the data now looks like so:
1900FC00 30000000 0A000000,
Reading above with little endian made it make sense.

Related

Parsing ID3V2 Frames in C

I have been attempting to retrieve ID3V2 Tag Frames by parsing through the mp3 file and retrieving each frame's size. So far I have had no luck.
I have effectively allocated memory to a buffer to aid in reading the file and have been successful in printing out the header version but am having difficulty in retrieving both the header and frame sizes. For the header framesize I get 1347687723, although viewing the file in a hex editor I see 05 2B 19.
Two snippets of my code:
typedef struct{ //typedef structure used to read tag information
char tagid[3]; //0-2 "ID3"
unsigned char tagversion; //3 $04
unsigned char tagsubversion;//4 00
unsigned char flags; //5-6 %abc0000
uint32_t size; //7-10 4 * %0xxxxxxx
}ID3TAG;
if(buff){
fseek(filename,0,SEEK_SET);
fread(&Tag, 1, sizeof(Tag),filename);
if(memcmp(Tag.tagid,"ID3", 3) == 0)
{
printf("ID3V2.%02x.%02x.%02x \nHeader Size:%lu\n",Tag.tagversion,
Tag.tagsubversion, Tag.flags ,Tag.size);
}
}
Due to memory alignment, the compiler has set 2 bytes of padding between flags and size. If your struct were putted directly in memory, size would be at address 6 (from the beginning of the struct). Since an element of 4 bytes size must be at an address multiple of 4, the compiler adds 2 bytes, so that size moves to the closest multiple of 4 address, which is here 8. So when you read from your file, size contains bytes 8-11. If you try to print *(&Tag.size - 2), you'll surely get the correct result.
To fix that, you can read fields one by one.
ID3v2 header structure is consistent across all ID3v2 versions (ID3v2.0, ID3v2.3 and ID3v2.4).
Its size is stored as a big-endian synch-safe int32
Synchsafe integers are
integers that keep its highest bit (bit 7) zeroed, making seven bits
out of eight available. Thus a 32 bit synchsafe integer can store 28
bits of information.
Example:
255 (%11111111) encoded as a 16 bit synchsafe integer is 383
(%00000001 01111111).
Source : http://id3.org/id3v2.4.0-structure § 6.2
Below is a straightforward, real-life C# implementation that you can easily adapt to C
public int DecodeSynchSafeInt32(byte[] bytes)
{
return
bytes[0] * 0x200000 + //2^21
bytes[1] * 0x4000 + //2^14
bytes[2] * 0x80 + //2^7
bytes[3];
}
=> Using values you read on your hex editor (00 05 EB 19), the actual tag size should be 112025 bytes.
By coincidence I am also working on an ID3V2 reader. The doc says that the size is encoded in four 7-bit bytes. So you need another step to convert the byte array into an integer... I don't think just reading those bytes as an int will work because of the null bit on top.

Why wrap a struct with a union?

I saw a code snippet from a good answer for Is it possible to insert three numbers into 2 bytes variable?
For example, I want to store date which contain days, months, years.
days -> 31, months -> 12, years -> 99.
I want to store 31, 12, 99 in one variable, and will use shift operators << and >> to manipulate it.
//Quoted: the C code from that answer
union mydate_struct {
struct {
uint16_t day : 5; // 0 - 31
uint16_t month : 4; // 0 - 12
uint16_t year : 7; // 0 - 127
};
uint16_t date_field;
};
Now, my question is to why wrap the struct with a union? What are the special benefits besides memory related concern?
PS: I know some typical usage to make sure memory size with union.
Because if it is just to use struct, it seems more direct and simple to use:
typedef struct {
uint16_t day : 5; // 0 - 31
uint16_t month : 4; // 0 - 12
uint16_t year : 7; // 0 - 127
} mydate_struct;
Update1:
Some conclusion about benefits to wrap a union here:
Can initailize the year, month and day simultaneously
The advantage of using the union is that give union my_datestruct u;
you can write u.date_field = 0x3456; and initialize the year, month
and day fields simultaneously. It is defined by the implementation
what that does, and different implementations could define it
differently. There's a modest chance that the year will be 0x56, the
month 0x08, and the day 0x06 (aka 86-08-06 — century not clearly
defined); there's also a modest chance that the year will be 0x1A, the
month 0x02, and the day 0x1A (aka 26-02-26 — century still not clearly
defined). People have forgotten Y2K already. ----comment of #Jonathan Leffler
You can read/write the whole number at once.(----comment of #StenSoft)
An union means that every part in it will use the same memory, so you can use the first or the second part (which can be completely different things). In your case, it´s either the whole struct or the uint16_t date_field.
In context of the linked question, the writer intended to use it to convert a struct with two byte size to a two byte integer and vice-versa. Assign something to the struct and read the int value from the same memory. But this is not allowed in C++ and may not work (multitude of reasons...). It´s not possible to arbitrarily switch between what part is used.
Union will share the memory among the members variables. So size of a union will be the size of the biggest element of its member variables. That is the reason struct wrapped within the union with variable uint16_t date_field;
So user can use 16 bits of memory for struct or variable date_field to keep the data.

Unexpected Union behaviour

The code below outputs different numbers each time ..
apples.num prints 2 which is correct, and apples.weight prints different numbers each time, it once even printed out "nan", and I don't know why is this happening ..
The really strange thing is that the double (apples.volume) prints out 2.0 ..
Can anybody explain things to me ?
#include <stdio.h>
typedef union {
short num;
float weight;
double volume;
} Count;
int main(int argc, char const *argv[])
{
Count apples;
apples.num = 2;
printf("Num: %d\nWeight: %f\nVolume: %f\n", apples.num, apples.weight, apples.volume);
return 0;
}
It seems to me you don't quite understand what a union is. The members of a union are overlapping values (in other words, the three members of a Count union share the same space).
Assuming, just for the sake of demonstration, a short is 16 bits (2 bytes), a float is 32 bits (4 bytes) and a double is 64 bits (8 bytes), then the union is 8 bytes in size. In little-endian format, the num member refers to the first 2 bytes, the weight member refers to the first 4 bytes (including the 2 bytes of num) and the volume member refers to the full 8 bytes (including the 2 bytes of num and the four bytes of weight).
Initially, your union contains garbage, i.e. some unknown bit pattern, let's display it like this (in hex):
GG GG GG GG GG GG GG GG // GG stands for garbage, i.e. unknown bit patterns
If you set num to 2, then the first two bytes are 0x02 0x00, but the other bytes are still garbage:
02 00 GG GG GG GG GG GG
If you read weight, you are simply reading the first four bytes, interpreted as a float, so the float contains the bytes
02 00 GG GG
Since floating point values have a totally different format as integral types like short, you can't predict what those bytes (i.e. that particular bit pattern) represent. They do not represent the floating point value 2.0f, which is what you probably want. Actually, the "more significant" part of a float is stored in the upper bytes, i.e. in the "garbage" part of weight, so it can be almost anything, including a NaN, +infinity, -infinity, etc.
Similarly, if you read volume, you have a double that consists of the bytes
02 00 GG GG GG GG GG GG
and that does not necessarily represent 2.0 either (although, by chance, it MAY come very close, if by coincidence the right bits are set at the right places, and if the low bits are rounded away when you display such a value).
Unions are not meant to do a proper conversion from int to float or double. They are merely meant to be able to store different kinds of values to the same type, and reading from another member as you set simply means you are reinterpreting a number of bits that are present in the union as something completely different. You are not converting.
So how do you convert? It is quite simple and does not require a union:
short num = 2;
float weight = num; // the compiler inserts code that performs a conversion to float
double volume = num; // the compiler inserts code that performs a conversion to double
If you access a union via the "wrong" member (i.e. a member other than the one it was assigned through), the result will depend on the semantics of the particular bit pattern for that type. Where the assigned type has a smaller bit-width that he accessed type, some of those bits will be undefined.
You are accessing uninitialized data. It will provide undefined behavior (ie: unknown values in this case). You also likely mean to use a struct instead of a union.
#include <stdio.h>
typedef union {
short num;
float weight;
double volume;
} Count;
int main(int argc, char const *argv[])
{
Count apples = { 0 };
apples.num = 2;
printf("Num: %d\nWeight: %f\nVolume: %f\n", apples.num, apples.weight, apples.volume);
return 0;
}
Initialize the union by either zeroing it out, or setting the largest member to a value. Even if you set the largest member, the other values might not even make sense. This is commonly used for creating a byte/word/nibble/long-word data type and making the individual bits accessible.

decoding data from old school measurement instrument

I am trying to recover raw data from an older measurement instrument, that is interfaced through a printer port.
For example, the instruments software will produce an text output file like this:
S 11/08/08 22:27:58 100 2 U 061
D ___^PR_^_^_]PP_]_^_]_^_____^_^_____^_[_\_\_[_Z_Z_X
D _W_U_T_Q^]^]^Z^V^S^T^S]]]Y]U]R]T]Q]V]Z]\]]^R^]_ZPX
D QSQYQ^RSRYSQSWS\S]SZSWSSSPR\RZRXRTQ^QWQPP[PUPRPQ_^
D _\_]_^_____\_\_Z_X_W_Y_X_X_Z_W_U_V_W_X_[_X_W_W_W
F 2
S 11/08/08 22:35:03 100 2 E 049
D QSQQP_P^QPQPQRQUQUQUQVQZQ[Q\Q]RSR\STSXSWSQR_SQSRR[
D RTQ_QWQUQWQUQZRSSQR]RTRSRQQZQRPZPVPTPTPSPWPTPQPQ_^
D _^_^__PPPPPP__PP__PR__PPPQ_____^_]_]PP_^_]_]_]_Y_^
D ___^_^_\_______^PP__PRPQPPPRPP__PPPP___]_^_^__PP
F 2
The "S" line is all good - provides the appropriate time the measurement
was taken along with some other values.
I'm interested in recovering whatever is hidden in
the "D" lines. The software generates a plot using this data, but
does not provide the raw data.
The only code I have detailing the data encoding contains the comment:
/* Packs the 8-bit data into two 7-bit ASCII chars, encoding the channel
* number into it as well, in the format:
*
* 1CCMMMM and 1CCLLLL, where CC = chn, MMMM/LLLL = Most/Least sig nibble
*/
I can send the actually packing code too if it helps - just trying to keep the
question as small as possible.
Any help - even a point in the right direction would be appreciated...
The encoding is actually pretty clever*: every combination of two letters (2*8 bits or 2*7 bits, depending how you look at it) is a single measurement. The comment tells us how the encoding works. For example, if we take 'QS' as an example:
Pattern: 01CCMMMM 01CCLLLL
Example: 01010001 01010011 = Q S
Channel: ..CC.... ..CC....
..01.... ..01.... = Channel 1
Data: ....0001 ....0011 = 10011 = 19
You simply have to take the bits labeled M and the bits labeled L, put them after each other, treat the whole thing as a single-byte number and you've got the original data. Conversely, extract the bits labeled C to get the channel number.
Here's an example of how you could parse a single measurement, assuming two bytes of input are in a and b:
/* To get the channel, mask with 00110000 = 0x30 then shift */
char channel = (a & 0x30) >> 4;
/* To get data, mask both with 00001111 = 0xF then combine */
char orgdata = ((a & 0xF) << 4) | (b & 0xF);
Putting all that together here gives the following data for the first 'frame' in your example, all on channel 1:
I'm hoping that matches what you're seeing on your plot :)
*: I'm not being sarcastic, either - this encoding packs 10 bits of useful data into 14 bits of usable space, while being a good deal simpler than something like base64 and probably faster.

C programming: words from byte array

I have some confusion regarding reading a word from a byte array. The background context is that I'm working on a MIPS simulator written in C for an intro computer architecture class, but while debugging my code I ran into a surprising result that I simply don't understand from a C programming standpoint.
I have a byte array called mem defined as follows:
uint8_t *mem;
//...
mem = calloc(MEM_SIZE, sizeof(uint8_t)); // MEM_SIZE is pre defined as 1024x1024
During some of my testing I manually stored a uint32_t value into four of the blocks of memory at an address called mipsaddr, one byte at a time, as follows:
for(int i = 3; i >=0; i--) {
*(mem+mipsaddr+i) = value;
value = value >> 8;
// in my test, value = 0x1084
}
Finally, I tested trying to read a word from the array in one of two ways. In the first way, I basically tried to read the entire word into a variable at once:
uint32_t foo = *(uint32_t*)(mem+mipsaddr);
printf("foo = 0x%08x\n", foo);
In the second way, I read each byte from each cell manually, and then added them together with bit shifts:
uint8_t test0 = mem[mipsaddr];
uint8_t test1 = mem[mipsaddr+1];
uint8_t test2 = mem[mipsaddr+2];
uint8_t test3 = mem[mipsaddr+3];
uint32_t test4 = (mem[mipsaddr]<<24) + (mem[mipsaddr+1]<<16) +
(mem[mipsaddr+2]<<8) + mem[mipsaddr+3];
printf("test4= 0x%08x\n", test4);
The output of the code above came out as this:
foo= 0x84100000
test4= 0x00001084
The value of test4 is exactly as I expect it to be, but foo seems to have reversed the order of the bytes. Why would this be the case? In the case of foo, I expected the uint32_t* pointer to point to mem[mipsaddr], and since it's 32-bits long, it would just read in all 32 bits in the order they exist in the array (which would be 00001084). Clearly, my understanding isn't correct.
I'm new here, and I did search for the answer to this question but couldn't find it. If it's already been posted, I apologize! But if not, I hope someone can enlighten me here.
It is (among others) explained here: http://en.wikipedia.org/wiki/Endianness
When storing data larger than one byte into memory, it depends on the architecture (means, the CPU) in which order the bytes are stored. Either, the most significant byte is stored first and the least significant byte last, or vice versa. When you read back the individual bytes through byte access operations, and then merge them to form the original value again, you need to consider the endianess of your particular system.
In your for-loop, you are storing your value byte-wise, starting with the most significant byte (counting down the index is a bit misleading ;-). Your memory looks like this afterwards: 0x00 0x00 0x10 0x84.
You are then reading the word back with a single 32 bit (four byte) access. Depending on our architecture, this will either become 0x00001084 (big endian) or 0x84100000 (little endian). Since you get the latter, you are working on a little endian system.
In your second approach, you are using the same order in which you stored the individual bytes (most significant first), so you get back the same value which you stored earlier.
It seems to be a problem of endianness, maybe comes from casting (uint8_t *) to (uint32_t *)

Resources