I have a C source code for a microcontroller and I would show you the first part of the header:
#define USART_RX_BUFFER_SIZE 256
#define USART_TX_BUFFER_SIZE 256
#define USART_RX_BUFFER_MASK (USART_RX_BUFFER_SIZE - 1)
#define USART_TX_BUFFER_MASK (USART_TX_BUFFER_SIZE - 1)
#if (USART_RX_BUFFER_SIZE & USART_RX_BUFFER_MASK)
#error RX buffer size is not a power of 2
#endif
#if (USART_TX_BUFFER_SIZE & USART_TX_BUFFER_MASK)
#error TX buffer size is not a power of 2
#endif
typedef struct USART_Buffer {
volatile uint8_t RX[USART_RX_BUFFER_SIZE];
volatile uint8_t TX[USART_TX_BUFFER_SIZE];
volatile uint16_t RX_Head;
volatile uint16_t RX_Tail;
volatile uint16_t TX_Head;
volatile uint16_t TX_Tail;
} USART_Buffer_t;
typedef struct Usart_and_buffer {
USART_t *usart;
USART_DREINTLVL_t dreIntLevel;
USART_TXCINTLVL_t txcIntLevel;
USART_Buffer_t buffer;
PORT_t *rs485_Port;
uint8_t rs485_Pin;
bool rs485;
} USART_data_t;
uint8_t USART_RXBuffer_GetByte(USART_data_t *usart_data);
bool USART_RXComplete(USART_data_t *usart_data);
void USART_TransmitComplete(USART_data_t *usart_data);
...
There are several other functions like this.
Their implementation often use both USART_data_t and USART_Buffer_t, example:
bool USART_TXBuffer_PutByte(USART_data_t *usart_data, uint8_t data) {
uint8_t tempCTRLA;
uint16_t tempTX_Head;
bool TXBuffer_FreeSpace;
USART_Buffer_t * TXbufPtr;
if (usart_data->rs485) usart_data->rs485_Port->OUTSET = usart_data->rs485_Pin;
TXbufPtr = &usart_data->buffer;
...
In my actual applications I need to declare a lot of USART_data_t structs but only a couple of them require such a big buffer (256 bytes). Most will work with very small ones, like 64 or even 32 bytes.
I easily run out of memory and all that space is actually unused.
My goal is find a way to save space.
The dirty way is to clone the whole file, rename all the variables/functions in order to have two different versions, say one with the buffers of 256 bytes and another with the buffers of 64 bytes.
But then I also have to change every call in my code, and I would avoid that.
This library is very fast because it works on circular buffers with a size of power of 2 and using the defines above it takes only few clock cycles to do the required math. So I really don't want to rewrite the whole library to use a dynamic allocation of the memory.
Any other idea?
Example of use:
USART_data_t RS232B_USART_data;
USART_data_t RS232A_USART_data;
#define RS232B_BUFFER_SIZE 4
char RS232B_RxBuffer[RS232B_BUFFER_SIZE];
char RS232B_TxBuffer[RS232B_BUFFER_SIZE];
#define RS232A_BUFFER_SIZE 64
char RS232A_RxBuffer[RS232A_BUFFER_SIZE];
char RS232A_TxBuffer[RS232A_BUFFER_SIZE];
ISR(USARTC1_RXC_vect) { USART_RXComplete(&RS232B_USART_data); }
ISR(USARTC1_DRE_vect) { USART_DataRegEmpty(&RS232B_USART_data); }
ISR(USARTC1_TXC_vect) { USART_TransmitComplete(&RS232B_USART_data); }
ISR(USARTD0_RXC_vect) { USART_RXComplete(&RS232A_USART_data); }
ISR(USARTD0_DRE_vect) { USART_DataRegEmpty(&RS232A_USART_data); }
ISR(USARTD0_TXC_vect) { USART_TransmitComplete(&RS232A_USART_data); }
// ...
USART_InterruptDriver_Initialize(&RS232B_USART_data, &RS232B_USART, USART_DREINTLVL_LO_gc, USART_TXCINTLVL_LO_gc, false);
USART_Format_Set(RS232B_USART_data.usart, USART_CHSIZE_8BIT_gc, USART_PMODE_DISABLED_gc, false);
USART_RxdInterruptLevel_Set(RS232B_USART_data.usart, USART_RXCINTLVL_HI_gc);
USART_Baudrate_Set(&RS232B_USART, 2094, -7);
USART_Rx_Enable(RS232B_USART_data.usart);
USART_Tx_Enable(RS232B_USART_data.usart);
USART_InterruptDriver_Initialize(&RS232A_USART_data, &RS232A_USART, USART_DREINTLVL_LO_gc, USART_TXCINTLVL_LO_gc, false);
USART_Format_Set(RS232A_USART_data.usart, USART_CHSIZE_8BIT_gc, USART_PMODE_DISABLED_gc, false);
USART_RxdInterruptLevel_Set(RS232A_USART_data.usart, USART_RXCINTLVL_HI_gc);
USART_Baudrate_Set(&RS232A_USART, 2094, -7);
USART_Rx_Enable(RS232A_USART_data.usart);
USART_Tx_Enable(RS232A_USART_data.usart);
You could potentially rewrite USART_Buffer_t to contain just a pointer to the rx/tx buffers and add two additional variables that are set to the size of the "attached" buffers.
This is all just written down, so expect typos etc., but I hope you get the idea.:
typedef struct USART_Buffer {
volatile uint8_t* pRX;
volatile uint8_t* pTX;
volatile uint16_t RX_Size;
volatile uint16_t TX_Size;
volatile uint16_t RX_Head;
volatile uint16_t RX_Tail;
volatile uint16_t TX_Head;
volatile uint16_t TX_Tail;
} USART_Buffer_t;
Then you write a helper function like
USART_InitBuffers(USART_data_t data, uint8_t* pTxBuffer, uint16_t sizeTxBuffer, uint8_t* pRxBuffer, uint16_t sizeRxBuffer) {
data.pRX = pTxBuffer;
// ... other assignments
}
This way, you can specify your arrays of different size for each USART:
uint8_t TxData1[100]
uint8_t RxData1[10]
uint8_t TxData2[255]
uint8_t RxData2[50]
USART_Buffer_t data1;
USART_Buffer_t data2;
main() {
USART_InitBuffers(&data1, TxData1, sizeof(TxData1), RxData1, sizeof(RxData1));
USART_InitBuffers(&data2, TxData2, sizeof(TxData2), RxData2, sizeof(RxData2));
}
In the end, you have to adjust the library functions, to make use of RX_Size and TX_Size instead of using USART_RX_BUFFER_SIZE and USART_TX_BUFFER_SIZE.
Related
I was trying to build a MPSC lock-free ring buffer for learning purpose, and am running into race conditions.
A description of the MPSC ring buffer:
It is guaranteed that poll() is never called when the buffer is empty.
Instead of mod'ing head and tail like a traditional ring buffer, it lets them proceed linearly, and AND's them before using them (since the buffer size is a power of 2, this works ok with overflow).
We keep MAX_PRODUCERS-1 slots open in the queue so that if multiple producers come and see one slot is available and proceed, they can all place their entries.
It uses 32-bit quantities for head and tail, so that it can snapshot them with a 64-bit atomic read without a lock.
My test involves a couple of threads writing some known set of values to the queue, and a consumer thread polling (when the buffer is not empty) and summing all, and verifying the correct result is obtained. With 2 or more producers, I get inconsistent sums (and with 1 producer, it works).
Any help would be much appreciated. Thank you!
Here is the code:
struct ring_buf_entry {
uint32_t seqn;
};
struct __attribute__((packed, aligned(8))) ring_buf {
union {
struct {
volatile uint32_t tail;
volatile uint32_t head;
};
volatile uint64_t snapshot;
};
volatile struct ring_buf_entry buf[RING_BUF_SIZE];
};
#define RING_SUB(x,y) ((x)>=(y)?((x)-(y)):((x)+(1ULL<<32)-(y)))
static void ring_buf_push(struct ring_buf* rb, uint32_t seqn)
{
size_t pos;
while (1) {
// rely on aligned, packed, and no member-reordering properties
uint64_t snapshot = __atomic_load_n(&(rb->snapshot), __ATOMIC_SEQ_CST);
// little endian.
uint64_t snap_head = snapshot >> 32;
uint64_t snap_tail = snapshot & 0xffffffffULL;
if (RING_SUB(snap_tail, snap_head) < RING_BUF_SIZE - MAX_PRODUCERS + 1) {
uint32_t exp = snap_tail;
if (__atomic_compare_exchange_n(&(rb->tail), &exp, snap_tail+1, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)) {
pos = snap_tail;
break;
}
}
asm volatile("pause\n": : :"memory");
}
pos &= RING_BUF_SIZE-1;
rb->buf[pos].seqn = seqn;
asm volatile("sfence\n": : :"memory");
}
static struct ring_buf_entry ring_buf_poll(struct ring_buf* rb)
{
struct ring_buf_entry ret = rb->buf[__atomic_load_n(&(rb->head), __ATOMIC_SEQ_CST) & (RING_BUF_SIZE-1)];
__atomic_add_fetch(&(rb->head), 1, __ATOMIC_SEQ_CST);
return ret;
}
I have an array of structs that I need to initialize at compile-time (no memset) to 0xFF. This array will be written as part of the program over erased flash. By setting it to 0xFF, it will remain erased after programming, and the app can use it as persistent storage. I've found two ways to do it, one ugly and one a workaround. I'm wondering if there's another way with syntax I haven't found yet. The ugly way is to use a nested initializer setting every field of the struct. However, it's error prone and a little ugly. My workaround is to allocate the struct as an array of bytes and then use a struct-typed pointer to access the data. Linear arrays of bytes are much easier to initialize to a non-zero value.
To aid anyone else doing the same thing, I'm including the gcc attributes used and the linker script portion.
Example struct:
struct BlData_t {
uint8_t version[3];
uint8_t reserved;
uint8_t markers[128];
struct AppData_t {
uint8_t version[3];
uint8_t reserved;
uint32_t crc;
} appInfo[512] __attribute__(( packed ));
} __attribute__(( packed ));
Initialize to 0xFF using the best way I know:
// Allocate the space as an array of bytes
// because it's a simpler syntax to
// initialize to 0xFF.
__attribute__(( section(".bootloader_data") ))
uint8_t bootloaderDataArray[sizeof(struct BlData_t)] = {
[0 ... sizeof(struct BlData_t) - 1] = 0xFF
};
// Use a correctly typed pointer set to the
// array of bytes for actual usage
struct BlData_t *bootloaderData = (struct BlData_t *)&bootloaderDataArray;
No initialization necessary because of (NOLOAD):
__attribute__(( section(".bootloader_data") ))
volatile const struct BLData_t bootloader_data;
Addition to linker script:
.bootloader_data (NOLOAD):
{
FILL(0xFF); /* Doesn't matter because (NOLOAD) */
. = ALIGN(512); /* start on a 512B page boundary */
__bootloader_data_start = .;
KEEP (*(.bootloader_data)) /* .bootloader_data sections */
KEEP (*(.bootloader_data*)) /* .bootloader_data* sections */
. = ALIGN(512); /* end on a 512B boundary to support
runtime erasure, if possible */
__bootloader_data_end = .;
__bootloader_data_size = ABSOLUTE(. - __bootloader_data_start);
} >FLASH
How to use the starting address, ending address and size in code:
extern uint32_t __bootloader_data_start;
extern uint32_t __bootloader_data_end;
extern uint32_t __bootloader_data_size;
uint32_t _bootloader_data_start = (uint32_t)&__bootloader_data_start;
uint32_t _bootloader_data_end = (uint32_t)&__bootloader_data_end;
uint32_t _bootloader_data_size = (uint32_t)&__bootloader_data_size;
Update:
It turns out that I was asking the wrong question. I didn't know about the (NOLOAD) linker section attribute which tells the program loader not to burn this section into flash. I accepted this answer to help others realize my mistake and possibly theirs. By not even programming the section, I don't have to worry about the initialization at all.
I've upvoted the union answers since they seem to be a good solution to the question I asked.
I would use a union of your struct together with an array of the correct size, then initialize the array member.
union {
struct BlData_t data;
uint8_t bytes[sizeof(struct BlData_t)];
} data_with_ff = {
.bytes = {
[0 ... sizeof(struct BlData_t) - 1] = 0xff
}
};
You can then access your struct as data_with_ff.data, defining a pointer or macro for convenience if you wish.
Try on godbolt
(Readers should note that the ... in a designated initializer is a GCC extension; since the question was already using this feature and is tagged gcc I assume that is fine here. If using a compiler that doesn't have it, I don't know another option besides .bytes = { 0xff, 0xff, 0xff ... } with the actual correct number of 0xffs; you'd probably want to generate it with a script.)
The sensible way to do this is to find the command in the linker script telling it to back off from touching that memory in the first place. Because why would you want it do be erased only to filled up with 0xFF again? That only causes unnecessary flash wear for nothing.
Something along the lines of this:
.bootloader_data (NOLOAD) :
{
. = ALIGN(512);
*(.bootloader_data *)
} >FLASH
If you truly need to do this initialization and in pure standard C, then you can wrap your inner struct inside an anonymous union (C11), then initialize that one using macro tricks:
struct BlData_t {
uint8_t version[3];
uint8_t reserved;
uint8_t markers[128];
union {
struct AppData_t {
uint8_t version[3];
uint8_t reserved;
uint32_t crc;
} appInfo[512];
uint8_t raw [512];
};
};
#define INIT_VAL 0xFF, // note the comma
#define INIT_1 INIT_VAL
#define INIT_2 INIT_1 INIT_1
#define INIT_5 INIT_2 INIT_2 INIT_1
#define INIT_10 INIT_5 INIT_5
/* ... you get the idea */
#define INIT_512 INIT_500 INIT_10 INIT_2
const struct BlData_t bld = { .raw = {INIT_512} };
This method could also be applied on whole struct basis, if you for example want to initialize a struct array with all items set to the same values at compile-time.
I am trying to port a generic C driver available for a IMU to my embedded platform based on a nordic module. the most optimal way would be to correctly modify the interface functions to adapt it to my system. So the driver available on github here, has this interface for write/read register:
typedef int32_t (*lsm6dso_write_ptr)(void *, uint8_t, uint8_t*, uint16_t);
typedef int32_t (*lsm6dso_read_ptr) (void *, uint8_t, uint8_t*, uint16_t);
typedef struct {
/** Component mandatory fields **/
lsm6dso_write_ptr write_reg;
lsm6dso_read_ptr read_reg;
/** Customizable optional pointer **/
void *handle;
} lsm6dso_ctx_t;
My read/write register functions are:
void write_i2c_data(nrf_drv_twi_t const *m_twi, uint8_t reg, uint8_t val)
{
uint8_t cmd[2] = {0, 0};
cmd[0] = reg;
cmd[1] = val;
nrf_drv_twi_tx(m_twi, ADDR, cmd, 2, true);
nrf_delay_ms(1);
}
void read_i2c_data(nrf_drv_twi_t const *m_twi, uint8_t reg, uint8_t *val)
{
nrf_drv_twi_tx(m_twi, ADDR, ®, 1, true);
nrf_delay_ms(1);
nrf_drv_twi_rx(m_twi, ADDR, val, 1);
nrf_delay_ms(1);
}
Questions -
1 - I am not sure how to pass along the m_twi driver instance function along to the lsm6dso_ctx_t struct. It says the struct is customizable, but I am not sure how to augment it.
2 - The function pointer kind of got me too - how can I point my function to the lsm6dso_write_ptr pointer. I know will need to modify my function to provide for multiple byte read/write, which I think is doable.
You should implement two functions:
static int32_t your_callback_lsm6dso_read_reg(void *ctx, uint8_t reg, uint8_t* data,
uint16_t len) {
// read from register ret
// len length of data to data pointer
// return 0 on success
// something like: (I have no idea about nrf_* interface)
nrf_drv_twi_t const *m_twi = ctx;
nrf_drv_twi_rx(m_twi, reg, data, len);
return 0;
}
static int32_t your_callback_lsm6dso_write_reg(void *ctx, uint8_t reg, uint8_t* data,
uint16_t len)
// write to register ret len length od data from data pointer
// return 0 on success
// something like: (I have no idea about nrf* interface)
nrf_drv_twi_t const *m_twi = ctx;
nrf_drv_twi_tx(m_twi, reg, data, len);
return 0;
}
Then instantiate the structure:
lsm6dso_ctx_t lsm6dso_ctx = { your_callback_lsm6dso_write_reg, your_callback_lsm6dso_read_reg, m_twi };
and use it like:
lsm6dso_some_function_from_the_library(&lsm6dso_ctx, ...)
The function from the library will call the function pointers from lsm6dso_ctx with the first argument as the void* pointer from the structure. The void* pointer from the structure is used to pass your custom data along. You can then cast the handle from void* pointer into a custom pointer and call the appropriate functions.
how can I point my function to the lsm6dso_write_ptr pointer.
I think your confusion comes from it, that's it's the other way round. The function pointers inside lsm6dso_ctx_t should point to your functions.
Then you have just an instance of lsm6dso_ctx_t structure you use with all functions from the driver. The driver has some logic and it calls your functions as passed with the structure to do input/output operations.
I am trying to design a data structure (I have made it much shorter to save space here but I think you get the idea) to be used for byte level communication:
/* PACKET.H */
#define CM_HEADER_SIZE 3
#define CM_DATA_SIZE 16
#define CM_FOOTER_SIZE 3
#define CM_PACKET_SIZE (CM_HEADER_SIZE + CM_DATA_SIZE + CM_FOOTER_SIZE)
// + some other definitions
typedef struct cm_header{
uint8_t PacketStart; //Start Indicator 0x5B [
uint8_t DeviceId; //ID Of the device which is sending
uint8_t PacketType;
} CM_Header;
typedef struct cm_footer {
uint16_t DataCrc; //CRC of the 'Data' part of CM_Packet
uint8_t PacketEnd; //should be 0X5D or ]
} CM_Footer;
//Here I am trying to conver a few u8[4] tp u32 (4*u32 = 16 byte, hence data size)
typedef struct cm_data {
union {
struct{
uint8_t Value_0_0:2;
uint8_t Value_0_1:2;
uint8_t Value_0_2:2;
uint8_t Value_0_3:2;
};
uint32_t Value_0;
};
//same thing for Value_1, 2 and 3
} CM_Data;
typedef struct cm_packet {
CM_Header Header;
CM_Data Data;
CM_Footer Footer;
} CM_Packet;
typedef struct cm_inittypedef{
uint8_t DeviceId;
CM_Packet Packet;
} CM_InitTypeDef;
typedef struct cm_appendresult{
uint8_t Result;
uint8_t Reason;
} CM_AppendResult;
extern CM_InitTypeDef cmHandler;
The goal here is to make reliable structure for transmitting data over USB interface. At the end the CM_Packet should be converted to an uint8_t array and be given to data transmit register of an mcu (stm32).
In the main.c file I try to init the structure as well as some other stuff related to this packet:
/* MAIN.C */
uint8_t packet[CM_PACKET_SIZE];
int main(void) {
//use the extern defined in packet.h to init the struct
cmHandler.DeviceId = 0x01; //assign device id
CM_Init(&cmHandler); //construct the handler
//rest of stuff
while(1) {
CM_GetPacket(&cmHandler, (uint8_t*)packet);
CDC_Transmit_FS(&packet, CM_PACKET_SIZE);
}
}
And here is the implementation of packet.h which screws up everything so bad. I added the packet[CM_PACKET_SIZE] to watch but it is like it is just being generated randomly. Sometimes by pure luck I can see in this array some of the values that I am interested in! but it is like 1% of the time!
/* PACKET.C */
CM_InitTypeDef cmHandler;
void CM_Init(CM_InitTypeDef *cm_initer) {
cmHandler.DeviceId = cm_initer->DeviceId;
static CM_Packet cmPacket;
cmPacket.Header.DeviceId = cm_initer->DeviceId;
cmPacket.Header.PacketStart = CM_START;
cmPacket.Footer.PacketEnd = CM_END;
cm_initer->Packet = cmPacket;
}
CM_AppendResult CM_AppendData(CM_InitTypeDef *handler, uint8_t identifier,
uint8_t *data){
CM_AppendResult result;
switch(identifier){
case CM_VALUE_0:
handler->Packet.Data.Value_0_0 = data[0];
handler->Packet.Data.Value_0_1 = data[1];
handler->Packet.Data.Value_0_2 = data[2];
handler->Packet.Data.Value_0_3 = data[3];
break;
//Also cases for CM_VALUE_0, 1 , 2
//to build up the CM_Data sturct of CM_Packet
default:
result.Result = CM_APPEND_FAILURE;
result.Reason = CM_APPEND_CASE_ERROR;
return result;
break;
}
result.Result = CM_APPEND_SUCCESS;
result.Reason = 0x00;
return result;
}
void CM_GetPacket(CM_InitTypeDef *handler, uint8_t *packet){
//copy the whole struct in the given buffer and later send it to USB host
memcpy(packet, &handler->Packet, sizeof(CM_PACKET_SIZE));
}
So, the problem is this code gives me 99% of the time random stuff. It never has the CM_START which is the start indicator of packet to the value I want to. But most of the time it has the CM_END byte correctly! I got really confused and cant find out the reason. Being working on an embedded platform which is hard to debugg I am kind of lost here...
If you transfer data to another (different) architecture, do not just pass a structure as a blob. That is the way to hell: endianess, alignment, padding bytes, etc. all can (and likely will) cause trouble.
Better serialize the struct in a conforming way, possily using some interpreted control stream so you do not have to write every field out manually. (But still use standard functions to generate that stream).
Some areas of potential or likely trouble:
CM_Footer: The second field might very well start at a 32 or 64 bit boundary, so the preceeding field will be followed by padding. Also, the end of that struct is very likely to be padded by at least 1 bytes on a 32 bit architecture to allow for proper alignment if used in an array (the compiler does not care you if you actually need this). It might even be 8 byte aligned.
CM_Header: Here you likely (not guaranteed) get one uint8_t with 4*2 bits with the ordering not standardized. The field my be followed by 3 unused bytes which are required for the uint32_t interprettion of the union.
How do you guarantee the same endianess (for >uint8_t: high byte first or low byte first?) for host and target?
In general, the structs/unions need not have the same layout for host and target. Even if the same compiler is used, their ABIs may differ, etc. Even if it is the same CPU, there might be other system constraints. Also, for some CPUs, different ABIs (application binary interface) exist.
I'm currently working on a C-project where I should hide text in an .bmp image.
Therefore I open an image and write the file header and the info header in two structures:
typedef struct _BitmapFileHeader_
{
uint16_t bfType_;
uint32_t bfSize_;
uint32_t bfReserved_;
uint32_t bfOffBits_;
}
__attribute__((packed))
BitmapFileHeader;
typedef struct _BitmapInfoHeader_
{
uint32_t biSize_;
int32_t biWidth_;
int32_t biHeight_;
uint16_t biPlanes_;
uint16_t biBitCount_;
uint32_t biCompression_;
uint32_t biSizeImage_;
int32_t biXPelsPerMeter_;
int32_t biYPelsPerMeter_;
uint32_t biClrUsed_;
uint32_t biClrImportant_;
}BitmapInfoHeader;
BitmapFileHeader* bitmap_headerdata = NULL;
BitmapInfoHeader* bitmap_infodata = NULL;
filename is defined previously
int readBitmapFile (char* filename, BitmapFileHeader** bitmap_headerdata,
BitmapInfoHeader** bitmap_infodata, unsigned char** bitmap_colordata)
{
FILE* bmp_file;
bmp_file = fopen(filename, "rb");
fseek(bmp_file, 0, SEEK_SET); // Set File Cursor to beginning
fread((*bitmap_headerdata), sizeof(**bitmap_headerdata), 1, bmp_file);
fseek(bmp_file, sizeof(**bitmap_headerdata), SEEK_SET);
fread((*bitmap_infodata), sizeof(**bitmap_infodata), 1, bmp_file);
int checkinfo = sizeof(**bitmap_infodata);
int checkheader = sizeof(**bitmap_headerdata);
printf("Size of Infodata: %d\nSize of Headerdata: %d\n", checkinfo, checkheader);
....
}
When I open a valid Bitmap (24bit, not compressed) and I compare the values bfType_, biBitCount and biCompression to 19778,24,0 on Linux it works just fine but when I try to run it on Windows the Program stops when it compares biBitCount to 24.
When I debugged the program I noticed that all the Values from "bitmap_infodata" are one line above from where they should be (when I look at it like a table).
Then I compared sizeof(**bitmap_headerdata) on Linux and on Windows and noticed that it's 14 on Linux and 16 on Windows?
Shouldn't that be the same? And why does the structure bitmap_headerdata have same values on both OS but bitmap_infodata is different?
Bernhard
The problem is that structs are padded differently in different environments.
There are 2 solutions to the problem.
1: Read the header field by field.
2: Remove struct padding. The syntax to do that varies. Some compilers use #PRAGMA PACK. You are using __attribute__((__packed__)) which apparantly doesn't work on both platforms.
The packed attribute is broken in mingw gcc:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52991
You might want to consider using:
#pragma pack(1)
struct BitmapFileHeader {
...
};
#pragma pack()
And -mno-ms-bitfields flag.