Algorithm for writing to EEPROM? - c

I have a memory which is a column of 4 byte rows. I can only write to it in 16 bytes and read is done in 4 bytes (line by line, that is) using I2C.
I am interested in how to write data into the EEPROM: the data that is being written consists of a few different parts of which two can be of variable length. For example, I can have XYYZ or XYYYYZZZZZZZ where each letter is 4 bytes.
My question is, how I should go about this problem to have a general way of writing the message to the memory using 16 byte write that would accommodate the variable nature of the two parts?

Rather than try to work in 4 or 16-byte units, you could consider using a small (21-byte) static cache for the eeprom. Let's assume you have
void eeprom_read16(uint32_t page, uint8_t *data);
void eeprom_write16(uint32_t page, const uint8_t *data);
where page is the address divided by 16, and always operate on 16 byte chunks. The cache itself and its initialization function (you'd call once at power-on) would be
static uint32_t eeprom_page; /* uint16_t suffices for 2 MiB EEPROM */
static uint8_t eeprom_cache[16];
static uint8_t eeprom_dirty;
static void eeprom_init(void)
{
eeprom_page = 0x80000000U; /* "None", at 32 GiB */
eeprom_dirty = 0;
}
static void eeprom_flush(void)
{
if (eeprom_dirty) {
eeprom_write16(eeprom_page, eeprom_cache);
eeprom_dirty = 0;
}
}
The eeprom_flush() function is only needed if you wish to ensure some data is stored in the EEPROM -- basically, after each complete transaction. You can safely call it at any time.
To access any memory in the EEPROM, you use the accessor functions
static inline uint8_t eeprom_get(const uint32_t address)
{
const uint32_t page = address >> 4;
if (page != eeprom_page) {
if (eeprom_dirty) {
eeprom_write(eeprom_page, eeprom_cache);
eeprom_dirty = 0;
}
eeprom_read(page, eeprom_cache);
eeprom_page = page;
}
return eeprom_cache[address % 0xFU];
}
static inline void eeprom_set(const uint32_t address, const uint8_t value)
{
const uint32_t page = address >> 4;
if (page != eeprom_page) {
if (eeprom_dirty) {
eeprom_write(eeprom_page, eeprom_cache);
eeprom_dirty = 0;
}
eeprom_read(page, eeprom_cache);
eeprom_page = page;
}
eeprom_dirty = 1;
eeprom_cache[address % 0xFU] = value;
}
Feel free to omit the inline if you like; it is just an optimization. The static inline above tell a C99 compiler to inline the functions if possible. It might increase a bit your code size, but it should produce faster code (because the compiler can make better optimizations when such small functions are inlined into the code).
Note that you should not use the above in interrupt handlers, because normal code is not prepared for the eeprom page to change mid-operation.
You can mix read and write operations, but that may lead to unnecessary wear on the EEPROM. You can, of course, split the read and write sides to separate caches, if you do mix reads and writes. That would also allow you to safely do EEPROM reads from an interrupt context (although the delay/latency of the I2C access might wreak havoc elsewhere).

Not tailored specifically to your examples, completely untested and relying on having "read 4 bytes from EEPROM" and "write 16 bytes to EEPROM" encapsulated in suitable functions.
void write_to_eeprom(uint32_t start, size_t len, uint8_t *data) {
uint32_t eeprom_dst = start & 0xfffffff0;
uint8_t buffer[16];
ssize_t data_offset;
for (data_offset = (start - eeprom_dst); data_offset < len; data_offset += 16, eeprom_dst+= 16) {
if (data_offset < 0) || ((len - data_offset) < 16) {
// we need to fill our buffer with EEPROM data
read_from_eeprom(eeprom_dst, buffer); // read 4 bytes, place at ptr
read_from_eeprom(eeprom_dst+4, buffer+4);
read_from_eeprom(eeprom_dst+8, buffer+8);
read_from_eeprom(eeprom_dst+12, buffer+12);
for (int buf_ix=0, ssize_t tmp_offset = data_offset; buf_ix < 16; buf_ix++, offset++) {
if ((offset >= 0) && (buf_ix < 16)) {
// We want to copy actual data
buffer[buf_ix] = data[offset];
}
}
} else {
// We don't need to cater for edge cases and can simply shift
// 16 bytes into our tmp buffer.
for (int ix = 0; ix < 16; ix++) {
buffer[ix] = data[data_offset + ix];
}
}
write_to_eeprom(eeprom_dst, buffer);
}
}

Related

Having a race condition in my MPSC ring buffer

I was trying to build a MPSC lock-free ring buffer for learning purpose, and am running into race conditions.
A description of the MPSC ring buffer:
It is guaranteed that poll() is never called when the buffer is empty.
Instead of mod'ing head and tail like a traditional ring buffer, it lets them proceed linearly, and AND's them before using them (since the buffer size is a power of 2, this works ok with overflow).
We keep MAX_PRODUCERS-1 slots open in the queue so that if multiple producers come and see one slot is available and proceed, they can all place their entries.
It uses 32-bit quantities for head and tail, so that it can snapshot them with a 64-bit atomic read without a lock.
My test involves a couple of threads writing some known set of values to the queue, and a consumer thread polling (when the buffer is not empty) and summing all, and verifying the correct result is obtained. With 2 or more producers, I get inconsistent sums (and with 1 producer, it works).
Any help would be much appreciated. Thank you!
Here is the code:
struct ring_buf_entry {
uint32_t seqn;
};
struct __attribute__((packed, aligned(8))) ring_buf {
union {
struct {
volatile uint32_t tail;
volatile uint32_t head;
};
volatile uint64_t snapshot;
};
volatile struct ring_buf_entry buf[RING_BUF_SIZE];
};
#define RING_SUB(x,y) ((x)>=(y)?((x)-(y)):((x)+(1ULL<<32)-(y)))
static void ring_buf_push(struct ring_buf* rb, uint32_t seqn)
{
size_t pos;
while (1) {
// rely on aligned, packed, and no member-reordering properties
uint64_t snapshot = __atomic_load_n(&(rb->snapshot), __ATOMIC_SEQ_CST);
// little endian.
uint64_t snap_head = snapshot >> 32;
uint64_t snap_tail = snapshot & 0xffffffffULL;
if (RING_SUB(snap_tail, snap_head) < RING_BUF_SIZE - MAX_PRODUCERS + 1) {
uint32_t exp = snap_tail;
if (__atomic_compare_exchange_n(&(rb->tail), &exp, snap_tail+1, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)) {
pos = snap_tail;
break;
}
}
asm volatile("pause\n": : :"memory");
}
pos &= RING_BUF_SIZE-1;
rb->buf[pos].seqn = seqn;
asm volatile("sfence\n": : :"memory");
}
static struct ring_buf_entry ring_buf_poll(struct ring_buf* rb)
{
struct ring_buf_entry ret = rb->buf[__atomic_load_n(&(rb->head), __ATOMIC_SEQ_CST) & (RING_BUF_SIZE-1)];
__atomic_add_fetch(&(rb->head), 1, __ATOMIC_SEQ_CST);
return ret;
}

About one line in an implementation of MD5

I'm confused by one line of code in an implementation of MD5,
void MD5_Update(MD5_CTX *ctx, const void *data, unsigned long size)
{
MD5_u32plus saved_lo;
unsigned long used, available;
saved_lo = ctx->lo;
if ((ctx->lo = (saved_lo + size) & 0x1fffffff) < saved_lo)
ctx->hi++;
ctx->hi += size >> 29;
used = saved_lo & 0x3f;
if (used)
{
available = 64 - used;
if (size < available)
{
memcpy(&ctx->buffer[used], data, size);
return;
}
memcpy(&ctx->buffer[used], data, available);
data = (const unsigned char *)data + available;
size -= available;
body(ctx, ctx->buffer, 64);
}
if (size >= 64)
{
data = body(ctx, data, size & ~(unsigned long)0x3f);
size &= 0x3f;
}
memcpy(ctx->buffer, data, size);
}
The question line is if ((ctx->lo = (saved_lo + size) & 0x1fffffff) < saved_lo), it seems the 'size' counts bytes, but the 'ctx->lo' and 'saved_lo' count bits. Why add them together? There are also some similar codes in Github, and also some projects use these code. So anyone can give some explanation?
The remarks about "bit counters" are likely misleading - ctx->hi and ctx->lo count bytes, just like size does.
You correctly notice that you're just adding size (bytes) to ctx->lo (and then checking for overflow/propagating overflow into ctx->hi). The overflow check is pretty simple - lo is used as a 29-bit integer, and if the result after adding/masking is less than the original value, then overflow occurred.
The checks around used are also evidence for ctx->lo and ctx->hi being byte counters -- body processes data 64 bytes at a time, and the lo counter is ANDed with 0x3F (i.e. 63).

Pass array that is part of struct as uint8_t pointer to function

I am working with the Renesas RA2A1 using their Flexible software package, trying to send data over a uart.
I am sending ints and floats over the uart, so I created a union of a float and a 4 byte uint8_t array, same for ints.
I put a few of these in a struct, and then put that in a union with an array that is the size of all the data contained in the struct.
I can't get it to work by passing the array in the struct to the function.. If I create an array of uint8_t, that passes in and works OK... I'm not sure what's wrong with trying to pass the array as I am.
It is failing an assert in R_SCI_UART_WRITE that checks the size, which is failing because it is 0.
typedef union{
float num_float;
uint32_t num_uint32;
int32_t num_int32;
uint8_t num_array[4];
} comms_data_t;
typedef struct{
comms_data_t a;
comms_data_t b;
comms_data_t c;
comms_data_t d;
comms_data_t e;
uint8_t lr[2];
} packet_data_t;
typedef union{
packet_data_t msg_packet_data;
uint8_t packet_array[22];
}msg_data_t;
/* Works */
uint8_t myData[10] = "Hi Dave!\r\n";
uart_print_main_processor_msg(myData);
/* Doesn't work */
msg_data_t msg_data;
/* code removed that puts data into msg_data,ex below */
msg_data.msg_packet_data.a.num_float = 1.2f;
uart_print_main_processor_msg(msg_data.packet_array);
// Functions below
/****************************************************************************************************************/
fsp_err_t uart_print_main_processor_msg(uint8_t *p_msg)
{
fsp_err_t err = FSP_SUCCESS;
uint8_t msg_len = RESET_VALUE;
uint32_t local_timeout = (DATA_LENGTH * UINT16_MAX);
char *p_temp_ptr = (char *)p_msg;
/* Calculate length of message received */
msg_len = ((uint8_t)(strlen(p_temp_ptr)));
/* Reset callback capture variable */
g_uart_event = RESET_VALUE;
/* Writing to terminal */
err = R_SCI_UART_Write (&g_uartMainProcessor_ctrl, p_msg, msg_len);
if (FSP_SUCCESS != err)
{
APP_ERR_PRINT ("\r\n** R_SCI_UART_Write API Failed **\r\n");
return err;
}
/* Check for event transfer complete */
while ((UART_EVENT_TX_COMPLETE != g_uart_event) && (--local_timeout))
{
/* Check if any error event occurred */
if (UART_ERROR_EVENTS == g_uart_event)
{
APP_ERR_PRINT ("\r\n** UART Error Event Received **\r\n");
return FSP_ERR_TRANSFER_ABORTED;
}
}
if(RESET_VALUE == local_timeout)
{
err = FSP_ERR_TIMEOUT;
}
return err;
}
fsp_err_t R_SCI_UART_Write (uart_ctrl_t * const p_api_ctrl, uint8_t const * const p_src, uint32_t const bytes)
{
#if (SCI_UART_CFG_TX_ENABLE)
sci_uart_instance_ctrl_t * p_ctrl = (sci_uart_instance_ctrl_t *) p_api_ctrl;
#if SCI_UART_CFG_PARAM_CHECKING_ENABLE || SCI_UART_CFG_DTC_SUPPORTED
fsp_err_t err = FSP_SUCCESS;
#endif
#if (SCI_UART_CFG_PARAM_CHECKING_ENABLE)
err = r_sci_read_write_param_check(p_ctrl, p_src, bytes);
FSP_ERROR_RETURN(FSP_SUCCESS == err, err);
FSP_ERROR_RETURN(0U == p_ctrl->tx_src_bytes, FSP_ERR_IN_USE);
#endif
/* Transmit interrupts must be disabled to start with. */
p_ctrl->p_reg->SCR &= (uint8_t) ~(SCI_SCR_TIE_MASK | SCI_SCR_TEIE_MASK);
/* If the fifo is not used the first write will be done from this function. Subsequent writes will be done
* from txi_isr. */
#if SCI_UART_CFG_FIFO_SUPPORT
if (p_ctrl->fifo_depth > 0U)
{
p_ctrl->tx_src_bytes = bytes;
p_ctrl->p_tx_src = p_src;
}
else
#endif
{
p_ctrl->tx_src_bytes = bytes - p_ctrl->data_bytes;
p_ctrl->p_tx_src = p_src + p_ctrl->data_bytes;
}
#if SCI_UART_CFG_DTC_SUPPORTED
/* If a transfer instance is used for transmission, reset the transfer instance to transmit the requested
* data. */
if ((NULL != p_ctrl->p_cfg->p_transfer_tx) && p_ctrl->tx_src_bytes)
{
uint32_t data_bytes = p_ctrl->data_bytes;
uint32_t num_transfers = p_ctrl->tx_src_bytes >> (data_bytes - 1);
p_ctrl->tx_src_bytes = 0U;
#if (SCI_UART_CFG_PARAM_CHECKING_ENABLE)
/* Check that the number of transfers is within the 16-bit limit. */
FSP_ASSERT(num_transfers <= SCI_UART_DTC_MAX_TRANSFER);
#endif
err = p_ctrl->p_cfg->p_transfer_tx->p_api->reset(p_ctrl->p_cfg->p_transfer_tx->p_ctrl,
(void const *) p_ctrl->p_tx_src,
NULL,
(uint16_t) num_transfers);
FSP_ERROR_RETURN(FSP_SUCCESS == err, err);
}
#endif
#if SCI_UART_CFG_FLOW_CONTROL_SUPPORT
if ((((sci_uart_extended_cfg_t *) p_ctrl->p_cfg->p_extend)->uart_mode == UART_MODE_RS485_HD) &&
(p_ctrl->flow_pin != SCI_UART_INVALID_16BIT_PARAM))
{
R_BSP_PinAccessEnable();
R_BSP_PinWrite(p_ctrl->flow_pin, BSP_IO_LEVEL_HIGH);
R_BSP_PinAccessDisable();
}
#endif
/* Trigger a TXI interrupt. This triggers the transfer instance or a TXI interrupt if the transfer instance is
* not used. */
p_ctrl->p_reg->SCR |= SCI_SCR_TIE_MASK;
#if SCI_UART_CFG_FIFO_SUPPORT
if (p_ctrl->fifo_depth == 0U)
#endif
{
/* On channels with no FIFO, the first byte is sent from this function to trigger the first TXI event. This
* method is used instead of setting TE and TIE at the same time as recommended in the hardware manual to avoid
* the one frame delay that occurs when the TE bit is set. */
if (2U == p_ctrl->data_bytes)
{
p_ctrl->p_reg->FTDRHL = *((uint16_t *) (p_src)) | (uint16_t) ~(SCI_UART_FIFO_DAT_MASK);
}
else
{
p_ctrl->p_reg->TDR = *(p_src);
}
}
return FSP_SUCCESS;
#else
FSP_PARAMETER_NOT_USED(p_api_ctrl);
FSP_PARAMETER_NOT_USED(p_src);
FSP_PARAMETER_NOT_USED(bytes);
return FSP_ERR_UNSUPPORTED;
#endif
}
There are several issues with this program. A large part of this code relies on undefined behavior. Unions are also UB if used for aliasing, even if pretty much all C compilers tend to allow it, but if you are using a union I would still prefer using a char[] for the array used for aliasing. As mentioned in the comments, "Hi Dave!\r\n"; actually takes up 11 bytes with the null-character. It's safer to use uint8_t myData[] = "Hi Dave!\r\n"; or const * uint8_t = "Hi Dave!\r\n"; and spare yourself the trouble.
Second problem is that strlen cannot work correctly for binary data. strlen works by searching for the first occurrence of the null-character in the string, so it's not applicable for binary data. If you pass a floating point value which has a single zero byte in its IEEE 754 representation, it will mark the end of this "string".
Plain and simple, your function should be declared as fsp_err_t uart_write(const char * msg, size_t msg_len); and be called using uart_write(data_array, sizeof data_array);. If you want to transmit messages of variable size over the UART, you will also have to define a certain communication protocol, i.e. create a message that can be unambiguously parsed. This will likely mean: 1) some cookie at the beginning, 2) length of the transmitted data, 3) actual data, 4) crc -- but this is outside the scope of this question.
So, strlen won't tell you the length of the data, you will pass it to the function yourself, and you don't need unions at all. If you choose not to properly serialize the data (e.g. using protobuf or some other protocol), you can simply pass the pointer to the struct to the function, i.e. call the above mentioned uart_write((char*)&some_struct, sizeof some_struct); and it will work as if you passed an array.
Note that char in this case doesn't mean "ascii character", or "character in a string". The point with using the char* is that it's the only pointer which is legally allowed to alias other pointers. So, you acquire a pointer to your struct (&str), cast it to a char*, and pass it to a function which can then read its representation in memory. I am aware that R_SCI_UART_Write is likely generated by your IDE, and unfortunately these blocks often use uint8_t* instead of char*, so you will probably have to cast to uint8_t* at some point.

Error while trying to update array element

I am working on an embedded platform which does not have debugging features. So it is hard to say what is the error source.
I have defined in header file:
typedef struct cm_packet {
CM_Header Header; //header of packet 3 bytes
uint8_t *Data; //packet data 64 bytes
CM_Footer Footer; //footer of packet 3 bytes
} CM_Packet;
typedef struct cm_inittypedef{
uint8_t DeviceId;
CM_Packet Packet;
} CM_InitTypeDef;
extern CM_InitTypeDef cmHandler;
void CM_Init(CM_InitTypeDef *handler);
CM_AppendResult CM_AppendData(CM_InitTypeDef *handler, uint8_t identifier
, uint8_t *data, uint8_t length);
And somewhere in implementation I have:
uint8_t bufferIndex = 0;
void CM_Init(CM_InitTypeDef *cm_initer) { //init a handler
cmHandler.DeviceId = cm_initer->DeviceId;
CM_Packet cmPacket;
cmPacket.Header.DeviceId = cm_initer->DeviceId;
cmPacket.Header.PacketStart = CM_START;
cmPacket.Footer.PacketEnd = CM_END;
//initialize data array
uint8_t emptyBuffer[CM_MAX_DATA_SIZE] = {0x00};
cmPacket.Data = emptyBuffer;
cm_initer->Packet = cmPacket;
}
CM_AppendResult CM_AppendData(CM_InitTypeDef *handler, uint8_t identifier
, uint8_t *data, uint8_t length){
//some check to see if new data does not make Data overflow
uint8_t i;
/*** ERROR HAPPENS HERE!!!! ***/
handler->Packet.Data[bufferIndex++] = identifier;
//now add the data itself
for(i = 0; i < length; i++) {
handler->Packet.Data[bufferIndex++] = data[i];
}
//reset indexer
if(bufferIndex > 64) {
PacketReady(); //mark packet as ready
bufferIndex = 0
};
//return result
}
The idea is to update the Packet.Data from some other source codes which have access to the handler. For example some other sources can call that Append function to change Packet.Data. But as you see in the code, I have commented the place which causes the micro-controller to go in hard fault mode. I am not sure what is happening here. All I know is exactly at that line micro goes into hard fault mode and never recovers!
This might be a race condition but before anything else I want to know I have written correct c !!! code then I try to rule out other problems.
In function CM_Init, you are setting cmPacket.Data to point to a local array:
uint8_t emptyBuffer[CM_MAX_DATA_SIZE] = {0x00};
cmPacket.Data = emptyBuffer;
Accessing this memory address outside the scope of the function yields undefined behavior.
As #barak manos mentioned, the buffer supplied to Data is allocated on the stack.
When you get to CM_AppendData, you are writing over memory that is no longer dedicated to the buffer.
You may want to use malloc so that the buffer is allocated on the heap instead of on the stack. Just remember to call free so that you are not leaking memory.
If you can't use dynamic allocation, it's possible to dedicate some scratch memory for all the Data uses. It just needs to be static.
Hope that helps :)

Distributed shared memory [duplicate]

This question already has answers here:
Distributed shared memory library for C++? [closed]
(3 answers)
Closed 8 years ago.
My current system runs on Linux, with the different tasks using shared memory to access the common data (which is defined as a C struct). The size of the shared data is about 100K.
Now, I want to run the main user interface on Windows, while keeping all the other tasks in Linux, and I’m looking for the best replacement for the shared memory. The refresh rate of the UI is about 10 times per second.
My doubt is if it’s better to do it by myself or to use a third party solution. If I do it on my own, my idea is to open a socket and then use some sort of data serialization between the client (Windows) and the server (Linux).
In the case of third party solutions, I’m a bit overwhelmed by the number of options. After some search, it seems to me that the right solution could be MPI, but I would like to consider other options before start working with it: XDR, OpenMP, JSON, DBus, RDMA, memcached, boost libraries … Has anyone got any experience with any of them? Which are the pros and cons of using such solutions for accessing the shared memory on Linux from Windows?
Maybe MPI or the other third party solutions are too elaborated for such a simple use as mine and I should use a do-it-yourself approach? Any advice if I take that solution? Am I missing something? Am I looking in the wrong direction? I wouldn’t like to reinvent the wheel.
For the file sharing, I’m considering Samba.
My development is done with C and C++, and, of course, the solution needs to be compiled in Linux and Windows.
AFAIK (but I don't know Windows) you cannot share memory (à la shm_overview(7)) on both Linux and Windows at once (you could share data, using some network protocol, this is what memcached does).
However, you can make a Linux process answering to network messages from a Windows machine (but that is not shared memory!).
Did you consider using a Web interface? You could add a web interface -with ajax techniques- on Linux (to your Linux software), e.g. using FastCGI or an HTTP server library like libonion or wt. Then your Windows users could use their browser to interact with your program (which would still run on some Linux compute server).
PS. Read the wikipage on distributed shared memory. It means something different than what you are asking! Read also about message passing!
I'd recommend using a plain TCP socket between the user interface and the service, with simple tagged message frames passed between the two in preferably architecture-independent manner (i.e. with specific byte order and integer and floating-point types).
On the server end, I'd use a helper process or thread, that maintains two local copies of the shared state struct per client. First one reflects the state the client knows about, and the second one is used to snapshot the shared state at regular intervals. When the two differ, the helper sends the differing fields to the client. Conversely, the client can send modification requests to the helper.
I would not send the 100k shared data structure as a single chunk; it'd be very inefficient. Also, assuming the state contains multiple fields, having the client send the entire new 100k state would overwrite fields not meant to by the client.
I'd use one message for each field in the state, or atomically manipulated set of fields. The message structure itself should be very simple. At minimum, each message should start with a fixed-size length and type fields, so that it is trivial to receive the messages. It is always a good idea to leave the possibility of later extending the capabilities/fields, without breaking backwards compatibility, too.
You don't need a lot of code to implement the above. You do need some accessor/manipulator code for each different type of field in the state (char, short, int, long, double, float, and any other type or struct you might use), and that (and the helper that compares the active state with the simulated user state) will probably be the bulk of the code.
Although I don't normally recommend reinventing the wheel, in this case I do recommend writing your own message passing code, directly on top of TCP, simply because all the libraries I know of that could be used for this, are rather unwieldy or would impose some design choices I'd rather leave to the application.
If you want, I can provide some example code that would hopefully illustrate this better. I don't have Windows, but Qt does have a QTcpSocket class you can use that works the same in both Linux and Windows, so there shouldn't be (m)any differences. In fact, if you design the message structure well, you could write the UI in some other language like Python, without any differences to the server itself.
Promised examples follow; apologies for maximum-length post.
Paired counters are useful when there is a single writer for a specific field, possibly many readers, and you wish to add minimal overhead to the write path. Reading never blocks, and a reader will see whether they got a valid copy or if an update was in progress and they should not rely on the value they saw. (This also allows reliable write semantics for a field in an async-signal-safe context, where pthread locking is not guaranteed to work.)
This implementation works best on GCC-4.7 and later, but should work with earlier C compilers, too. Legacy __sync built-ins also work with Intel, Pathscale, and PGI C compilers. counter.h:
#ifndef COUNTER_H
#define COUNTER_H
/*
* Atomic generation counter support.
*
* A generation counter allows a single writer thread to
* locklessly modify a value non-atomically, while letting
* any number of reader threads detect the change. Reader
* threads can also check if the value they observed is
* "atomic", or taken during a modification in progress.
*
* There is no protection against multiple concurrent writers.
*
* If the writer gets canceled or dies during an update,
* the counter will get garbled. Reinitialize before relying
* on such a counter.
*
* There is no guarantee that a reader will observe an
* "atomic" value, if the writer thread modifies the value
* more often than the reader thread can read it.
*
* Two typical use cases:
*
* A) A single writer requires minimum overhead/latencies,
* whereas readers can poll and retry if necessary
*
* B) Non-atomic value or structure modified in
* an interrupt context (async-signal-safe)
*
* Initialization:
*
* Counter counter = COUNTER_INITIALIZER;
*
* or
*
* Counter counter;
* counter_init(&counter);
*
* Write sequence:
*
* counter_acquire(&counter);
* (modify or write value)
* counter_release(&counter);
*
* Read sequence:
*
* unsigned int check;
* check = counter_before(&counter);
* (obtain a copy of the value)
* if (check == counter_after(&counter))
* (a copy of the value is atomic)
*
* Read sequence with spin-waiting,
* will spin forever if counter garbled:
*
* unsigned int check;
* do {
* check = counter_before(&counter);
* (obtain a copy of the value)
* } while (check != counter_after(&counter));
*
* All these are async-signal-safe, and will never block
* (except for the spinning loop just above, obviously).
*/
typedef struct {
volatile unsigned int value[2];
} Counter;
#define COUNTER_INITIALIZER {{0U,0U}}
#if (__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)
/*
* GCC 4.7 and later provide __atomic built-ins.
* These are very efficient on x86 and x86-64.
*/
static inline void counter_init(Counter *const counter)
{
/* This is silly; assignments should suffice. */
do {
__atomic_store_n(&(counter->value[1]), 0U, __ATOMIC_SEQ_CST);
__atomic_store_n(&(counter->value[0]), 0U, __ATOMIC_SEQ_CST);
} while (__atomic_load_n(&(counter->value[0]), __ATOMIC_SEQ_CST) |
__atomic_load_n(&(counter->value[1]), __ATOMIC_SEQ_CST));
}
static inline unsigned int counter_before(Counter *const counter)
{
return __atomic_load_n(&(counter->value[1]), __ATOMIC_SEQ_CST);
}
static inline unsigned int counter_after(Counter *const counter)
{
return __atomic_load_n(&(counter->value[0]), __ATOMIC_SEQ_CST);
}
static inline unsigned int counter_acquire(Counter *const counter)
{
return __atomic_fetch_add(&(counter->value[0]), 1U, __ATOMIC_SEQ_CST);
}
static inline unsigned int counter_release(Counter *const counter)
{
return __atomic_fetch_add(&(counter->value[1]), 1U, __ATOMIC_SEQ_CST);
}
#else
/*
* Rely on legacy __sync built-ins.
*
* Because there is no __sync_fetch(),
* counter_before() and counter_after() are safe,
* but not optimal (especially on x86 and x86-64).
*/
static inline void counter_init(Counter *const counter)
{
/* This is silly; assignments should suffice. */
do {
counter->value[0] = 0U;
counter->value[1] = 0U;
__sync_synchronize();
} while (__sync_fetch_and_add(&(counter->value[0]), 0U) |
__sync_fetch_and_add(&(counter->value[1]), 0U));
}
static inline unsigned int counter_before(Counter *const counter)
{
return __sync_fetch_and_add(&(counter->value[1]), 0U);
}
static inline unsigned int counter_after(Counter *const counter)
{
return __sync_fetch_and_add(&(counter->value[0]), 0U);
}
static inline unsigned int counter_acquire(Counter *const counter)
{
return __sync_fetch_and_add(&(counter->value[0]), 1U);
}
static inline unsigned int counter_release(Counter *const counter)
{
return __sync_fetch_and_add(&(counter->value[1]), 1U);
}
#endif
#endif /* COUNTER_H */
Each shared state field type needs their own structure defined. For our purposes, I'll just use a Value structure to describe one of them. It is up to you to define all the needed Field and Value types to match your needs. For example, a 3D vector of double precision floating-point components would be
typedef struct {
double x;
double y;
double z;
} Value;
typedef struct {
Counter counter;
Value value;
} Field;
where the modifying thread uses e.g.
void set_field(Field *const field, Value *const value)
{
counter_acquire(&field->counter);
field->value = value;
counter_release(&field->counter);
}
and each reader maintains their own local shapshot using e.g.
typedef enum {
UPDATED = 0,
UNCHANGED = 1,
BUSY = 2
} Updated;
Updated check_field(Field *const local, const Field *const shared)
{
Field cache;
cache.counter[0] = counter_before(&shared->counter);
/* Local counter check allows forcing an update by
* simply changing one half of the local counter.
* If you don't need that, omit the local-only part. */
if (local->counter[0] == local->counter[1] &&
cache.counter[0] == local->counter[0])
return UNCHANGED;
cache.value = shared->value;
cache.counter[1] = counter_after(&shared->counter);
if (cache.counter[0] != cache.counter[1])
return BUSY;
*local = cache;
return UPDATED;
}
Note that the cache above constitutes one local copy of the value, so each reader only needs to maintain one copy of the fields it is interested in. Also, the above accessor function only modifies local with a successful snapshot.
When using pthread locking, pthread_rwlock_t is a good option, as long as the programmer understands the priority issues with multiple readers and writers. (see man pthread_rwlock_rdlock and man pthread_rwlock_wrlock for details.)
Personally, I would just stick to mutexes with as short holding times as possible, to reduce overall complexity. The Field structure for a mutex-protected change-counted value I'd use would probably be something like
typedef struct {
pthread_mutex_t mutex;
Value value;
volatile unsigned int change;
} Field;
void set_field(Field *const field, const Value *const value)
{
pthread_mutex_lock(&field->mutex);
field->value = *value;
#if (__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)
__atomic_fetch_add(&field->change, 1U, __ATOMIC_SEQ_CST);
#else
__sync_fetch_and_add(&field->change, 1U);
#endif
pthread_mutex_unlock(&field->mutex);
}
void get_field(Field *const local, Field *const field)
{
pthread_mutex_lock(&field->mutex);
local->value = field->value;
#if (__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)
local->value = __atomic_fetch_n(&field->change, __ATOMIC_SEQ_CST);
#else
local->value = __sync_fetch_and_add(&field->change, 0U);
#endif
pthread_mutex_unlock(&field->mutex);
}
Updated check_field(Field *const local, Field *const shared)
{
#if (__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)
if (local->change == __atomic_fetch_n(&field->change, __ATOMIC_SEQ_CST))
return UNCHANGED;
#else
if (local->change == __sync_fetch_and_add(&field->change, 0U))
return UNCHANGED;
#endif
pthread_mutex_lock(&shared->mutex);
local->value = shared->value;
#if (__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)
local->change = __atomic_fetch_n(&field->change, __ATOMIC_SEQ_CST);
#else
local->change = __sync_fetch_and_add(&field->change, 0U);
#endif
pthread_mutex_unlock(&shared->mutex);
}
Note that the mutex field in the local copy is unused, and can be omitted from the local copy (defining a suitable Cache etc. structure type without it).
The semantics with respect to the change field are simple: It is always accessed atomically, and can therefore be read without holding the mutex. It is only modified while holding the mutex (but because readers do not hold the mutex, it must be modified atomically even then).
Note that if your server is restricted to x86 or x86-64 architectures, you can use ordinary accesses to change above (e.g. field->change++), as unsigned int access is atomic on these architectures; no need to use built-in functions. (Although increment is non-atomic, readers will only ever see the old or the new unsigned int, so that's not a problem either.)
On the server, you'll need some kind of "agent" process or thread, acting on behalf of the remote clients, sending field changes to each client, and optionally receiving field contents for update from each client.
Efficient implementation of such an agent is a big question. A properly thought-out recommendation would require more detailed information on the system. The simplest possible implementation, I guess, would be a low-priority thread simply scanning the shared state continuously, reporting any field changes not known to each client as it sees them. If change rate is high -- say, usually more than a dozen changes per second over the entire state -- then that may be quite acceptable. If change rate varies, adding a global change counter (modified whenever a single field is modified, i.e. in set_field()) and sleeping (or at least yielding) when no global change is observed, is better.
If there are multiple remote clients, I would use a single agent, which collects changes into a queue, with a cached copy of the state as seen at the head of the queue (with previous changes being those that lead up to it). Initially connected clients get the entire state, then each queued change following that. Note that the queue entries do not need to represent fields as such, but the messages sent to each client to update that field, and that if a latter update replaces an already queued value, the already queued value can be removed, reducing the amount of data that needs to be sent to clients that are catching up. This is an interesting topic on its own, and would warrant a question of its own, in my opinion.
Like I mentioned above, I would personally use a plain TCP socket between the user interface and the service.
There are four main issues with respect to the TCP messages:
Byte order and data formats (unless both server and remote clients are guaranteed to use the same hardware architecture)
My preferred solution is to use native byte order (but well-defined standard types) when sending, with recipient doing any byte order conversions. This requires that each end sends an initial message with predetermined "prototype values" for each integer and floating-point type used.
I assume integers use two's complement format with no padding bits, and double and float are double- and single-precision IEEE-754 types, respectively. Most current hardware architectures have these natively (although byte ordering does vary). Weird architectures can and do emulate these in software, typically by supplying suitable command-line switches or using some libraries.
Message framing (as TCP is a stream of bytes, not a stream of packets, from the application perspective)
Most robust option is to prefix each message with a fixed size type field and a fixed size length field. (Usually, the length field indicates how many additional bytes are part of this message, and will include any possible padding bytes the sender might have added.)
The recipient will receive TCP input until it has enough cached for the fixed type and length parts to examine the complete packet length. Then it receives more data, until it has a full TCP packet. A minimal receive buffer implementation is something like
Extensibility
Backwards compatibility is essential for making updates and upgrades easier. This means you'll probably need to include some kind of version number in the initial message from server to client, and have both server and client ignore messages they do not know. (This obviously requires a length field in each message, as recipients cannot infer the length of messages they do not recognize.)
In some cases you might need the ability to mark a message important, in the sense that if the recipient does not recognize it, further communications should be aborted. This is easiest to achieve by using some kind of identifier or bit in the type field. For example, PNG file format is very similar to the message format I've described here, with a four-byte (four ASCII character) type field, and a four-byte (32-bit) length field for each message ("chunk"). If the first type character is uppercase ASCII, it means it is a critical one.
Passing initial state
Passing the initial state as a single chunk fixes the entire shared state structure. Instead, I recommend passing each field or record separately, just as if they were updates. Optionally, when all the initial fields have been sent, you can send a message that informs the client they have the entire state; this should let the client first receive the complete state, then construct and render the user interface, without having to dynamically adjust to varying number of fields.
Whether each client requires the full state, or just a smaller subset of the state, is another question. When a remote client connects to a server, it probably should include some kind of identifier the server can use to make that determination.
Here is messages.h, a header file with inline implementation of integer and floating-point type handling and byte order detection:
#ifndef MESSAGES_H
#define MESSAGES_H
#include <stdint.h>
#include <string.h>
#include <errno.h>
#if defined(__GNUC__)
static const struct __attribute__((packed)) {
#else
#pragma pack(push,1)
#endif
const uint64_t u64; /* Offset +0 */
const int64_t i64; /* +8 */
const double dbl; /* +16 */
const uint32_t u32; /* +24 */
const int32_t i32; /* +28 */
const float flt; /* +32 */
const uint16_t u16; /* +36 */
const int16_t i16; /* +38 */
} prototypes = {
18364758544493064720ULL, /* u64 */
-1311768465156298103LL, /* i64 */
0.71948481353325643983254167324048466980457305908203125,
3735928559UL, /* u32 */
-195951326L, /* i32 */
1.06622731685638427734375f, /* flt */
51966U, /* u16 */
-7658 /* i16 */
};
#if !defined(__GNUC__)
#pragma pack(pop)
#endif
/* Add prototype section to a buffer.
*/
size_t add_prototypes(unsigned char *const data, const size_t size)
{
if (size < sizeof prototypes) {
errno = ENOSPC;
return 0;
} else {
memcpy(data, &prototypes, sizeof prototypes);
return sizeof prototypes;
}
}
/*
* Byte order manipulation functions.
*/
static void reorder64(void *const dst, const void *const src, const int order)
{
if (order) {
uint64_t value;
memcpy(&value, src, 8);
if (order & 1)
value = ((value >> 8U) & 0x00FF00FF00FF00FFULL)
| ((value & 0x00FF00FF00FF00FFULL) << 8U);
if (order & 2)
value = ((value >> 16U) & 0x0000FFFF0000FFFFULL)
| ((value & 0x0000FFFF0000FFFFULL) << 16U);
if (order & 4)
value = ((value >> 32U) & 0x00000000FFFFFFFFULL)
| ((value & 0x00000000FFFFFFFFULL) << 32U);
memcpy(dst, &value, 8);
} else
if (dst != src)
memmove(dst, src, 8);
}
static void reorder32(void *const dst, const void *const src, const int order)
{
if (order) {
uint32_t value;
memcpy(&value, src, 4);
if (order & 1)
value = ((value >> 8U) & 0x00FF00FFUL)
| ((value & 0x00FF00FFUL) << 8U);
if (order & 2)
value = ((value >> 16U) & 0x0000FFFFUL)
| ((value & 0x0000FFFFUL) << 16U);
memcpy(dst, &value, 4);
} else
if (dst != src)
memmove(dst, src, 4);
}
static void reorder16(void *const dst, const void *const src, const int order)
{
if (order & 1) {
const unsigned char c[2] = { ((const unsigned char *)src)[0],
((const unsigned char *)src)[1] };
((unsigned char *)dst)[0] = c[1];
((unsigned char *)dst)[1] = c[0];
} else
if (dst != src)
memmove(dst, src, 2);
}
/* Detect byte order conversions needed.
*
* If the prototypes cannot be supported, returns -1.
*
* If prototype record uses native types, returns 0.
*
* Otherwise, bits 0..2 are the integer order conversion,
* and bits 3..5 are the floating-point order conversion.
* If 'order' is the return value, use
* reorderXX(local, remote, order)
* for integers, and
* reorderXX(local, remote, order/8)
* for floating-point types.
*
* For adjusting records sent to server, just do the same,
* but with order obtained by calling this function with
* parameters swapped.
*/
int detect_order(const void *const native, const void *const other)
{
const unsigned char *const source = other;
const unsigned char *const target = native;
unsigned char temp[8];
int iorder = 0;
int forder = 0;
/* Verify the size of the types.
* C89/C99/C11 says sizeof (char) == 1, but check that too. */
if (sizeof (double) != 8 ||
sizeof (int64_t) != 8 ||
sizeof (uint64_t) != 8 ||
sizeof (float) != 4 ||
sizeof (int32_t) != 4 ||
sizeof (uint32_t) != 4 ||
sizeof (int16_t) != 2 ||
sizeof (uint16_t) != 2 ||
sizeof (unsigned char) != 1 ||
sizeof prototypes != 40)
return -1;
/* Find byte order for the largest floating-point type. */
while (1) {
reorder64(temp, source + 16, forder);
if (!memcmp(temp, target + 16, 8))
break;
if (++forder >= 8)
return -1;
}
/* Verify forder works for all other floating-point types. */
reorder32(temp, source + 32, forder);
if (memcmp(temp, target + 32, 4))
return -1;
/* Find byte order for the largest integer type. */
while (1) {
reorder64(temp, source + 0, iorder);
if (!memcmp(temp, target + 0, 8))
break;
if (++iorder >= 8)
return -1;
}
/* Verify iorder works for all other integer types. */
reorder64(temp, source + 8, iorder);
if (memcmp(temp, target + 8, 8))
return -1;
reorder32(temp, source + 24, iorder);
if (memcmp(temp, target + 24, 4))
return -1;
reorder32(temp, source + 28, iorder);
if (memcmp(temp, target + 28, 4))
return -1;
reorder16(temp, source + 36, iorder);
if (memcmp(temp, target + 36, 2))
return -1;
reorder16(temp, source + 38, iorder);
if (memcmp(temp, target + 38, 2))
return -1;
/* Everything works. */
return iorder + 8 * forder;
}
/* Verify current architecture is supportable.
* This could be a compile-time check.
*
* (The buffer contains prototypes for network byte order,
* and actually returns the order needed to convert from
* native to network byte order.)
*
* Returns -1 if architecture is not supported,
* a nonnegative (0 or positive) value if successful.
*/
static int verify_architecture(void)
{
static const unsigned char network_endian[40] = {
254U, 220U, 186U, 152U, 118U, 84U, 50U, 16U, /* u64 */
237U, 203U, 169U, 135U, 238U, 204U, 170U, 137U, /* i64 */
63U, 231U, 6U, 5U, 4U, 3U, 2U, 1U, /* dbl */
222U, 173U, 190U, 239U, /* u32 */
244U, 82U, 5U, 34U, /* i32 */
63U, 136U, 122U, 35U, /* flt */
202U, 254U, /* u16 */
226U, 22U, /* i16 */
};
return detect_order(&prototypes, network_endian);
}
#endif /* MESSAGES_H */
Here is an example function that a sender can use to pack a 3-component vector identified by a 32-bit unsigned integer, into a 36-byte message. This uses a message frame starting with four bytes ("Vec3"), followed by the 32-bit frame length, followed by the 32-bit identifier, followed by three doubles:
size_t add_vector(unsigned char *const data, const size_t size,
const uint32_t id,
const double x, const double y, const double z)
{
const uint32_t length = 4 + 4 + 4 + 8 + 8 + 8;
if (size < (size_t)bytes) {
errno = ENOSPC;
return 0;
}
/* Message identifier, four bytes */
buffer[0] = 'V';
buffer[1] = 'e';
buffer[2] = 'c';
buffer[3] = '3';
/* Length field, uint32_t */
memcpy(buffer + 4, &length, 4);
/* Identifier, uint32_t */
memcpy(buffer + 8, &id, 4);
/* Vector components */
memcpy(buffer + 12, &x, 8);
memcpy(buffer + 20, &y, 8);
memcpy(buffer + 28, &z, 8);
return length;
}
Personally, I prefer shorter, 16-bit identifiers and 16-bit length; that also limits maximum message length to 65536 bytes, making 65536 bytes a good read/write buffer size.
The recipient can handle the received TCP data stream for example thus:
static unsigned char *input_data; /* Data buffer */
static size_t input_size; /* Data buffer size */
static unsigned char *input_head; /* Next byte in buffer */
static unsigned char *input_tail; /* After last buffered byte */
static int input_order; /* From detect_order() */
/* Function to handle "Vec3" messages: */
static void handle_vec3(unsigned char *const msg)
{
uint32_t id;
double x, y, z;
reorder32(&id, msg+8, input_order);
reorder64(&x, msg+12, input_order/8);
reorder64(&y, msg+20, input_order/8);
reorder64(&z, msg+28, input_order/8);
/* Do something with vector id, x, y, z. */
}
/* Function that tries to consume thus far buffered
* input data -- typically run once after each
* successful TCP receive. */
static void consume(void)
{
while (input_head + 8 < input_tail) {
uint32_t length;
/* Get current packet length. */
reorder32(&length, input_head + 4, input_order);
if (input_head + length < input_tail)
break; /* Not all read, yet. */
/* We have a full packet! */
/* Handle "Vec3" packet: */
if (input_head[0] == 'V' &&
input_head[1] == 'e' &&
input_head[2] == 'c' &&
input_head[3] == '3')
handle_vec3(input_head);
/* Advance to next packet. */
input_head += length;
}
if (input_head < input_tail) {
/* Move partial message to start of buffer. */
if (input_head > input_data) {
const size_t have = input_head - input_data;
memmove(input_data, input_head, have);
input_head = input_data;
input_tail = input_data + have;
}
} else {
/* Buffer is empty. */
input_head = input_data;
input_tail = input_data;
}
}
Questions?

Resources