I have a need for a ring buffer (In C language) which can hold objects of any type at the run time (almost the data will be different signal's values like current (100ms and 10ms) and temperature.etc) ( I am not sure if it have to be a fixed size or not) and it needs to be very high performance. although it's in a multi-tasking embedded environment.
Actually i need this buffer as a back up, which mean the embedded software will work as normal and save the data into the ring buffer, so far for any reason and when an error occurred, then i could have like a reference for the measured values then i will be able to have a look on them and determine the problem. Also i need to make a time stamp on the ring buffer, which mean every data (Signal value) is stored on the ring buffer will stored with the measurement's time.
Any code or ideas would be greatly appreciated. some of the operations required are:
create a ring buffer with specific size.
Link it with the whole software.
put at the tail.
get from the head.
at error, read the data and when its happen (time stamp).
return the count.
overwrite when the buffer is being full.
#include<stdint.h>
#include<stdio.h>
#include<stdlib.h>
typedef struct ring_buffer
{
void * buffer; // data buffer
void * buffer_end; // end of data buffer
void * data_start; // pointer to head
void * data_end; // pointer to tail
uint64_t capacity; // maximum number of items in buffer
uint64_t count; // number of items in the buffer
uint64_t size; // size of each item in the buffer
} ring_buffer;
void rb_init (ring_buffer *rb, uint64_t size, uint64_t capacity )
{
rb->buffer = malloc(capacity * size);
if(rb->buffer == NULL)
// handle error
rb->buffer_end = (char *)rb->buffer + capacity * size;
rb->capacity = capacity;
rb->count = 0;
rb->size = size;
rb->data_start = rb->buffer;
rb->data_end = rb->buffer;
}
void cb_free(ring_buffer *rb)
{
free(rb->buffer);
// clear out other fields too, just to be safe
}
void rb_push_back(ring_buffer *rb, const void *item)
{
if(rb->count == rb->capacity){
// handle error
}
memcpy(rb->data_start, item, rb->size);
rb->data_start = (char*)rb->data_start + rb->size;
if(rb->data_start == rb->buffer_end)
rb->data_start = rb->buffer;
rb->count++;
}
void rb_pop_front(ring_buffer *rb, void *item)
{
if(rb->count == 0){
// handle error
}
memcpy(item, rb->data_end, rb->size);
rb->data_end = (char*)rb->data_end + rb->size;
if(rb->data_end == rb->buffer_end)
rb->data_end = rb->buffer;
rb->count--;
}
Creating a ring buffer/FIFO with hardcopies of generic type is highly questionable design for embedded systems. You shouldn't need that high level of abstraction for code so close to the hardware.
Either you make a ring buffer with a data type tag (like an enum) plus a void* to data allocated elsewhere, or you make a ring buffer where all data is of the same type. Everything else is most likely confused program design ("XY problem").
You need some means to lock access to the ring buffer internally, to make it thread-safe/interrupt-safe. This, as well as the time stamp, has to be handled internally by the ring buffer ADT.
Related
I was trying to build a MPSC lock-free ring buffer for learning purpose, and am running into race conditions.
A description of the MPSC ring buffer:
It is guaranteed that poll() is never called when the buffer is empty.
Instead of mod'ing head and tail like a traditional ring buffer, it lets them proceed linearly, and AND's them before using them (since the buffer size is a power of 2, this works ok with overflow).
We keep MAX_PRODUCERS-1 slots open in the queue so that if multiple producers come and see one slot is available and proceed, they can all place their entries.
It uses 32-bit quantities for head and tail, so that it can snapshot them with a 64-bit atomic read without a lock.
My test involves a couple of threads writing some known set of values to the queue, and a consumer thread polling (when the buffer is not empty) and summing all, and verifying the correct result is obtained. With 2 or more producers, I get inconsistent sums (and with 1 producer, it works).
Any help would be much appreciated. Thank you!
Here is the code:
struct ring_buf_entry {
uint32_t seqn;
};
struct __attribute__((packed, aligned(8))) ring_buf {
union {
struct {
volatile uint32_t tail;
volatile uint32_t head;
};
volatile uint64_t snapshot;
};
volatile struct ring_buf_entry buf[RING_BUF_SIZE];
};
#define RING_SUB(x,y) ((x)>=(y)?((x)-(y)):((x)+(1ULL<<32)-(y)))
static void ring_buf_push(struct ring_buf* rb, uint32_t seqn)
{
size_t pos;
while (1) {
// rely on aligned, packed, and no member-reordering properties
uint64_t snapshot = __atomic_load_n(&(rb->snapshot), __ATOMIC_SEQ_CST);
// little endian.
uint64_t snap_head = snapshot >> 32;
uint64_t snap_tail = snapshot & 0xffffffffULL;
if (RING_SUB(snap_tail, snap_head) < RING_BUF_SIZE - MAX_PRODUCERS + 1) {
uint32_t exp = snap_tail;
if (__atomic_compare_exchange_n(&(rb->tail), &exp, snap_tail+1, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)) {
pos = snap_tail;
break;
}
}
asm volatile("pause\n": : :"memory");
}
pos &= RING_BUF_SIZE-1;
rb->buf[pos].seqn = seqn;
asm volatile("sfence\n": : :"memory");
}
static struct ring_buf_entry ring_buf_poll(struct ring_buf* rb)
{
struct ring_buf_entry ret = rb->buf[__atomic_load_n(&(rb->head), __ATOMIC_SEQ_CST) & (RING_BUF_SIZE-1)];
__atomic_add_fetch(&(rb->head), 1, __ATOMIC_SEQ_CST);
return ret;
}
In device drivers, how can we tell what data is shared among processes and what is local to a process? The Linux Device Drivers book mentions
Any time that a hardware or software resource is shared beyond a single thread of execution, and the possibility exists that one thread could encounter an inconsistent view of that resource, you must explicitly manage access to that resource.
But what kinds of software resources can be shared among threads and what kinds of data cannot be shared? I know that global variables are generally considered as shared memory but what other kinds of things need to be protected?
For example, is the struct inode and struct file types passed in file operations like open, release, read, write, etc. considered to be shared?
In the open call inside main.c , why is dev (in the line dev = container_of(inode->i_cdev, struct scull_dev, cdev);) not protected with a lock if it points to a struct scull_dev entry in the global array scull_devices?
In scull_write, why isn't the line int quantum = dev->quantum, qset = dev->qset; locked with a semaphore since it's accessing a global variable?
/* In scull.h */
struct scull_qset {
void **data; /* pointer to an array of pointers which each point to a quantum buffer */
struct scull_qset *next;
};
struct scull_dev {
struct scull_qset *data; /* Pointer to first quantum set */
int quantum; /* the current quantum size */
int qset; /* the current array size */
unsigned long size; /* amount of data stored here */
unsigned int access_key; /* used by sculluid and scullpriv */
struct semaphore sem; /* mutual exclusion semaphore */
struct cdev cdev; /* Char device structure */
};
/* In main.c */
struct scull_dev *scull_devices; /* allocated in scull_init_module */
int scull_major = SCULL_MAJOR;
int scull_minor = 0;
int scull_nr_devs = SCULL_NR_DEVS;
int scull_quantum = SCULL_QUANTUM;
int scull_qset = SCULL_QSET;
ssize_t scull_write(struct file *filp, const char __user *buf, size_t count,
loff_t *f_pos)
{
struct scull_dev *dev = filp->private_data; /* flip->private_data assigned in scull_open */
struct scull_qset *dptr;
int quantum = dev->quantum, qset = dev->qset;
int itemsize = quantum * qset;
int item; /* item in linked list */
int s_pos; /* position in qset data array */
int q_pos; /* position in quantum */
int rest;
ssize_t retval = -ENOMEM; /* value used in "goto out" statements */
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
/* find listitem, qset index and offset in the quantum */
item = (long)*f_pos / itemsize;
rest = (long)*f_pos % itemsize;
s_pos = rest / quantum;
q_pos = rest % quantum;
/* follow the list up to the right position */
dptr = scull_follow(dev, item);
if (dptr == NULL)
goto out;
if (!dptr->data) {
dptr->data = kmalloc(qset * sizeof(char *), GFP_KERNEL);
if (!dptr->data)
goto out;
memset(dptr->data, 0, qset * sizeof(char *));
}
if (!dptr->data[s_pos]) {
dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL);
if (!dptr->data[s_pos])
goto out;
}
/* write only up to the end of this quantum */
if (count > quantum - q_pos)
count = quantum - q_pos;
if (copy_from_user(dptr->data[s_pos]+q_pos, buf, count)) {
retval = -EFAULT;
goto out;
}
*f_pos += count;
retval = count;
/* update the size */
if (dev->size < *f_pos)
dev->size = *f_pos;
out:
up(&dev->sem);
return retval;
}
int scull_open(struct inode *inode, struct file *filp)
{
struct scull_dev *dev; /* device information */
/* Question: Why was the lock not placed here? */
dev = container_of(inode->i_cdev, struct scull_dev, cdev);
filp->private_data = dev; /* for other methods */
/* now trim to 0 the length of the device if open was write-only */
if ( (filp->f_flags & O_ACCMODE) == O_WRONLY) {
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
scull_trim(dev); /* ignore errors */
up(&dev->sem);
}
return 0; /* success */
}
int scull_init_module(void)
{
int result, i;
dev_t dev = 0;
/* assigns major and minor numbers (left out for brevity) */
/*
* allocate the devices -- we can't have them static, as the number
* can be specified at load time
*/
scull_devices = kmalloc(scull_nr_devs * sizeof(struct scull_dev), GFP_KERNEL);
if (!scull_devices) {
result = -ENOMEM;
goto fail; /* isn't this redundant? */
}
memset(scull_devices, 0, scull_nr_devs * sizeof(struct scull_dev));
/* Initialize each device. */
for (i = 0; i < scull_nr_devs; i++) {
scull_devices[i].quantum = scull_quantum;
scull_devices[i].qset = scull_qset;
init_MUTEX(&scull_devices[i].sem);
scull_setup_cdev(&scull_devices[i], i);
}
/* some other stuff (left out for brevity) */
return 0; /* succeed */
fail:
scull_cleanup_module(); /* left out for brevity */
return result;
}
/*
* Set up the char_dev structure for this device.
*/
static void scull_setup_cdev(struct scull_dev *dev, int index)
{
int err, devno = MKDEV(scull_major, scull_minor + index);
cdev_init(&dev->cdev, &scull_fops);
dev->cdev.owner = THIS_MODULE;
dev->cdev.ops = &scull_fops; /* isn't this redundant? */
err = cdev_add (&dev->cdev, devno, 1);
/* Fail gracefully if need be */
if (err)
printk(KERN_NOTICE "Error %d adding scull%d", err, index);
}
All data in memory can be considered a "shared resource" if both threads are able to access it*. The only resource they wouldn't be shared between processors is the data in the registers, which is abstracted away in C.
There are two reasons that you would not practically consider two resources to be shared (even though they do not actually mean that two threads could not theoretically access them, some nightmarish code could sometimes bypass these).
Only one thread can/does access it. Clearly if only one thread accesses a variable then there can be no race conditions. This is the reason local variables and single threaded programs do not need locking mechanisms.
The value is constant. You can't get different results based on order of access if the value can never change.
The program you have shown here is incomplete, so it is hard to say, but each of the variables accessed without locking must meet one of the criteria for this program to be thread safe.
There are some non-obvious ways to meet the criteria, such as if a variable is constant or limited to one thread only in a specific context.
You gave two examples of lines that were not locked. For the first line.
dev = container_of(inode->i_cdev, struct scull_dev, cdev);
This line does not actually access any variables, it just computes where the struct containing cdev would be. There can be no race conditions because nobody else has access to your pointers (though they have access to what they point to), they are only accessible within the function (this is not true of what they point to). This meets criteria (1).
The other example is
int quantum = dev->quantum, qset = dev->qset;
This one is a bit harder to say without context, but my best guess is that it is assumed that dev->quantum and dev->qset will never change during the function call. This seems supported by the fact that they are only called in scull_init_module which should only be called once at the very beginning. I believe this fits criteria (2).
Which brings up another way that you might change a shared variable without locking, if you know that other threads will not try to access it until you are done for some other reason (eg they are not extant yet)
In short, all memory is shared, but sometimes you can get away with acting like its not.
*There are embedded systems where each processor has some amount of RAM that only it could use, but this is not the typical case.
I am working on an embedded platform which does not have debugging features. So it is hard to say what is the error source.
I have defined in header file:
typedef struct cm_packet {
CM_Header Header; //header of packet 3 bytes
uint8_t *Data; //packet data 64 bytes
CM_Footer Footer; //footer of packet 3 bytes
} CM_Packet;
typedef struct cm_inittypedef{
uint8_t DeviceId;
CM_Packet Packet;
} CM_InitTypeDef;
extern CM_InitTypeDef cmHandler;
void CM_Init(CM_InitTypeDef *handler);
CM_AppendResult CM_AppendData(CM_InitTypeDef *handler, uint8_t identifier
, uint8_t *data, uint8_t length);
And somewhere in implementation I have:
uint8_t bufferIndex = 0;
void CM_Init(CM_InitTypeDef *cm_initer) { //init a handler
cmHandler.DeviceId = cm_initer->DeviceId;
CM_Packet cmPacket;
cmPacket.Header.DeviceId = cm_initer->DeviceId;
cmPacket.Header.PacketStart = CM_START;
cmPacket.Footer.PacketEnd = CM_END;
//initialize data array
uint8_t emptyBuffer[CM_MAX_DATA_SIZE] = {0x00};
cmPacket.Data = emptyBuffer;
cm_initer->Packet = cmPacket;
}
CM_AppendResult CM_AppendData(CM_InitTypeDef *handler, uint8_t identifier
, uint8_t *data, uint8_t length){
//some check to see if new data does not make Data overflow
uint8_t i;
/*** ERROR HAPPENS HERE!!!! ***/
handler->Packet.Data[bufferIndex++] = identifier;
//now add the data itself
for(i = 0; i < length; i++) {
handler->Packet.Data[bufferIndex++] = data[i];
}
//reset indexer
if(bufferIndex > 64) {
PacketReady(); //mark packet as ready
bufferIndex = 0
};
//return result
}
The idea is to update the Packet.Data from some other source codes which have access to the handler. For example some other sources can call that Append function to change Packet.Data. But as you see in the code, I have commented the place which causes the micro-controller to go in hard fault mode. I am not sure what is happening here. All I know is exactly at that line micro goes into hard fault mode and never recovers!
This might be a race condition but before anything else I want to know I have written correct c !!! code then I try to rule out other problems.
In function CM_Init, you are setting cmPacket.Data to point to a local array:
uint8_t emptyBuffer[CM_MAX_DATA_SIZE] = {0x00};
cmPacket.Data = emptyBuffer;
Accessing this memory address outside the scope of the function yields undefined behavior.
As #barak manos mentioned, the buffer supplied to Data is allocated on the stack.
When you get to CM_AppendData, you are writing over memory that is no longer dedicated to the buffer.
You may want to use malloc so that the buffer is allocated on the heap instead of on the stack. Just remember to call free so that you are not leaking memory.
If you can't use dynamic allocation, it's possible to dedicate some scratch memory for all the Data uses. It just needs to be static.
Hope that helps :)
Scatter-gather - readv()/writev()/preadv()/pwritev() - reads/writes a variable number of iovec structs in a single system call. Basically it reads/write each buffer sequentially from the 0th iovec to the Nth. However according to the documentation it can also return less on the readv/writev calls than was requested. I was wondering if there is a standard/best practice/elegant way to handle that situation.
If we are just handling a bunch of character buffers or similar this isn't a big deal. But one of the niceties is using scatter-gather for structs and/or discrete variables as the individual iovec items. How do you handle the situation where the readv/writev only reads/writes a portion of a struct or half of a long or something like that.
Below is some contrived code of what I am getting at:
int fd;
struct iovec iov[3];
long aLong = 74775767;
int aInt = 949;
char aBuff[100]; //filled from where ever
ssize_t bytesWritten = 0;
ssize_t bytesToWrite = 0;
iov[0].iov_base = &aLong;
iov[0].iov_len = sizeof(aLong);
bytesToWrite += iov[0].iov_len;
iov[1].iov_base = &aInt;
iov[1].iov_len = sizeof(aInt);
bytesToWrite += iov[1].iov_len;
iov[2].iov_base = &aBuff;
iov[2].iov_len = sizeof(aBuff);
bytesToWrite += iov[2].iov_len;
bytesWritten = writev(fd, iov, 3);
if (bytesWritten == -1)
{
//handle error
}
if (bytesWritten < bytesToWrite)
//how to gracefully continue?.........
Use a loop like the following to advance the partially-processed iov:
for (;;) {
written = writev(fd, iov+cur, count-cur);
if (written < 0) goto error;
while (cur < count && written >= iov[cur].iov_len)
written -= iov[cur++].iov_len;
if (cur == count) break;
iov[cur].iov_base = (char *)iov[cur].iov_base + written;
iov[cur].iov_len -= written;
}
Note that if you don't check for cur < count you will read past the end of iov which might contain zero.
AFAICS the vectored read/write functions work the same wrt short reads/writes as the normal ones. That is, you get back the number of bytes read/written, but this might well point into the middle of a struct, just like with read()/write(). There is no guarantee that the possible "interruption points" (for lack of a better term) coincide with the vector boundaries. So unfortunately the vectored IO functions offer no more help for dealing with short reads/writes than the normal IO functions. In fact, it's more complicated since you need to map the byte count into an IO vector element and offset within the element.
Also note that the idea of using vectored IO for individual structs or data items might not work that well; the max allowed value for the iovcnt argument (IOV_MAX) is usually quite small, something like 1024 or so. So if you data is contiguous in memory, just pass it as a single element rather than artificially splitting it up.
Vectored write will write all the data you have provided with one call to "writev" function. So byteswritten will be always be equal to total number of bytes provided as input. this is what my understanding is.
Please correct me if I am wrong
Imaging having a stack of protocols and some c/cpp code that neatly covers
sending on each layer. Each send function uses the layer below to add
another header until the whole message is eventually placed into a
continuous global buffer on layer 0:
void SendLayer2(void * payload, unsigned int payload_length)
{
Layer2Header header; /* eat some stack */
const int msg_length = sizeof(header) + payload_length;
char msg[msg_length]; /* and have some more */
memset(msg, 0, sizeof(msg));
header.whatever = 42;
memcpy(&buffer[sizeof(header)], payload, payload_length);
SendLayer1(buffer, msg_length);
}
void SendLayer1(void * payload, unsigned int payload_length)
{
Layer1Header header; /* eat some stack */
const int msg_length = sizeof(header) + payload_length;
char msg[msg_length]; /* and have some more */
memset(msg, 0, sizeof(msg));
header.whatever = 42;
memcpy(&buffer[sizeof(header)], payload, payload_length);
SendLayer0(buffer, msg_length);
}
Now the data is moved to some global variable and actually transferred:
char globalSendBuffer[MAX_MSG_SIZE];
void SendLayer0(void * payload, unsigned int payload_length)
{
// Some clever locking for the global buffer goes here
memcpy(globalSendBuffer, payload, payload_length);
SendDataViaCopper(globalSendBuffer, payload_length);
}
I'd like to reduce both the stack usage and the number of memcpy()s in this
code, so I imagine something like:
void SendLayer2(void * payload, unsigned int payload_length)
{
Layer2Header * header = GetHeader2Pointer();
header->whatever = 42;
void * buffer = GetPayload2Pointer();
memcpy(buffer, payload, payload_length);
...
}
My idea would be to have something at the bottom that would calculate the proper offsets for each layers header and the offset for the actual payload by continuously subtracting from MAX_MSG_SIZE and letting the upper layer code then fill in the global buffer directly from the end / right side.
Does this sound sensible? Are there alternative, perhaps more elegant approaches?
You may be interested in this article: "Network Buffers and Memory Management" by Alan Cox. Basically, you have the buffer and several pointers to different interesting parts of that buffer: protocol headers, data, ... Initially you reserve some space for headers by setting the data pointer to (buffer_start + max_headers_size), and each layer gets a pointer nearer to the start of the buffer.
I'm sure there must be a similar description somewhere for BSD's mbufs.
EDIT:
David Miller (Linux networking maintainer) has this article "How SKBs work"
This sounds like "Zero Copy." I'm no expert so search for that term and you'll find all sorts of references.