Circular buffer using pointers in C - c

I have a queue structure, that I attempted to implement using a circular buffer, which I am using in a networking application. I am looking for some guidance and feedback. First, let me present the relevant code.
typedef struct nwk_packet_type
{
uint8_t dest_address[NRF24_ADDR_LEN];
uint8_t data[32];
uint8_t data_len;
}nwk_packet_t;
/* The circular fifo on which outgoing packets are stored */
nwk_packet_t nwk_send_queue[NWK_QUEUE_SIZE];
nwk_packet_t* send_queue_in; /* pointer to queue head */
nwk_packet_t* send_queue_out; /* pointer to queue tail */
static nwk_packet_t* nwk_tx_pkt_allocate(void)
{
/* Make sure the send queue is not full */
if(send_queue_in == (send_queue_out - 1 + NWK_QUEUE_SIZE) % NWK_QUEUE_SIZE)
return 0;
/* return pointer to the next add and increment the tracker */
return send_queue_in++;//TODO: it's not just ++, it has to be modular by packet size
}
/* External facing function for application layer to send network data */
// simply adds the packet to the network queue if there is space
// returns an appropriate error code if anything goes wrong
uint8_t nwk_send(uint8_t* address, uint8_t* data, uint8_t len)
{
/* First check all the parameters */
if(!address)
return NWK_BAD_ADDRESS;
if(!data)
return NWK_BAD_DATA_PTR;
if(!len || len > 32)
return NWK_BAD_DATA_LEN;
//TODO: PROBABLY NEED TO START BLOCKING HERE
/* Allocate the packet on the queue */
nwk_packet_t* packet;
if(!( packet = nwk_tx_pkt_allocate() ))
return NWK_QUEUE_FULL;
/* Build the packet */
memcpy(packet->dest_address, address, NRF24_ADDR_LEN);
memcpy(packet->data, data, len);
packet->data_len = len;
//TODO: PROBABLY SAFE TO STOP BLOCKING HERE
return NWK_SUCCESS;
}
/* Only called during NWK_IDLE, pushes the next item on the send queue out to the chip's "MAC" layer over SPI */
void nwk_transmit_pkt(void)
{
nwk_packet_t tx_pkt = nwk_send_queue[send_queue_out];
nrf24_send(tx_pkt->data, tx_pkt->data_len);
}
/* The callback for transceiver interrupt when a sent packet is either completed or ran out of retries */
void nwk_tx_result_cb(bool completed)
{
if( (completed) && (nwk_tx_state == NWK_SENDING))
send_queue_out++;//TODO: it's not just ++, it has to be modular by packet size with in the buffer
}
Ok now for a quick explanation and then my questions. So the basic idea is that I've got this queue for data which is being sent onto the network. The function nwk_send() can be called from anywhere in application code, which by the wall will be a small pre-emptive task based operating system (FreeRTOS) and thus can happen from lots of places in the code and be interrupted by the OS tick interrupt.
Now since that function is modifying the pointers into the global queue, I know it needs to be blocking when it is doing that. Am I correct in my comments on the code about where I should be blocking (ie disabling interrupts)? Also would be smarter to make a mutex using a global boolean variable or something rather than just disabling interrupts?
Also, I think there's a second place I should be blocking when things are being taken off the queue, but I'm not sure where that is exactly. Is it in nwk_transmit_pkt() where I'm actually copying the data off the queue and into a local ram variable?
Final question, how do I achieve the modulus operation on my pointers within the arrays? I feel like it should look something like:
send_queue_in = ((send_queue_in + 1) % (NWK_QUEUE_SIZE*sizeof(nwk_packet_t))) + nwk_send_queue;
Any feedback is greatly appreciated, thank you.

About locking it will be best to use some existing mutex primitive from the OS you use. I am not familiar with FreeRTOS but it should have builtin primitives for locking between interrupt and user context.
For circular buffer you may use these:
check for empty queue
send_queue_in == send_queue_out
check for full queue
(send_queue_in + 1) % NWK_QUEUE_SIZE == send_queue_out
push element [pseudo code]
if (queue is full)
return error;
queue[send_queue_in] = new element;
send_queue_in = (send_queue_in + 1) % NWK_QUEUE_SIZE;
pop element [pseudo code]
if (queue is empty)
return error;
element = queue[send_queue_out];
send_queue_out = (send_queue_out + 1) % NWK_QUEUE_SIZE;
It looks that you copy and do not just reference the packet data before sending. This means that you can hold the lock until the copy is done.

Without an overall driver framework to develop with, and when communicating with interrupt-state on a uC, you need to be very careful.
You cannot use OS synchro primitives to communicate to interrupt state. Attmpting to do so will certainly crash your OS because interrupt-handlers cannot block.
Copying the actual bulk data should be avoided.
On an 8-bit uC, I suggest queueing an index onto a buffer array pool, where the number of buffers is <256. That means that only one byte needs to be queued up and so, with an appropriate queue class that stores the value before updating internal byte-size indexes, it is possible to safely communicate buffers into a tx handler without excessive interrupt-disabling.
Access to the pool array should be thread-safe and 'insertion/deletion' should be quick - I have 'succ/pred' byte-fields in each buffer struct, so forming a double-linked list, access protected by a mutex. As well as I/O, I use this pool of buffers for all inter-thread comms.
For tx, get a buffer struct from teh pool, fill with data, push the index onto a tx queue, disable interrupts for only long enough to determine whether the tx interrupt needs 'primimg'. If priming is required, shove in a FIFO-full of data before re-enabling interrupts.
When the tx interrupt-handler has sent the buffer, it can push the 'used' index back onto a 'scavenge' queue and signal a semaphore to make a handler thread run. This thread can then take the entry from the scavenge queue and return it to the pool.
This scheme only works if interrupt-handlers do not re-enable higher-priority interrupts using the same buffering scheme.

Related

Using pthread to have one thread do all disk writing that other threads send it?

I am right now trying to create a program where multiple threads are querying for data that needs to be processed and then written to disk. Currently I am using pragma and pragma critical in order to ensure that the data is being written to as intended.
This is quite costly though as threads are having to wait for one another. I read that it should be possible to have a single thread handle all write to disks for you while the others can focus on getting the incoming data and parsing it. How would I go about doing this?
The program is an XDP-based packet parser than only stores particular information regarding each packet. The code is based upon this project code here: https://github.com/xdp-project/xdp-tutorial/blob/master/tracing04-xdp-tcpdump/xdp_sample_pkts_user.c
static int print_bpf_output(void *data, int size)
{
struct {
__u16 cookie;
__u16 pkt_len;
__u8 pkt_data[SAMPLE_SIZE];
} __packed *e = data;
struct pcap_pkthdr h = {
.caplen = SAMPLE_SIZE,
.len = e->pkt_len,
};
struct timespec ts;
int i, err;
if (e->cookie != 0xdead) {
printf("BUG cookie %x sized %d\n",
e->cookie, size);
return LIBBPF_PERF_EVENT_ERROR;
}
err = clock_gettime(CLOCK_MONOTONIC, &ts);
if (err < 0) {
printf("Error with gettimeofday! (%i)\n", err);
return LIBBPF_PERF_EVENT_ERROR;
}
h.ts.tv_sec = ts.tv_sec;
h.ts.tv_usec = ts.tv_nsec / NANOSECS_PER_USEC;
if (verbose) {
printf("pkt len: %-5d bytes. hdr: ", e->pkt_len);
for (i = 0; i < e->pkt_len; i++)
printf("%02x ", e->pkt_data[i]);
printf("\n");
}
pcap_dump((u_char *) pdumper, &h, e->pkt_data);
pcap_pkts++;
return LIBBPF_PERF_EVENT_CONT;
}
This function would be called by numerous threads, and I want the pcap_dump calls to be executed by a single, different thread.
Yes,that is a common way to avoid delays where the disk is fast enough to handle the average data rate, but where occasional data peaks, disk cache writes, directory updates and other such cause intermittent data loss.
You need a producer-consumer queue. Such a class or code/struct, using condvars or semaphores,is easily found on SO or elsewhere on the net. The queue only needs to queue up pointers.
Don't use a wide queue to queue up the bulk data. As soon as it is read from [wherever], read it in to a malloced buffer/struct that has the data, path, command and anything else that the write thread might need to perform the write. Queue up the struct pointer to the write thread. In the write thread, loop round the P-C queue pop, get the pointers, do the write, (or whatever is commanded by the struct command field), and,if no error, free the struct. If there is some problem, you could load an error message into some field of the struct and queue it off again to some error-logging thread, store it in a queue to try again later, whatever you want, really.
This way, you insulate the rest of your app from those unavoidable, occasional disk delays. That is very important with high-latency disks, eg. those on a network. It also makes housekeeping operations much easier, for instance, some hour timer could queue up a struct whose command field instructs the thread to open a new file with a date-time stamp in the filename, so making it easier to track the data later without wading through one, massive, file:) Such operations, without the queue and write thread, would surely inflict a massive delay to your app:(

How receive data with HAL_UART?

I'm learning about the STM32. I'm want receive data by UART byte-to-byte with interruption.
HAL_UART_Receive_IT(&huart1, buffer, length)
Where &huart1 is my uart gate, buffer is the input storage and length is the amount of input bytes. I use the following function to read data
static requestRead(void *buffer, uint16_t length)
{
uint8_t teste;
while (HAL_UART_Receive_IT(&huart1, buffer, length) != HAL_OK) osDelay(1);
//HAL_UART_RxCpltCallback
}
I store my data in:
void StartDefaultTask(void const *argument)
{
char sender[] = "Alaska Sending\n";
uint8_t receive[10];
uint8_t data[30];
for (;;)
{
uint8_t i = 0;
memset(data, 0, 30);
requestRead(&receive, 1);
data[i++] = receive;
while (data != '\r')
{
requestRead(&receive, 1);
data[i++] = receive;
}
//HAL_UART_Transmit(&huart1, data, i, HAL_MAX_DELAY);
}
/* USER CODE END StartDefaultTask */
}
My problem is the value receive and store. When I send by serial a string of character as Welcome to Alaska\n, only W is read and stored, then I need send again the buffer and again just store only W. How solve this?
Well, there are a few issues here.
Arrays and their contents
data[i++] = receive;
stores the address of the receive buffer, a memory pointer value, into the data array. That's certainly not what you want. As this is a very basic C programming paradigm, I'd recommend reviewing the chapter on arrays and pointers in a good C textbook.
What you send and what you expect
while (data != '\r')
Even if you'd get the array address and its value right (see above), you are sending a string terminated with '\n', and check for a '\r' character, so change one or the other to get a match.
Missing volatile
uint8_t receive[10];
The receive buffer should be declared volatile, as it would be accessed by an interrupt handler. Otherwise the main program would miss writes to the buffer even if it had checked whether the receiving is complete (see below).
Working with hardware in realtime
while (HAL_UART_Receive_IT(&huart1, buffer, length) != HAL_OK) osDelay(1);
This would enable the UART receive (and error handling) interrupt to receive one byte. That's fine so far, but the function returns before receiving the byte, and as it's called again immediately, it would return HAL_BUSY the second time, and wait a millisecond before attempting it again. In that millisecond, it would miss most of the rest of the transmission, as bytes are arriving faster than that, and your program does nothing about it.
Moreover, the main program does not check when the receive is complete, possibly accessing the buffer before the interrupt handler places a value in it.
If you receive data using interrupts, you'd have to do something about that data in the interrupt handler. (If you don't use interrupts, but polling for data, then be sure that you'd meet the deadline imposed by the speed of the interface).
HAL is not suited for this kind of tasks
HAL has no interface for receiving an unknown length of data terminated by a specific value. Of course the main program can poll the receiver one byte at a time, but then it must ensure that the polling occurs faster than the data comes in. In other words, the program can do very little else while expecting a transmission.
There are some workarounds, but I won't even hint at them now, because they would lead to deadlocks in an RTOS environment which tend to occur at the most inconvenient times, and are quite hard to investigate and to properly avoid.
Write your interrupt handler instead
(all of this is detailed in the Reference Manual for your controller)
set the UART interrupt priority NVIC_SetPriority()
enable the interrupt with NVIC_EnableIRQ()
set the USART_CR1_RXNEIE bit to 1
in the interrupt handler,
read the SR register
if the RXNE bit is set,
read the data from the data register
store it in the buffer
set a global flag if it matches the terminating character
switch to another buffer if more data is expected while the program processes the first line.
Don't forget declaring declaring all variables touched by the interrupt handler as volatile.

Issue with TFT lcd screen speed

I am using TFT LCD screen (ILI9163c - 160*128). It is connected with athros AR9331 module with spi. Athros AR9331 is running with OpenWRT linux distribution. So, I am driving my LCD with spidev0.1. While filling screen or writing any string on LCD, it is taking too much time to print. So, what can i do to get sufficient printing speed.
Thanks.
This is the function i'm using to write data on spi pin using spidev...
void spi_transactor(unsigned char *write_data, int mode,int size)
{
int ret;
struct spi_ioc_transfer xfer[4];
unsigned char *init_reg;
init_reg = (unsigned char*) malloc(size);
memcpy(init_reg,write_data,size);
if (mode)
{
gpio_set_value(_rs, 1); // DATA
}
else
{
gpio_set_value(_rs, 0); // COMMAND
}
memset(xfer, 0, sizeof xfer);
xfer[0].bits_per_word = 8;
xfer[0].tx_buf = (unsigned long)init_reg;
xfer[0].rx_buf = 0; //( unsigned long ) &buf_rx[0];
xfer[0].len = size; //wlength + rlength;
xfer[0].delay_usecs = 0;
xfer[0].speed_hz = speedx; // 8MHZ
//xfer[0].speed_hz = 160000000; // 40MHZ
ret = ioctl(spi_fd, SPI_IOC_MESSAGE(1), &xfer);
gpio_set_value(_rs, 1);
}
The main performance issue here is that you make a hard copy of the data to send on the heap, every time the function is called. You also set up the communication parameters from scratch each time, even though they are always the same. To make things worse, the function has a massive bug: it leaks memory as if there's no tomorrow.
The hard copies aren't really necessary unless the SPI communication takes too much time for the program to sit and busy-wait on it to finish (rather likely). What you can do in that case is this:
Outsource the whole SPI business to a separate thread.
Create a work queue for the thread, using your favourite ADT for such. It should be a thread-safe FIFO.
Data is copied into the ADT as hard copies, by the caller.
The thread picks one chunk of work from the ADT and transmits it from there, without making yet another hard copy.
The thread waits for the SPI communcation to finish, then makes sure that the ADT deletes the data, before grabbing the next one. For hard real-time requirements, you can have the thread prepare the next message in advance while waiting for the previous one.
The communication parameters "xfer" are set up once by the thread, it just changes the data destination address from case to case.

Racy behavior in low-latency interrupt-based transmit code

Suppose you have some data transmitting peripheral, like a UART, that signals an interrupt whenever it's ready to transmit more data. We're sending data from a circular buffer, where tail is where the data is removed from, head is where you add data, and tail == head means that there's no more data to transmit.
Let's also assume that the peripheral has no buffering whatsoever, and you can't pass it the next value to send while it's busy sending the current one. If you need a concrete, if made-up, example, think of a shift register attached directly to a CPU's parallel I/O port.
To keep transmitter as busy as possible, you might wish to transmit as soon as the transmit interrupt handler is entered. When there's no data to transmit, the interrupt is masked out and the handler will not be invoked even though the interrupt has been armed. The system starts in with the interrupt masked out.
I'll use C to illustrate things, although the issue is not C-specific. The interrupt handler, and the buffer, are set up as follows:
char buf[...];
char * head = buf; ///< write pointer
char * tail = buf; ///< read pointer
char * const first = buf; ///< first byte of the buffer
char * const last = buf+sizeof(buf)-1; ///< last byte of the buffer
/// Sends one byte out. The interrupt handler will be invoked as soon
/// as another byte can be sent.
void transmit(char);
void handler() {
transmit(*tail);
if (tail == last)
tail = first;
else
tail++;
if (tail == head)
mask_interrupt();
}
So far, so good. Now let's see how one might implement putch(). We can invoke putch() in bursts much faster than the device is able to send the data out. Let's assume that the caller knows not to overflow the buffer.
void putch(char c) {
*head = c;
if (head == last)
head = first;
else
head++;
/***/
unmask_interrupt();
}
Suppose now that these things happen:
The transmitter was busy, and when putch was called, there a byte was being sent.
The transmission happens to finish when putch is in the spot marked /***/ above. The handler() happens to execute right there.
The handler() happens to send the last byte of the data in the buffer - the byte that we have just loaded in preceding lines in putch().
The handler masks the interrupt, as there's no more data to send, but putch incorrectly unmasks it right after handler() returns. Thus the handler will have another go through the buffer, and will send a buffer's worth of stale data until tail equals head again.
My questions is: Is the only fix to increase the latency and check for empty buffer before sending in the handler? The fixed code looks as follows:
void fixed_handler() {
if (head == tail) {
mask_interrupt();
arm_interrupt(); // so that next time we unmask it, we get invoked
return;
}
transmit(*tail);
if (tail == last)
tail = first;
else
tail++;
}
This fix adds some latency, and also adds an extra operation (arm_interrupt) that's executed once when there's no more data to send.
For possible other approaches, feel free to assume the existence of at least the following operations:
/// Is the interrupt armed and will the handler fire once unmasked?
bool is_armed();
/// Is the interrupt unmasked?
bool is_unmasked();
I've always done this with double-buffering, so that at any point in time the program and the UART are "owning" different buffers.
When the UART finishes sending its buffer, a swap can happen, with interrupts masked.
That way, it doesn't have to mask interrupts on every character.
One fix would be to prevent the interrupt handler from running within putch:
void putch(char c) {
*head = c;
mask_interrupt();
if (head == last)
head = first;
else
head++;
unmask_interrupt();
}
This lets us use the original transmit-first interrupt handler. The problem with that is that overall it increases the number of operations performed per byte sent. It also increases the peak latency as well, since there are now times when handler() simply won't run even though the hardware is ready for more data and there's data to be sent.
The average latency in getting the transmitter busy again is determined by the interrupt handler. The peak latency on top of that is determined by the code that delays the execution of the interrupt handler.

Looking for the right ring buffer implementation in C

I am looking for a ring buffer implementation (or pseudocode) in C with the following characteristics:
multiple producer single consumer pattern (MPSC)
consumer blocks on empty
producers block on full
lock-free (I expect high contention)
So far I've been working only with SPSC buffers - one per producer - but I would like to avoid the continuous spinning of the consumer to check for new data over all its input buffers (and maybe to get rid of some marshaling threads in my system).
I develop for Linux on Intel machines.
See liblfds, a lock-free MPMC ringbuffer. It won't block at all—lock-free data structures don't tend to do this, because the point of being lock-free is to avoid blocking; you need to handle this, when the data structure comes back to you with a NULL—returns NULL if you try to read on empty, but doesn't match your requirement when writing on full; here, it will throw away the oldest element and give you that for your write.
However, it would only take a small modification to obtain that behaviour.
But there may be a better solution. The tricky part of a ringbuffer is when full getting the oldest previous element and re-using that. You don't need this. I think you could take the SPSC memory-barrier only circular buffer and rewrite it using atomic operations. That will be a lot more performant that the MPMC ringbuffer in liblfds (which is a combination of a queue and a stack).
I think I have what you are looking for. It is a lock free ring buffer implementation that blocks producer/consumer. You only need access to atomic primitives - in this example I will use gcc's sync functions.
It has a known bug - if you overflow the buffer by more than 100% it is not guaranteed that the queue remains FIFO (it will still process them all eventually).
This implementation relies on reading/writing the buffer elements as being an atomic operation (which is pretty much guaranteed for pointers)
struct ringBuffer
{
void** buffer;
uint64_t writePosition;
size_t size;
sem_t* semaphore;
}
//create the ring buffer
struct ringBuffer* buf = calloc(1, sizeof(struct ringBuffer));
buf->buffer = calloc(bufferSize, sizeof(void*));
buf->size = bufferSize;
buf->semaphore = malloc(sizeof(sem_t));
sem_init(buf->semaphore, 0, 0);
//producer
void addToBuffer(void* newValue, struct ringBuffer* buf)
{
uint64_t writepos = __sync_fetch_and_add(&buf->writePosition, 1) % buf->size;
//spin lock until buffer space available
while(!__sync_bool_compare_and_swap(&(buf->buffer[writePosition]), NULL, newValue));
sem_post(buf->semaphore);
}
//consumer
void processBuffer(struct ringBuffer* buf)
{
uint64_t readPos = 0;
while(1)
{
sem_wait(buf->semaphore);
//process buf->buffer[readPos % buf->size]
buf->buffer[readPos % buf->size] = NULL;
readPos++;
}
}

Resources