Producer-consumer algorithm to use full buffer - c

I was reading Galvin OS book about producer consumer problem and came through this piece of code.
Global definitions
#define BUFFER_SIZE 10
typedef struct {
. . .
} item;
int in = 0;
int out = 0;
Producer
while (((in + 1) % BUFFER_SIZE) == out)
; /* do nothing */
buffer[in] = next_produced;
in = (in + 1) % BUFFER_SIZE ;
Consumer
while (in == out)
; /* do nothing */
next_consumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
Now this is what Galvin book says:
This scheme allows at most BUFFER_SIZE − 1 items in the buffer at the
same time. We leave it as an exercise for you to provide a solution in which
BUFFER_SIZE items can be in the buffer at the same time.
This is what I came up with. Is this correct?
Producer
buffer[in] = next_produced; //JUST MOVED THIS LINE!
while (((in + 1) % BUFFER_SIZE ) == out)
; /* do nothing */
in = (in + 1) % BUFFER_SIZE;
Consumer
while (in == out)
; /* do nothing */
next_consumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
I think this solves, but is this correct? Any other better solution possible?

In the original piece of code, when in == out it could mean the buffer is empty OR full. So to avoid such an ambiguity, the original code do not allow buffer to full, always leaving at least one empty item.
I am not sure you are solving this problem with your change: you will be able to put BUFFER_SIZE items, but you will not be able to consume them. So, literally you solved it, but it will not function properly.
Basically, to solve this problem, you should have an extra piece of information, so you can distinct between an empty buffer and full. There are a variety solutions for that, the most obvious is to add an extra flag.
The most elegant IMO is to use in and out counters as is, wrapping them only to access the buffer, so:
when in == out -- the buffer is empty
when abs(in - out) == BUFFER_SIZE -- the buffer is full
to access the buffer we should use buffer[in % BUFFER_SIZE] or buffer[out % BUFFER_SIZE]
We leave it as an exercise for you to provide a complete solution ;)

Related

Efficiently find a sequence within a buffer

So I have a buffer that I am filling with a frame that has a maximum of 1200 bytes and is variably sized. I know the frame is complete when I get a tail sequence that is always the same and doesn't occur otherwise. So I am trying to find how to most efficiently detect that tail sequence. This is embedded so ideally the fewer function calls and data structures I use the better.
Here is what I have thus far:
//I am reading off of a circular buffer so this is checking that I still have unread bytes
while (cbuf_last_written_index != cbuf_last_read_index) {
buffer[frame_size] = circular_buffer[cbuf_last_read_index];
//this function does exactly what it says and just maintains circular buffer correctness
increment_cbuf_read_index_count();
frame_size++;
//TODO need to make this more efficient
int i;
uint8_t sync_test_array[TAIL_SYNC_LENGTH] = {0};
//this just makes sure I have enough in the frame to even bother checking the tail seq
if (frame_size > TAIL_SYNC_LENGTH) {
for (i = 0; i < TAIL_SYNC_LENGTH; i++) {
//sets the test array equal to the last TAIL_SYNC_LENGTH elements the buffer
sync_test_array[i] = buffer[(frame_size - TAIL_SYNC_LENGTH) + i];
}
if (sync_test_array == tail_sequence_array) {
//I will toggle a pin here to notify that the frame is complete
//get out of the while loop because the following bytes are part of the next frame
break;
}
}
//end efficiency needed area
}
So basically for each new byte that is added to the frame I am checking the last x bytes (will probably actually be ~8) to see if they are the tail sequence. Can you think of a better way to do this?
Implement it as a state machine.
If your tail sequence is 1,2,5, the psuedo code would be:
switch(current_state) {
IDLE: next_state = ONE_SEEN if new_byte == 1 else next-state = IDLE
ONE_SEEN: next_state = TWO_SEEN if new_byte == 2 else next_state = IDLE
TWO_SEEN: next_state = TERMINATE if new_byte == 5 else next_state = IDLE
}

c - Avoid if in loop

Context
Debian 64.
Core 2 duo.
Fiddling with a loop. I came with different variations of the same loop but I would like to avoid conditional branching if possible.
But, even if I think it will be difficult to beat.
I thought about SSE or bit shifting but still, it would require a jump (look at the computed goto below). Spoiler : a computed jump doesn't seems to be the way to go.
The code is compiled without PGO. Because on this piece of code, it makes the code slower..
flags :
gcc -march=native -O3 -std=c11 test_comp.c
Unrolling the loop didn't help here..
63 in ascii is '?'.
The printf is here to force the code to execute. Nothing more.
My need :
A logic to avoid the condition. I assume this as a challenge to make my holydays :)
The code :
Test with the sentence. The character '?' is guaranteed to be there but at a random position.
hjkjhqsjhdjshnbcvvyzayuazeioufdhkjbvcxmlkdqijebdvyxjgqddsyduge?iorfe
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char **argv){
/* This is quite slow. Average actually.
Executes in 369,041 cycles here (cachegrind) */
for (int x = 0; x < 100; ++x){
if (argv[1][x] == 63){
printf("%d\n",x);
break;
}
}
/* This is the slowest.
Executes in 370,385 cycles here (cachegrind) */
register unsigned int i = 0;
static void * restrict table[] = {&&keep,&&end};
keep:
++i;
goto *table[(argv[1][i-1] == 63)];
end:
printf("i = %d",i-1);
/* This is slower. Because of the calculation..
Executes in 369,109 cycles here (cachegrind) */
for (int x = 100; ; --x){
if (argv[1][100 - x ] == 63){printf("%d\n",100-x);break;}
}
return 0;
}
Question
Is there a way to make it faster, avoiding the branch maybe ?
The branch miss is huge with 11.3% (cachegrind with --branch-sim=yes).
I cannot think it is the best one can achieve.
If some of you manage assembly with enough talent, please come in.
Assuming you have a buffer of well know size being able to hold the maximum amount of chars to test against, like
char buffer[100];
make it one byte larger
char buffer[100 + 1];
then fill it with the sequence to test against
read(fileno(stdin), buffer, 100);
and put your test-char '?' at the very end
buffer[100] = '?';
This allows you for a loop with only one test condition:
size_t i = 0;
while ('?' != buffer[i])
{
++i;
}
if (100 == i)
{
/* test failed */
}
else
{
/* test passed for i */
}
All other optimisation leave to the compiler.
However I couldn't resist, so here's a possible approach to do micro optimisation
char buffer[100 + 1];
read(fileno(stdin), buffer, 100);
buffer[100] = '?';
char * p = buffer;
while ('?' != *p)
{
++p;
}
if ((p - buffer) == 100)
{
/* test failed */
}
else
{
/* test passed for (p - buffer) */
}

Optimal Memory Utilization in realloc (splitting?)

I'm having difficulty with coding my realloc function.
I have it working through standard memcpy procedure, but I can't get it optimized. I know there are two other cases I need to accommodate for: expanding the current block forward, and checking if the current sized block is large enough (and if too large, split it to free memory).
However, I can't seem to get it right. I always get errors. To clarify, these are not compile errors... these are heap integrity checks that fail through a trace driver. If I do it without splitting, I I run out of memory, and if I try to split, it says it "failed to preserve the original block/data."
Below is my normal memcpy code. The commented section in the middle is my attempt to expand, but I think I need to split because it's causing a ton of fragmentation. This is leading to me running out of memory and erroring out during (one) of the realloc tests. If I do it without the comment block, it works fine, but there is zero optimization.
My attempts to split always fail; commented code at the bottom is my attempt. What am I doing wrong here?
I would very much appreciate any assistance, thank you. :)
#define PACK(size, alloc) ((size) | (alloc))
#define GET_SIZE(p) (GET(p) & ~0x7)
#define GET_ALLOC(p) (GET(p) & 0x1)
#define HDRP(bp) ((char *)(bp) - WSIZE)
#define FTRP(bp) ((char *)(bp) + GET_SIZE(HDRP(bp)) - DSIZE)
#define NEXT_BLKP(bp) ((char *)(bp) + GET_SIZE(((char *)(bp) - WSIZE)))
void *mm_realloc(void *oldptr, size_t size)
{
void *newptr;
size_t copySize;
copySize = GET_SIZE(HDRP(oldptr));
size_t next_alloc = GET_ALLOC(HDRP(NEXT_BLKP(oldptr)));
// if (copySize > size) return oldptr;
/*if (!next_alloc) {
if ((GET_SIZE(HDRP(oldptr)) + GET_SIZE(HDRP(NEXT_BLKP(oldptr))))>size) {
copySize += GET_SIZE(HDRP(NEXT_BLKP(oldptr)));
PUT(HDRP(oldptr), PACK(copySize,1));
PUT(FTRP(oldptr), PACK(copySize,1));
return oldptr;
}
}*/
newptr = mm_malloc(size);
if (newptr == NULL)
return NULL;
if (size < copySize)
copySize = size;
memcpy(newptr, oldptr, copySize);
PUT(newptr,GET(oldptr));
mm_free(oldptr);
return newptr;
}
// int total_avail = (GET_SIZE(HDRP(oldptr)) + GET_SIZE(HDRP(NEXT_BLKP(oldptr))));
// copySize -= (total_avail - size);

Circular buffer implementation in C

I have found pseudo code on how to implement a circular buffer.
// Producer.
while (true) {
/* produce item v */
while ((in+1)%n == out)
/* Wait. */;
b[in] = v;
in = (in + 1) % n
}
// Consumer.
while (true) {
while (in == out)
/* Wait. */;
w = b[out];
out = (out + 1) % n;
/* Consume item w. */
}
What I don't understand is the "Consume item w." comment, because I think that with w = b[out]; we are consuming w, aren't we?
With
w = b[out];
You only grab a copy of the item to be consumed. With
out = (out + 1) % n;
You advance the index of the item to be consumed, thereby preventing it from being referenced again.
In a manner, multiple calls to w = b[out]; don't actually consume the buffer's slot, it just accesses it; while out = (out + 1) % n; prevents further access of that item. Preventing further access of the buffer item is the strongest definition of the term "consume the item" that I can think of.
these two lines are both part of consuming process:
w = b[out];
out = (out + 1) % n;
The first extract the value and the second increment the out index.
The comment refers to the previously two lines.
Yes, because then it's out of the buffer, which the following row says is empty.
Then we can process w.

Techniques for handling short reads/writes with scatter-gather?

Scatter-gather - readv()/writev()/preadv()/pwritev() - reads/writes a variable number of iovec structs in a single system call. Basically it reads/write each buffer sequentially from the 0th iovec to the Nth. However according to the documentation it can also return less on the readv/writev calls than was requested. I was wondering if there is a standard/best practice/elegant way to handle that situation.
If we are just handling a bunch of character buffers or similar this isn't a big deal. But one of the niceties is using scatter-gather for structs and/or discrete variables as the individual iovec items. How do you handle the situation where the readv/writev only reads/writes a portion of a struct or half of a long or something like that.
Below is some contrived code of what I am getting at:
int fd;
struct iovec iov[3];
long aLong = 74775767;
int aInt = 949;
char aBuff[100]; //filled from where ever
ssize_t bytesWritten = 0;
ssize_t bytesToWrite = 0;
iov[0].iov_base = &aLong;
iov[0].iov_len = sizeof(aLong);
bytesToWrite += iov[0].iov_len;
iov[1].iov_base = &aInt;
iov[1].iov_len = sizeof(aInt);
bytesToWrite += iov[1].iov_len;
iov[2].iov_base = &aBuff;
iov[2].iov_len = sizeof(aBuff);
bytesToWrite += iov[2].iov_len;
bytesWritten = writev(fd, iov, 3);
if (bytesWritten == -1)
{
//handle error
}
if (bytesWritten < bytesToWrite)
//how to gracefully continue?.........
Use a loop like the following to advance the partially-processed iov:
for (;;) {
written = writev(fd, iov+cur, count-cur);
if (written < 0) goto error;
while (cur < count && written >= iov[cur].iov_len)
written -= iov[cur++].iov_len;
if (cur == count) break;
iov[cur].iov_base = (char *)iov[cur].iov_base + written;
iov[cur].iov_len -= written;
}
Note that if you don't check for cur < count you will read past the end of iov which might contain zero.
AFAICS the vectored read/write functions work the same wrt short reads/writes as the normal ones. That is, you get back the number of bytes read/written, but this might well point into the middle of a struct, just like with read()/write(). There is no guarantee that the possible "interruption points" (for lack of a better term) coincide with the vector boundaries. So unfortunately the vectored IO functions offer no more help for dealing with short reads/writes than the normal IO functions. In fact, it's more complicated since you need to map the byte count into an IO vector element and offset within the element.
Also note that the idea of using vectored IO for individual structs or data items might not work that well; the max allowed value for the iovcnt argument (IOV_MAX) is usually quite small, something like 1024 or so. So if you data is contiguous in memory, just pass it as a single element rather than artificially splitting it up.
Vectored write will write all the data you have provided with one call to "writev" function. So byteswritten will be always be equal to total number of bytes provided as input. this is what my understanding is.
Please correct me if I am wrong

Resources