Optimizing a read statement - arrays

I have build a led cube using transistors, shift registers and an arduino nano (and it kinda works). I know shift registers may be a poor design choice but I have to work with what I got so please don't get stuck on that in your answers.
There is this piece of code:
bool input[32];
void resetLeds()
{
input[R1] = 0;
input[G1] = 0;
input[B1] = 0;
input[R2] = 0;
input[G2] = 0;
input[B2] = 0;
input[R3] = 0;
input[G3] = 0;
input[B3] = 0;
input[R4] = 0;
input[G4] = 0;
input[B4] = 0;
input[X1Z1] = 1;
input[X1Z2] = 1;
input[X1Z3] = 1;
input[X1Z4] = 1;
input[X2Z1] = 1;
input[X2Z2] = 1;
input[X2Z3] = 1;
input[X2Z4] = 1;
input[X3Z1] = 1;
input[X3Z2] = 1;
input[X3Z3] = 1;
input[X3Z4] = 1;
input[X4Z1] = 1;
input[X4Z2] = 1;
input[X4Z3] = 1;
input[X4Z4] = 1;
}
void loop()
{
T = micros();
for(int I = 0; I < 100; I++)
{
counter++;
if(counter >= 256 / DIVIDER) counter = 0;
for(int i = 0; i < 64; i += 4)
{
x = i / 16;
z = (i % 16) / 4;
resetLeds();
input[XZ[x][z]] = 0;
for(y = 0; y < 4; y++)
{
index = i + y;
if(counter < xyz[index][0]) input[Y[y][RED]] = 1;
if(counter < xyz[index][1]) input[Y[y][GREEN]] = 1;
if(counter < xyz[index][2]) input[Y[y][BLUE]] = 1;
}
PORTB = 0;
for(int j = 0; j < 32; j++)
{
bitWrite(OUT_PORT, 4, 0);
bitWrite(OUT_PORT, 3, input[j]);
PORTB = OUT_PORT;
bitWrite(OUT_PORT, 4, 1);
PORTB = OUT_PORT;
}
bitWrite(OUT_PORT, 0, 1);
PORTB = OUT_PORT;
}
}
T = micros() - T;
Serial.println(T / 100);
}
The runtime of a single iteration is reported to be 1274 microseconds, but I need it to be even lower to build a pwm function of sorts (manually turning a transistor on and off through a shift register). While optimizing I found this strange behavior I cannot explain. There is this line in the code:
bitWrite(OUT_PORT, 3, input[j]);
When I remove this line or change input[j] to 0 the runtime is halved. Apparently, an array lookup takes about 20 microseconds. But I find it very weird since I am indexing this array in more places in the code (when writing) and there it takes 23 microseconds for 28 writes.
Can somebody please explain to me what is going on and/or how to make this piece of code run faster? I guess you can do writes in a pipelined manner but a read stalls the code completely since you cannot continue before you receive the value from cache. But then again, I hardly doubt a read from cache should take 23 whole microseconds.
[EDIT 27/10/2020 16:55]
The part of the code that writes to the shiftregisters has a lot of bitWrites which are time consuming. Instead of writing the bits every time I implemented preconfigured options for the byte to write to PORTB:
PORTB = LATCH_LOW;
for(int j = 0; j < 32; j++)
{
if(input[j] == 0)
{
PORTB = CLOCK_OFF_DATA_0;
PORTB = CLOCK_ON_DATA_0;
}
else
{
PORTB = CLOCK_OFF_DATA_1;
PORTB = CLOCK_ON_DATA_1;
}
}
PORTB = LATCH_HIGH;
This cuts my running time roughly in half, which is kind of fast enough but I wonder I could get it to run even faster. When I remove everything from the loop except for the writing to the shift registers and I remove the input[j] read, I get a runtime of 200 microseconds. This means that if I could remove dependence on the input[j] read and compute its value inline I should be able to get at least another 2 times speed up. To achieve 8 bit PWM I calculated I need the running time to be 40 microseconds or less so I am going to stick with 16 (instead of 256) brightness levels for now to prevent flicker.
[EDIT 28/10/2020 19:38]
I went into the platform.txt and change the optimization flag to -Ofast. This got my iteration time down another 200 microseconds!

Related

Losing data bit when reading from GPIO by libgpiod on Linux

I am using Debian (8.3.0-6) on an embedded custom board and working on the dht11 sensor.
Briefly,
I need to read 40 bits from a GPIO pin and each bit can take a max 70 microseconds. When bit-level is high for max 28us or 70us, it means is logic 0 or 1, respectively. (So I have a timeout controller for each bit and if a bit takes more than 80us, I need to stop the process.).
In my situation, sometimes I can read all 40 bits correctly but sometimes I can't do it and the function of libgpiod gpiod_line_get_value(line); is missing the bit (my code is below). I am trying to figure out that why I cant read and lose a bit, what is the reason for that. But I haven't found a sensible answer yet. So I was wondering What am I missing out?, What is the proper way to GPIO programming?
Here is what I wanted to show you, How do I understand what I am missing a bit? Whenever I catch a bit, I am setting and resetting another GPIO pin at the rising and falling edge of a bit (So I can see which bit missing). Moreover, as far as I see I am always missing the two edges on one bit or one edge on two bits consecutively (rising and falling or falling and rising). In the first picture, you can see which bit I missed, the second is when I read all bits correctly.
Here it is my code:
//********************************************************* Start reading data bit by low level (50us) ***************************
for (int i = 0; i < DHT_DATA_BYTE_COUNT ; i++) //DHT_DATA_BYTE_COUNT = 5
{
for (int J = 7; J > -1; J--)
{
GPIO_SetOutPutPin(testPin); //gpiod_line_set_value(testPin, 1);
int ret;
start = micros();
do
{
ret = GPIO_IsInputPinSet(dht11pin);//gpiod_line_get_value(dht11pin);
delta = micros() - start;
if(ret == -1)
{
err_step.step = 9;
err_step.ret_val = -1;
return -1;
}
if(delta > DHT_START_BIT_TIMEOUT_US) //80us
{
err_step.step = 10;
err_step.ret_val = -2;
err_step.timestamp[is] = delta;
err_step.indx[is].i = i;
err_step.indx[is++].j = J;
GPIO_ResetOutPutPin(testPin);
return -2;
}
}while(ret == 0);
GPIO_ResetOutPutPin(testPin);
err_step.ret_val = 10;
GPIO_SetOutPutPin(testPin);
start = micros();
do
{
ret = GPIO_IsInputPinSet(dht11pin);
delta = micros() - start;
if(ret == -1)
{
err_step.step = 11;
err_step.ret_val = -1;
return -1;
}
if(delta > DHT_BEGIN_RESPONSE_TIMEOUT_US) //80us
{
err_step.step = 12;
err_step.ret_val = -2;
err_step.timestamp[is] = delta;
err_step.indx[is].i = i;
err_step.indx[is++].j = J;
return -2;
}
}while(ret == 1);
err_step.timestamp[is] = delta;
err_step.indx[is].i = i;
err_step.indx[is++].j = J;
GPIO_ResetOutPutPin(testPin);
err_step.ret_val = 10;
(delta > DHT_BIT_SET_DATA_DETECT_TIME_US) ? bitWrite(dht11_byte[i],J,1) : bitWrite(dht11_byte[i],J,0);
}
}

Using thread-private variables in OpenMP, for __m128i SSE2 variables?

Need help in multi-threading one supersimple yet supernifty etude!
It is given below, the commented 9 lines are the generic Longest Common SubString loop-in-loop implementation, while the fragment below is the branchless SSE2 counterpart. The etude works just fine as it is, but when trying to multi-thread it (tried several ways) - IT REPORTS randomly correct or incorrect results?!
#ifdef KamXMM
printf("Branchless 128bit Assembly struggling ...\n");
for(i=0; i < size_inLINESIXFOUR2; i++){
XMMclone = _mm_set1_epi8(workK2[i]);
//omp_set_num_threads(4);
#ifdef Commence_OpenMP
//#pragma omp parallel for shared(workK,PADDED32,Matrix_vectorCurr,Matrix_vectorPrev) private(j,ThreadID) // Sometimes reports correctly sometimes NOT?!
#endif
for(j=0; j < PADDED32; j+=(32/2)){
XMMprev = _mm_loadu_si128((__m128i*)(Matrix_vectorPrev+(j-1)));
XMMcurr = _mm_loadu_si128((__m128i*)&workK[j]);
XMMcmp = _mm_cmpeq_epi8(XMMcurr, XMMclone);
XMMand = _mm_and_si128(XMMprev, XMMcmp);
XMMsub = _mm_sub_epi8(XMMzero, XMMcmp);
XMMadd = _mm_add_epi8(XMMand, XMMsub);
_mm_storeu_si128((__m128i*)(Matrix_vectorCurr+j), XMMadd);
// This doesn't work, sometimes reports 24 sometimes 23, (for Carlos vs Japan):
//ThreadID=omp_get_thread_num();
//if (ThreadID==0) XMMmax0 = _mm_max_epu8(XMMmax0, XMMadd);
//if (ThreadID==1) XMMmax1 = _mm_max_epu8(XMMmax1, XMMadd);
//if (ThreadID==2) XMMmax2 = _mm_max_epu8(XMMmax2, XMMadd);
//if (ThreadID==3) XMMmax3 = _mm_max_epu8(XMMmax3, XMMadd);
{
XMMmax = _mm_max_epu8(XMMmax, XMMadd);
}
// if(workK[j] == workK2[i]){
// if (i==0 || j==0)
// *(Matrix_vectorCurr+j) = 1;
// else
// *(Matrix_vectorCurr+j) = *(Matrix_vectorPrev+(j-1)) + 1;
// if(max < *(Matrix_vectorCurr+j)) max = *(Matrix_vectorCurr+j);
// }
// else
// *(Matrix_vectorCurr+j) = 0;
}
// XMMmax = _mm_max_epu8(XMMmax, XMMmax0);
// XMMmax = _mm_max_epu8(XMMmax, XMMmax1);
// XMMmax = _mm_max_epu8(XMMmax, XMMmax2);
// XMMmax = _mm_max_epu8(XMMmax, XMMmax3);
_mm_storeu_si128((__m128i*)vector, XMMmax); // No need since it was last, yet...
for(k=0; k < 32/2; k++)
if ( max < vector[k] ) max = vector[k];
if (max >= 255) {printf("\nWARNING! LCSS >= 255 found, cannot house it within BYTE long cell! Exit.\n"); exit(13);}
printf("%s; Done %d%% \r", Auberge[Melnitchka++], (int)(((double)i*100/size_inLINESIXFOUR2)));
Melnitchka = Melnitchka & 3; // 0 1 2 3: 00 01 10 11
Matrix_vectorSWAP=Matrix_vectorCurr;
Matrix_vectorCurr=Matrix_vectorPrev;
Matrix_vectorPrev=Matrix_vectorSWAP;
}
#endif
My wish is to have it boosted to the extent it reaches for the memory bandwdith, on my laptop with i5-7200u it traverses the rows at 5GB/s, whereas the memcpy() is somewhere at 12GB/s.
My comprehension of OpenMP is superficial, I managed to multi-thread (with #pragma omp sections nowait) non-vector code, but vectors are problematic, how to tell the compiler that XMMmax has to be private?!

C, can't compare two buffers

I am working with some C code and I'm totally stuck in this function. It should compare two buffers with some deviator. For example if EEPROM_buffer[1] = 80, so TxBuffer values from 78 to 82 should be correct!
So the problem is that it always returns -1. I checked both buffers, data is correct and they should match, but won't. Program just runs while until reach i = 3 and returns -1..
I compile with atmel studio 6.1, atmel32A4U microcontroller..
int8_t CheckMatching(t_IrBuff * tx_buffer, t_IrBuff * tpool)
{
uint8_t i = 0;
uint16_t * TxBuffer = (uint16_t*) tx_buffer->data;
while((TxBuffer->state != Data_match) || (i != (SavedBuff_count))) // Data_match = 7;
{
uint16_t * EEPROM_buffer = (uint16_t*) tpool[i].data;
for(uint16_t j = 0; j < tpool[i].usedSize; j++) // tpool[i].usedSize = 67;
{
if(abs(TxBuffer[j] - EEPROM_buffer[j]) > 3)
{
i++;
continue;
}
}
i++;
TxBuffer->state = Data_match; // state value before Data_match equal 6!
}
tx_buffer->state = Buffer_empty;
if(i == (SavedBuff_count)) // SavedBuff_count = 3;
{
return -1;
}
return i;
}
Both your TxBuffer elements and EEPROM_buffer elements are uint16_t. When deducting 81 from 80 as uint16_t it would give 0xffff, with no chance of abs to help you. Do a typecast to int32_t and you will be better off.

Kiss FFT on a dsPIC33

I have been trying to get KissFFT to work on a dsPIC, however after trying various different ways, the output is not what it should be. I was hoping to get some help to see if there are any configurations that I may be overlooking or if its just somthing i haven't thought of?
I am using a dsPIC33EP256MC202 with the XC16 compiler within MPLABX.
Declarations and memory assignment.
int readings[3] = {0, 0, 0};
kiss_fft_scalar zero;
memset(&zero,0,sizeof(zero));
int size = 128 * 2;
float fin[256];
kiss_fft_cpx in[size];
kiss_fft_cpx out[size];
for (i = 0; i < size; i++) {
in[i].r = zero;
in[i].i = zero;
out[i].r = zero;
out[i].i = zero;
}
kiss_fft_cfg mycfg = kiss_fft_alloc(size*2 ,0 ,NULL,NULL);
Get readings from an accellerometer on the breadboard and populate the float array (using pythagoras to consolidate the 3 axis' into one signal). The input XYZ value are scaled down as they come in anywhere between -2400 and 2400 on average.
while(1)
{
if(iii <= 1){
UART_Write_Text("Collecting...");
}
getOutput(readings);
X = (double)readings[0];
Y = (double)readings[1];
Z = (double)readings[2];
X = X / 50;
Y = Y / 50;
Z = Z / 50;
if(ii <= 256){
fin[ii] = sqrt(X*X + Y*Y + Z*Z);
ii++;
}
else{
i=0;
while(i<255){
fin[i] = fin[i+1];
i++;
}
fin[255] = sqrt(X*X + Y*Y + Z*Z);
}
Once the float array is full of values, populate the real component of the input complex array with the values in the float array. Then perform the Kiss FFT and populate a float array (arrayDFTOUT) with the absolute value of each real and imaginary value of the out array of Kiss FFT, the final loop makes any negative value positive.
if(iii == 255){
iii = 0;
UART_Write_Text("Processing...");
for (i = 0; i < size; i++) {
// samples are type of short
in[i].r = fin[i];
in[i].i = zero;
out[i].r = zero;
out[i].i = zero;
}
kiss_fft(mycfg, in, out);
for(i=0;i<128;i++){
arrayDFTOUT[i] = sqrt((out[i].r*out[i].r) + (out[i].i*out[i].i));
}
arrayDFTOUT[0] = 1;
for(i = 0; i<128; i++){
if(arrayDFTOUT[i] < 0){
arrayDFTOUT[i] = arrayDFTOUT[i] - (arrayDFTOUT[i]*2);
}
}
Finally display the output values through serial using the UART on the breadboard.
for(i = 0; i < 128; i++){
sprintf(temp, "%f,", arrayDFTOUT[i]);
UART_Write_Text(temp);
}
And are the results. All zero's aparet from the first value that was set to 1 after KissFFT had been performed. Any ideas?

How Can I change objects name by a loop in C lang?

How Can I change objects name by a loop?
I want create a light effect like knight-rider's one. With a PIC
I thought instead of turning on and off manually to use a loop for change RB line number.
I want to change the last number of this Port line name: like RB01 RB02 like this
my code is like this
for(int i = 0; i>6 ; i++ ){
PORTB = 0X00;
RB+i = 1;
}
Are there any kind of method do something like this? thanks
Assuming RB01, RB02, etc are just convenient #defines for accessing the bits in PORTB, you can write the loop with bitwise arithmetic and not use RB0* at all.
for ( int i = 0; i != 6; ++ i ) {
PORTB = 1 << i; /* one light at a time */
/* or */
PORTB = ( 1 << i + 1 ) - 1; /* light all in sequence */
}
It's not very elegant, but one way is to do it like this:
PORTB = 0x00;
for (i = 0; i < 6; ++i)
{
RB00 = (i == 0);
RB01 = (i == 1);
RB02 = (i == 2);
RB03 = (i == 3);
RB04 = (i == 4);
RB05 = (i == 5);
// note: you probably want to put a delay in here, e.g. 200 ms
}
If you want to keep the previous LEDs on each time you turn on a new one then you can do that like this:
PORTB = 0x00;
for (i = 0; i < 6; ++i)
{
RB00 = (i >= 0);
RB01 = (i >= 1);
RB02 = (i >= 2);
RB03 = (i >= 3);
RB04 = (i >= 4);
RB05 = (i >= 5);
// note: you probably want to put a delay in here, e.g. 200 ms
}
No, there is no way to "generate" symbol names that way. You can use bit masks for manipulating the latch register of the port in question.
I would probably use a table:
struct portbits
{
sometype bit; // Not quite sure what "RB0..RB5" actually translate to.
};
struct portbits bits[] =
{
RB00,
RB01,
RB02,
RB03,
RB04,
RB05,
RB06,
RB07,
};
for(i = 0; i < 7; i++)
{
bits[i] = 1;
}

Resources