This is actually part of a project I'm working on using an avr. I'm interfacing via twi with a DS1307 real-time clock IC. It reports information back as a series of 8 chars. It returns in the format:
// Second : ds1307[0]
// Minute : ds1307[1]
// Hour : ds1307[2]
// Day : ds1307[3]
// Date : ds1307[4]
// Month : ds1307[5]
// Year : ds1307[6]
What I would like to do is take each part of the time and read it bit by bit. I can't think of a way to do this. Basically lighting up an led if the bit is a 1, but not if it's a 0.
I'd imagine that there is a rather simple way to do it by bitshifting, but I can't put my finger on the logic to do it.
Checking whether the bit N is set can be done with a simple expression like:
(bitmap & (0x1 << N)) != 0
where bitmap is the integer value (e.g. 64 bit in your case) containing the bits.
Finding the seconds:
(bitmap & 0xFF)
Finding the minute:
(bitmap & 0xFF00) >> 8
Finding the hour:
(bitmap & 0xFF0000) >> 16
If I'm interpreting you correctly, the following iterates over all the bits from lowest to highest. That is, the 8 bits of Seconds, followed by the 8 bits of Minutes, etc.
unsigned char i, j;
for (i = 0; i < sizeof(ds1307); i++)
{
unsigned char value = ds1307[i]; // seconds, minutes, hours etc
for (j = 0; j < 8; j++)
{
if (value & 0x01)
{
// bit is 1
}
else
{
// bit is 0
}
value >>= 1;
}
}
Yes - you can use >> to shift the bits right by one, and & 1 to obtain the value of the least significant bit:
unsigned char ds1307[7];
int i, j;
for (i = 0; i < 7; i++)
for (j = 0; j < 8; j++)
printf("byte %d, bit %d = %u\n", i, j, (ds1307[i] >> j) & 1U);
(This will examine the bits from least to most significant. By the way, your example array only has 7 bytes, not 8...)
essentially, if the 6 LEDs to show the seconds in binary format are connected to PORTA2-PORTA7, you can PORTA = ds1307[0] to have the seconds automatically lit up correctly.
Related
I am working on a project in which i want to convert a given video input stream into block sections (so it can be used by a hardware codec). This is project is run on an STM32 microcontroller running a 200Mhz clock.
The received input is a YCbCr 4:2:2 progressive stream, which basically means the input stream looks like this for every row:
Size: 32 bit word 32 bit word 32 bit word ...
Component: Cr Y1 Cb Y0 Cr Y1 Cb Y0 Cr Y1 Cb Y0 ...
Bits: 8 8 8 8 8 8 8 8 8 8 8 8 ...
This stream needs to be converted into a block format used by a hardware codec. The codec accepts a byte array in a specific order. Currently i am doing this using a nested loop for every 1/8 of an image frame using lookup tables and writing into an empty array:
Defines:
#define ROWS_PER_MCU 8
#define WORDS_PER_MCU 8
#define HORIZONTAL_MCU_PER_INPUTBUFFER 40
#define VERTICAL_MCU_PER_INPUTBUFFER 8
Global variables are declared like this:
typedef struct jpegInputbufferLUT
{
uint8_t JPEG_Y_MCU_LUT[256];
uint8_t JPEG_Cb_MCU_422_LUT[256];
uint8_t JPEG_Cr_MCU_422_LUT[256];
}jpegIndexLUT;
jpegIndexLUT jpegInputLUT;
uint8_t jpegInBuffer[81920];
uint32_t rawBuffer[20480];
Look up tables are created like this:
void JPEG_Init_MCU_LUT(void)
{
uint32_t offset;
/*Y LUT */
for(uint32_t i = 0; i < 16; i++)
{
for(j = 0; j < 16; j++)
{
offset = j + (i*8);
if((j>=8) && (i>=8)) offset+= 120;
else if((j>=8) && (i<8)) offset+= 56;
else if((j<8) && (i>=8)) offset+= 64;
jpegInputLUT.JPEG_Y_MCU_LUT[i*16 + j] = offset;
}
}
/*Cb Cr LUT*/
for(uint32_t i = 0; i < 16; i++)
{
for(j = 0; j < 16; j++)
{
offset = i*16 + j;
jpegInputLUT.JPEG_Cb_MCU_422_LUT[offset] = (j/2) + (i*8) + 128;
jpegInputLUT.JPEG_Cr_MCU_422_LUT[offset] = (j/2) + (i*8) + 192;
}
}
}
Conversion code:
/* Initialize variables for array conversion */
uint32_t currentMCU = 0;
uint32_t lutOffset = 0;
uint32_t inputOffset = 0;
uint32_t verticalOffset = 0;
/* Convert X rows into MCU blocks for JPEG encoding */
for(uint8_t k = 0; k < VERTICAL_MCU_PER_INPUTBUFFER; k++)
{
for(uint8_t n = 0; n < HORIZONTAL_MCU_PER_INPUTBUFFER; n++)
{
inputOffset = verticalOffset + (n * 8);
lutOffset = 0;
for(uint8_t i = 0; i < ROWS_PER_MCU; i++)
{
for(uint8_t j = 0; j < WORDS_PER_MCU; j++)
{
/* Mask 32 bit according to DCMI input format */
uint32_t rawBufferAddress = inputOffset+j; // Calculate rawBuffer address here so it only has to be calculated once
jpegInBuffer[jpegInputLUT.JPEG_Y_MCU_LUT[lutOffset] + currentMCU] = (rawBuffer[rawBufferAddress] & 0x7F);
jpegInBuffer[jpegInputLUT.JPEG_Cb_MCU_422_LUT[lutOffset] + currentMCU] = ((rawBuffer[rawBufferAddress] >> 7) & 0x7F);
jpegInBuffer[jpegInputLUT.JPEG_Cr_MCU_422_LUT[lutOffset] + currentMCU] = ((rawBuffer[rawBufferAddress] >> 23) & 0x7F);
jpegInBuffer[jpegInputLUT.JPEG_Y_MCU_LUT[lutOffset+1] + currentMCU] = ((rawBuffer[rawBufferAddress] >> 16) & 0x7F);
lutOffset+=2;
}
inputOffset += 320;
}
currentMCU += 256;
}
verticalOffset += 2240;
}
This conversion is currently taking me about 8 ms, and this needs to be done 8 times. This is currently taking up almost all of my available execution time, since i am trying to get 15 fps out of my system.
Is it in any way possible to speed this up? I was thinking maybe sorting the input array instead of just writing into a new buffer, but would swapping 2 elements in an array have a faster execution time than copying values into another array?
Would love to hear your ideas/thoughts on this,
Thanks in advance!
Your program seems to run slower than expected from an STM32. You may need to look into what assembly is produced, compiler optimization settings, if MCU frequency is correct, if memory is too slow, etc. We don't have enough information to give a definite answer why. Your code seems to spend 8 ms * 200M / (8*8*8*40) = 78 cycles for each inner loop iteration. For reference, an stm32f723 only needs about 15 cycles, and an stm32f103 about 28 cycles (the code was adjusted to access smaller arrays in the latter case).
The LUT table is not needed as its content is very regular. Reading LUT values adds more memory reads, which may be a significant contribution. If I got your LUT generation code correctly, it produces the following numbers in the inner loop:
Y1 Cb Cr Y2
0 128 192 1
2 129 193 3
4 130 194 5
6 131 195 7
64 132 196 65
66 133 197 67
68 134 198 69
70 135 199 71
8 136 200 9
etc
The second and third columns are just consecutive numbers. The fourth column equals the first one plus one. And the first number needs a bit flip. You can try the following code (please check that it is correct):
uint32_t lutOffset = 0;
for(uint8_t i = 0; i < ROWS_PER_MCU; i++)
{
for(uint8_t j = 0; j < WORDS_PER_MCU; j++)
{
uint32_t rawBufferAddress = (inputOffset+j) /* % 2048 */;
#if 0
unsigned y_lut1 = jpegInputLUT.JPEG_Y_MCU_LUT[lutOffset];
unsigned Cb_lut = jpegInputLUT.JPEG_Cb_MCU_422_LUT[lutOffset];
unsigned Cr_lut = jpegInputLUT.JPEG_Cr_MCU_422_LUT[lutOffset];
unsigned y_lut2 = jpegInputLUT.JPEG_Y_MCU_LUT[lutOffset+1];
#else
unsigned y_lut1 = lutOffset | (j / 4) << 6 | (j % 4) << 1;
unsigned Cb_lut = 128 + lutOffset + j;
unsigned Cr_lut = 192 + lutOffset + j;
unsigned y_lut2 = y_lut1 + 1;
#endif
jpegInBuffer[y_lut1 + currentMCU] = (rawBuffer[rawBufferAddress] & 0x7F);
jpegInBuffer[Cb_lut + currentMCU] = ((rawBuffer[rawBufferAddress] >> 7) & 0x7F);
jpegInBuffer[Cr_lut + currentMCU] = ((rawBuffer[rawBufferAddress] >> 23) & 0x7F);
jpegInBuffer[y_lut2 + currentMCU] = ((rawBuffer[rawBufferAddress] >> 16) & 0x7F);
}
lutOffset += 8;
inputOffset += 320;
}
This version takes about 20 cycles per iteration on my stm32f103, which is less than 6 ms even at its 72 MHz.
UPD. Another option is using one small lookup table instead of bit computations:
static const unsigned x[8] = { 0, 2, 4, 6, 64, 66, 68, 70 };
// unsigned y_lut1 = lutOffset | (j / 4) << 6 | (j % 4) << 1;
unsigned y_lut1 = lutOffset + x[j];
This improves the inner loop timing to 18 (f103) / 7.5 (f723) cycles. For some reason, optimizing this expression for F723 does not work well. I would expect these options to give identical result since the inner loop is unrolled, but who knows.
As an additional optimization, probably not necessary, the output values can be combined into 32-bit words and written one word a time. This seems possible because LUT values come in blocks of four consecutive ones. For this, the inner loop can be converted to a nested loop of 2 by 4 iterations. Each 4 iterations of the innermost loop will produce one uint32_t for Cb, one uint32_t for Cr and two uint32_t for Y. But is not worth doing.
I measure run time with SysTick:
SysTick->LOAD = SysTick_LOAD_RELOAD_Msk;
SysTick->VAL = 0;
SysTick->CTRL = SysTick_CTRL_CLKSOURCE_Msk | SysTick_CTRL_ENABLE_Msk;
volatile unsigned t0 = SysTick->VAL;
f();
volatile unsigned t1 = t0 - SysTick->VAL;
I used output pins sometimes too, when connecting a debugger is not practical. Strictly speaking, both methods are not guaranteed to work because the compiler may move code across measurement points, but it has worked as intended for me (with gcc). Assembly inspection is needed to make sure that nothing fishy is going on.
There are any number of micro optimisations that could be performed here that could provide an improvement. Some may exhibit an improvement in debug build without compiler optimisation, only to have no advantage with optimisation. It is possible even that some "clever" trick that is faster in debug if non-idiomatic could cause the optimiser to generate worse code that it might had you favoured clarity over performance.
All the obvious micro-optimisations such as loop unrolling the compiler optimiser will likely be able to perform for you without complicating the code or risking introducing errors.
One rather obvious improvement (regardless of whether or not it is faster), would be to change:
for( uint8_t j = 0; j < WORDS_PER_MCU; j++ )
{
/* Mask 32 bit according to DCMI input format */
uint32_t rawBufferAddress = inputOffset+j; // Calculate rawBuffer address here so it only has to be calculated once
...
to:
uint32_t rawBufferAddress = inputOffset ;
for( uint8_t j = 0; j < WORDS_PER_MCU; rawBufferAddress++, j++)
{
/* Mask 32 bit according to DCMI input format */
...
Your "only has to be calculated once" is actually WORDS_PER_MCU calculations, and an increment is likely to be faster than and addition and assignment. At worst it will be no different.
I would similarly suggest moving all the other "end of loop increments such as lutOffset+=2 into the respective for third expression also. Not for performance, but for clarity.
I'm wondering if someone know effective approach to calculate bits in specified position along array?
Assuming that OP wants to count active bits
size_t countbits(uint8_t *array, int pos, size_t size)
{
uint8_t mask = 1 << pos;
uint32_t result = 0;
while(size--)
{
result += *array++ & mask;
}
return result >> pos;
}
You can just loop the array values and test for the bits with a bitwise and operator, like so:
int arr[] = {1,2,3,4,5};
// 1 - 001
// 2 - 010
// 3 - 011
// 4 - 100
// 5 - 101
int i, bitcount = 0;
for (i = 0; i < 5; ++i){
if (arr[i] & (1 << 2)){ //testing and counting the 3rd bit
bitcount++;
}
}
printf("%d", bitcount); //2
Note that i opted for 1 << 2 which tests for the 3rd bit from the right or the third least significant bit just to be easier to show. Now bitCount would now hold 2 which are the number of 3rd bits set to 1.
Take a look at the result in Ideone
In your case you would need to check for the 5th bit which can be represented as:
1 << 4
0x10000
16
And the 8th bit:
1 << 7
0x10000000
256
So adjusting this to your bits would give you:
int i, bitcount8 = 0, bitcount5 = 0;
for (i = 0; i < your_array_size_here; ++i){
if (arr[i] & 0x10000000){
bitcount8++;
}
if (arr[i] & 0x10000){
bitcount5++;
}
}
If you need to count many of them, then this solution isn't great and you'd be better off creating an array of bit counts, and calculating them with another for loop:
int i, j, bitcounts[8] = {0};
for (i = 0; i < your_array_size_here; ++i){
for (j = 0; j < 8; ++j){
//j will be catching each bit with the increasing shift lefts
if (arr[i] & (1 << j)){
bitcounts[j]++;
}
}
}
And in this case you would access the bit counts by their index:
printf("%d", bitcounts[2]); //2
Check this solution in Ideone as well
Let the bit position difference (e.g. 7 - 4 in this case) be diff.
If 2diff > n, then code can add both bits at the same time.
void count(const uint8_t *Array, size_t n, int *bit7sum, int *bit4sum) {
unsigned sum = 0;
unsigned mask = 0x90;
while (n > 0) {
n--;
sum += Array[n] & mask;
}
*bit7sum = sum >> 7;
*bit4sum = (sum >> 4) & 0x07;
}
If the processor has a fast multiply and n is still not too large, like n < pow(2,14) in this case. (Or n < pow(2,8) in the general case)
void count2(const uint8_t *Array, size_t n, int *bit7sum, int *bit4sum) {
// assume 32 bit or wider unsigned
unsigned sum = 0;
unsigned mask1 = 0x90;
unsigned m = 1 + (1u << 11); // to move bit 7 to the bit 18 place
unsigned mask2 = (1u << 18) | (1u << 4);
while (n > 0) {
n--;
sum += ((Array[n] & mask1)*m) & mask2;
}
*bit7sum = sum >> 18;
*bit4sum = ((1u << 18) - 1) & sum) >> 4);
}
Algorithm: code is using a mask, multiply, mask to separate the 2 bits. The lower bit remains in it low position while the upper bit is shifted to the upper bits. Then a parallel add occurs.
The loop avoids any branching aside from the loop itself. This can make for fast code. YMMV.
With even larger n, break it down into multiple calls to count2()
I am writing a program in C where I am comparing two bytes of data, and then seeing if the bytes are different, and if so, at which bits.
This is what I have so far:
int var1 = 81; //Binary: 0101 0001
int var2 = 193; //Binary: 1100 0001
int diff = var1 ^ var2; //diff = 1001 0000 / 144
Basically I know how to use the XOR bitwise operator to see which bits are different between the two variables, but from here I don't know how to use diff to figure out which bits are the differences. For example, in my above code I'd want to use diff to output "Bit 5 and Bit 8 are different".
You can use a for loop to get that idea and make bitwise AND with 1 properly left shifted to get the set bits positions
for(size_t i = 0; i < sizeof(int)*8; i++){
if( diff & (1U << i))
printf("%zu is different\n",i+1);
}
Far easier to start with unsigned types when doing bit manipulations.
As #coderredoc inquired about solutions across various platforms, even uncommon ones:
Using int:
When int diff is negative, conversion to an unsigned (via masking with an unsigned) may change its bit pattern.
An int may have more than 8 bits per "byte". Diminishes correctness of sizeof(int)*8.
Various integer types may have padding (rare). Diminishes correctness of sizeof(int)*CHAR_BIT.
// OP wants to report first bit index as 1. 0 is more common.
#define BIT_REPORT_OFFSET 0
int bit_position = 0;
int mask;
do {
mask = 1 << bit_position;
if (diff & mask) {
printf("Bit %d\n", bit_position + BIT_REPORT_OFFSET);
}
bit_position++;
} while (mask < INT_MAX/2);
if (diff < 0) {
printf("Bit %d\n", bit_position + BIT_REPORT_OFFSET);
}
For maximum portability, avoid changing types, changing the value of diff and use constants from <limits.h> rather than compute them.
use unsigned int instead of int; then you can use
for (unsigned int pos = 0; diff; ++pos) {
if (diff & 1)
printf("difference in pos %u\n", pos);
diff >>= 1;
}
or
while (diff) {
int pos = ffs(diff);
printf("difference in pos %d\n", pos);
diff &= ~(1u << pos);
}
To get the different bits position, lets say you have 4 byte integer
for(int bit_index = sizeof(diff) - 1; bit_index >= 0;bit_index-- ) {
if((diff >> bit_index & 1) == 1 ){ /* if particular bit is 1, that bit_index value you can use */
printf("[%d] bit is different or 1 \n",bit_index);
}
As part of a larger problem, I have to take some binary value: 00000000 11011110 (8)
Then, I have to:
Derive the bit count in this function - so I've done that by finding the place of the most sig fig.
Then store the first 6 numbers of this value into the value 128, such that it equals: 10011110
Then store the last 5 numbers of this value into the value 192, such that it equals: 11000011 10011110
The two bytes should be stored in some array, buffer[]
I have written this function however, position does not appear to initialise properly in gdb and the values are not outputting correctly. This is my attempt:
void create_value(unsigned short init_val, unsigned char buffer[])
{
// get the count
int position = 0;
while (init_val >>= 1)
position++;
// get total
int count = position++;
int start = 128;
for (int i = 0; i < 7; i++)
if (((1 << i) & init_val) != 0) start = start | 1 << i;
buffer[0] = start;
start = 192;
for (int i = 7; i < 11; i++) {
if (((1 << i) & init_val) !=0) start = start | 1 << i;
}
buf[1] = start;
}
After
while (init_val >>= 1)
position++;
init_val will be 0. When you later use
if (((1 << i) & init_val) != 0) start = start | 1 << i;
you will never change start.
So, after reading through what you're trying to do (which is pretty confusingly described), why don't you:
void create_value(unsigned short init_value, unsigned char buffer[])
{
buffer[0] = (init_value & 63) | 128;
buffer[1] = ((init_value >> 6) & 31) | 192;
return;
}
What this does: init_value & 63 masks off all but the lowest 6 bits in init_value, as you wanted. The | 128 then sets the most significant bit of the byte (IFF CHAR_BIT == 8, mind you).
(init_value >> 6) shifts init_value down by 6 bits, so now the original bits 6-11 are bits 0-4. & 31 masks off all bit the lowest 5 bits in this value, | 192 sets the two most significant bits.
I've been researching for an Arduino project I will be starting soon and I came across some relevant code (Un-commented...) but I can't seem to decipher how the most important part works! The way the code should work is there are 4 bytes in question (yaw, pitch, throttle, and trim (all dec values)) and each bit in each byte corresponds to a sequence of LED flashes encoded in the sendZero() and sendOne() commands. Here is the code in question:
void sendCommand() {
byte b;
sendHeader();
for (int i=0; i<=7; i++) {
b = (yaw & (1 << i)) >> i;
if (b > 0) sendOne(); else sendZero();
}
for (int i=0; i<=7; i++) {
b = (pitch & (1 << i)) >> i;
if (b > 0) sendOne(); else sendZero();
}
for (int i=0; i<=7; i++) {
b = (throttle & (1 << i)) >> i;
if (b > 0) sendOne(); else sendZero();
}
for (int i=0; i<=7; i++) {
b = (trim & (1 << i)) >> i;
if (b > 0) sendOne(); else sendZero();
}
}
The part that gets me is the inside of each for-loop as I have no clue what's going on with those bitwise operations. My guess is they are somehow converting the dec value into binary and then iterating through it, registering zeros or ones accordingly? It's this:
b = (x & (1 << i)) >> i;
where I can't seem to see what's going on or why. Any help would be appreciated.
You are checking the each bit of x (0 to 7) in the code by doing
b = (x & (1 << i)) >> i;
Say for example: You want to display 0 on the led. the seven segment code for displaying 0 on led is 0X3F (The variable x has the value 0X3F). (For more on seven segment display).
You are checking the each bit of the 0X3F and the variable 'b' has the value whether the bit i is 0 or 1.
`0X3F` is
0 1 1 1 1 1 1 1
3 F
For example:
int main()
{
int i, x, b;
x = 0x3F;
for(i = 0; i< 7; i++)
{
b = (x & (1 << i)) >> i;
printf("%d ", b); // I am printing the value of b here
}
getchar();
return 0;
}
Will print
1 1 1 1 1 1 0
when you say
b = (yaw & (1 << i)) >> i;
if (b > 0) sendOne(); else sendZero();
the variable yaw is having the seven segment code for the number you want to display on LED and you are checking each bit of yaw. If the bit is 1 you are calling sendOne() function, which might be sending a high voltage to the LED and which will light up the corresponding LED in the 7 segment display. If the bit is 0, you are calling sendZero() to send a low voltage.
Here you can notice that you are trying to light up the individual LEDs in the 7 Segment. The above program will be so fast that, you will see all the LEDs (who's bits are 1 as per the 7 segment code)lighted up.
Those are left shifts << and right shifts >> respectively.
A left shift is equivalent to multiplication by some power of two.
a << 1; // a * 2.
A right shift is division,
a >> 1; // a / 2.
The constant one is indicates 21 (or the value 2).
To expand. each x & (1 << i) is creating a mask of the desired bit and testing it against x
x & 00000001
x & 00000010
etc..
x & 01000000
x & 10000000
and then shifting the bit of interest down to the LSB so that it can be tested with b = (result of mask and input) >> i; as to transmit either a one or zero. Where the for loop walks the desired bit across the byte.
Note: this later part is not really needed as it will be greater then zero, regardless of it being shifted to the 1's bit.
since you are looking at helicopter code. I would like to point out mine, as I have decoded several common 3.5ch.
Library
and Demo INO
it is a bit cleaner in that bit structures assemble the message and a union shifts it all out, regardless of the different formats.