Detect ascii char in one byte in 32 or 64 bits - c

When I was looking to code a faster strlen in C (than the one which check byte by byte) I found this macro:
#define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080)
This macro reads 4 bytes and returns (1) when it finds at least one NUL byte.
Otherwise it returns (0).
I wonder if it's possible to use the same technique to find any char of the ascii table (I prefer not to use a byte by byte loop).
I tried a lot of combinations and the best I could do is this:
// in this example I wanted to find a '#'
int32_t detectsharp(int32_t c) {
c = ~(c - 0x24242424) & ~c;
return ((c - 0x01010101) & ~c & 0x80808080);
}
But it doesn't work with 0x22222222 ("""") or things like 0x24212121 ($!!!).

It works to detect any char if you previously xor it with your int.
#define DETECTCHAR(x,c) (DETECTNULL((x) ^ ((c)*0x01010101l) ))
The multiplication distributes the char in the 4 bytes of the int and the xor clears the byte where the char is present.

Related

C - Writing integers to binary file using only 3 bytes

I have a programm in C that writes a frequency table to a binary file.
The frequency table is an array filled with structs that contains an int and a char.
So I have to write an unsigned int counter and an unsigned char character to the file (multiple times).
I know that an integer normally uses 4 bytes however I know that the int counter can never be bigger than 2^24-1.
So I could use 4 bytes to write the counter and the character to the file => 3 bytes for counter and 1 byte for the character. This would also be easy to read.
Is there an easy way to do this in C without using special libraries?
Yes, there is a very easy way of doing it in C. You can combine a char, which is one byte on all platforms, with an int of up to 24 bits in size by shifting the char by 24 bits to the left:
uint32_t toWrite = (myChar << 24) | myCount;
When you read the data back, perform the opposite operation:
uint32_t fromFile;
uint32_t myCount = fromFile & 0xFFFFFF;
char myChar = (fromFile >> 24) & 0xFF;

Read a single bit from a buffer of char

I would to implement a function like this:
int read_single_bit(unsigned char* buffer, unsigned int index)
where index is the offset of the bit that I would want to read.
How do I use bit shifting or masking to achieve this?
You might want to split this into three separate tasks:
Determining which char contains the bit that you're looking for.
Determining the bit offset into that char that you need to read.
Actually selecting that bit out of that char.
I'll leave parts (1) and (2) as exercises, since they're not too bad. For part (3), one trick you might find useful would be to do a bitwise AND between the byte in question and a byte with a single 1 bit at the index that you want. For example, suppose you want to get the fourth bit out of a byte. You could then do something like this:
Byte: 11011100
Mask: 00001000
----------------
AND: 00001000
So think about the following: how would you generate the mask that you need given that you know the bit index? And how would you convert the AND result back to a single bit?
Good luck!
buffer[index/8] & (1u<<(index%8))
should do it (that is, view buffer as a bit array and test the bit at index).
Similarly:
buffer[index/8] |= (1u<<(index%8))
should set the index-th bit.
Or you could store a table of the eight shift states of 1 and & against that
unsigned char bits[] = { 1u<<0, 1u<<1, 1u<<2, 1u<<3, 1u<<4, 1u<<5, 1u<<6, 1u<<7 };
If your compiler doesn't optimize those / and % to bit ops (more efficient), then:
unsigned_int / 8 == unsigned_int >> 3
unsigned_int % 8 == unsigned_int & 0x07 //0x07 == 0000 0111
so
buffer[index>>3] & (1u<<(index&0x07u)) //test
buffer[index>>3] |= (1u<<(index&0x07u)) //set
One possible implementation of your function might look like this:
int read_single_bit(unsigned char* buffer, unsigned int index)
{
unsigned char c = buffer[index / 8]; //getting the byte which contains the bit
unsigned int bit_position = index % 8; //getting the position of that bit within the byte
return ((c >> (7 - bit_position)) & 1);
//shifting that byte to the right with (7 - bit_position) will move the bit whose value you want to know at "the end" of the byte.
//then, by doing bitwise AND with the new byte and 1 (whose binary representation is 00000001) will yield 1 or 0, depending on the value of the bit you need.
}

What do these macros do?

I have inherited some heavily obfuscated and poorly written PIC code to modify. There are two macros here:
#define TopByteInt(v) (*(((unsigned char *)(&v)+1)))
#define BottomByteInt(v) (*((unsigned char *)(&v)))
Is anyone able to explain what on earth they do and what that means please?
Thanks :)
They access a 16-bit integer variable one byte at a time, allowing access to the most significant and least significant byte halves. Little-endian byte order is assumed.
Usage would be like this:
uint16_t v = 0xcafe;
const uint8_t v_high = TopByteInt(&v);
const uint8_t v_low = BottomByteInt(&v);
The above would result in v_high being 0xca and v_low being 0xfe.
It's rather scary code, it would be cleaner to just do this arithmetically:
#define TopByteInt(v) (((v) >> 8) & 0xff)
#define BottomByteInt(v) ((v) & 0xff)
(*((unsigned char *)(&v)))
It casts the v (a 16 bit integer) into a char (8 bits), doing this you get only the bottom byte.
(*(((unsigned char *)(&v)+1)))
This is the same but it gets the address of v and sum 1 byte, so it gets only the top byte.
It'll only work as expected if v is a 16 bits integer.
Ugg.
Assuming you are on a little-endian platform, that looks like it might meaningfully be recorded as
#define TopByteInt(v) (((v) >> 8) & 0xff)
#define BottomByteInt(v) ((v) & 0xff)
It is basically taking the variable v, and extracting the least significant byte (BottomByteInt) and the next more significant byte (TopByteInt) from that. 'TopByte' is a bit of a misnomer if v isn't a 16 bit value.

Creating bitflag variables with large amounts of flags or how to create large bit-width numbers

Lets say I have an enum with bitflag options larger than the amount of bits in a standard data type:
enum flag_t {
FLAG_1 = 0x1,
FLAG_2 = 0x2,
...
FLAG_130 = 0x400000000000000000000000000000000,
};
This is impossible for several reasons. Enums are max size of 128 bits (in C/gcc on my system from experimentation), single variables are also of max size 128 bits etc.
In C you can't perform bitwise operations on arrays, though in C++ I suppose you could overload bitwise operators to do the job with a loop.
Is there any way in C other than manually remembering which flags go where to have this work for large numbers?
This is exactly what bit-fields are for.
In C, it's possible to define the following data layout :
struct flag_t
{
unsigned int flag1 : 1;
unsigned int flag2 : 1;
unsigned int flag3 : 1;
(...)
unsigned int flag130 : 1;
(...)
unsigned int flag1204 : 1; // for fun
};
In this example, all flags occupy just one bit. An obvious advantage is the unlimited number of flags. Another great advantage is that you are no longer limited to single-bit flags, you could have some multi-value flags merged in the middle.
But most importantly, testing and attribution would be a bit different, and probably simplified, as far as unit operations are concerned : you no longer need to do any masking, just access the flag directly by naming it. And by the way, use the opportunity to give these flags more comprehensive names :)
Instead of trying to assign absurdly large numbers to an enum so you can have a hundreds-of-bits-wide bitfield, let the compiler assign a normal zero-based sequence of numbers to your flag names, and simulate a wide bitfield using an array of unsigned char. You can have a 1024-bit bitfield using unsigned char bits[128], and write get_flag() and set_flag() accessor functions to mask the minor amount of extra work involved.
However, a far better piece of advice would be to look at your design again, and ask yourself "Why do I need over a hundred different flags?". It seems to me that what you really need is a redesign.
In this answer to a question related to bitflags, Bit Manipulation and Flags, I provided an example of using an unsigned char array that is an approach for very large sets of bitflags which I am moving to this posting.
This source example provides the following:
a set of Preprocessor defines for the bitflag values
a set of Preprocessor macros to manipulate bits
a couple of functions to implement bitwise operations on the arrays
The general approach for this is as follows:
create a set of defines for the flags which specify an array offset and a bit pattern
create a typedef for an unsigned char array of the proper size
create a set of functions that implement the bitwise logical operations
The Specifics from the Answer with a Few Improvements and More Exposition
Use a set of C Preprocessor defines to create a set of bitflags to be used with the array. These bitflag defines specify an offset within the unsigned char array along with the bit to manipulate.
The defines in this example are 16 bit values in which the upper byte contains the array offset and the lower byte contains the bit flag(s) for the byte of the unsigned char array whose offset is in the upper byte. Using this technique you can have arrays up to 256 elements, 256 * 8 or 2,048 bitflags, or by going from a 16 bit define to a 32 bit long you could have much more. (In the comments below bit 0 means least significant bit of a byte and bit 7 means most significant bite of a byte).
#define ITEM_FLG_01 0x0001 // array offset 0, bit 0
#define ITEM_FLG_02 0x0002 // array offset 0, bit 1
#define ITEM_FLG_03 0x0101 // array offset 1, bit 0
#define ITEM_FLG_04 0x0102 // array offset 1, bit 1
#define ITEM_FLG_05 0x0201 // array offset 2, bit 0
#define ITEM_FLG_06 0x0202 // array offset 2, bit 1
#define ITEM_FLG_07 0x0301 // array offset 3, bit 0
#define ITEM_FLG_08 0x0302 // array offset 3, bit 1
#define ITEM_FLG_10 0x0908 // array offset 9, bit 7
Next you have a set of macros to set and unset the bits along with a typedef to make it a bit easier to use. Unfortunately using a typedef with C does not provide you better type checking from the compiler but it does make it easier to use. These macros do no checking of their arguments so you might feel safer using regular functions instead.
#define SET_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) |= (b) & 0xf)
#define TOG_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) ^= (b) & 0xf)
#define CLR_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) &= ~ ((b) & 0xf))
#define TST_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) & ((b) & 0xf))
typedef unsigned char BitSet[10];
An example of using this basic framework is as follows.
BitSet uchR = { 0 };
int bValue;
SET_BIT(uchR, ITEM_FLG_01);
bValue = TST_BIT(uchR, ITEM_FLG_01);
SET_BIT(uchR, ITEM_FLG_03);
TOG_BIT(uchR, ITEM_FLG_03);
TOG_BIT(uchR, ITEM_FLG_04);
CLR_BIT(uchR, ITEM_FLG_05);
CLR_BIT(uchR, ITEM_FLG_01);
Next you can introduce a set of utility functions to do some of the bitwise operations we want to support. These bitwise operations would be analogous to the built in C operators such as bitwise Or (|) or bitwise And (&). These functions use the built in C operators to perform the designated operator on all array elements.
These particular examples of the utility functions modify one of the sets of bitflags provided. However if that is a problem, you can modify the functions to accept three arguments, one being for the result of the operation and the other two for the two sets of bitflags to use in the operation.
void AndBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ &= *s2++;
}
}
void OrBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ |= *s2++;
}
}
void XorBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ ^= *s2++;
}
}
If you need more than one size of a bitflags type using this approach then the most flexible approach to eliminate the typedef and just use straight unsigned char arrays of various sizes. This change would entail modifying the interface of the utility functions replacing BitSet with unsigned char pointer and unsigned char arrays where bitflag variables are defined. Along with the unsigned char pointers, you would also need to specify a length for the arrays.
You may also consider an approach similar to what is being done for text strings in Is concatenating arbitrary number of strings with nested function calls in C undefined behavior?.

Look at the bytes/bits of a variable in C

How can I see the bytes/bits of a variable in C? In terms of binary, just zeros and ones.
My problem is that I want to test to see if any zeros exist in the most significant byte of variable x. Any help would be appreciated.
Use the logical AND operator &. For example:
char c = ....
if ( (c & 0xFF) == 0xFF) ... // test char c for zeroes
You may want to use shifts and macros to automate it, instead of using numeric constants, because for different types you'll need different values to test the MSB. You can get the value for shifts using sizeof.
// test MSB of an int for zeroes
int i = ...
if ( ( i & (0xFF << 8*(sizeof(int)-1))) == (0xFF<<8*(sizeof(int)-1))) ...
You can use following test
var & (1 << N)
To check if bit N is set in var. Most significant bit depends on the datatype of var.
Print the memory byte by byte, i.e. from 0 to sizeof(x) (if x happens to be your variable). Then, when printing each byte, print all eight bits individually.
if(x & 0x80) // assuming x is a byte(char type)
{
// msb is set
}

Resources