Circular buffer increment using alternate method - c

I am not able to understand how does the last statement increments the pointer.Can somebody explain me with few examples?

The code, as shown:
aptr = (aptr + 1) & (void *)(BUFFERSIZE - 1);
// |________| incremented here
Since it is a circular buffer AND the buffer size is a power of 2, then the & is an easy and fast way to roll over by simply masking. Assuming that the BUFFERSIZE is 256, then:
num & (256 - 1) == num % 256
num & (0x100 - 1) == num % 0x100
num & (0x0ff) == num % 0x100
When the number is not a power of 2, then you can't use the masking technique:
num & (257 - 1) != num % 257
num & (0x101 - 1) != num % 0x101
num & 0x100 != num % 0x101
The (void *) allows the compiler to choose an appropriate width for the BUFFERSIZE constant based on your pointer width... although it is generally best to know - and use! - the width before a statement like this.
I added the hex notation so to make more clear why the & results in an emulated rollover event. Note that 0xff is binary 0x11111111, so the AND operation is simply masking off the upper bits.

2 problems with this approach.
A) Using a pointer with a bit-wise operation is not portable code. #Ilja Everilä
char *aptr;
// error: invalid operands to binary & (have 'char *' and 'void *')
// The following increments the index: (not really)
// aptr = (aptr + 1) & (void *)(BUFFERSIZE-1);
B) With compilers that support the non-standard math on a void * akin to a char *, the math is wrong if aptr point to an object wider than char and BUFFERSIZE is the number of elements in the buffer and not the byte-size. Of course this depends on how the non-standard complier implements some_type * & void *. Why bother to unnecessarily code to use some implementation specific behavior?
Instead use i % BUFFERSIZE. This portable approach works when BUFFERSIZE is a power-of-2 and well as when it is not. When a compiler sees i % power-of-2 and i is some unsigned type, then the same code is certainly emitted as i & (power-of-2 - 1).
For compilers that do not recognize this optimization, then one should consider a better compiler.
#define BUFFERSIZE 256
int main(void) {
char buf[BUFFERSIZE];
// pointer solution
char *aptr = buf;
aptr = &buf[(aptr - buf + 1) % BUFFERSIZE];
// index solution
size_t index = 0;
index = (index + 1) % BUFFERSIZE;
}

Related

How to check if pointer address has the value 0xffffffffffffffff (i.e., is pointing to the maximum address)?

I am trying to dereference a pointer to an unsigned char*, but the program crashes with segfault because its value is always 0xffffffffffffffff. Before dereferencing the pointer, I would like to check whether it is pointing address is different than 0xffffffffffffffff (to avoid segfaults). How could I check that without hardcoding the check against 0xffffffffffffffff ? i.e.,
unsigned char* pointer;
...
// I would rather use something which works for x86 or x64 machine
if( pointer & 0xffffffffffffffff >= 0xffffffffffffffff - 1 ) {
// exit
}
I was researching and I found this https://www.qnx.com/developers/docs/6.4.1/dinkum_en/c99/stdint.html#UINTPTR_MAX, but that UINTPTR_MAX seems to be the maximum value of the integer the pointer is pointing to, other than the maximum value of the pointer address.
The problem is consistent, every segmentation fault, the pointer has the value 0xffffffffffffffff.
I know I should fix my application so the pointer is not set to the invalid address 0xffffffffffffffff, but the code base is quite big and complex and I not sure how to reproduce the problem yet. So, while do not figure out how this pointer is set to an invalid address, I would like to put a protection.
Just out of idle curiosity, how can I purposefully make a pointer to an unsigned char* point to the address 0xffffffffffffffff (maximum address on the target machine) ? My program is doing this somewhere, but I have no idea how as I am still searching for the culprit.
Casting -1 to an unsigned char * will, on a typical compiler, produce a pointer value of either 0xffffffff or 0xffffffffffffffff on a 32-bit or 64-bit system respectively.
if (pointer == (unsigned char *)-1) {
log_msg("The notorious 0xffffffff bug has surfaced! Details follow...");
log_msg(/* more info about the program state */);
abort();
}
Other possibilities would include if (pointer == (unsigned char *)UINTPTR_MAX) or if ((uintptr_t)pointer == UINTPTR_MAX).
UINTPTR_MAX seems to be the maximum value of the integer the pointer is pointing to
That's not right. It's the maximum value of the unsigned integer type uintptr_t, which will normally be a 32- or 64-bit integer according to the size of a pointer on this system. So you can get the desired pointer by casting this value to a pointer, or cast the pointer to uintptr_t and compare with UINTPTR_MAX.
Caveats for future readers: This is only appropriate for the specific case OP mentions: where you have already identified 0xffffffff etc as the values produced by some specific bug that you are trying to track down. In general, it is not possible to test at runtime whether a given pointer points to a valid object or not; you have to design your program so that only valid pointers are used in the first place.
Also, any time you talk about a pointer that doesn't point to a specific object (and is not NULL), you are outside the realm of standard C, so all of this is undefined behavior per the standard (or at best implementation-defined behavior). You are reliant on the behavior of your particular compiler, but it's reasonable to expect them to handle this case in the "natural" way.
Please do not use this, see comments for reference:
sizeof(char *) * 8 is likely 32 or 64. Then 1 << sizeof(char*) * 8 is 1 << 32 or 1 << 64. 1 is an int constant, which is likely 32 bits. Then 1 << 32 and 1 << 64 are not defined by the C standard; shifts with widths greater than or equal to the width of the left operand (after promotion) are not defined. And there is no need for any of this; UINTPTR_MAX is the value OP asks for (even if their approach is bad), and, if that were unavailable, (uintptr_t) -1 is also the value.
I did this test program and it seems to be working: https://ideone.com/bqgUrH
On if 1
On if 2
1 0xffffffffffffffff
2 0xfffffffffffffffd
It takes the size of the char * (usually 4 bytes on x86) and multiplies by 8 to convert it to bits (32 bits), then, it shifts 1 << 32, and subtracts 1 making it 0xffffff...
#include <stdio.h>
int main(void) {
unsigned char *p = (1 << sizeof(char*) * 8) - 1;
unsigned char *p2 = (1 << sizeof(char*) * 8) - 3;
if ( ((unsigned int)p & ((1 << sizeof(char*) * 8) - 1)) >= ((1 << sizeof(char*) * 8) - 2) )
{
printf("On if 1\n");
}
if (!(((unsigned int)p2 & ((1 << sizeof(char*) * 8) - 1)) >= ((1 << sizeof(char*) * 8) - 2)))
{
printf("On if 2\n");
}
printf("1 %p\n", p);
printf("2 %p\n", p2);
return 0;
}

C math function with _BYTE and

I used IDA to decompile a function a program and i don`t know what exactly this code work.
flag[i] ^= *((_BYTE *)&v2 + (signed int)i % 4);
How does this work?
This could be used for xor-"decrypting" (or encrypting, the operation is symmetric) a buffer with a 4-byte key. See the following code, which might be a bit better readable, than the decompiler output
char flag[SIZE];
char key[4];
for (int i = 0; i < SIZE; i++) {
flag[i] = flag[i] ^ key[i%4];
}
So if your data is "ZGUIHUOHJIOJOPIJMXAR" and your key is "akey", then the snippet basically does
ZGUIHUOHJIOJOPIJMXA
^ akeyakeyakeyakeyake
=====================
yourplaintextresult (<- not really the result here, but you get the idea)
(_BYTE *)&v2
This says the address of v2 should be an address to a variable of type byte
(signed int)i % 4
This says the remainder of integer i divided by 4 (i is probably a loop counter)
(_BYTE *)&v2 + (signed int)i % 4
This says address that v2 points to it, should be incremented by (i % 4).
*((_BYTE *)&v2 + (signed int)i % 4)
This is to dereference the content in memory at position (v2 + i%4)
flag[i] ^= *((_BYTE *)&v2 + (signed int)i % 4);
This says the i th element of flag array should be XOR-ed with the result of the content in memory at the position of (v2 + i%4)

Fast strlen with bit operations

I found this code
int strlen_my(const char *s)
{
int len = 0;
for(;;)
{
unsigned x = *(unsigned*)s;
if((x & 0xFF) == 0) return len;
if((x & 0xFF00) == 0) return len + 1;
if((x & 0xFF0000) == 0) return len + 2;
if((x & 0xFF000000) == 0) return len + 3;
s += 4, len += 4;
}
}
I'm very interested in knowing how it works. ¿Can anyone explain how it works?
A bitwise AND with ones will retrieve the bit pattern from the other operand. Meaning, 10101 & 11111 = 10101. If the result of that bitwise AND is 0, then we know we know the other operand was 0. A result of 0 when ANDing a single byte with 0xFF (ones) will indicate a NULL byte.
The code itself checks each byte of the char array in four-byte partitions. NOTE: This code isn't portable; on another machine or compiler, an unsigned int could be more than 4 bytes. It would probably be better to use the uint32_t data type to ensure 32-bit unsigned integers.
The first thing to note is that on a little-endian machine, the bytes making up the character array will be read into an unsigned data type in reverse order; that is, if the four bytes at the current address are the bit pattern corresponding to abcd, then the unsigned variable will contain the bit pattern corresponding to dcba.
The second is that a hexadecimal number constant in C results in an int-sized number with the specified bytes at the little-end of the bit pattern. Meaning, 0xFF is actually 0x000000FF when compiling with 4-byte ints. 0xFF00 is 0x0000FF00. And so on.
So the program is basically looking for the NULL character in the four possible positions. If there is no NULL character in the current partition, it advances to the next four-byte slot.
Take the char array abcdef for an example. In C, string constants will always have null terminators at the end, so there's a 0x00 byte at the end of that string.
It'll work as follows:
Read "abcd" into unsigned int x:
x: 0x64636261 [ASCII representations for "dcba"]
Check each byte for a null terminator:
0x64636261
& 0x000000FF
0x00000061 != 0,
0x64636261
& 0x0000FF00
0x00006200 != 0,
And check the other two positions; there are no null terminators in this 4-byte partition, so advance to the next partition.
Read "ef" into unsigned int x:
x: 0xBF006665 [ASCII representations for "fe"]
Note the 0xBF byte; this is past the string's length, so we're reading in garbage from the runtime stack. It could be anything. On a machine that doesn't allow unaligned accesses, this will crash if the memory after the string is not 1-byte aligned. If there were just one character left in the string, we'd be reading two extra bytes, so the alignment of the memory adjacent to the char array would have to be 2-byte aligned.
Check each byte for a null terminator:
0xBF006665
& 0x000000FF
0x00000065 != 0,
0xBF006665
& 0x0000FF00
0x00006600 != 0,
0xBF006665
& 0x00FF0000
0x00000000 == 0 !!!
So we return len + 2; len was 4 since we incremented it once by 4, so we return 6, which is indeed the length of the string.
Code "works" by attempting to read 4 bytes at a time by assuming the string is laid out and accessible like an array of int. Code reads the first int and then each byte in turn, testing if it is the null character. In theory, code working with int will run faster then 4 individualchar operations.
But there are problems:
Alignment is an issue: e.g. *(unsigned*)s may seg-fault.
Endian is an issue with if((x & 0xFF) == 0) might not get the byte at address s
s += 4 is a problem as sizeof(int) may differ from 4.
Array types may exceed int range, better to use size_t.
An attempt to right these difficulties.
#include <stddef.h>
#include <stdio.h>
static inline aligned_as_int(const char *s) {
max_align_t mat; // C11
uintptr_t i = (uintptr_t) s;
return i % sizeof mat == 0;
}
size_t strlen_my(const char *s) {
size_t len = 0;
// align
while (!aligned_as_int(s)) {
if (*s == 0) return len;
s++;
len++;
}
for (;;) {
unsigned x = *(unsigned*) s;
#if UINT_MAX >> CHAR_BIT == UCHAR_MAX
if(!(x & 0xFF) || !(x & 0xFF00)) break;
s += 2, len += 2;
#elif UINT_MAX >> CHAR_BIT*3 == UCHAR_MAX
if (!(x & 0xFF) || !(x & 0xFF00) || !(x & 0xFF0000) || !(x & 0xFF000000)) break;
s += 4, len += 4;
#elif UINT_MAX >> CHAR_BIT*7 == UCHAR_MAX
if ( !(x & 0xFF) || !(x & 0xFF00)
|| !(x & 0xFF0000) || !(x & 0xFF000000)
|| !(x & 0xFF00000000) || !(x & 0xFF0000000000)
|| !(x & 0xFF000000000000) || !(x & 0xFF00000000000000)) break;
s += 8, len += 8;
#else
#error TBD code
#endif
}
while (*s++) {
len++;
}
return len;
}
It trades undefined behaviour (unaligned accesses, 75% probability to access beyond the end of the array) for a very questionable speedup (it is very possibly even slower). And is not standard-compliant, because it returns int instead of size_t. Even if unaligned accesses are allowed on the platform, they can be much slower than aligned accesses.
It also does not work on big-endian systems, or if unsigned is not 32 bits. Not to mention the multiple mask and conditional operations.
That said:
It tests 4 8-bit bytes at a time by loading a unsigned (which is not even guaranteed to have more than 16 bits). Once any of the bytes contains the '\0'-terminator, it returns the sum of the current length plus the position of that byte. Else it increments the current length by the number of bytes tested in parallel (4) and gets the next unsigned.
My advice: bad example of optimization plus too many uncertainties/pitfalls. It's likely not even faster — just profile it against the standard version:
size_t strlen(restrict const char *s)
{
size_t l = 0;
while ( *s++ )
l++;
return l;
}
There might be a way to use special vector-instructions, but unless you can prove this is a critical function, you should leave this to the compiler — some may unroll/speedup such loops much better.
All there proposals are slower than a simple strlen().
The reason is that they do not reduce the number of comparisons and only one deals with alignment.
Check for the strlen() proposal from Torbjorn Granlund (tege#sics.se) and Dan Sahlin (dan#sics.se) in the net. If you are on a 64 bit platform this really helps to speed up things.
It detects if any bits are set at a specific byte on a little-endian machine. Since we're only checking a single byte (since all the nibbles, 0 or 0xF, are doubled up) and it happens to be the last byte position (since the machine is little-endian and the byte pattern for the numbers is therefore reversed) we can immediately know which byte contains NUL.
The loop is taking 4 bytes of the char array for each iteration. The four if statements are used to determine if the string is over, using bitmask with AND operator to read the status of i-th element of the substring selected.

C variable smaller then 8-bit

I'm writing C implementation of Conway's Game of Life and pretty much done with the code, but I'm wondering what is the most efficient way to storage the net in the program.
The net is two dimensional and stores whether cell (x, y) is alive (1) or dead (0). Currently I'm doing it with unsigned char like that:
struct:
typedef struct {
int rows;
int cols;
unsigned char *vec;
} net_t;
allocation:
n->vec = calloc( n->rows * n->cols, sizeof(unsigned char) );
filling:
i = ( n->cols * (x - 1) ) + (y - 1);
n->vec[i] = 1;
searching:
if( n->vec[i] == 1 )
but I don't really need 0-255 values - I only need 0 - 1, so I'm feeling that doing it like that is a waste of space, but as far as I know 8-bit char is the smallest type in C.
Is there any way to do it better?
Thanks!
The smallest declarable / addressable unit of memory you can address/use is a single byte, implemented as unsigned char in your case.
If you want to really save on space, you could make use of masking off individual bits in a character, or using bit fields via a union. The trade-off will be that your code will execute a bit slower, and will certainly be more complicated.
#include <stdio.h>
union both {
struct {
unsigned char b0: 1;
unsigned char b1: 1;
unsigned char b2: 1;
unsigned char b3: 1;
unsigned char b4: 1;
unsigned char b5: 1;
unsigned char b6: 1;
unsigned char b7: 1;
} bits;
unsigned char byte;
};
int main ( ) {
union both var;
var.byte = 0xAA;
if ( var.bits.b0 ) {
printf("Yes\n");
} else {
printf("No\n");
}
return 0;
}
References
Union and Bit Fields, Accessed 2014-04-07, <http://www.rightcorner.com/code/CPP/Basic/union/sample.php>
Access Bits in a Char in C, Accessed 2014-04-07, <https://stackoverflow.com/questions/8584577/access-bits-in-a-char-in-c>
Struct - Bit Field, Accessed 2014-04-07, <http://cboard.cprogramming.com/c-programming/10029-struct-bit-fields.html>
Unless you're working on an embedded platform, I wouldn't be too concerned about the size your net takes up by using an unsigned char to store only a 1 or 0.
To address your specific question: char is the smallest of the C data types. char, signed char, and unsigned char are all only going to take up 1 byte each.
If you want to make your code smaller you can use bitfields to decrees the amount of space you take up, but that will increase the complexity of your code.
For a simple exercise like this, I'd be more concerned about readability than size. One way you can make it more obvious what you're doing is switch to a bool instead of a char.
#include <stdbool.h>
typedef struct {
int rows;
int cols;
bool *vec;
} net_t;
You can then use true and false which, IMO, will make your code much easier to read and understand when all you need is 1 and 0.
It will take up at least as much space as the way you're doing it now, but like I said, consider what's really important in the program you're writing for the platform you're writing it for... it's probably not the size.
The smallest type on C as i know are the char (-128, 127), signed char (-128, 127), unsigned char (0, 255) types, all of them takes a whole byte, so if you are storing multiple bits values on different variables, you can instead use an unsigned char as a group of bits.
unsigned char lives = 128;
At this moment, lives have a 128 decimal value, which it's 10000000 in binary, so now you can use a bitwise operator to get a single value from this variable (like an array of bits)
if((lives >> 7) == 1) {
//This code will run if the 8 bit from right to left (decimal 128) it's true
}
It's a little complex, but finally you'll end up with a bit array, so instead of using multiple variables to store single TRUE / FALSE values, you can use a single unsigned char variable to store 8 TRUE / FALSE values.
Note: As i have some time out of the C/C++ world, i'm not 100% sure that it's "lives >> 7", but it's with the '>' symbol, a little research on it and you'll be ready to go.
You're correct that a char is the smallest type - and it is typically (8) bits, though this is a minimum requirement. And sizeof(char) or (unsigned char) is (1). So, consider using an (unsigned) char to represent (8) columns.
How many char's are required per row? It's (cols / 8), but we have to round up for an integer value:
int byte_cols = (cols + 7) / 8;
or:
int byte_cols = (cols + 7) >> 3;
which you may wish to store with in the net_t data structure. Then:
calloc(n->rows * n->byte_cols, 1) is sufficient for a contiguous bit vector.
Address columns and rows by x and y respectively. Setting (x, y) (relative to 0) :
n->vec[y * byte_cols + (x >> 3)] |= (1 << (x & 0x7));
Clearing:
n->vec[y * byte_cols + (x >> 3)] &= ~(1 << (x & 0x7));
Searching:
if (n->vec[y * byte_cols + (x >> 3)] & (1 << (x & 0x7)))
/* ... (x, y) is set... */
else
/* ... (x, y) is clear... */
These are bit manipulation operations. And it's fundamentally important to learn how (and why) this works. Google the term for more resources. This uses an eighth of the memory of a char per cell, so I certainly wouldn't consider it premature optimization.

How to convert from integer to unsigned char in C, given integers larger than 256?

As part of my CS course I've been given some functions to use. One of these functions takes a pointer to unsigned chars to write some data to a file (I have to use this function, so I can't just make my own purpose built function that works differently BTW). I need to write an array of integers whose values can be up to 4095 using this function (that only takes unsigned chars).
However am I right in thinking that an unsigned char can only have a max value of 256 because it is 1 byte long? I therefore need to use 4 unsigned chars for every integer? But casting doesn't seem to work with larger values for the integer. Does anyone have any idea how best to convert an array of integers to unsigned chars?
Usually an unsigned char holds 8 bits, with a max value of 255. If you want to know this for your particular compiler, print out CHAR_BIT and UCHAR_MAX from <limits.h> You could extract the individual bytes of a 32 bit int,
#include <stdint.h>
void
pack32(uint32_t val,uint8_t *dest)
{
dest[0] = (val & 0xff000000) >> 24;
dest[1] = (val & 0x00ff0000) >> 16;
dest[2] = (val & 0x0000ff00) >> 8;
dest[3] = (val & 0x000000ff) ;
}
uint32_t
unpack32(uint8_t *src)
{
uint32_t val;
val = src[0] << 24;
val |= src[1] << 16;
val |= src[2] << 8;
val |= src[3] ;
return val;
}
Unsigned char generally has a value of 1 byte, therefore you can decompose any other type to an array of unsigned chars (eg. for a 4 byte int you can use an array of 4 unsigned chars). Your exercise is probably about generics. You should write the file as a binary file using the fwrite() function, and just write byte after byte in the file.
The following example should write a number (of any data type) to the file. I am not sure if it works since you are forcing the cast to unsigned char * instead of void *.
int homework(unsigned char *foo, size_t size)
{
int i;
// open file for binary writing
FILE *f = fopen("work.txt", "wb");
if(f == NULL)
return 1;
// should write byte by byte the data to the file
fwrite(foo+i, sizeof(char), size, f);
fclose(f);
return 0;
}
I hope the given example at least gives you a starting point.
Yes, you're right; a char/byte only allows up to 8 distinct bits, so that is 2^8 distinct numbers, which is zero to 2^8 - 1, or zero to 255. Do something like this to get the bytes:
int x = 0;
char* p = (char*)&x;
for (int i = 0; i < sizeof(x); i++)
{
//Do something with p[i]
}
(This isn't officially C because of the order of declaration but whatever... it's more readable. :) )
Do note that this code may not be portable, since it depends on the processor's internal storage of an int.
If you have to write an array of integers then just convert the array into a pointer to char then run through the array.
int main()
{
int data[] = { 1, 2, 3, 4 ,5 };
size_t size = sizeof(data)/sizeof(data[0]); // Number of integers.
unsigned char* out = (unsigned char*)data;
for(size_t loop =0; loop < (size * sizeof(int)); ++loop)
{
MyProfSuperWrite(out + loop); // Write 1 unsigned char
}
}
Now people have mentioned that 4096 will fit in less bits than a normal integer. Probably true. Thus you can save space and not write out the top bits of each integer. Personally I think this is not worth the effort. The extra code to write the value and processes the incoming data is not worth the savings you would get (Maybe if the data was the size of the library of congress). Rule one do as little work as possible (its easier to maintain). Rule two optimize if asked (but ask why first). You may save space but it will cost in processing time and maintenance costs.
The part of the assignment of: integers whose values can be up to 4095 using this function (that only takes unsigned chars should be giving you a huge hint. 4095 unsigned is 12 bits.
You can store the 12 bits in a 16 bit short, but that is somewhat wasteful of space -- you are only using 12 of 16 bits of the short. Since you are dealing with more than 1 byte in the conversion of characters, you may need to deal with endianess of the result. Easiest.
You could also do a bit field or some packed binary structure if you are concerned about space. More work.
It sounds like what you really want to do is call sprintf to get a string representation of your integers. This is a standard way to convert from a numeric type to its string representation. Something like the following might get you started:
char num[5]; // Room for 4095
// Array is the array of integers, and arrayLen is its length
for (i = 0; i < arrayLen; i++)
{
sprintf (num, "%d", array[i]);
// Call your function that expects a pointer to chars
printfunc (num);
}
Without information on the function you are directed to use regarding its arguments, return value and semantics (i.e. the definition of its behaviour) it is hard to answer. One possibility is:
Given:
void theFunction(unsigned char* data, int size);
then
int array[SIZE_OF_ARRAY];
theFunction((insigned char*)array, sizeof(array));
or
theFunction((insigned char*)array, SIZE_OF_ARRAY * sizeof(*array));
or
theFunction((insigned char*)array, SIZE_OF_ARRAY * sizeof(int));
All of which will pass all of the data to theFunction(), but whether than makes any sense will depend on what theFunction() does.

Resources