C math function with _BYTE and - c

I used IDA to decompile a function a program and i don`t know what exactly this code work.
flag[i] ^= *((_BYTE *)&v2 + (signed int)i % 4);
How does this work?

This could be used for xor-"decrypting" (or encrypting, the operation is symmetric) a buffer with a 4-byte key. See the following code, which might be a bit better readable, than the decompiler output
char flag[SIZE];
char key[4];
for (int i = 0; i < SIZE; i++) {
flag[i] = flag[i] ^ key[i%4];
}
So if your data is "ZGUIHUOHJIOJOPIJMXAR" and your key is "akey", then the snippet basically does
ZGUIHUOHJIOJOPIJMXA
^ akeyakeyakeyakeyake
=====================
yourplaintextresult (<- not really the result here, but you get the idea)

(_BYTE *)&v2
This says the address of v2 should be an address to a variable of type byte
(signed int)i % 4
This says the remainder of integer i divided by 4 (i is probably a loop counter)
(_BYTE *)&v2 + (signed int)i % 4
This says address that v2 points to it, should be incremented by (i % 4).
*((_BYTE *)&v2 + (signed int)i % 4)
This is to dereference the content in memory at position (v2 + i%4)
flag[i] ^= *((_BYTE *)&v2 + (signed int)i % 4);
This says the i th element of flag array should be XOR-ed with the result of the content in memory at the position of (v2 + i%4)

Related

Using bit shifting with rand() to allow for a larger random range

I am reviewing a function to generate keys for a Radix Map and found the implementation of rand() to be novel to me.
Here is the function:
static int make_random(RadixMap *map)
{
size_t i = 0;
for (i = 0; i < map->max - 1; i++){
uint32_t key = (uint32_t) (rand() | (rand() << 16));<--This was interesting
check(RadixMap_add(map, key, i) == 0, "Failed to add key %u", key);
}
return i;
error:
return 0;
}
----- Type definitions --------
typedef union RMElement {
uint64_t raw;
struct {
uint32_t key;
uint32_t value;
} data;
} RMElement;
typedef struct RadixMap {
size_t max;
size_t end;
uint32_t counter;
RMElement *contents;
RMElement *temp;
} RadixMap;
from ex35 Learn C the Hard Way by Zed Shaw
The specific part I found interesting was
uint32_t key = (uint32_t) (rand() | (rand() << 16)); <-- This was interesting
It is interesting to me because it would have been possible to simply do ..
uint32_t key = rand();
As RAND_MAX (0x7FFFFFFF) is less than uint32_t MAX (0xFFFFFFFF)
The bit shifting implementation looks to have the following advantages.
Allows for a larger random value range, 0xFFFFFFFF vs 0x7FFFFFFF
Values (other than initial 0) are at least 5 digits decimal (65537) (0x10001)
Reduced probability of seeing "0".
And the following disadvantage
Increased code complexity?
Are there other reasons for using this bit shift implementation of rand()?
I've been trying to hash out the reason for using this implementation in my code review and wanted to make sure I was on the right track with my thinking.
The C standard only guarantees that RAND_MAX is at least 32767. This code accounts for that by calling rand twice and shifting to ensure it gets at least 30 bits of randomness.
However, this does does not properly account for the case where RAND_MAX is larger.
The rand function returns an int which is signed. If RAND_MAX was the same as INT_MAX, rand() << 16 would most likely shift a "1" bit into the sign bit, triggering undefined behavior.
The proper way to implement this to handle both cases is:
uint32_t key = rand() | ((uint32_t)rand() << 16));
Since left shifting an unsigned number is well defined as long as the shift amount is less than the size of the type.
Or better yet:
uint32_t key = (((uint32_t)rand() & 0x7FFF) << 17) |
(((uint32_t)rand() & 0x7FFF) << 2) |
((uint32_t)rand() & 0x3);
To get a full 32 bits of randomness.
uint32_t key = (uint32_t) (rand() | (rand() << 16)); has shortcomings.
Not uniform when RAND_MAX != 65535, which is the usual case.
Undefined behavior when int is 16 bit. Also UB in other cases due to signed integer overflow possibilities with rand() << 16
The cast is too late to protect against a narrow int. Effectively same as uint32_t key = rand() | (rand() << 16); uint32_t key = rand() + (rand() * (RAND_MAX+(uint32_t key)1); would make a bit more sense.
A key failing is using | to append the bits zeroed on the right are not the same as the bit-width of RAND_MAX.
2nd weakness is assuming shifting is better than multiplying by a power-of-2. A good compiler emits efficient code either way.
Instead, call your random function (1, 2 or 3 times) as needed based on its RAND_MAX. Below works well when RAND_MAX is a Mersenne number.
See Is there any way to compute the width of an integer type at compile-time?.
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
// Bit width of RAND_MAX, which is at least 15
#define RAND_MAX_BITS IMAX_BITS(RAND_MAX)
_Static_assert(((RAND_MAX + 1u) & RAND_MAX) == 0, "RAND_MAX is not a Mersenne number");
uint32_t rand32(void) {
uint32_t r = rand();
#if RAND_MAX_BITS < 32
r = (r << RAND_MAX_BITS) | rand();
#endif
#if RAND_MAX_BITS*2 < 32
r = (r << RAND_MAX_BITS) | rand();
#endif
return r;
}
(Bit shifting) Increased code complexity?
No.
Are there other reasons for using this bit shift implementation of rand()?
OP's code is not uniform as it generally favors one bits with its potential or-ing of bits past the 15th.
I've been trying to hash out the reason for using this implementation ...
Do not use it.
Or, you could just use a really fast random number generator.
Careful you don't see with values that don't have too many zero bytes.
uint64_t
xorshift128plus(uint64_t seed[2])
{
uint64_t x = seed[0];
uint64_t y = seed[1];
seed[0] = y;
x ^= x << 23;
seed[1] = x ^ y ^ (x >> 17) ^ (y >> 26);
return s[1] + y;
}
convert the result to float or just modulo your max int value...

Can someone explain to me how this HASH FUNCTION works (also if they have another better option)?

I'm working on CS50's pset 5, speller. I need a hash function for a hash table that will efficiently store all of the words on the dictionary (~140,000). I found this one online, but I don't understand how it works. I don't know what << or ^ mean. Here is the hash function, thank you! (I would really appreciate it if you could help me :))
int hash_it(char* needs_hashing)
{
unsigned int hash = 0;
for (int i=0, n=strlen(needs_hashing); i<n; i++)
hash = (hash << 2) ^ needs_hashing[i];
return hash % HASHTABLE_SIZE;
}
Those two are Bit-wise operators. These are easy to learn and must to learn for a programmer.
<< - is a binary left shift operator.
Suppose variable "hash" binary is "0011".
hash << 2 becomes "1100".
And ^ is XOR operator. (If set in only one operand ...not in both)
Suppose in your code
hash << 2 gives "1100"
needs_hashing[1] gives "1111"
then
(hash << 2) ^ needs_hashing[i] gives "0011"
For a quick understanding bitwise operators, quickly walk here
https://www.tutorialspoint.com/cprogramming/c_bitwise_operators.htm
In the original topic, is demonstrated very inefficient hash function. Two lowest bits of hash after calculation equals to two lowest bits of last char within input line needs_hashing. As result, for example, if all strings contains even ascii-code of last char, then all your hashes also would be even, if HASHTABLE_SIZE is even (2^n, or so).
More efficient hash, based on cyclic shift:
uint32_t hash_it(const char *p) {
uint32_t h = 0xDeadBeef;
while(char c = *p++)
h = ((h << 5) | (h >> (32 - 5))) + c;
h ^= h >> 16;
return h % HASHTABLE_SIZE;
}

Circular buffer increment using alternate method

I am not able to understand how does the last statement increments the pointer.Can somebody explain me with few examples?
The code, as shown:
aptr = (aptr + 1) & (void *)(BUFFERSIZE - 1);
// |________| incremented here
Since it is a circular buffer AND the buffer size is a power of 2, then the & is an easy and fast way to roll over by simply masking. Assuming that the BUFFERSIZE is 256, then:
num & (256 - 1) == num % 256
num & (0x100 - 1) == num % 0x100
num & (0x0ff) == num % 0x100
When the number is not a power of 2, then you can't use the masking technique:
num & (257 - 1) != num % 257
num & (0x101 - 1) != num % 0x101
num & 0x100 != num % 0x101
The (void *) allows the compiler to choose an appropriate width for the BUFFERSIZE constant based on your pointer width... although it is generally best to know - and use! - the width before a statement like this.
I added the hex notation so to make more clear why the & results in an emulated rollover event. Note that 0xff is binary 0x11111111, so the AND operation is simply masking off the upper bits.
2 problems with this approach.
A) Using a pointer with a bit-wise operation is not portable code. #Ilja Everilä
char *aptr;
// error: invalid operands to binary & (have 'char *' and 'void *')
// The following increments the index: (not really)
// aptr = (aptr + 1) & (void *)(BUFFERSIZE-1);
B) With compilers that support the non-standard math on a void * akin to a char *, the math is wrong if aptr point to an object wider than char and BUFFERSIZE is the number of elements in the buffer and not the byte-size. Of course this depends on how the non-standard complier implements some_type * & void *. Why bother to unnecessarily code to use some implementation specific behavior?
Instead use i % BUFFERSIZE. This portable approach works when BUFFERSIZE is a power-of-2 and well as when it is not. When a compiler sees i % power-of-2 and i is some unsigned type, then the same code is certainly emitted as i & (power-of-2 - 1).
For compilers that do not recognize this optimization, then one should consider a better compiler.
#define BUFFERSIZE 256
int main(void) {
char buf[BUFFERSIZE];
// pointer solution
char *aptr = buf;
aptr = &buf[(aptr - buf + 1) % BUFFERSIZE];
// index solution
size_t index = 0;
index = (index + 1) % BUFFERSIZE;
}

Can someone explain this bitwise C code?

I don't know what's going on with this:
#define _PACK32(str, x) \
{ \
*(x) = ((int) *((str) + 3) ) \
| ((int) *((str) + 2) << 8) \
| ((int) *((str) + 1) << 16) \
| ((int) *((str) + 0) << 24); \
}
the str it's a integer and the x it's a integer pointer
Well, as mentioned, str is not an integer. It's a pointer, as it is being dereference with * operator.
*((str) + 3) is equivalent to *(str + sizeof(str[0])*3), thus this depends on the type of str, as seen here. Same goes for other dereference operator.
So what's going on? Well, it takes the least significant 8bit of str[0], str1, str[2], and assemble them to one 32 bit size integer.
For instance, let W, X, Y, Z, A be arbitrary bit. Then,
*(str + 3) = WWWWWWWWWWWWWWWWWWWWWWWWXXXXXXXX
*(str + 2) = WWWWWWWWWWWWWWWWWWWWWWWWYYYYYYYY
*(str + 1) = WWWWWWWWWWWWWWWWWWWWWWWWZZZZZZZZ
*(str + 0) = WWWWWWWWWWWWWWWWWWWWWWWWAAAAAAAA
The last 3 are shifted, 8, 16, and 24, respectively, thus,
*(str + 3) = WWWWWWWWWWWWWWWWWWWWWWWWXXXXXXXX
*(str + 2) = WWWWWWWWWWWWWWWWYYYYYYYY00000000
*(str + 1) = WWWWWWWWZZZZZZZZ0000000000000000
*(str + 0) = AAAAAAAA000000000000000000000000
Note that the least significant digits of the last 3 are replaced with 0 during the shift.
Last, they are OR'ED, which is then assigned to X,
X = AAAAAAAAZZZZZZZZYYYYYYYYXXXXXXXX
Edit: O'Ring is not as straightforward as it might seem seems W's could be anything.
Looks like str is a pointer to an array of 4 bytes, and x is a pointer to a 32 bit value. str would actually point to the first byte (LSB) of a little endian 32 bit number, and this macro would read it and store in the variable pointed by x.
Correctly written as an inline function this should look something like:
void pack32(void const*p, unsigned* x) {
unsigned char const* str = p;
*x = str[0];
*X = *x<<8 | str[1];
*X = *x<<8 | str[2];
*X = *x<<8 | str[3];
}
you should use unsigned types when you do bit shifting, otherwise your result can overflow. And perhaps it also makes the idea clearer. The suposedly 8 bit of each byte are placed in the different bits of the target x.

Placing '0' and '1' character in C string

I've recently begun working with C again (haven't touched it since my first couple semesters of school) and have been dealing with strings.
Currently, I'm writing a function that will convert an integer into its binary representation in the form of a string.
Here is the code I have to do this:
#include <stdio.h>
float power(int, int); //My version of the pow() function from math.h
char* itob(int);
int main ()
{
int x = 63;
const char * bin_x = itob(x);
printf("x = %d (%s)",x,bin_x);
return 0;
}
char* itob(int a){
static char bin[100];
int i;
for(i=0;((int) power(2,i) < a);i++){
bin[i] = (int)((a & ((int) power(2,i))) >> i);
}
bin[i] = '\0';
return (char*)&bin;
}
The issue I'm having is that the result of the binary operations on a I'm storing into an element of bin[] seems to be '\001' or '\000' as opposed to simply '1' or '0'. Because of this, when I printf() the result in main(), the output looks like a missing character box or if there is a 0 (if x=62 instead of 63) this is interpreted as an end of string character '\0'.
Am I storing these elements into my string incorrectly?
Yes, you are storing it incorrectly. You are storing either a 0 or a 1, but you need to store its encoding instead.
Try to replace this line:
bin[i] = (int)((a & ((int) power(2,i))) >> i);
With:
bin[i] = (int)((a & ((int) power(2,i))) >> i) ? '1' : '0';
And you can simply return bin;, there is no need neither for the cast, nor for the use of the address of operator.
Do not use pow for powers of 2, that's very inefficient, just a shift is enough. Your program wastes lots of time converting between int and float, and another tens of hundreds of cycles for the power while shifting generally takes only 1 cycle (or a little more depending on architecture). And no need to cast to int, any type shorter or equal to int will be automatically promoted to int
bin[i] = a & (1 << i) ? '1' : '0'; // or:
bin[i] = (a >> i) & 1 ? '1' : '0';
Much cleaner, shorter and faster. Another way if branching results in bad performance
bin[i] = ((a >> i) & 1) + '0';

Resources