How to understand the following code in C? - c

I'm using the PCG Random Number Generation package,and can not understand the following code.
time(NULL) ^ (intptr_t)&printf
which is a argument for the function to generate the seed for randomization:
void pcg32_srandom(uint64_t seed, uint64_t seq)
In the main function, it will be used as follow:
pcg32_srandom(time(NULL) ^ (intptr_t)&printf, 54u);
BTW I also want to ask why "54u" should be written in such way?

I am not sure how much you know about random number generators, but often a random number generator is initialized by passing a number to it called a "seed". In this case, the seed is chosen to be the time returned by the time function XORed with the address of the printf function. I think this is not a very good random seed, and I wouldn't trust it with any important cryptographic tasks.
In C, when you write 54u it tells the compiler that you are writing an unsigned number. The u is not actually needed in the code you posted.

Let's break it down...
/* original: pcg32_srandom(time(NULL) ^ (intptr_t)&printf, 54u); */
uint64_t a = time(NULL);
uint64_t b = (intptr_t)&printf;
uint64_t c = a ^ b;
uint64_t d = 54u;
printf("a=%llx\n", (unsigned long long) a); /* no guessing about length */
printf("b=%llx\n", (unsigned long long) b); /* (thank you #chux) */
printf("c=%llx\n", (unsigned long long) c);
printf("d=%llx\n", (unsigned long long) d);
pcg32_srandom( a ^ b, d);
So... the ^ is the bitwise xor operator (edit: I originally wrote or).
Adding the printf should help you trace what the code is doing.
Apparently the code is xor-ing together something from time with
the address of the printf function (which is clever, I haven't seen
that before).
The u on the 54u is probably the original author being cautious.
When doing bit manipulations usually you don't want signed numbers.
This has some background: http://soundsoftware.ac.uk/c-pitfall-unsigned
We can see the API for pcg32_srandom() here:
http://www.pcg-random.org/using-pcg-c-basic.html
and these variants for the global RNG:
void pcg32_srandom(uint64_t initstate, uint64_t initseq)
uint32_t pcg32_random()
uint32_t pcg32_boundedrand(uint32_t bound)
So it looks like they're trying to come up with a seed for
the random number generator "initstate" and, for some reason,
want to use 54u as the "initseq".

Related

C - Pupulate (get) data with a function. Best via reference or copy (in micro controller enviroment)?

I am currently programming a 8051 µC in C (with compiler: Wickehaeuser µC/51) and so that, I am thinking which way is the best, to pupulate a structure. In my current case I have a time/date structure which should be pupulated with the current Time/Date from an RTC via SFRs.
So I am thinking of the best method to do this:
Get the data via return value by creating the variable inside the function (get_foo_create)
Get data via call by reference (get_foo_by_reference)
Get via call by reference plus returning it (by writing I think this is stupid, but I am also thinking about this) (get_foo_by_reference)
The following code is just an example (note: there is currently a failure in the last print, which does not print out the value atm)
Which is the best method?
typedef struct {
unsigned char foo;
unsigned char bar;
unsigned char baz;
}data_foo;
data_foo get_foo_create(void) {
data_foo foo;
foo.bar = 2;
return foo;
}
void get_foo_by_reference(data_foo *foo) {
// Read values e.g. from SFR
foo->bar = 42; // Just simulate SFR
}
data_foo *get_foo_pointer_return(data_foo *foo) {
// Read values e.g. from SFR
(*foo).bar = 11; // Just simulate SFR
return foo;
}
/**
* Main program
*/
void main(void) {
data_foo struct_foo;
data_foo *ptr_foo;
seri_init(); // Serial Com init
clear_screen();
struct_foo = get_foo_create();
printf("%d\n", struct_foo.bar);
get_foo_by_reference(&struct_foo);
printf("%d\n", struct_foo.bar);
ptr_foo = get_foo_pointer_return(&ptr_foo);
//Temp problem also here, got 39 instead 11, tried also
//printf("%d\n",(void*)(*ptr_foo).bar);
printf("%d\n",(*ptr_foo).bar);
SYSTEM_HALT; //Programm end
}
On the 8051, you should avoid using pointers to the extent possible. Instead, it's generally best--if you can afford it--to have some global structures which will be operated upon by various functions. Having functions for "load thing from address" and "store thing to address", along with various functions that manipulate thing, can be much more efficient than trying to have functions that can operate on objects of that type "in place".
For your particular situation, I'd suggest having a global structure called "time", as well as a global union called "ldiv_acc" which combines a uint_32, two uint16_t, and four uint8_t. I'd also suggest having an "ldivmod" function which divides the 32-bit value in ldiv_acc by an 8-bit argument, leaving the quotient in ldiv_acc and returning the remainder, as well as an "lmul" function which multiplies the 32-bit value in ldiv_acc by an 8-bit value. It's been a long time since I've programmed the 8051, so I'm not sure what help compilers need to generate good code, but 32x32 divisions and multiplies are going to be expensive compared with using a combination of 8x8 multiplies and divides.
On the 8051, code like:
uint32_t time;
uint32_t sec,min,hr;
sec = time % 60;
time /= 60;
min = time % 60;
time /= 60;
hr = time % 24;
time /= 24;
is likely to be big and slow. Using something like:
ldiv_acc.l = time;
sec = ldivmod(60);
min = ldivmod(60);
hr = ldivmod(24);
is apt to be much more compact and, if you're clever, faster. If speed is really important, you could use functions to perform divmod6, divmod10, divmod24, and divmod60, taking advantage of the fact that divmod60(h+256*l) is equal to h*4+divmod60(h*16+l). The second addition might yield a value greater than 256, but if it does, applying the same technique will get the operand below 256. Dividing an unsigned char by another unsigned char is faster than divisions involving unsigned int.

Issue working with uint2 and CUDA

Recently I started working with CUDA and Ethereum and I found a little bit of code snipet on a function that when I try to port to a cuda file I get some errors.
Here is the code snippet:
void keccak_f1600_round(uint2* a, uint r, uint out_size)
{
#if !__ENDIAN_LITTLE__
for (uint i = 0; i != 25; ++i)
a[i] = make_uint2(a[i].y, a[i].x);
#endif
uint2 b[25];
uint2 t;
// Theta
b[0] = a[0] ^ a[5] ^ a[10] ^ a[15] ^ a[20];
#if !__ENDIAN_LITTLE__
for (uint i = 0; i != 25; ++i)
a[i] = make_uint2(a[i].y, a[i].x);
#endif
}
The error I am getting concern the b[0] line and is:
error: no operator "^=" matches these operands operand types are: uint2 ^= uint2
TO be honest I don't have a lot of experience with uint2 and cuda and that is why I am asking what should I do to correct this issue.
The exclusive-or operator works with unsigned long long, but not with uint2 (which for CUDA is a built-in struct containing two unsigned ints).
To make the code work, there are several options. Some that come to my mind:
you can use reinterpret-cast<unsigned long long &> before each uint2 in the line that does the exclusive-or (see How to use reinterpret_cast in C++?)
you can rewrite the code to use unsigned long long types everywhere you use uint2 now. This probably produces the most maintainable code.
you can rewrite the line for the exclusive-or among uint2 types, as a pair of exclusive-or lines using the .x and .y members of the uint2, as each is an unsigned int type.
you can define a union type to allow access to the data that is currently type uint2, as either a uint2 or a unsigned long long.
you can overload the ^ exclusive-or operator to work with uint2 types.
you can replace the line that produces the error with asm statements to generate the PTX code to perform the exclusive-or for you. See http://docs.nvidia.com/cuda/inline-ptx-assembly/index.html#using-inline-ptx-assembly-in-cuda
uint2 is simply a struct, you'll need to implement ^ using a[].x and a[].y. I couldn't find where the builtin declarations are but Are there advantages to using the CUDA vector types? has a good description of their use.

Blowfish encryption function(BF_encrypt) in C

I am trying to understand the BlowFish Encryption algorithm. I am having some problems which are described below in full details:
Here are functions:
void BF_encrypt (data, encrypt)
BF_LONG *data; //data is array of two members i.e ti[0] and ti[1]
//and BF_LONG is defined as unsigned long in header file
int encrypt; //encrypt is defined as 1
{
register BF_LONG l, r, *p, *s; //BF_LONG is defined as unsigned long in header file
p = key_P; //key_P is declared as BF_LONG key_P[16 + 2];
s = &(key_S[0]); //key_S is declared as BF_LONG key_S[4 * 256];
l = data[0]; //data[0]=ti[0]
r = data[1]; //data[1]=ti[1]
l ^= p[0]; //Bitwise ex-or of l with p[0];
BF_ENC (r, l, s, p[1]);
}
And here is BF_ENC function:
#define BF_ENC(LL,R,S,P) \
LL^=P; \
LL^=((( S[ (R>>24L) ] + \
S[0x0100+((R>>16L)&0xff)])^ \
S[0x0200+((R>> 8L)&0xff)])+ \
S[0x0300+((R )&0xff)])&0xffffffff;
Here are my questions:
What does "\" mean in every ending line in BF_ENC?
R>>24L, I can get it that it is shifting to right. L is long here but I am not getting its presence here. What is the effect of L on data?
Last question is that what is BF_ENC function doing over all, just a overview not in depth.
If you need any more information, I am here.
Thanks!
BF_ENC is a C-style macro. Usually those are delimited by the end of line character. The \ escape character is just there to indicate that the next line should be considered part of this line. So everything up to 0xffffffff; is part of the macro.
The effect of L is that on many languages the primitive type of the output may depend on both operands. The L makes 24 (and 16 and 8) literals of the type long. This will make sure that the output is of the correct primitive type (of at least 32 bit). In particular it makes sure that the output is large enough to store the 32 bit information.
For shifts this doesn't make much sense (you wouldn't expect a shift of more than 64 bits, so the output could be just the primitive type put in the left operand) but language designers like symmetry - with operators such as + and * - and dislike special cases.
Looking at a previous question I found:
yes, this was my thought as well. Afair, this really worked in K&R (1 << 1L was of type long like 1 + 1L is in ISO C), which is why I asked where the code is from...
so make sure you verify that this macro is working correctly in your environment. To understand blowfish you may be better off looking at more modern code that doesn't contain macro's at all.
BF_ENC shows Blowfishes F function which is the round function (Feistel function) of Blowfish.
It's on the upper right of the Wikipedia page about Blowfish and is described as:
The diagram to the upper right shows Blowfish's F-function. The function splits the 32-bit input into four eight-bit quarters, and uses the quarters as input to the S-boxes. The outputs are added modulo 2^32 and XORed to produce the final 32-bit output.

CUDA __float_as_int in acosf implementation

CUDA C's maths function implementation (cuda/math_function.h) of acosf contains the passage:
if (__float_as_int(a) < 0) {
t1 = CUDART_PI_F - t1;
}
where a and t1 are floats and CUDART_PI_F is a float previously set to a numerical value close to the mathematical constant Pi.
I am trying to understand what the conditional (if-clause) is testing for and what would be the C equivalent of it or the function/macro __float_as_int(a). I searched for the implementation of __float_as_int() but without success. It seems that __float_as_int() is a built-in macro or function to NVIDIA NVCC. Looking at the PTX that NVCC produces out of the above passage:
.reg .u32 %r<4>;
.reg .f32 %f<46>;
.reg .pred %p<4>;
// ...
mov.b32 %r1, %f1;
mov.s32 %r2, 0;
setp.lt.s32 %p2, %r1, %r2;
selp.f32 %f44, %f43, %f41, %p2;
it becomes clear that __float_as_int() is not a float to int rounding. (This would have yielded a cvt.s32.f32.) Instead it assigns the float %f1 as a bit-copy (b32) to %r1 (notice: %r1 is of type u32 (unsigned int)!!) and then compares %r1 as if it was a s32 (signed int, confusing!!) with %r2 (who's value is 0).
To me this looks a little odd. But obviously it is correct.
Can someone explain what's going on and especially explain what __float_as_int() is doing in the context of the if-clause testing for being negative (<0)? .. and provide a C equivalent of the if-clause and/or __float_as_int() marco ?
__float_as_int reinterprets float as an int. int is <0 when it has most significant bit on. For float it also means that the sign bit is on, but it does not exactly mean that number is negative (e.g. it can be 'negative zero'). It can be faster to check then checking if float is < 0.0.
C function could look like:
int __float_as_int(float in) {
union fi { int i; float f; } conv;
conv.f = in;
return conv.i;
}
In some other version of this header __cuda___signbitf is used instead.

Partial, or wrapped multiplication - can anyone identify this function?

I am hoping for insight into what looks like a partial multiplication.
#define LOW(x) ((x)&0xffffffff)
#define HIGH(x) ((x)>>32)
unsigned long long NotMultiply(unsigned long long x, unsigned long long y)
{
return HIGH(x)*HIGH(y) + LOW(x)*LOW(y);
}
This function is iterated multiple times, as follows:
unsigned long long DoBusyWork( unsigned long long x, unsigned long long y, int n)
{
while (n--)
x = NotMultiply(x,y);
return x;
}
Are there any shortcuts for calculating this result ?
What about for the case where x == y?
Any links to more information would help..
Looks like a strange hash calculation. It takes the lower 32 bits of two numbers, multiplies them, takes the higher 32 bits (moving them to the lower places) and multiplies them, and returns the sum of it.
I don't think that you can make it simpler, but probably faster. You can break the while loop if it returns the same value again.
unsigned long long DoBusyWork( unsigned long long x, unsigned long long y, int n)
{
long long previousX = x;
while (n--)
{
x = NotMultiply(x,y);
if (x == previousX) break;
previousX = x;
}
return x;
}
I don't know if there is a high chance to finish the loop early.
DoBusyWork (as #RBerteig has suggested, the name is a red flag) may be a way to force the compiler to not optimize a busy-loop.
These darned compilers are getting so smart that sometimes they decide, "Oh! You don't need a loop there! I see what you're trying to calculate!" despite your real interest as a programmer.
This is an early form of random number generator, often used on minicomputers and small mainframes. The call might look something like this:
unsigned long long seedold = 0xA5A5A5A5A5A5A5A5;
unsigned long long seednew = 0X5A5A5A5A5A5A5A5A;
unsigned long long lltmp;
int finetune;
Randomize finetune by timing the keyboard or some similar truly random but slow method,
then call once like this:
lltmp = DoBusyWork( seedold, seednew, finetune );
seedold = seednew;
seednew = lltmp;
Subsequently use it as a PRNG by calling it like this:
lltmp = DoBusyWork( seedold, seednew, 1 );
seedold = seednew;
seednew = lltmp;
Use seednew as the PRN.
Von Neumann once advocated this kind of calculation for "Monte-Carlo" testing applications but later changed his mind when he learned more about analyzing the output of PRNGs.
-Al.
LOW is trying to take only the bottom 32 bits. High is moving the next 32 bits down 32 bits. The whole NotMultiply routine is trying to multiple the bottom 32 bits of x and y together, and add them to the top 32 bits. DoBusyWork does it n times.
If x==y, the you get HIGH(x)²+LOW(x)².
I have no idea why they'd want to do that, though. It's sort of mashing together the top and bottom halves of x and y.

Resources