How to optimize the KASUMI cipher S-boxes? - c

I am trying to optimizing the Kasumi crypto algorithm written in C.
There are S-box which uses to encrypt the data. which I am representing as an array in which is huge:
int S7[128] = {
54, 50, 62, 56, 22, 34, 94, 96, 38, 6, 63, 93, 2, 18,123, 33,
55,113, 39,114, 21, 67, 65, 12, 47, 73, 46, 27, 25,111,124, 81,
53, 9,121, 79, 52, 60, 58, 48,101,127, 40,120,104, 70, 71, 43,
20,122, 72, 61, 23,109, 13,100, 77, 1, 16, 7, 82, 10,105, 98,
117,116, 76, 11, 89,106, 0,125,118, 99, 86, 69, 30, 57,126, 87,
112, 51, 17, 5, 95, 14, 90, 84, 91, 8, 35,103, 32, 97, 28, 66,
102, 31, 26, 45, 75, 4, 85, 92, 37, 74, 80, 49, 68, 29,115, 44,
64,107,108, 24,110, 83, 36, 78, 42, 19, 15, 41, 88,119, 59, 3
};
int S9[512] = {
167,239,161,379,391,334, 9,338, 38,226, 48,358,452,385, 90,397,
183,253,147,331,415,340, 51,362,306,500,262, 82,216,159,356,177,
175,241,489, 37,206, 17, 0,333, 44,254,378, 58,143,220, 81,400,
95, 3,315,245, 54,235,218,405,472,264,172,494,371,290,399, 76,
165,197,395,121,257,480,423,212,240, 28,462,176,406,507,288,223,
501,407,249,265, 89,186,221,428,164, 74,440,196,458,421,350,163,
232,158,134,354, 13,250,491,142,191, 69,193,425,152,227,366,135,
344,300,276,242,437,320,113,278, 11,243, 87,317, 36, 93,496, 27,
487,446,482, 41, 68,156,457,131,326,403,339, 20, 39,115,442,124,
475,384,508, 53,112,170,479,151,126,169, 73,268,279,321,168,364,
363,292, 46,499,393,327,324, 24,456,267,157,460,488,426,309,229,
439,506,208,271,349,401,434,236, 16,209,359, 52, 56,120,199,277,
465,416,252,287,246, 6, 83,305,420,345,153,502, 65, 61,244,282,
173,222,418, 67,386,368,261,101,476,291,195,430, 49, 79,166,330,
280,383,373,128,382,408,155,495,367,388,274,107,459,417, 62,454,
132,225,203,316,234, 14,301, 91,503,286,424,211,347,307,140,374,
35,103,125,427, 19,214,453,146,498,314,444,230,256,329,198,285,
50,116, 78,410, 10,205,510,171,231, 45,139,467, 29, 86,505, 32,
72, 26,342,150,313,490,431,238,411,325,149,473, 40,119,174,355,
185,233,389, 71,448,273,372, 55,110,178,322, 12,469,392,369,190,
1,109,375,137,181, 88, 75,308,260,484, 98,272,370,275,412,111,
336,318, 4,504,492,259,304, 77,337,435, 21,357,303,332,483, 18,
47, 85, 25,497,474,289,100,269,296,478,270,106, 31,104,433, 84,
414,486,394, 96, 99,154,511,148,413,361,409,255,162,215,302,201,
266,351,343,144,441,365,108,298,251, 34,182,509,138,210,335,133,
311,352,328,141,396,346,123,319,450,281,429,228,443,481, 92,404,
485,422,248,297, 23,213,130,466, 22,217,283, 70,294,360,419,127,
312,377, 7,468,194, 2,117,295,463,258,224,447,247,187, 80,398,
284,353,105,390,299,471,470,184, 57,200,348, 63,204,188, 33,451,
97, 30,310,219, 94,160,129,493, 64,179,263,102,189,207,114,402,
438,477,387,122,192, 42,381, 5,145,118,180,449,293,323,136,380,
43, 66, 60,455,341,445,202,432, 8,237, 15,376,436,464, 59,461
};
During the encryption we are accessing this array very frequently.
One optimization which I had done moving this array from header file to local function so that some cache miss will not happened.
Any suggestion to more optimize this either by changing this array to any other data structure?

that array is not huge. a typical L1 cache is at least 10s of kB (that's the total memory on, say, an apple ii). and moving the array from a header to a function is not going to change cache locality.
storing it in the appropriate form (as in comments) may make sense (it's going to fit in l1 cache, but if you have other data, perhaps used by another thread, there's more chance of it staying there) - there's no need for more than 2 bytes per value (but i have no idea if that introduces extra cost compared to using native size ints).
if this is really critical, you should look at the code generated and optimize that.

First of all, make sure you declare those arrays as const, so that the compiler knows they'll never change.
Second, as Oli Charlesworth suggests in the comments, you don't really need a full int to store each value. The elements of the S7 and S9 arrays are 7-bit and 9-bit unsigned integers, so either of int8_t or uint8_t should be enough for S7, and either of int16_t or uint16_t for S9. (You may want to benchmark whether there's any difference between using signed or unsigned types, although I wouldn't really expect any.)
Finally, if you really want to get rid of the arrays entirely, it's also possible to implement the KASUMI S-boxes directly without any lookup tables, using bit operations (specifically, AND and XOR). For details, see pages 13–16 of the KASUMI specification. However, I strongly suspect that this will not be useful for a software implementation, unless you're using bit-slicing to encrypt many blocks in parallel.

Related

Viability of using the "tv_nsec" nanosecond component returned by "timespec_get()" on Linux as random number generator in C?

With the following simple code snippet:
struct timespec ts;
for (int i = 0; i < 100; i++) {
timespec_get(&ts, TIME_UTC);
printf("%ld, ", ts.tv_nsec % 100);
}
I get output like this:
58, 1, 74, 49, 5, 59, 89, 20, 52, 86, 17, 48, 79, 10, 41, 73, 3, 40, 72, 3, 36, 67, 98, 30, 61, 92, 24, 55, 86, 17, 49, 82, 14, 45, 76, 7, 40, 72, 3, 36, 71, 2, 35, 66, 97, 28, 66, 97, 28, 60, 90, 22, 52, 83, 15, 46, 77, 7, 41, 72, 3, 36, 67, 0, 44, 17, 82, 13, 45, 77, 8, 59, 90, 22, 54, 85, 17, 48, 80, 12, 43, 75, 6, 57, 89, 20, 52, 84, 15, 47, 79, 14, 50, 82, 16, 47, 79, 11, 43, 74,
I haven't studied the statistical distribution of the numbers and my searches have turned up blank, but the output does at first glance look similar to output of rand() or random(). Has anyone studied this or is able to express an opinion - could timespec_get() be used as random number generator? Would it be pseudo random or not? Why?
could timespec_get() be used as random number generator?
Of course. But that doesn't mean the output of such a RNG would have desirable or even acceptable statistical properties.
In particular, successive outputs are strongly correlated with each other. Your example hides that, somewhat, by discarding all the most-significant decimal digits. Additionally, the system clock is not required to have single-nanosecond resolution, though yours seems to have. In a system that didn`t have such resolution, the least-significant digits of all results would likely be correlated, and their distribution non-uniform.
Would it be pseudo random or not? Why?
No, actually. The output of a PRNG is deterministic with respect to the runtime state of the calling program at the time of the call. timespec_get(), on the other hand, depends on the program's execution context, not its own state.
The code you have provided is almost certainly guaranteed not to provide (pseudo-)random numbers!
Why?
Consider running this on an efficient CPU that can dedicate 100% of its time to your code (and with nothing else of 'significant impact') going on in the OS background: each run of the for loop executes an identical instruction sequence, so the intervals between successive calls to timespec_get will all be very similar - and a list of numbers with continuously similar intervals is certainly not random.
Even a fairly cursory glance through your generated number list shows that the only time a number is less than its precursor is when the value "rolls over" the 100 mark (this effect will be more noticeable if you increase your modulus from 100 to, say, 500 and run the test again).
could timespec_get() be used as random number generator?
I tried calling timespec_get(&ts, TIME_UTC); multiple times and received delta values of about 14 +/- 1 ns. To me this implies at best a non-predictable-ness (random-ness) of 1 bit per call (given the variability in the delta), not the 7 to 8 bits hoped for with timespec_get(&ts, TIME_UTC); ts.tv_nsec % 100. At worst, there is nearly zero bits of randomness.
.tv_nsec and .tv_sec could be used to initialize a random engine, but as as a source, it is very weak.
Would it be pseudo random or not? Why?
No. A PRNG is deterministic. Reading time is not deterministic enough.

When trying to remove just one element in a nested numpy array the whole subarray gets deleted

I have a 3 dimensional numpy array (temp_X) like:
[ [[23,34,45,56],[34,45,67,78],[23,45,67,78]],
[[12,43,65,43],[23,54,67,87],[12,32,34,43]],
[[43,45,86,23],[23,45,56,23],[12,23,65,34]] ]
I want to remove the 1st element of each 3rd sub-array (highlighted values).
shown below is the code that i tried:
for i in range(len(temp_X)):
temp_X = np.delete(temp_X[i][(len(temp_X[i]) - 1)], [0])
Somehow when I run the code the whole array gets deleted except for 3 values. Any help is much appreciated. Thank you in advance.
With a as the 3D input array, here's one way -
m = np.prod(a.shape[1:])
n = m-a.shape[-1]
out = a.reshape(a.shape[0],-1)[:,np.r_[:n,n+1:m]]
Alternative to last step with boolean-indexing -
out = a.reshape(a.shape[0],-1)[:,np.arange(m)!=n]
Sample input, output -
In [285]: a
Out[285]:
array([[[23, 34, 45, 56],
[34, 45, 67, 78],
[23, 45, 67, 78]],
[[12, 43, 65, 43],
[23, 54, 67, 87],
[12, 32, 34, 43]],
[[43, 45, 86, 23],
[23, 45, 56, 23],
[12, 23, 65, 34]]])
In [286]: out
Out[286]:
array([[23, 34, 45, 56, 34, 45, 67, 78, 45, 67, 78],
[12, 43, 65, 43, 23, 54, 67, 87, 32, 34, 43],
[43, 45, 86, 23, 23, 45, 56, 23, 23, 65, 34]])
Here's another with mask creation to mask along the last two axes -
mask = np.ones(a.shape[-2:],dtype=bool)
mask[-1,0] = 0
out = np.moveaxis(a,0,-1)[mask].T

Logarithmic scale step

I'm building a keyboard light with AVR micro controller.
There are two buttons, BRIGHT and DIM, and a white LED.
The LED isn't really linear, so I need to use a logarithmic scale (increase brightness faster in higher values, and use tiny steps in lower).
To do that, I adjust the delay between 1 is added or subtracted to/from the PWM compare match control register.
while (1) {
if (btn_high() && OCR0A < 255) OCR0A += 1;
if (btn_low() && OCR0A > 0) OCR0A -= 1;
if (OCR0A < 25)
_delay_ms(30);
else if (OCR0A < 50)
_delay_ms(25);
else if (OCR0A < 128)
_delay_ms(17);
else
_delay_ms(5);
}
It works nice, but there's a visible step when it goes from one speed to another. It'd be much better if the delay adjusted smoothly.
Is there some simple formula I can use?
It must not contain division, modulo, sqrt, log or any other advanced math. I can use multiplication, add, sub, and bit operations. Also, I can't use float in it.
Or perhaps just some kind of lookup table? I'm not really happy with adding more branches to this if-else mess.
The posted transfer function is quite linear. Suggest a linear delay calculation.
delay = 32 - OCR0A/8;
After accept edit
Various look-up-tables lend themselves to a close fit simple equations (constructed to avoid intermediate values > 65535) such as
BRIGHTNESS_60 = (((index*index)>>2 + 128)*index)>>8;
The scaling isn't quite logarithmic so simply using log() isn't enough.
I have tackled this problem in the past by using a LUT with 18 entries and going an entire step at a time (i.e. the control variable varies from 0 to 17 and then is shoved through the LUT), but if finer control is required then having 52 or more is certainly doable. Make sure to put it in flash so that it doesn't consume any SRAM though.
Edit by MightyPork
Here's arrays I used in the end - obtained from the original array by linear interpolation.
Basic
#define BRIGHTNESS_LEN 60
const uint8_t BRIGHTNESS[] PROGMEM = {
0, 1, 1, 2, 2, 2, 3, 4, 4, 5, 6, 6, 7, 8, 9,
10, 11, 13, 14, 16, 18, 21, 24, 27, 30, 32,
35, 38, 40, 42, 45, 48, 50, 54, 58, 61, 65,
69, 72, 76, 80, 85, 90, 95, 100, 106, 112,
119, 125, 134, 142, 151, 160, 170, 180, 190,
200, 214, 228, 241, 255
};
Smoother
#define BRIGHTNESS_LEN 121
const uint8_t BRIGHTNESS[] PROGMEM = {
0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5,
6, 6, 6, 7, 7, 8, 8, 8, 9, 10, 10, 10, 11, 12, 13, 14, 14,
15, 16, 17, 18, 20, 21, 22, 24, 26, 27, 28, 30, 31, 32, 34,
35, 36, 38, 39, 40, 41, 42, 44, 45, 46, 48, 49, 50, 52, 54,
56, 58, 59, 61, 63, 65, 67, 69, 71, 72, 74, 76, 78, 80, 82,
85, 88, 90, 92, 95, 98, 100, 103, 106, 109, 112, 116, 119,
122, 125, 129, 134, 138, 142, 147, 151, 156, 160, 165, 170,
175, 180, 185, 190, 195, 200, 207, 214, 221, 228, 234, 241,
248, 255
};
It sounds like you really want to use some linear function of a logarithm, but without the overhead of the floating point math library. A crude fixed point logarithm can be coded as
uint_8 log2fix(uint_8 in)
{
if(in == 0)
return 0;
uint_8 out = 0;
while(in > 0)
{
in = in >> 1;
out++;
}
return out - 1;
}
This will give you a rough approximation. If you want more precision there is a fast fixed point algorithm that you should be able to modify for Q8.0 to Q3.5.
You have over-complicated the issue. You have already turned the logarithmic problem into a linear one by defining a variable update rate rather than a variable PWM step - so you have essentially solved the problem, but not seen the simple arithmetic relationship.
If you take the OCR0A vs delay points you have selected (25,30), (50,25), (128,17), it can be seen that that is an approximately linear relationship described by (approximately) y = 0.125x + 32, which can be rearranged as y = 32 - x / 8
So what you need is:
while (1)
{
if (btn_high() && OCR0A < 255) OCR0A += 1;
if (btn_low() && OCR0A > 0) OCR0A -= 1;
_delay_ms( 32 - OCR0A / 8 ) ;
}

Defining long arrays in C

I want to define a very long array in C and at the mean time write the elements not side by side but in a vertical manner. The code block I typed will illustrate the situation. Which character should I use at the end of the line in order to continue the definition of the array "/" works but the preceding and following elements are not printed and a zero or a one is printed instead. How can I accomplish this?
int i, grades[40] = {49, 80, 84, 73, 89, 78, 78, 92, 56, 85, 10, 84, 59, 56
62, 53, 83, 81, 65, 81, 69, 69, 53, 55, 77, 82, 81, 76, 79, 83, 74, 86
78, 55, 66, 60, 68, 92, 87, 86};
Thanks for your contribution. I am on Ubuntu 12.04 by the way (thought the end line character may be different for Ubuntu and Windows).
Edit: Comma was the culprit. Sorry to take your time.
Escaping new-lines is only necessary in pre-processor directives and inside strings. Just enter it with all the white-space you like. As long as the compiler doesn't complain you should be good.
As discussed in the other answer the compiler is smart enough to recognize if a statement extends over several lines.
However, there actually is an explicit way to tell the compiler that the statement continues in the next line : \.
In your example you used a slash (/) instead of a backslash and forgot the ,. This lead to an integer division resulting in the 0 and 1 you observed.
If you want to use the '\' you could write the code like this:
int i, grades[40] = {49, 80, 84, 73, 89, 78, 78, 92, 56, 85, 10, 84, 59, 56, \
62, 53, 83, 81, 65, 81, 69, 69, 53, 55, 77, 82, 81, 76, 79, 83, 74, 86, \
78, 55, 66, 60, 68, 92, 87, 86};
In most cases, C compilers don't care about whitespace ( except for strings, etc. ). That means that as long as the syntax is still valid, you can have as much whitespace as you'd like. The only constraint is that each token has to be separated by at least one whitespace.
So, in short, you can write out the array with as many spaces, tabs, and newlines as you'd like, provided there is a comma after every element before the last one.

Rand() seems to not work properly [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why do I always get the same sequence of random numbers with rand()?
I've been experimenting with generating random numbers in C, and I've come across something weird. I don't know if it's only on my compiler but whenever I try to generate a pseudo-random number with the rand() function, it returns a very predictable number — the number generated with the parameter before plus 3.125 to be exact. It's hard to explain but here's an example.
srand(71);
int number = rand();
printf("%d", number);
This returns 270.
srand(72);
int number = rand();
printf("%d", number);
This returns 273.
srand(73);
int number = rand();
printf("%d", number);
This returns 277.
srand(74);
int number = rand();
printf("%d", number);
This returns 280.
Every eighth number is 4 higher. Otherwise it's 3.
This can't possibly be right. Is there something wrong with my compiler?
Edit: I figured it out — I created a function where I seed only once, then I loop the rand() and it generates random numbers. Thank you all!
The confusion here is about how pseudorandom number generators work.
Pseudorandom number generators like C's rand work by having a number representing the current 'state'. Every time the rand function is called, some deterministic computations are done on the 'state' number to produce the next 'state' number. Thus, if the generator is given the same input (the same 'state'), it will produce the same output.
So, when you seed the generator with srand(74), it will always generate the same string of numbers, every time. When you seed the generator with srand(75), it will generate a different string of numbers, etc.
The common way to ensure different output each time is to always provide a different seed, usually done by seeding the generator with the current time in seconds/milliseconds, e.g. srand(time(NULL)).
EDIT: Here is a Python session demonstrating this behavior. It is entirely expected.
>>> import random
If we seed the generator with the same number, it will always output the same sequence:
>>> random.seed(500)
>>> [random.randint(0, 100) for _ in xrange(20)]
[80, 95, 58, 25, 76, 37, 80, 34, 57, 79, 1, 33, 40, 29, 92, 6, 45, 31, 13, 11]
>>> random.seed(500)
>>> [random.randint(0, 100) for _ in xrange(20)]
[80, 95, 58, 25, 76, 37, 80, 34, 57, 79, 1, 33, 40, 29, 92, 6, 45, 31, 13, 11]
>>> random.seed(500)
>>> [random.randint(0, 100) for _ in xrange(20)]
[80, 95, 58, 25, 76, 37, 80, 34, 57, 79, 1, 33, 40, 29, 92, 6, 45, 31, 13, 11]
If we give it a different seed, even a slightly different one, the numbers will be totally different from the old seed, yet still the same if the same (new) seed is used:
>>> random.seed(501)
>>> [random.randint(0, 100) for _ in xrange(20)]
[64, 63, 24, 81, 33, 36, 72, 35, 95, 46, 37, 2, 76, 21, 46, 68, 47, 96, 39, 36]
>>> random.seed(501)
>>> [random.randint(0, 100) for _ in xrange(20)]
[64, 63, 24, 81, 33, 36, 72, 35, 95, 46, 37, 2, 76, 21, 46, 68, 47, 96, 39, 36]
>>> random.seed(501)
>>> [random.randint(0, 100) for _ in xrange(20)]
[64, 63, 24, 81, 33, 36, 72, 35, 95, 46, 37, 2, 76, 21, 46, 68, 47, 96, 39, 36]
How do we make our program have different behavior each time? If we supply the same seed, it will always behave the same. We can use the time.time() function, which will yield a different number each time we call it:
>>> import time
>>> time.time()
1347917648.783
>>> time.time()
1347917649.734
>>> time.time()
1347917650.835
So if we keep re-seeding it with a call to time.time(), we will get a different sequence of numbers each time, because the seed is different each time:
>>> random.seed(time.time())
>>> [random.randint(0, 100) for _ in xrange(20)]
[60, 75, 60, 26, 19, 70, 12, 87, 58, 2, 79, 74, 1, 79, 4, 39, 62, 20, 28, 19]
>>> random.seed(time.time())
>>> [random.randint(0, 100) for _ in xrange(20)]
[98, 45, 85, 1, 67, 25, 30, 88, 17, 93, 44, 17, 94, 23, 98, 32, 35, 90, 56, 35]
>>> random.seed(time.time())
>>> [random.randint(0, 100) for _ in xrange(20)]
[44, 17, 10, 98, 18, 6, 17, 15, 60, 83, 73, 67, 18, 2, 40, 76, 71, 63, 92, 5]
Of course, even better than constantly re-seeding it is to seed it once and keep going from there:
>>> random.seed(time.time())
>>> [random.randint(0, 100) for _ in xrange(20)]
[94, 80, 63, 66, 31, 94, 74, 15, 20, 29, 76, 90, 50, 84, 43, 79, 50, 18, 58, 15]
>>> [random.randint(0, 100) for _ in xrange(20)]
[30, 53, 75, 19, 35, 11, 73, 88, 3, 67, 55, 43, 37, 91, 66, 0, 9, 4, 41, 49]
>>> [random.randint(0, 100) for _ in xrange(20)]
[69, 7, 25, 68, 39, 57, 72, 51, 33, 93, 81, 89, 44, 61, 78, 77, 43, 10, 33, 8]
Every invocation of rand() returns the next number in a predefined sequence where the starting number is the seed supplied to srand(). That' why it's called a pseudo-random number generator, and not a random number generator.
rand() is implemented by a pseudo random number generator.
The distribution of numbers generated by consecutive calls to rand() have the properties of being random numbers, but the order is pre-determined.
The 'start' number is determined by the seed that you provide.
You should give a PRNG a single seed only. Providing it with multiple seeds can radically alter the randomness of the generator. In addition, providing it the same seed over and over removes all randomness.
Generating a "random" number regardless of the implementation is dependent on a divergent infinite sequence. The infinite sequence is generated using the seed of the random function and it is actually pseudo random because of its nature. This would explain to you why your number is actually very dependent on the seed that you give the function.
In some implementations the sequence is only one and the seed is the starting member of the sequence. In others there are difference sequences depending on the seed. If a seed is not provided then the seed is determined by the internal "clock".
The number is truncated when using an upper and lower bounds for your random number by respectively doing randValue % upperBound and randValue + lowerBound. Random implementation is very similar to Hash Functions. Depending on architecture the upper bound of the random value is set depending on what it the largest integer/double that it can carry out if not set lower by the user.

Resources