I am working with cryptography and need to use some really large numbers. I am also using the new Intel instruction for carryless multiplication that requires m128i data type which is done by loading it with a function that takes in floating point data as its arguments.
I need to store 2^1223 integer and then square it and store that value as well.
I know I can use the GMP library but I think it would be faster to create two data types that both store values like 2^1224 and 2^2448. It will have less overhead.I am going to using karatsuba to multiply the numbers so the only operation I need to perform on the data type is addition as I will be breaking the number down to fit m128i.
Can someone direct me in the direction towards material that can help me create the size of integer I need.
If you need your own datatypes (regardless of whether it's for math, etc), you'll need to fall back to structures and functions. For example:
struct bignum_s {
char bignum_data[1024];
}
(obviously you want to get the sizing right, this is just an example)
Most people end up typedefing it as well:
typedef struct bignum_s bignum;
And then create functions that take two (or whatever) pointers to the numbers to do what you want:
/* takes two bignums and ORs them together, putting the result back into a */
void
bignum_or(bignum *a, bignum *b) {
int i;
for(i = 0; i < sizeof(a->bignum_data); i++) {
a->bignum_data[i] |= b->bignum_data[i];
}
}
You really want to end up defining nearly every function you might need, and this frequently includes memory allocation functions (bignum_new), memory freeing functions (bignum_free) and init routines (bignum_init). Even if you don't need them now, doing it in advance will set you up for when the code needs to grow and develop later.
Related
Is it possible to create arrays based of their index as in
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y] = someNr;
dynamically/on the run, without creating foo[0...3][0...4]?
If not, is there a data structure that allow me to do something similar to this in C?
No.
As written your code make no sense at all. You need foo to be declared somewhere and then you can index into it with foo[x][y] = someNr;. But you cant just make foo spring into existence which is what it looks like you are trying to do.
Either create foo with correct sizes (only you can say what they are) int foo[16][16]; for example or use a different data structure.
In C++ you could do a map<pair<int, int>, int>
Variable Length Arrays
Even if x and y were replaced by constants, you could not initialize the array using the notation shown. You'd need to use:
int fixed[3][4] = { someNr };
or similar (extra braces, perhaps; more values perhaps). You can, however, declare/define variable length arrays (VLA), but you cannot initialize them at all. So, you could write:
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y];
for (int i = 0; i < x; i++)
{
for (int j = 0; j < y; j++)
foo[i][j] = someNr + i * (x + 1) + j;
}
Obviously, you can't use x and y as indexes without writing (or reading) outside the bounds of the array. The onus is on you to ensure that there is enough space on the stack for the values chosen as the limits on the arrays (it won't be a problem at 3x4; it might be at 300x400 though, and will be at 3000x4000). You can also use dynamic allocation of VLAs to handle bigger matrices.
VLA support is mandatory in C99, optional in C11 and C18, and non-existent in strict C90.
Sparse arrays
If what you want is 'sparse array support', there is no built-in facility in C that will assist you. You have to devise (or find) code that will handle that for you. It can certainly be done; Fortran programmers used to have to do it quite often in the bad old days when megabytes of memory were a luxury and MIPS meant millions of instruction per second and people were happy when their computer could do double-digit MIPS (and the Fortran 90 standard was still years in the future).
You'll need to devise a structure and a set of functions to handle the sparse array. You will probably need to decide whether you have values in every row, or whether you only record the data in some rows. You'll need a function to assign a value to a cell, and another to retrieve the value from a cell. You'll need to think what the value is when there is no explicit entry. (The thinking probably isn't hard. The default value is usually zero, but an infinity or a NaN (not a number) might be appropriate, depending on context.) You'd also need a function to allocate the base structure (would you specify the maximum sizes?) and another to release it.
Most efficient way to create a dynamic index of an array is to create an empty array of the same data type that the array to index is holding.
Let's imagine we are using integers in sake of simplicity. You can then stretch the concept to any other data type.
The ideal index depth will depend on the length of the data to index and will be somewhere close to the length of the data.
Let's say you have 1 million 64 bit integers in the array to index.
First of all you should order the data and eliminate duplicates. That's something easy to achieve by using qsort() (the quick sort C built in function) and some remove duplicate function such as
uint64_t remove_dupes(char *unord_arr, char *ord_arr, uint64_t arr_size)
{
uint64_t i, j=0;
for (i=1;i<arr_size;i++)
{
if ( strcmp(unord_arr[i], unord_arr[i-1]) != 0 ){
strcpy(ord_arr[j],unord_arr[i-1]);
j++;
}
if ( i == arr_size-1 ){
strcpy(ord_arr[j],unord_arr[i]);
j++;
}
}
return j;
}
Adapt the code above to your needs, you should free() the unordered array when the function finishes ordering it to the ordered array. The function above is very fast, it will return zero entries when the array to order contains one element, but that's probably something you can live with.
Once the data is ordered and unique, create an index with a length close to that of the data. It does not need to be of an exact length, although pledging to powers of 10 will make everything easier, in case of integers.
uint64_t* idx = calloc(pow(10, indexdepth), sizeof(uint64_t));
This will create an empty index array.
Then populate the index. Traverse your array to index just once and every time you detect a change in the number of significant figures (same as index depth) to the left add the position where that new number was detected.
If you choose an indexdepth of 2 you will have 10² = 100 possible values in your index, typically going from 0 to 99.
When you detect that some number starts by 10 (103456), you add an entry to the index, let's say that 103456 was detected at position 733, your index entry would be:
index[10] = 733;
Next entry begining by 11 should be added in the next index slot, let's say that first number beginning by 11 is found at position 2023
index[11] = 2023;
And so on.
When you later need to find some number in your original array storing 1 million entries, you don't have to iterate the whole array, you just need to check where in your index the first number starting by the first two significant digits is stored. Entry index[10] tells you where the first number starting by 10 is stored. You can then iterate forward until you find your match.
In my example I employed a small index, thus the average number of iterations that you will need to perform will be 1000000/100 = 10000
If you enlarge your index to somewhere close the length of the data the number of iterations will tend to 1, making any search blazing fast.
What I like to do is to create some simple algorithm that tells me what's the ideal depth of the index after knowing the type and length of the data to index.
Please, note that in the example that I have posed, 64 bit numbers are indexed by their first index depth significant figures, thus 10 and 100001 will be stored in the same index segment. That's not a problem on its own, nonetheless each master has his small book of secrets. Treating numbers as a fixed length hexadecimal string can help keeping a strict numerical order.
You don't have to change the base though, you could consider 10 to be 0000010 to keep it in the 00 index segment and keep base 10 numbers ordered, using different numerical bases is nonetheless trivial in C, which is of great help for this task.
As you make your index depth become larger, the amount of entries per index segment will be reduced
Please, do note that programming, especially lower level like C consists in comprehending the tradeof between CPU cycles and memory use in great part.
Creating the proposed index is a way to reduce the number of CPU cycles required to locate a value at the cost of using more memory as the index becomes larger. This is nonetheless the way to go nowadays, as masive amounts of memory are cheap.
As SSDs' speed become closer to that of RAM, using files to store indexes is to be taken on account. Nevertheless modern OSs tend to load in RAM as much as they can, thus using files would end up in something similar from a performance point of view.
This is an extension of the previously asked question: link. In a short, I am trying to convert a C program into Matlab and looking for your suggestion to improve the code as the code is not giving the correct output. Did I convert xor the best way possible?
C Code:
void rc4(char *key, char *data){
://Other parts of the program
:
:
i = j = 0;
int k;
for (k=0;k<strlen(data);k++){
:
:
has[k] = data[k]^S[(S[i]+S[j]) %256];
}
int main()
{
char key[] = "Key";
char sdata[] = "Data";
rc4(key,sdata);
}
Matlab code:
function has = rc4(key, data)
://Other parts of the program
:
:
i=0; j=0;
for k=0:length(data)-1
:
:
out(k+1) = S(mod(S(i+1)+S(j+1), 256)+1);
v(k+1)=double(data(k+1))-48;
C = bitxor(v,out);
data_show =dec2hex(C);
has = data_show;
end
It looks like you're doing bitwise XOR on 64-bit doubles. [Edit: or not, seems I forgot bitxor() will do an implicit conversion to integer - still, an implicit conversion may not always do what you expect, so my point remains, plus it's far more efficient to store 8-bit integer data in the appropriate type rather than double]
To replicate the C code, if key, data, out and S are not already the correct type you can either convert them explicitly - with e.g. key = int8(key) - or if they're being read from a file even better to use the precision argument to fread() to create them as the correct type in the first place. If this is in fact already happening in the not-shown code then you simply need to remove the conversion to double and let v be int8 as well.
Second, k is being used incorrectly - Matlab arrays are 1-indexed so either k needs to loop over 1:length(data) or (if the zero-based value of k is used as i and j are) then you need to index data by k+1.
(side note: who is x and where did he come from?)
Third, you appear to be constructing v as an array the same size of data - if this is correct then you should take the bitxor() and following lines outside the loop. Since they work on entire arrays you're needlessly repeating this every iteration instead of doing it just once at the end when the arrays are full.
As a general aside, since converting C code to Matlab code can sometimes be tricky (and converting C code to efficient Matlab code very much more so), if it's purely a case of wanting to use some existing non-trivial C code from within Matlab then it's often far easier to just wrap it in a MEX function. Of course if it's more of a programming exercise or way to explore the algorithm, then the pain of converting it, trying to vectorise it well, etc. is worthwhile and, dare I say it, (eventually) fun.
Note: There are posts similar to this for C++ only, I didn't find any useful post in regards to C.
I want to set the array of elements with the same value. Of course, this can be achieved simply using a for loop.
But, that consumes a lot of time. Because, in my algorithm this setting array with same value takes place many number of times. Is there any simple way to achieve this in C.
Use a for loop. Any decent compiler will optimize this as much as possible.
It is a near certainty that you wouldn't be able to improve substantially on the speed of your for loop. There is no magic way to set a value into multiple memory locations faster than it takes to store that value into these multiple memory locations. Regardless of whether you use the for loop or not, all the locations must be written to, which takes most of the time.
There is of course the void * memset ( void * ptr, int value, size_t num ); for values composed of identical bytes1, but under the hood it has a loop. Perhaps the implementation could be very smart about using that loop, but so can the optimizing compiler.
1 Although memset takes an int, it converts it to unsigned char before setting it into the memory region.
As suggested by other users, use memset if you want to initiate your array with 0 values, but don't do it if the values are not that simple.
For more complicated values, you can have a constant copy of your initial values and copy them later with memcpy:
float original_values[100]; // don't modify these values
original_values[0] = 1.2f;
original_values[1] = 10.9f;
...
float working_values[100]; // work with these values
memcpy(working_values, original_values, 100 * sizeof(float));
// do your task
working_values[0] *= working_values[1];
...
You can use memset() . It fills no of bytes you want to fill with same byte value.Here
you can read man page.
You can use memset() function
Example:
memset(<array-name>,<initialization-value>,<len>);
You can easily memset an array to 0.
If you want a different value, it all depends on the type used.
For char* arrays you can memset them to any value, since char is almost always one byte long.
For an array of structures, if all fields of a structure are to be initialized with 0 or NULL, you can memset it with 0.
You can not memset an array or array of structures to any value other than 0, because memset operates on single bytes. So if you memset an int[] with 1, you will not have an array of 1's.
To initialize an array of structures with a custom value, just fill one structure with the desired data and do an assignment it in a for. The compiler should do it relatively efficiently for you.
If you are talking about initialization see this question. If you want to set the values at a later time then use memset
Well you can only set your values to zero for a particular array. Here is an example
int arr[5]={0};
Given following data, what is the best way to organize an array of elements so that the fastest random access will be possible?
Each element has some int number, a name of 3 characters with '\0' at the end, and a floating point value.
I see two possible methods to organize and access such array:
First:
typedef struct { int num; char name[4]; float val; } t_Element;
t_Element array[900000000];
//random access:
num = array[i].num;
name = array[i].name;
val = array[i].val;
//sequential access:
some_cycle:
num = array[i].num
i++;
Second:
#define NUMS 0
#define NAMES 1
#define VALS 2
#define SIZE (VALS+1)
int array[SIZE][900000000];
//random access:
num = array[NUMS][i];
name = (char*) array[NAMES][i];
val = (float) array[VALS][i];
//sequential access:
p_array_nums = &array[NUMS][i];
some_cycle:
num = *p_array_nums;
p_array_nums++;
My question is, what method is faster and why? My first thought was the second method makes fastest code and allows fastest block copy, but I doubt whether it saves any sensitive number of CPU instructions in comparison to the first method?
It depends on the common access patterns. If you plan to iterate over the data, accessing every element as you go, the struct approach is better. If you plan to iterate independently over each component, then parallel arrays are better.
This is not a subtle distinction, either. With main memory typically being around two orders of magnitude slower than L1 cache, using the data structure that is appropriate for the usage pattern can possibly triple performance.
I must say, though, that your approach to implementing parallel arrays leaves much to be desired. You should simply declare three arrays instead of getting "clever" with two-dimensional arrays and casting:
int nums[900000000];
char names[900000000][4];
float vals[900000000];
Impossible to say. As with any performance related test, the answer my vary by any one or more of your OS, your CPU, your memory, your compiler etc.
So you need to test for yourself. Set your performance targets, measure, optimise, repeat.
The first one is probably faster, since memory access latency will be the dominant factor in performance. Ideally you should access memory sequentially and contiguously, to make best use of loaded cache lines and reduce cache misses.
Of course the access pattern is critical in any such discussion, which is why sometimes it's better to use SoA (structure of arrays) and other times AoS (array of structures), at least when performance is critical.
Most of the time of course you shouldn't worry about such things (premature optimisation, and all that).
I won't go into details, but I'm attempting to implement an algorithm similar to the Boyer-Moore-Horspool algorithm, only using hex color values instead of characters (i.e., there is a much greater range).
Following the example on Wikipedia, I originally had this:
size_t jump_table[0xFFFFFF + 1];
memset(jump_table, default_value, sizeof(jump_table);
However, 0xFFFFFF is obviously a huge number and this quickly causes C to seg-fault (but not stack-overflow, disappointingly).
Basically, what I need is an efficient associative array mapping integers to integers. I was considering using a hash table, but having a malloc'd struct for each entry just seems overkill to me (I also do not need hashes generated, as each key is a unique integer and there can be no duplicate entries).
Does anyone have any alternatives to suggest? Am I being overly pragmatic about this?
Update
For those interested, I ended up using a hash table via the uthash library.
0xffffff is rather too large to put on the stack on most systems, but you absolutely can malloc a buffer of that size (at least on current computers; not so much on a smartphone). Whether or not you should do it for this task is a separate issue.
Edit: Based on the comment, if you expect the common case to have a relatively small number of entries other than the "this color doesn't appear in the input" skip value, you should probably just go ahead and use a hash map (obviously only storing values that actually appear in the input).
(ignore earlier discussion of other data structures, which was based on an incorrect recollection of the algorithm under discussion -- you want to use a hash table)
If the array you were going to make (of size 0xFFFFFF) was going to be sparse you could try making a smaller array to act as a simple hash table, with the size being 0xFFFFFF / N and the hash function being hexValue / N (or hexValue % (0xFFFFFF / N)). You'll have to be creative to handle collisions though.
This is the only way I can foresee getting out of mallocing structs.
You can malloc(3) 0xFFFFFF blocks of size_t on the heap (for simplicity), and address them as you do with an array.
As for the stack overflow. Basically the program receives a SIGSEGV, which can be a result of a stack overflow or accessing illegal memory or writing on a read-only segment etc... They are all abstracted under the same error message "Segmentation fault".
But why don't you use a higher level language like python that supports associate arrays?
At possibly the cost of some speed, you could try modifying the algorithm to find only matches that are aligned to some boundary (every three or four symbols), then perform the search at byte level.
You could create a sparse array of sorts which has "pages" like this (this example uses 256 "pages", so the upper most byte is the page number):
int *pages[256];
/* call this first to make sure all of the pages start out NULL! */
void init_pages(void) {
for(i = 0; i < 256; ++i) {
pages[i] = NULL;
}
}
int get_value(int index) {
if(pages[index / 0x10000] == NULL) {
pages[index / 0x10000] = calloc(0x10000, 1); /* calloc so it will zero it out */
}
return pages[index / 0x10000][index % 0x10000];
}
void set_value(int index, int value) {
if(pages[index / 0x10000] == NULL) {
pages[index / 0x10000] = calloc(0x10000, 1); /* calloc so it will zero it out */
}
pages[index / 0x10000][index % 0x10000] = value;
}
this will allocate a page the first time it is touched, read or write.
To avoid the overhead of malloc you can use a hashtable where the entries in the table are your structs, assuming they are small. In your case a pair of integers should suffice, with a special value to indicate emptyness of the slot in the table.
How many values are there in your output space, i.e. how many different values do you map to in the range 0-0xFFFFF?
Using randomized universal hashing you can come up with a collision-free hash function with a table no bigger than 2 times the number of values in your output space (for a static table)