I have 10163 equations and 9000 unknowns, all over finite fields, like this style:
Of course my equation will be much larger than this, I have 10163 rows and 9000 different x.
Presented in the form of a matrix is AX=B. A is a 10163x9000 coefficient matrix and it may be sparse, X is a 9000x1 unknown vector, B is the result of their multiplication and mod 2.
Because of the large number of unknowns that need to be solved for, it can be time consuming. I'm looking for a faster way to solve this system of equations using C language.
I tried to use Gaussian elimination method to solve this equation, In order to make the elimination between rows more efficient, I store the matrix A in a 64-bit two-dimensional array, and let the last column of the array store the value of B, so that the XOR operation may reduce the calculating time.
The code I am using is as follows:
uint8_t guss_x_main[R_BITS] = {0};
uint64_t tmp_guss[guss_j_num];
for(uint16_t guss_j = 0; guss_j < x_weight; guss_j++)
{
uint64_t mask_1 = 1;
uint64_t mask_guss = (mask_1 << (guss_j % GUSS_BLOCK));
uint16_t eq_j = guss_j / GUSS_BLOCK;
for(uint16_t guss_i = guss_j; guss_i < R_BITS; guss_i++)
{
if((mask_guss & equations_guss_byte[guss_i][eq_j]) != 0)
{
if(guss_x_main[guss_j] == 0)
{
guss_x_main[guss_j] = 1;
for(uint16_t change_i = 0; change_i < guss_j_num; change_i++)
{
tmp_guss[change_i] = equations_guss_byte[guss_j][change_i];
equations_guss_byte[guss_j][change_i] =
equations_guss_byte[guss_i][change_i];
equations_guss_byte[guss_i][change_i] = tmp_guss[change_i];
}
}
else
{
GUARD(xor_64(equations_guss_byte[guss_i], equations_guss_byte[guss_i],
equations_guss_byte[guss_j], guss_j_num));
}
}
}
for(uint16_t guss_i = 0; guss_i < guss_j; guss_i++)
{
if((mask_guss & equations_guss_byte[guss_i][eq_j]) != 0)
{
GUARD(xor_64(equations_guss_byte[guss_i], equations_guss_byte[guss_i],
equations_guss_byte[guss_j], guss_j_num));
}
}
}
R_BIT = 10163, x_weight = 9000, GUSS_BLOCK = 64, guss_j_num = x_weight / GUSS_BLOCK + 1; equations_guss_byte is a two-dimensional array of uint64, where x_weight / GUSS_BLOCK column stores the matrix A and the latter column stores the vector B, xor_64() is used to XOR two arrays, GUARD() is used to check the correctness of function operation.
Using this method takes about 8 seconds to run on my machine. Is there a better way to speed up the calculation?
Im trying to do calculations on a big int number and then convert the result to a byte array, but I cannot figure out how to do so this is where Im at so far. anyone got any ideas
sum := big.NewInt(0)
for _, num := range balances {
sum = sum.Add(sum, num)
}
fmt.Println("total: ", sum)
phrase := []byte(sum)
phraseLen := len(phrase)
padNumber := 65 - phraseLen
Try using Int.Bytes() to get the byte array representation and Int.SetBytes([]byte) to set the value from a byte array. For example:
x := new(big.Int).SetInt64(123456)
fmt.Printf("OK: x=%s (bytes=%#v)\n", x, x.Bytes())
// OK: x=123456 (bytes=[]byte{0x1, 0xe2, 0x40})
y := new(big.Int).SetBytes(x.Bytes())
fmt.Printf("OK: y=%s (bytes=%#v)\n", y, y.Bytes())
// OK: y=123456 (bytes=[]byte{0x1, 0xe2, 0x40})
Note that the byte array value of big numbers is a compact machine representation and should not be mistaken for the string value, which can be retrieved by the usual String() method (or Text(int) for different bases) and set from a string value by the SetString(...) method:
a := new(big.Int).SetInt64(42)
a.String() // => "42"
b, _ := new(big.Int).SetString("cafebabe", 16)
b.String() // => "3405691582"
b.Text(16) // => "cafebabe"
b.Bytes() // => []byte{0xca, 0xfe, 0xba, 0xbe}
The cgo code below has a function to put a Go value in a C buffer, and two alternative functions to get it back; getViaGoBytes and getDirect.
Is getViaGoBytes any better than getDirect?
I assume not, and the intermediary slice created in getViaGoBytes is unnecessary.
Am I correct in thinking Go allocates enough memory when the uint64 y variable is declared, and the assignment to y copies the memory from C to Go?
package main
/*
char buf[8];
void put(char * input, int size) {
while (size--) {
buf[size] = input[size];
}
}
*/
import "C"
import "unsafe"
func put(input uint64) {
C.put((*C.char)(unsafe.Pointer(&input)), C.int(unsafe.Sizeof(input)))
}
func getViaGoBytes() uint64 {
var out uint64
data := C.GoBytes(unsafe.Pointer(&(C.buf[0])), C.int(unsafe.Sizeof(out)))
out = *(*uint64)(unsafe.Pointer(&data[0]))
return out
}
func getDirect() uint64 {
return *(*uint64)(unsafe.Pointer(&(C.buf[0])))
}
func main() {
var input uint64 = 1<<64 - 1
println(input)
put(input)
var x uint64 = getViaGoBytes()
println(x)
var y uint64 = getDirect()
println(y)
}
Marking question answered by copying JimB's answer from comment:
GoBytes copies a C allocated buffer into a slice with Go allocated
memory. If that's what you want, then use GoBytes. Here you're not
even keeping that copy, so there's no reason to do it.
Also, benchmark is interesting:
$ go test -bench . -benchmem
BenchmarkGoBytes-8 20000000 97.8 ns/op 32 B/op 3 allocs/op
BenchmarkDirect-8 2000000000 0.84 ns/op 0 B/op 0 allocs/op
PASS
I don't know Rust but I wanted to investigate the performance in scientific computing to compare it to Julia and Fortran. I managed to write the following program but the problem is that I get a runtime segmentation fault when MAX is larger than 1022. Any advice?
fn main() {
const MAX: usize = 1023;
let mut arr2: [[f64; MAX]; MAX] = [[0.0; MAX]; MAX];
let pi: f64 = 3.1415926535;
// compute something useless and put in matrix
for ii in 0.. MAX {
for jj in 0.. MAX {
let i = ii as f64;
let j = jj as f64;
arr2[ii][jj] = ((i + j) * pi * 41.0).sqrt().sin();
}
}
let mut sum0:f64 = 0.0;
//collapse to scalar like sum(sum(array,1),2) in other langs
for iii in 0..MAX {
let vec1:&[f64] = &arr2[iii][..];
sum0 += vec1.iter().sum();
}
println!("this {}", sum0);
}
So no error just 'Segmentaion fault' in the terminal. I'm using Ubuntu 16 and installed with the command on www.rustup.rs. It is stable version rustc 1.12.1 (d4f39402a 2016-10-19).
You have a Stack Overflow (how ironic, hey?).
There are two solutions to the issue:
Do not allocate a large array on the stack, use the heap instead (Vec)
Only do so on a large stack.
Needless to say, using a Vec is just much easier; and you can use a Vec[f64; MAX] if you wish.
If you insist on using the stack, then I will redirect you to this question.
Assume that we have an array of integers (3x3) depicted as follows:
+-+-+-+
| |1| |
+-+-+-+
|0|x|1|
+-+-+-+
| |0| |
+-+-+-+
(0,1) above is set to 1 and (1,0) is 0 etc.
Now assume that I find myself at (1,1) (at x), what would be the easiest method for me to come up with all the directions I can take (say all that have the value 0) and then among those choose one?
What I'm having trouble with is actually the step between choosing all valid directions and then choosing among those. I can do the two steps seperately fairly easily but I don't have a solution that feels elegant which combines the two.
E.g. I can multiply the value of each cell by a value representing 1,2,4 and 8 and or them all together. this would give me what directions I can take, but how to choose between them? Also I can easily randomize a number between 1 and 4 to choose a direction but if that direction is "taken" then I have to randomize again but excluding the direction that failed.
Any ideas?
The fastest solution is likely the last one you posted -- choose directions randomly, repeating until you get a valid one. That will take at most four tries (the worst case is when there is only one valid neighbor). Something more elegant is to iterate through all possible directions, updating a variable randomly at each valid neighbor, such as this pseudocode:
c = 1
r = invalid
for i in neighbors:
if (valid[i]):
if (rand() <= 1. / c): r = i
++c
and then r is the answer (c is the number of valid neighbors found so far).
Here's a very neat trick in pseudocode
Initialise your "current result" to nil
Initialise a "number found" to 0
Loop through all the possible directions. If it is valid then:
increment "number found"
set "current result" to the direction with probability 1/"number found"
At the end of this, you will have a valid direction (or nil if not found). If there are multiple valid directions, they will all be chosen with equal probability.
Assumed:
Each location has a set number of valid target locations (some location may have fewer valid targets, a chess-knight has fewer valid targets when placed in a corner than when in the middle of the board.)
You want to pick a random target from all available, valid moves.
Algorithm:
Create a bit-array with one one bit representing each valid target. (In the original example you would create a four bit array.)
For each valid target determine if the location is empty; set the corresponding bit in the bit-array to 1 if empty.
If bit-array > 0 then number_of_targets = SUM(bit-array), else return(No Valid Moves).
Pick random number between 1 and number_of_targets.
return(the location associated with the nth set bit in the bit-array)
Example using the the original question:
X has four valid moves. We create a 4-bit array and fill it in with '1' for each empty location; starting with the cell directly above and moving in a clockwise direction we end up with :0:0:1:1:
The sum of bits tells us we have two places we can move. Our random selection will choose either '1' or '2'. We move through the bit-array until we find the nth set bit and move to that location.
This algorithm will work for any system with any number of valid targets (not limited to 2-D). You can replace the Random number selector with a function that recursively returns the best move (MIN-MAX algorithm.)
A slighly contrived way might be this (pseudo-code):
Build the bit-mask as you describe, based on which neighbors are open.
Use that bit-mask as the index into an array of:
struct RandomData
{
size_t num_directions;
struct { signed int dx, dy; } deltas[4];
} random_data[16];
where num_directions is the number of open neighbors, and deltas[] tells you how to get to each neighbor.
This has a lot of fiddly data, but it does away with the looping and branching.
UPDATE: Okay, for some reason I had problems letting this idea go. I blame a certain amount of indoctrination about "data-driven programming" at work, since this very simple problem made me "get" the thought of data-driven-ness a bit better. Which is always nice.
Anyway, here's a complete, tested and working implementation of the random-stepping function using the above ideas:
/* Directions are ordered from north and going clockwise, and assigned to bits:
*
* 3 2 1 0
* WEST | SOUTH | EAST | NORTH
* 8 4 2 1
*/
static void random_walk(unsigned int *px, unsigned int *py, unsigned max_x, unsigned int max_y)
{
const unsigned int x = *px, y = *py;
const unsigned int dirs = ((x > 0) << 3) | ((y < max_y) << 2) |
((x < max_x) << 1) | (y > 0);
static const struct
{
size_t num_dirs;
struct { int dx, dy; } deltas[4];
} step_info[] = {
#define STEP_NORTH { 0, -1 }
#define STEP_EAST { 1, 0 }
#define STEP_SOUTH { 0, 1 }
#define STEP_WEST { -1, 0 }
{ 0 },
{ 1, { STEP_NORTH } },
{ 1, { STEP_EAST } },
{ 2, { STEP_NORTH, STEP_EAST } },
{ 1, { STEP_SOUTH } },
{ 2, { STEP_NORTH, STEP_SOUTH } },
{ 2, { STEP_EAST, STEP_SOUTH } },
{ 3, { STEP_NORTH, STEP_EAST, STEP_SOUTH } },
{ 1, { STEP_WEST } },
{ 2, { STEP_NORTH, STEP_WEST } },
{ 2, { STEP_EAST, STEP_WEST } },
{ 3, { STEP_NORTH, STEP_EAST, STEP_WEST } },
{ 2, { STEP_SOUTH, STEP_WEST } },
{ 3, { STEP_NORTH, STEP_SOUTH, STEP_WEST } },
{ 3, { STEP_EAST, STEP_SOUTH, STEP_WEST } },
{ 4, { STEP_NORTH, STEP_EAST, STEP_SOUTH, STEP_WEST } }
};
const unsigned int step = rand() % step_info[dirs].num_dirs;
*px = x + step_info[dirs].deltas[step].dx;
*py = y + step_info[dirs].deltas[step].dy;
}
int main(void)
{
unsigned int w = 16, h = 16, x = w / 2, y = h / 2, i;
struct timeval t1, t2;
double seconds;
srand(time(NULL));
gettimeofday(&t1, NULL);
for(i = 0; i < 100000000; i++)
{
random_walk(&x, &y, w - 1, h - 1);
}
gettimeofday(&t2, NULL);
seconds = (t2.tv_sec - t1.tv_sec) + 1e-6 * (t2.tv_usec - t1.tv_usec);
printf("Took %u steps, final position is (%u,%u) after %.2g seconds => %.1f Msteps/second\n", i, x, y, seconds, (i / 1e6) / seconds);
return EXIT_SUCCESS;
}
Some explanations might be in order, the above is pretty opaque until you "get" it, I guess:
The interface to the function itself should be clear. Note that width and height of the grid are represented as "max_x" and "max_y", to save on some constant-subtractions when checking if the current position is on the border or not.
The variable dirs is set to a bit-mask of the "open" directions to walk in. For an empty grid, this is always 0x0f unless you're on a border. This could be made to handle walls by testing the map, of course.
The step_info array collects information about which steps are available to take from each of the 16 possible combinations of open directions. When reading the initializations (one per line) of each struct, think of that struct's index in binary, and convert that to bits in dirs.
The STEP_NORTH macro (and friends) cut down on the typing, and make it way clearer what's going on.
I like how the "meat" of random_walk() is just four almost-clear expressions, it's refreshing to not see a single if in there.
When compiled with gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 on my 2.4 GHz x86_64 system, using optimization level -O3, the performance seems to be just short of 36 million steps per second. Reading the assembly the core logic is branch-free. Of course there's a call to rand(), I didn't feel like going all the way and implementing a local random number generator to have inlined.
NOTE: This doesn't solve the exact question asked, but I felt the technique was worth expanding on.