Lower memory usage when searching for optimal route from one point of matrix to another - c

I have following problem:
Tourist wants to create an optimal route for the hike. He has a
terrain elevation map at a certain scale - an NxM-sized matrix
containing the elevation values ​​at the corresponding terrain points.
Tourist wants to make a route from the start point to the end point in
such a way that the total change in altitude while passing the route
is minimal. The total change in altitude on the route is the sum of
the changes in altitude modulo on each segment of the route. For example, if there is a continuous ascent or descent from the starting
point of the route to the end point, then such a route will be
optimal.
For simplicity, let's assume that you can only walk along the lines of
an imaginary grid, i.e. from position (i, j), which is not on the edge
of the card, you can go to position (i-1, j), (i + 1, j), (i, j-1),
(i, j + 1). You cannot go beyond the edge of the map.
On standard input: non-negative integers N, M, N0, M0, N1, M1 are entered. N is the number of rows in the heightmap, M is the number of columns. Point (N0, M0) is the starting point of route, point (N1, M1) is the end point of the route. Point coordinates are numbered starting
from zero. The points can be the same. It is known that the total
number of matrix elements is limited to 1,100,000.
After numbers height map is entered in rows - at first the first line, then
the second, and so on. Height is a non-negative integer not exceeding
10000.
Print to standard output the total change in elevation while
traversing the optimal route.
I came to conclusion that it's about shortest path in graph and wrote this
But for m=n=1000 program eats too much (~169MiB, mostly heap) memory.
Limits are as following:
Time limit: 2 s
Memory limit: 50M
Stack limit: 64M
I also wrote C++ program doing same thing with priority_queue(just to check, problem must be solved in C), but it still needs ~78MiB (mostly heap)
How should I solve this problem (use another algorithm, optimize existing C code or something else)?

You can fit a height value into a 16-bit unsigned int (uint_16_t, if using C++11). To store 1.1M of those 2-byte values requires 2.2M of memory. So far so good.
Your code builds an in-memory graph with linked lists and lots of pointers. Your heap-based queue also has a lot of pointers. You can greatly decrease memory usage by relying on a more-efficient representation - for example, you could build an NxM array of elements with
struct Cell {
uint_32_t distance; // shortest distance from start, initially INF
uint_16_t height;
int_16_t parent; // add this to cell index to find parent; 0 for `none`
};
A cell at row, col will be at index row + N*col in this array. I strongly recommend building utility methods to find each neighbor, which would return -1 to indicate "out of bounds" and a direct index otherwise. The difference between two indices would be usable in parent.
You can implement a (not very efficient) priority queue in C by calling stdlib's "sort" on an array of node indices, and sorting them by distance. This would cut a lot of additional overhead, as each pointer in your program probably takes 8 bytes (64-bit pointers).
By avoiding a lot of pointers, and using an in-memory description instead of a graph, you should be able to cut down memory consumption to
1.1M Cells x 8 bytes/cell = 8.8M
1.1M indices in the pq-array x 4 bytes/index = 4.4M
For a total of around 16 Mb w/overheads - well under stated limits. If it takes too long, you should use a better queue.

Related

Assemble eigen3 sparsematrix from smaller sparsematrices

I am assembling the jacobian of a coupled multi physics system. The jacobian consists of a blockmatrix on the diagonal for each system and off diagonal blocks for the coupling.
I find it the best to assemble to block separatly and then sum over them with projection matrices to get the complete jacobian.
pseudo-code (where J[i] are the diagonal elements and C[ij] the couplings, P are the projections to the complete matrix).
// diagonal blocks
J.setZero();
for(int i=0;i<N;++i){
J+=P[i]J[i]P[i].transpose()
}
// off diagonal elements
for(int i=0;i<N;++i){
for(int j=i+1;j<N;++j){
J+=P[i]C[ij]P[j].transpose()
J+=P[j]C[ji]P[i].transpose()
}
}
This takes a lot performance, around 20% of the whole programm, which is too much for some assembling. I have to recalculate the jacobian every time step since the system is highly nonlinear.
Valgrind indicates that the ressource consuming method is Eigen::internal::assign_sparse_to_sparse and in this method the call to Eigen::SparseMatrix<>::InsertBackByOuterInner.
Is there a more efficient way to assemble such a matrix?
(I also had to use P*(JP.transpose()) instead of PJ*J.transpose() to make the programm compile, may be there is already something wrong)
P.S: NDEBUG and optimizations are turned on
Edit: by storing P.transpose in a extra matrix ,I get a bit better performance, but the summation accounts still for 15% of the programm
Your code will be much faster by working inplace. First, estimate the number of non-zeros per column in the final matrix and reserve space (if not already done):
int nnz_per_col = ...;
J.reserve(VectorXi::Constant(n_cols, nnz_per_col));
If the number of nnz per column is highly non-uniform, then you can also compute it per column:
VectorXi nnz_per_col(n_cols);
for each j
nnz_per_col(j) = ...;
J.reserve(nnz_per_col);
Then manually insert elements:
for each block B[k]
for each elements i,j
J.coeffRef(foo(i),foo(j)) += B[k](i,j)
where foo implement the appropriate mapping of indices.
And for the next iteration, no need to reserve, but you need to set coefficient values to zero while preserving the structure:
J.coeffs().setZero();

Randomly permuting an array [duplicate]

The famous Fisher-Yates shuffle algorithm can be used to randomly permute an array A of length N:
For k = 1 to N
Pick a random integer j from k to N
Swap A[k] and A[j]
A common mistake that I've been told over and over again not to make is this:
For k = 1 to N
Pick a random integer j from 1 to N
Swap A[k] and A[j]
That is, instead of picking a random integer from k to N, you pick a random integer from 1 to N.
What happens if you make this mistake? I know that the resulting permutation isn't uniformly distributed, but I don't know what guarantees there are on what the resulting distribution will be. In particular, does anyone have an expression for the probability distributions over the final positions of the elements?
An Empirical Approach.
Let's implement the erroneous algorithm in Mathematica:
p = 10; (* Range *)
s = {}
For[l = 1, l <= 30000, l++, (*Iterations*)
a = Range[p];
For[k = 1, k <= p, k++,
i = RandomInteger[{1, p}];
temp = a[[k]];
a[[k]] = a[[i]];
a[[i]] = temp
];
AppendTo[s, a];
]
Now get the number of times each integer is in each position:
r = SortBy[#, #[[1]] &] & /# Tally /# Transpose[s]
Let's take three positions in the resulting arrays and plot the frequency distribution for each integer in that position:
For position 1 the freq distribution is:
For position 5 (middle)
And for position 10 (last):
and here you have the distribution for all positions plotted together:
Here you have a better statistics over 8 positions:
Some observations:
For all positions the probability of
"1" is the same (1/n).
The probability matrix is symmetrical
with respect to the big anti-diagonal
So, the probability for any number in the last
position is also uniform (1/n)
You may visualize those properties looking at the starting of all lines from the same point (first property) and the last horizontal line (third property).
The second property can be seen from the following matrix representation example, where the rows are the positions, the columns are the occupant number, and the color represents the experimental probability:
For a 100x100 matrix:
Edit
Just for fun, I calculated the exact formula for the second diagonal element (the first is 1/n). The rest can be done, but it's a lot of work.
h[n_] := (n-1)/n^2 + (n-1)^(n-2) n^(-n)
Values verified from n=3 to 6 ( {8/27, 57/256, 564/3125, 7105/46656} )
Edit
Working out a little the general explicit calculation in #wnoise answer, we can get a little more info.
Replacing 1/n by p[n], so the calculations are hold unevaluated, we get for example for the first part of the matrix with n=7 (click to see a bigger image):
Which, after comparing with results for other values of n, let us identify some known integer sequences in the matrix:
{{ 1/n, 1/n , ...},
{... .., A007318, ....},
{... .., ... ..., ..},
... ....,
{A129687, ... ... ... ... ... ... ..},
{A131084, A028326 ... ... ... ... ..},
{A028326, A131084 , A129687 ... ....}}
You may find those sequences (in some cases with different signs) in the wonderful http://oeis.org/
Solving the general problem is more difficult, but I hope this is a start
The "common mistake" you mention is shuffling by random transpositions. This problem was studied in full detail by Diaconis and Shahshahani in Generating a random permutation with random transpositions (1981). They do a complete analysis of stopping times and convergence to uniformity. If you cannot get a link to the paper, then please send me an e-mail and I can forward you a copy. It's actually a fun read (as are most of Persi Diaconis's papers).
If the array has repeated entries, then the problem is slightly different. As a shameless plug, this more general problem is addressed by myself, Diaconis and Soundararajan in Appendix B of A Rule of Thumb for Riffle Shuffling (2011).
Let's say
a = 1/N
b = 1-a
Bi(k) is the probability matrix after i swaps for the kth element. i.e the answer to the question "where is k after i swaps?". For example B0(3) = (0 0 1 0 ... 0) and B1(3) = (a 0 b 0 ... 0). What you want is BN(k) for every k.
Ki is an NxN matrix with 1s in the i-th column and i-th row, zeroes everywhere else, e.g:
Ii is the identity matrix but with the element x=y=i zeroed. E.g for i=2:
Ai is
Then,
But because BN(k=1..N) forms the identity matrix, the probability that any given element i will at the end be at position j is given by the matrix element (i,j) of the matrix:
For example, for N=4:
As a diagram for N = 500 (color levels are 100*probability):
The pattern is the same for all N>2:
The most probable ending position for k-th element is k-1.
The least probable ending position is k for k < N*ln(2), position 1 otherwise
I knew I had seen this question before...
" why does this simple shuffle algorithm produce biased results? what is a simple reason? " has a lot of good stuff in the answers, especially a link to a blog by Jeff Atwood on Coding Horror.
As you may have already guessed, based on the answer by #belisarius, the exact distribution is highly dependent on the number of elements to be shuffled. Here's Atwood's plot for a 6-element deck:
What a lovely question! I wish I had a full answer.
Fisher-Yates is nice to analyze because once it decides on the first element, it leaves it alone. The biased one can repeatedly swap an element in and out of any place.
We can analyze this the same way we would a Markov chain, by describing the actions as stochastic transition matrices acting linearly on probability distributions. Most elements get left alone, the diagonal is usually (n-1)/n. On pass k, when they don't get left alone, they get swapped with element k, (or a random element if they are element k). This is 1/(n-1) in either row or column k. The element in both row and column k is also 1/(n-1). It's easy enough to multiply these matrices together for k going from 1 to n.
We do know that the element in last place will be equally likely to have originally been anywhere because the last pass swaps the last place equally likely with any other. Similarly, the first element will be equally likely to be placed anywhere. This symmetry is because the transpose reverses the order of matrix multiplication. In fact, the matrix is symmetric in the sense that row i is the same as column (n+1 - i). Beyond that, the numbers don't show much apparent pattern. These exact solutions do show agreement with the simulations run by belisarius: In slot i, The probability of getting j decreases as j raises to i, reaching its lowest value at i-1, and then jumping up to its highest value at i, and decreasing until j reaches n.
In Mathematica I generated each step with
step[k_, n_] := Normal[SparseArray[{{k, i_} -> 1/n,
{j_, k} -> 1/n, {i_, i_} -> (n - 1)/n} , {n, n}]]
(I haven't found it documented anywhere, but the first matching rule is used.)
The final transition matrix can be calculated with:
Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]]
ListDensityPlot is a useful visualization tool.
Edit (by belisarius)
Just a confirmation. The following code gives the same matrix as in #Eelvex's answer:
step[k_, n_] := Normal[SparseArray[{{k, i_} -> (1/n),
{j_, k} -> (1/n), {i_, i_} -> ((n - 1)/n)}, {n, n}]];
r[n_, s_] := Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]];
Last#Table[r[4, i], {i, 1, 4}] // MatrixForm
Wikipedia's page on the Fisher-Yates shuffle has a description and example of exactly what will happen in that case.
You can compute the distribution using stochastic matrices. Let the matrix A(i,j) describe the probability of the card originally at position i ending up in position j. Then the kth swap has a matrix Ak given by Ak(i,j) = 1/N if i == k or j == k, (the card in position k can end up anywhere and any card can end up at position k with equal probability), Ak(i,i) = (N - 1)/N for all i != k (every other card will stay in the same place with probability (N-1)/N) and all other elements zero.
The result of the complete shuffle is then given by the product of the matrices AN ... A1.
I expect you're looking for an algebraic description of the probabilities; you can get one by expanding out the above matrix product, but it I imagine it will be fairly complex!
UPDATE: I just spotted wnoise's equivalent answer above! oops...
I've looked into this further, and it turns out that this distribution has been studied at length. The reason it's of interest is because this "broken" algorithm is (or was) used in the RSA chip system.
In Shuffling by semi-random transpositions, Elchanan Mossel, Yuval Peres, and Alistair Sinclair study this and a more general class of shuffles. The upshot of that paper appears to be that it takes log(n) broken shuffles to achieve near random distribution.
In The bias of three pseudorandom shuffles (Aequationes Mathematicae, 22, 1981, 268-292), Ethan Bolker and David Robbins analyze this shuffle and determine that the total variation distance to uniformity after a single pass is 1, indicating that it is not very random at all. They give asympotic analyses as well.
Finally, Laurent Saloff-Coste and Jessica Zuniga found a nice upper bound in their study of inhomogeneous Markov chains.
This question is begging for an interactive visual matrix diagram analysis of the broken shuffle mentioned. Such a tool is on the page Will It Shuffle? - Why random comparators are bad by Mike Bostock.
Bostock has put together an excellent tool that analyzes random comparators. In the dropdown on that page, choose naïve swap (random ↦ random) to see the broken algorithm and the pattern it produces.
His page is informative as it allows one to see the immediate effects a change in logic has on the shuffled data. For example:
This matrix diagram using a non-uniform and very-biased shuffle is produced using a naïve swap (we pick from "1 to N") with code like this:
function shuffle(array) {
var n = array.length, i = -1, j;
while (++i < n) {
j = Math.floor(Math.random() * n);
t = array[j];
array[j] = array[i];
array[i] = t;
}
}
But if we implement a non-biased shuffle, where we pick from "k to N" we should see a diagram like this:
where the distribution is uniform, and is produced from code such as:
function FisherYatesDurstenfeldKnuthshuffle( array ) {
var pickIndex, arrayPosition = array.length;
while( --arrayPosition ) {
pickIndex = Math.floor( Math.random() * ( arrayPosition + 1 ) );
array[ pickIndex ] = [ array[ arrayPosition ], array[ arrayPosition ] = array[ pickIndex ] ][ 0 ];
}
}
The excellent answers given so far are concentrating on the distribution, but you have asked also "What happens if you make this mistake?" - which is what I haven't seen answered yet, so I'll give an explanation on this:
The Knuth-Fisher-Yates shuffle algorithm picks 1 out of n elements, then 1 out of n-1 remaining elements and so forth.
You can implement it with two arrays a1 and a2 where you remove one element from a1 and insert it into a2, but the algorithm does it in place (which means, that it needs only one array), as is explained here (Google: "Shuffling Algorithms Fisher-Yates DataGenetics") very well.
If you don't remove the elements, they can be randomly chosen again which produces the biased randomness. This is exactly what the 2nd example your are describing does. The first example, the Knuth-Fisher-Yates algorithm, uses a cursor variable running from k to N, which remembers which elements have already been taken, hence avoiding to pick elements more than once.

Why does linear probing work with a relatively prime step?

I was reading about linear probing in a hash table tutorial and came upon this:
The step size is almost always 1 with linear probing, but it is acceptable to use other step sizes as long as the step size is relatively prime to the table size so that every index is eventually visited. If this restriction isn't met, all of the indices may not be visited...
(The basic problem is: You need to visit every index in an array starting at an arbitrary index and skipping ahead a fixed number of indices [the skip] to the next index, wrapping to the beginning of the array if necessary with modulo.)
I understand why not all indices could be visited if the step size isn't relatively prime to the table size, but I don't understand why the converse is true: that all the indices will be visited if the step size is relatively prime to the array size.
I've observed this relatively prime property working in several examples that I've worked out by hand, but I don't understand why it works in every case.
In short, my question is: Why is every index of an array visited with a step that is relatively prime to the array size? Is there a proof of this?
Thanks!
Wikipedia about Cyclic Groups
The units of the ring Z/nZ are the numbers coprime to n.
Also:
[If two numbers are co-prime] There exist integers x and y such that ax + by = 1
So, if "a" is your step length, and "b" the length of the array, you can reach any index "z" by
axz + byz = z
=>
axz = z (mod b)
i.e stepping "xz" times (and wrapping over the array "yz" times).
number of steps is lcm(A,P)/P or A/gcd(A,P) where A is array size and P is this magic coprime.
so if gcd(A,P) != 1 then number of steps will be less than A
On contrary if gcd(A,P) == 1 (coprimes) then number of steps will be A and all indexes will be visited

How to identify the duplicated number, in an array, with minimum compexity?

There is an array of size 10,000. It store the number 1 to 10,000 in randomly order.
Each number occurs one time only.
Now if any number is removed from that array and any other number is duplicated into array.
How can we identify the which number is duplicated, with minimum complexity?
NOTE : We can not use another array.
The fastest way is an O(N) in-place pigeonhole sort.
Start at the first location of the array, a[0]. Say it has the value 5. You know that 5 belongs at a[4], so swap locations 0 and 4. Now a new value is in a[0]. Swap it to where it needs to go.
Repeat until a[0] == 1, then move on to a[1] and swap until a[1] == 2, etc.
If at any point you end up attempting to swap two identical values, then you have found the duplicate!
Runtime: O(N) with a very low coefficient and early exit. Storage required: zero.
Bonus optimization: count how many swaps have occurred and exit early if n_swaps == array_size. This resulted in a 15% improvement when I implemented a similar algorithm for permuting a sequence.
Compute the sum and the sum of the squares of the elements (you will need 64 bit values for the sum of the squares). From these you can recover which element was modified:
Subtract the expected values for the unmodified array.
If x was removed and y duplicated you get the difference y - x for the sum and y2 - x2 = (y + x) (y - x) for the sum of squares.
From that it is easy to recover x and y.
Edit: Note that this may be faster than pigeonhole sort, because it runs linearly over the array and is thus more cache friendly.
Why not simply using a second array or other data structure like hash table (hash table if you like, depending on the memory/performance tradeoff). This second array would simply store the count of a number in the original array. Now just add a +/- to the access function of the original array and you have your information immediately.
ps when you wrote "we can not use another array" - I assume, you can not change the ORIGINAL data structure. However the use of additional data structures is possible....
Sort the array, then iterate through until you hit two of the same number in a row.

Find the minimum number of elements required so that their sum equals or exceeds S

I know this can be done by sorting the array and taking the larger numbers until the required condition is met. That would take at least nlog(n) sorting time.
Is there any improvement over nlog(n).
We can assume all numbers are positive.
Here is an algorithm that is O(n + size(smallest subset) * log(n)). If the smallest subset is much smaller than the array, this will be O(n).
Read http://en.wikipedia.org/wiki/Heap_%28data_structure%29 if my description of the algorithm is unclear (it is light on details, but the details are all there).
Turn the array into a heap arranged such that the biggest element is available in time O(n).
Repeatedly extract the biggest element from the heap until their sum is large enough. This takes O(size(smallest subset) * log(n)).
This is almost certainly the answer they were hoping for, though not getting it shouldn't be a deal breaker.
Edit: Here is another variant that is often faster, but can be slower.
Walk through elements, until the sum of the first few exceeds S. Store current_sum.
Copy those elements into an array.
Heapify that array such that the minimum is easy to find, remember the minimum.
For each remaining element in the main array:
if min(in our heap) < element:
insert element into heap
increase current_sum by element
while S + min(in our heap) < current_sum:
current_sum -= min(in our heap)
remove min from heap
If we get to reject most of the array without manipulating our heap, this can be up to twice as fast as the previous solution. But it is also possible to be slower, such as when the last element in the array happens to be bigger than S.
Assuming the numbers are integers, you can improve upon the usual n lg(n) complexity of sorting because in this case we have the extra information that the values are between 0 and S (for our purposes, integers larger than S are the same as S).
Because the range of values is finite, you can use a non-comparative sorting algorithm such as Pigeonhole Sort or Radix Sort to go below n lg(n).
Note that these methods are dependent on some function of S, so if S gets large enough (and n stays small enough) you may be better off reverting to a comparative sort.
Here is an O(n) expected time solution to the problem. It's somewhat like Moron's idea but we don't throw out the work that our selection algorithm did in each step, and we start trying from an item potentially in the middle rather than using the repeated doubling approach.
Alternatively, It's really just quickselect with a little additional book keeping for the remaining sum.
First, it's clear that if you had the elements in sorted order, you could just pick the largest items first until you exceed the desired sum. Our solution is going to be like that, except we'll try as hard as we can to not to discover ordering information, because sorting is slow.
You want to be able to determine if a given value is the cut off. If we include that value and everything greater than it, we meet or exceed S, but when we remove it, then we are below S, then we are golden.
Here is the psuedo code, I didn't test it for edge cases, but this gets the idea across.
def Solve(arr, s):
# We could get rid of worse case O(n^2) behavior that basically never happens
# by selecting the median here deterministically, but in practice, the constant
# factor on the algorithm will be much worse.
p = random_element(arr)
left_arr, right_arr = partition(arr, p)
# assume p is in neither left_arr nor right_arr
right_sum = sum(right_arr)
if right_sum + p >= s:
if right_sum < s:
# solved it, p forms the cut off
return len(right_arr) + 1
# took too much, at least we eliminated left_arr and p
return Solve(right_arr, s)
else:
# didn't take enough yet, include all elements from and eliminate right_arr and p
return len(right_arr) + 1 + Solve(left_arr, s - right_sum - p)
One improvement (asymptotically) over Theta(nlogn) you can do is to get an O(n log K) time algorithm, where K is the required minimum number of elements.
Thus if K is constant, or say log n, this is better (asymptotically) than sorting. Of course if K is n^epsilon, then this is not better than Theta(n logn).
The way to do this is to use selection algorithms, which can tell you the ith largest element in O(n) time.
Now do a binary search for K, starting with i=1 (the largest) and doubling i etc at each turn.
You find the ith largest, and find the sum of the i largest elements and check if it is greater than S or not.
This way, you would run O(log K) runs of the selection algorithm (which is O(n)) for a total running time of O(n log K).
eliminate numbers < S, if you find some number ==S, then solved
pigeon-hole sort the numbers < S
Sum elements highest to lowest in the sorted order till you exceed S.

Resources