Eigenvector computation using OpenCV - c

I have this matrix A, representing similarities of pixel intensities of an image. For example: Consider a 10 x 10 image. Matrix A in this case would be of dimension 100 x 100, and element A(i,j) would have a value in the range 0 to 1, representing the similarity of pixel i to j in terms of intensity.
I am using OpenCV for image processing and the development environment is C on Linux.
Objective is to compute the Eigenvectors of matrix A and I have used the following approach:
static CvMat mat, *eigenVec, *eigenVal;
static double A[100][100]={}, Ain1D[10000]={};
int cnt=0;
//Converting matrix A into a one dimensional array
//Reason: That is how cvMat requires it
for(i = 0;i < affnDim;i++){
for(j = 0;j < affnDim;j++){
Ain1D[cnt++] = A[i][j];
}
}
mat = cvMat(100, 100, CV_32FC1, Ain1D);
cvEigenVV(&mat, eigenVec, eigenVal, 1e-300);
for(i=0;i < 100;i++){
val1 = cvmGet(eigenVal,i,0); //Fetching Eigen Value
for(j=0;j < 100;j++){
matX[i][j] = cvmGet(eigenVec,i,j); //Fetching each component of Eigenvector i
}
}
Problem: After execution I get nearly all components of all the Eigenvectors to be zero. I tried different images and also tried populating A with random values between 0 and 1, but the same result.
Few of the top eigenvalues returned look like the following:
9805401476911479666115491135488.000000
-9805401476911479666115491135488.000000
-89222871725331592641813413888.000000
89222862280598626902522986496.000000
5255391142666987110400.000000
I am now thinking on the lines of using cvSVD() which performs singular value decomposition of real floating-point matrix and might yield me the eigenvectors. But before that I thought of asking it here. Is there anything absurd in my current approach? Am I using the right API i.e. cvEigenVV() for the right input matrix (my matrix A is a floating point matrix)?
cheers

Note to readers: This post at first may seem unrelated to the topic, but please refer to the discussion in the comments above.
The following is my attempt at implementing the Spectral Clustering algorithm applied to image pixels in MATLAB. I followed exactly the paper mentioned by #Andriyev:
Andrew Ng, Michael Jordan, and Yair Weiss (2002).
On spectral clustering: analysis and an algorithm.
In T. Dietterich, S. Becker, and Z. Ghahramani (Eds.),
Advances in Neural Information Processing Systems 14. MIT Press
The code:
%# parameters to tune
SIGMA = 2e-3; %# controls Gaussian kernel width
NUM_CLUSTERS = 4; %# specify number of clusters
%% Loading and preparing a sample image
%# read RGB image, and make it smaller for fast processing
I0 = im2double(imread('house.png'));
I0 = imresize(I0, 0.1);
[r,c,~] = size(I0);
%# reshape into one row per-pixel: r*c-by-3
%# (with pixels traversed in columwise-order)
I = reshape(I0, [r*c 3]);
%% 1) Compute affinity matrix
%# for each pair of pixels, apply a Gaussian kernel
%# to obtain a measure of similarity
A = exp(-SIGMA * squareform(pdist(I,'euclidean')).^2);
%# and we plot the matrix obtained
imagesc(A)
axis xy; colorbar; colormap(hot)
%% 2) Compute the Laplacian matrix L
D = diag( 1 ./ sqrt(sum(A,2)) );
L = D*A*D;
%% 3) perform an eigen decomposition of the laplacian marix L
[V,d] = eig(L);
%# Sort the eigenvalues and the eigenvectors in descending order.
[d,order] = sort(real(diag(d)), 'descend');
V = V(:,order);
%# kepp only the largest k eigenvectors
%# In this case 4 vectors are enough to explain 99.999% of the variance
NUM_VECTORS = sum(cumsum(d)./sum(d) < 0.99999) + 1;
V = V(:, 1:NUM_VECTORS);
%% 4) renormalize rows of V to unit length
VV = bsxfun(#rdivide, V, sqrt(sum(V.^2,2)));
%% 5) cluster rows of VV using K-Means
opts = statset('MaxIter',100, 'Display','iter');
[clustIDX,clusters] = kmeans(VV, NUM_CLUSTERS, 'options',opts, ...
'distance','sqEuclidean', 'EmptyAction','singleton');
%% 6) assign pixels to cluster and show the results
%# assign for each pixel the color of the cluster it belongs to
clr = lines(NUM_CLUSTERS);
J = reshape(clr(clustIDX,:), [r c 3]);
%# show results
figure('Name',sprintf('Clustering into K=%d clusters',NUM_CLUSTERS))
subplot(121), imshow(I0), title('original image')
subplot(122), imshow(J), title({'clustered pixels' '(color-coded classes)'})
... and using a simple house image I drew in Paint, the results were:
and by the way, the first 4 eigenvalues used were:
1.0000
0.0014
0.0004
0.0002
and the corresponding eigenvectors [columns of length r*c=400]:
-0.0500 0.0572 -0.0112 -0.0200
-0.0500 0.0553 0.0275 0.0135
-0.0500 0.0560 0.0130 0.0009
-0.0500 0.0572 -0.0122 -0.0209
-0.0500 0.0570 -0.0101 -0.0191
-0.0500 0.0562 -0.0094 -0.0184
......
Note that there are step performed above which you didn't mention in your question (Laplacian matrix, and normalizing its rows)

I would recommend this article. The author implements Eigenfaces for face recognition. On page 4 you can see that he uses cvCalcEigenObjects to generate the eigenvectors from an image. In the article the whole pre processing step necessary for this computations are shown.

Here's a not very helpful answer:
What does theory (or maths scribbled on a piece of paper) tell you the eigenvectors ought to be ? Approximately.
What does another library tell you the eigenvectors ought to be ? Ideally what does a system such as Mathematica or Maple (which can be persuaded to compute to arbitrary precision) tell you the eigenvectors ought to be ? If not for a production-sixed problem at least for a test-sized problem.
I'm not an expert with image processing so I can't be much more helpful, but I spend a lot of time with scientists and experience has taught me that a lot of tears and anger can be avoided by doing some maths first and forming an expectation of what results you ought to get before wondering why you got 0s all over the place. Sure it might be an error in the implementation of an algorithm, it might be loss of precision or some other numerical problem. But you don't know and shouldn't follow up those lines of inquiry yet.
Regards
Mark

Related

MATLAB: Improving for-loop

I need to multiply parts of a column vector with a fixed row vector. I solved this problem using a for-loop. However, I am wondering if the performance can be improved as I have to perform this kind of computation around 50 million times. Here's my code so far:
multMat = 1:5;
mat = randi(5,10,1);
windowSize = 5;
vout = nan(10,1);
for r = windowSize : 10
vout(r) = multMat * mat( (r - windowSize + 1) : r);
end
I was thinking about uisng arrayfun. However, first I don't know how to adress the cell range (i.e. the previous five cells including the current cell), and second, I am not sure if arrayfun will be any faster than using the loop?
This sliding vector multiplication you're describing is an example of what is known as convolution. The following produces the same result as the loop in your example:
vout = [nan(windowSize-1,1);
conv(mat,flip(multMat),'valid')];
If your output doesn't really need the leading NaN values which aren't overwritten in your loop then the conv expression is sufficient without concatenating the NaN elements to it.
For sufficiently large vectors this is of course not guaranteed to be as fast as you'd like it to be, but MATLAB's built-in convolution implementation is likely to be pretty close to an optimal tool for the job.

Network Formation and Large Array's in Matlab Optimization

I am getting an error using repmat. My Matlab version is 2017a. "Requested 3711450x2726 (75.4GB) array exceeds maximum array size..." First, some context.
I have an adjacency matrix of social network data call it D. D is 2725x2725 with 1s denoting a link between agents i and j and 0s otherwise. I have been provided a function and sub-functions for a network formation model. There are K regressors (x variables). The model requires forming a dyad-specific regressor matrix W that is W = 0.5N(N-1) x K. In my data, this is 3711450 x K. For a start, I select only one x variable so K=1.
In the main function, there are two steps. The first step calculates the joint MLE from a logit. I have a problem in the second step computation of the variance covariance matrix with array size. Inside this step, there is a calculation that creates a 3711450 x n (2725) matrix using repmat.
INFO = ((repmat((exp_Xbeta ./ (1+exp_Xbeta).^2),1,K) .* X)'*X);
exp_Xbeta is 3711450 x K and X is a sparse 3711450 x 2725 matrix with Bytes = 178171416 of class double. The error occurs at INFO.
I've tried converting X to a tall matrix but thus far no joy. I've tried adding sparse to the INFO line but again no joy. Anyone have any ideas short of going to a cluster or getting more ram? Could I somehow convert X from a sparse matrix to a full matrix inside a datastore and then call the datastore using tall? I have not been able to figure out how to do that if it is possible.
Once INFO is constructed as an array it will be used later in one of the sub-functions. So, it needs to be callable. In case you're curious, INFO is the second derivative matrix.
I have found that producing the INFO matrix all at once was too much for my memory constraints. I split up the steps, but still, repmat and subsequent steps were a problem. Now, I've turned to building up the INFO matrix one step at a time, while never holding more than exp_Xbeta, X, and two vectors in memory. Replacing the construction of INFO with
for i = 1:d
s1_i = step1(:,1).*X(:,i);
s1_i = s1_i';
for j = 1:d;
INFO(i,j) = s1_i*X(:,j);
end
clear s1_i;
end
has dropped the memory requirement, though its slow, and things seem to be working. For anyone interested, below is a little example illustrating the point.
clear all
N = 20
n = 0.5*N*(N-1)
exp_Xbeta = rand(n,1);
X = rand(n,N);
step1 = (exp_Xbeta ./ (1+exp_Xbeta).^2);
[c,d] = size(X);
INFO = zeros(d,d);
for i = 1:d
s1_i = step1(:,1).*X(:,i)
s1_i = s1_i'
for j = 1:d
INFO(i,j) = s1_i*X(:,j)
end
clear s1_i
end
K = 1
INFO2 = ((repmat((exp_Xbeta ./ (1+exp_Xbeta).^2),1,K) .* X)'*X);
% Methods produce equivalent matrices
INFO
INFO2

Matrix calculations without loops in MATLAB

I have an issue with a code performing some array operations. It is too slow, because I use loops and input data are quite big. It was the easiest way for me, but now I am looking for something faster than for loops. I was trying to optimize or rewrite code, but unsuccessful. I really aprecciate Your help.
In my code I have three arrays x1, y1 (coordinates of points in grid), g1 (values in the points) and for example their size is 300 x 300. I treat each matrix as composition of 9 and I make calculation for points in the middle one. For example I start with g1(101,101), but I am using data from g1(1:201,1:201)=g2. I need to calculate distance from each point of g1(1:201,1:201) to g1(101,101) (ll matrix), then I calculate nn as it is in the code, next I find value for g1(101,101) from nn and put it in N array. Then I go to g1(101,102) and so on until g1(200,200), where in this last case g2=g1(99:300,99:300).
As i said, this code is not very efficient, even I have to use larger arrays than I gave in the example, it takes too much time. I hope I explain enough clearly what I expect from the code. I was thinking of using arrayfun, but I have never worked with this function, so I don't know how should use it, however it seems to me it won't handle. Maybe there are other solutions, however I couldn't find anything apropriate.
tic
x1=randn(300,300);
y1=randn(300,300);
g1=randn(300,300);
m=size(g1,1);
n=size(g1,2);
w=1/3*m;
k=1/3*n;
N=zeros(w,k);
for i=w+1:2*w
for j=k+1:2*k
x=x1(i,j);
y=y1(i,j);
x2=y1(i-k:i+k,j-w:j+w);
y2=y1(i-k:i+k,j-w:j+w);
g2=g1(i-k:i+k,j-w:j+w);
ll=1./sqrt((x2-x).^2+(y2-y).^2);
ll(isinf(ll))=0;
nn=ifft2(fft2(g2).*fft2(ll));
N(i-w,j-k)=nn(w+1,k+1);
end
end
czas=toc;
For what it's worth, arrayfun() is just a wrapper for a for loop, so it wouldn't lead to any performance improvements. Also, you probably have a typo in the definition of x2, I'll assume that it depends on x1. Otherwise it would be a superfluous variable. Also, your i<->w/k, j<->k/w pairing seems inconsistent, you should check that as well. Also also, just timing with tic/toc is rarely accurate. When profiling your code, put it in a function and run the timing multiple times, and exclude the variable generation from the timing. Even better: use the built-in profiler.
Disclaimer: this solution will likely not help for your actual problem due to its huge memory need. For your input of 300x300 matrices this works with arrays of size 300x300x100x100, which is usually a no-go. Still, it's here for reference with a smaller input size. I wanted to add a solution based on nlfilter(), but your problem seems to be too convoluted to be able to use that.
As always with vectorization, you can do it faster if you can spare the memory for it. You are trying to work with matrices of size [2*k+1,2*w+1] for each [i,j] index. This calls for 4d arrays, of shape [2*k+1,2*w+1,w,k]. For each element [i,j] you have a matrix with indices [:,:,i,j] to treat together with the corresponding elements of x1 and y1. It also helps that fft2 accepts multidimensional arrays.
Here's what I mean:
tic
x1 = randn(30,30); %// smaller input for tractability
y1 = randn(30,30);
g1 = randn(30,30);
m = size(g1,1);
n = size(g1,2);
w = 1/3*m;
k = 1/3*n;
%// these will be indexed on the fly:
%//x = x1(w+1:2*w,k+1:2*k); %// size [w,k]
%//y = x1(w+1:2*w,k+1:2*k); %// size [w,k]
x2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
y2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
g2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
%// manual definition for now, maybe could be done smarter:
for ii=w+1:2*w %// don't use i and j as variables
for jj=k+1:2*k %// don't use i and j as variables
x2(:,:,ii-w,jj-k) = x1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
y2(:,:,ii-w,jj-k) = y1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
g2(:,:,ii-w,jj-k) = g1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
end
end
%// use bsxfun to operate on [2*k+1,2*w+1,w,k] vs [w,k]-sized arrays
%// need to introduce leading singletons with permute() in the latter
%// in order to have shape [1,1,w,k] compatible with the first array
ll = 1./sqrt(bsxfun(#minus,x2,permute(x1(w+1:2*w,k+1:2*k),[3,4,1,2])).^2 ...
+ bsxfun(#minus,y2,permute(y1(w+1:2*w,k+1:2*k),[3,4,1,2])).^2);
ll(isinf(ll)) = 0;
%// compute fft2, operating on [2*k+1,2*w+1,w,k]
%// will return fft2 for each index in the [w,k] subspace
nn = ifft2(fft2(g2).*fft2(ll));
%// we need nn(w+1,k+1,:,:) which is exactly of size [w,k] as needed
N = reshape(nn(w+1,k+1,:,:),[w,k]); %// quicker than squeeze()
N = real(N); %// this solution leaves an imaginary part of around 1e-12
czas=toc;

Neural network for linear regression

I found this great source that matched the exact model I needed: http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
The important bits go like this.
You have a plot x->y. Each x-value is the sum of "features" or how I'll denote them, z.
So a regression line for the x->y plot would go h(SUM(z(subscript-i)) where h(x) is the regression line (function)
In this NN the idea is that each z-value gets assigned a weight in a way that minimizes the least squared error.
The gradient function is used to update weights to minimize error. I believe I may be back propagating incorrectly -- where I update the weights.
So I wrote some code, but my weights aren't being correctly updated.
I may have simply misunderstood a spec from that Stanford post, so that's where I need your help. Can anyone verify I have correctly implemented this NN?
My h(x) function was a simple linear regression on the initial data. In other words, the idea is that the NN will adjust weights so that all data points shift closer to this linear regression.
for (epoch = 0; epoch < 10000; epoch++){
//loop number of games
for (game = 1; game < 39; game++){
sum = 0;
int temp1 = 0;
int temp2 = 0;
//loop number of inputs
for (i = 0; i < 10; i++){
//compute sum = x
temp1 += inputs[game][i] * weights[i];
}
for (i = 10; i < 20; i++){
temp2 += inputs[game][i] * weights[i];
}
sum = temp1 - temp2;
//compute error
error += .5 * (5.1136 * (sum) + 1.7238 - targets[game]) * (5.1136 * (sum) + 1.7238 - targets[game]);
printf("error = %G\n", error);
//backpropogate
for (i = 0; i < 20; i++){
weights[i] = sum * (5.1136 * (sum) + 1.7238 - targets[game]); //POSSIBLE ERROR HERE
}
}
printf("Epoch = %d\n", epoch);
printf("Error = %G\n", error);
}
Please check out Andrew Ng's Coursera. He is the professor of Machine Learning at Stanford and can explain the concept of Linear Regression to you better than any pretty much anyone else. You can learn the essentials for linear regression in the first lesson.
For linear regression, you are trying to minimize the cost function, which in this case is the sum of squared errors (predicted value - actual value)^2 and is achieved by gradient descent. Solving a problem like this does not require a Neural Network and using one would be rather inefficient.
For this problem, only two values are needed. If you think back to the equation for a line, y = mx + b, there are really only two aspects of a line that you need: The slope and the y-intercept. In linear regression you are looking for the slope and y-intercept that best fits the data.
In this problem, the two values can be represented by theta0 and theta1. theta0 is the y-intercept and theta1 is the slope.
This is the update function for Linear Regression:
Here, theta is a 2 x 1 dimensional vector with theta0 and theta1 inside of it. What you are doing is taking theta and subtracting the mean of the sum of errors multiplied by a learning rate alpha (usually small, like 0.1).
Let's say the real perfect fit for the line is at y = 2x + 3, but our current slope and y-intercept are both at 0. Therefore, the sum of errors will be negative, and when theta is subtracted from a negative number, theta will increase, moving your prediction closer to the correct value. And vice versa for positive numbers. This is a basic example of gradient descent, where you are descending down a slope to minimize the cost (or error) of the model.
This is the type of model you should be trying to implement in your model instead of a Neural Network, which is more complex. Try to gain an understanding of linear and logistic regression with gradient descent before moving on to Neural Networks.
Implementing a linear regression algorithm in C can be rather challenging, especially without vectorization. If you are looking to learn about how a linear regression algorithm works and aren't specifically looking to use C to make it, I recommend using something like MatLab or Octave (a free alternative) to implement it instead. After all, the examples from the post you found use the same format.

Randomly permuting an array [duplicate]

The famous Fisher-Yates shuffle algorithm can be used to randomly permute an array A of length N:
For k = 1 to N
Pick a random integer j from k to N
Swap A[k] and A[j]
A common mistake that I've been told over and over again not to make is this:
For k = 1 to N
Pick a random integer j from 1 to N
Swap A[k] and A[j]
That is, instead of picking a random integer from k to N, you pick a random integer from 1 to N.
What happens if you make this mistake? I know that the resulting permutation isn't uniformly distributed, but I don't know what guarantees there are on what the resulting distribution will be. In particular, does anyone have an expression for the probability distributions over the final positions of the elements?
An Empirical Approach.
Let's implement the erroneous algorithm in Mathematica:
p = 10; (* Range *)
s = {}
For[l = 1, l <= 30000, l++, (*Iterations*)
a = Range[p];
For[k = 1, k <= p, k++,
i = RandomInteger[{1, p}];
temp = a[[k]];
a[[k]] = a[[i]];
a[[i]] = temp
];
AppendTo[s, a];
]
Now get the number of times each integer is in each position:
r = SortBy[#, #[[1]] &] & /# Tally /# Transpose[s]
Let's take three positions in the resulting arrays and plot the frequency distribution for each integer in that position:
For position 1 the freq distribution is:
For position 5 (middle)
And for position 10 (last):
and here you have the distribution for all positions plotted together:
Here you have a better statistics over 8 positions:
Some observations:
For all positions the probability of
"1" is the same (1/n).
The probability matrix is symmetrical
with respect to the big anti-diagonal
So, the probability for any number in the last
position is also uniform (1/n)
You may visualize those properties looking at the starting of all lines from the same point (first property) and the last horizontal line (third property).
The second property can be seen from the following matrix representation example, where the rows are the positions, the columns are the occupant number, and the color represents the experimental probability:
For a 100x100 matrix:
Edit
Just for fun, I calculated the exact formula for the second diagonal element (the first is 1/n). The rest can be done, but it's a lot of work.
h[n_] := (n-1)/n^2 + (n-1)^(n-2) n^(-n)
Values verified from n=3 to 6 ( {8/27, 57/256, 564/3125, 7105/46656} )
Edit
Working out a little the general explicit calculation in #wnoise answer, we can get a little more info.
Replacing 1/n by p[n], so the calculations are hold unevaluated, we get for example for the first part of the matrix with n=7 (click to see a bigger image):
Which, after comparing with results for other values of n, let us identify some known integer sequences in the matrix:
{{ 1/n, 1/n , ...},
{... .., A007318, ....},
{... .., ... ..., ..},
... ....,
{A129687, ... ... ... ... ... ... ..},
{A131084, A028326 ... ... ... ... ..},
{A028326, A131084 , A129687 ... ....}}
You may find those sequences (in some cases with different signs) in the wonderful http://oeis.org/
Solving the general problem is more difficult, but I hope this is a start
The "common mistake" you mention is shuffling by random transpositions. This problem was studied in full detail by Diaconis and Shahshahani in Generating a random permutation with random transpositions (1981). They do a complete analysis of stopping times and convergence to uniformity. If you cannot get a link to the paper, then please send me an e-mail and I can forward you a copy. It's actually a fun read (as are most of Persi Diaconis's papers).
If the array has repeated entries, then the problem is slightly different. As a shameless plug, this more general problem is addressed by myself, Diaconis and Soundararajan in Appendix B of A Rule of Thumb for Riffle Shuffling (2011).
Let's say
a = 1/N
b = 1-a
Bi(k) is the probability matrix after i swaps for the kth element. i.e the answer to the question "where is k after i swaps?". For example B0(3) = (0 0 1 0 ... 0) and B1(3) = (a 0 b 0 ... 0). What you want is BN(k) for every k.
Ki is an NxN matrix with 1s in the i-th column and i-th row, zeroes everywhere else, e.g:
Ii is the identity matrix but with the element x=y=i zeroed. E.g for i=2:
Ai is
Then,
But because BN(k=1..N) forms the identity matrix, the probability that any given element i will at the end be at position j is given by the matrix element (i,j) of the matrix:
For example, for N=4:
As a diagram for N = 500 (color levels are 100*probability):
The pattern is the same for all N>2:
The most probable ending position for k-th element is k-1.
The least probable ending position is k for k < N*ln(2), position 1 otherwise
I knew I had seen this question before...
" why does this simple shuffle algorithm produce biased results? what is a simple reason? " has a lot of good stuff in the answers, especially a link to a blog by Jeff Atwood on Coding Horror.
As you may have already guessed, based on the answer by #belisarius, the exact distribution is highly dependent on the number of elements to be shuffled. Here's Atwood's plot for a 6-element deck:
What a lovely question! I wish I had a full answer.
Fisher-Yates is nice to analyze because once it decides on the first element, it leaves it alone. The biased one can repeatedly swap an element in and out of any place.
We can analyze this the same way we would a Markov chain, by describing the actions as stochastic transition matrices acting linearly on probability distributions. Most elements get left alone, the diagonal is usually (n-1)/n. On pass k, when they don't get left alone, they get swapped with element k, (or a random element if they are element k). This is 1/(n-1) in either row or column k. The element in both row and column k is also 1/(n-1). It's easy enough to multiply these matrices together for k going from 1 to n.
We do know that the element in last place will be equally likely to have originally been anywhere because the last pass swaps the last place equally likely with any other. Similarly, the first element will be equally likely to be placed anywhere. This symmetry is because the transpose reverses the order of matrix multiplication. In fact, the matrix is symmetric in the sense that row i is the same as column (n+1 - i). Beyond that, the numbers don't show much apparent pattern. These exact solutions do show agreement with the simulations run by belisarius: In slot i, The probability of getting j decreases as j raises to i, reaching its lowest value at i-1, and then jumping up to its highest value at i, and decreasing until j reaches n.
In Mathematica I generated each step with
step[k_, n_] := Normal[SparseArray[{{k, i_} -> 1/n,
{j_, k} -> 1/n, {i_, i_} -> (n - 1)/n} , {n, n}]]
(I haven't found it documented anywhere, but the first matching rule is used.)
The final transition matrix can be calculated with:
Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]]
ListDensityPlot is a useful visualization tool.
Edit (by belisarius)
Just a confirmation. The following code gives the same matrix as in #Eelvex's answer:
step[k_, n_] := Normal[SparseArray[{{k, i_} -> (1/n),
{j_, k} -> (1/n), {i_, i_} -> ((n - 1)/n)}, {n, n}]];
r[n_, s_] := Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]];
Last#Table[r[4, i], {i, 1, 4}] // MatrixForm
Wikipedia's page on the Fisher-Yates shuffle has a description and example of exactly what will happen in that case.
You can compute the distribution using stochastic matrices. Let the matrix A(i,j) describe the probability of the card originally at position i ending up in position j. Then the kth swap has a matrix Ak given by Ak(i,j) = 1/N if i == k or j == k, (the card in position k can end up anywhere and any card can end up at position k with equal probability), Ak(i,i) = (N - 1)/N for all i != k (every other card will stay in the same place with probability (N-1)/N) and all other elements zero.
The result of the complete shuffle is then given by the product of the matrices AN ... A1.
I expect you're looking for an algebraic description of the probabilities; you can get one by expanding out the above matrix product, but it I imagine it will be fairly complex!
UPDATE: I just spotted wnoise's equivalent answer above! oops...
I've looked into this further, and it turns out that this distribution has been studied at length. The reason it's of interest is because this "broken" algorithm is (or was) used in the RSA chip system.
In Shuffling by semi-random transpositions, Elchanan Mossel, Yuval Peres, and Alistair Sinclair study this and a more general class of shuffles. The upshot of that paper appears to be that it takes log(n) broken shuffles to achieve near random distribution.
In The bias of three pseudorandom shuffles (Aequationes Mathematicae, 22, 1981, 268-292), Ethan Bolker and David Robbins analyze this shuffle and determine that the total variation distance to uniformity after a single pass is 1, indicating that it is not very random at all. They give asympotic analyses as well.
Finally, Laurent Saloff-Coste and Jessica Zuniga found a nice upper bound in their study of inhomogeneous Markov chains.
This question is begging for an interactive visual matrix diagram analysis of the broken shuffle mentioned. Such a tool is on the page Will It Shuffle? - Why random comparators are bad by Mike Bostock.
Bostock has put together an excellent tool that analyzes random comparators. In the dropdown on that page, choose naïve swap (random ↦ random) to see the broken algorithm and the pattern it produces.
His page is informative as it allows one to see the immediate effects a change in logic has on the shuffled data. For example:
This matrix diagram using a non-uniform and very-biased shuffle is produced using a naïve swap (we pick from "1 to N") with code like this:
function shuffle(array) {
var n = array.length, i = -1, j;
while (++i < n) {
j = Math.floor(Math.random() * n);
t = array[j];
array[j] = array[i];
array[i] = t;
}
}
But if we implement a non-biased shuffle, where we pick from "k to N" we should see a diagram like this:
where the distribution is uniform, and is produced from code such as:
function FisherYatesDurstenfeldKnuthshuffle( array ) {
var pickIndex, arrayPosition = array.length;
while( --arrayPosition ) {
pickIndex = Math.floor( Math.random() * ( arrayPosition + 1 ) );
array[ pickIndex ] = [ array[ arrayPosition ], array[ arrayPosition ] = array[ pickIndex ] ][ 0 ];
}
}
The excellent answers given so far are concentrating on the distribution, but you have asked also "What happens if you make this mistake?" - which is what I haven't seen answered yet, so I'll give an explanation on this:
The Knuth-Fisher-Yates shuffle algorithm picks 1 out of n elements, then 1 out of n-1 remaining elements and so forth.
You can implement it with two arrays a1 and a2 where you remove one element from a1 and insert it into a2, but the algorithm does it in place (which means, that it needs only one array), as is explained here (Google: "Shuffling Algorithms Fisher-Yates DataGenetics") very well.
If you don't remove the elements, they can be randomly chosen again which produces the biased randomness. This is exactly what the 2nd example your are describing does. The first example, the Knuth-Fisher-Yates algorithm, uses a cursor variable running from k to N, which remembers which elements have already been taken, hence avoiding to pick elements more than once.

Resources