Calculating the nth combination of a set of sequential numbers - arrays

I have a number n that my program uses to represent a list of natural numbers of size n:
[0,1,2] // n=3
[0,1,2,3,4,5] // n=6
The lists always begin with 0 and their elements appear in sequential order with no numbers skipped. The last element is always (n-1).
Now, I need to get the unique pairs of elements for these arrays. So I wrote an algorithm that takes n as an input, and returns an array of unique pairs of elements from its counterpart above.
[[0,1],[0,2],[1,2]] // n=3
[[0,1],[0,2],[0,3],[0,4],[0,5],[1,2],[1,3],[1,4],[1,5],[2,3],[2,4],[2,5],[3,4],[3,5],[4,5]] // n=6
In this implementation, elements cannot pair with themselves (e.g. [0,0]). The pair [1,2] is considered equivalent to [2,1], so only the former would appear.
However, since the pairs have a consistent ordering and follow a basic pattern, I suspect that there is some numeric formula I can use to calculate their values directly—without programmatically creating a list of them.
What I want is a function f(n,i) that would give me the values in the ith pair in the array of pairs for n, for example:
f(3,2) => [1,2]
f(6,8) => [1,5]
Alternatively, it'd be fine to have two functions: One, g(n,i), that returns the first pair-element and another, h(n,i), that returns the second. Like this:
g(3,2) => 1
h(3,2) => 2
g(6,8) => 1
h(6,8) => 5
Is there a formula that can calculate those numbers?
Note: I am not looking for an algorithm to generate the combinations arrays. I have that already. I want to avoid generating array combinations and simply calculate the combination values directly, numerically.

f(n, i):
m = (n - 1) * n / 2 # error check i <= m
i = m - i # zero-based index
t = floor((sqrt(8 * i + 1) - 1) / 2)
r = i - t * (t + 1) / 2
[n - t - 2, n - r - 1]
The trick is to count backward from the end. Otherwise you're basically looking to find the triangular number preceding i and calculating relative to that.
Wikipedia has many properties on triangular roots including the formula used above to derive the triangular root.

Credit to #shawnt00 for the basic idea of inverting the triangular number; I used x = (sqrt(8*i + 1) - 1)//2 as the triangular root, which worked out.
def find(n, i):
m = n * (n - 1) // 2
i = m - i - 1
t = (sqrt(8 * i + 1) - 1)//2
return (n - t - 2, n - 1 - (i - t * (t + 1) // 2))

Related

Maximum adjacent product sum (Interview question)

We have an array of integers where integer in each position is seen as its value. Each time a position is selected, you will earn the amount associated with it multiplied by its adjacent position's value (left and right side). After a position has been selected it would be removed from the array and its left and right positions would become adjacent to each other.
If there are no adjacent positions assume a value of 1 for the same. For example, if there is only single position left and you select it then it's value will be multiplied by 1 as both left and right adjacent positions.
Find out what can be maximum amount earned at the end after selecting all positions.
I have implemented a dynamic programming approach to it using the following recurrence relation : First we observe that if we somehow in the process as mentioned in question encounter a step where we multiply arr[position_p] and arr[position_q], then all positions in between position_p and position_q should have already been chosen, if any.
For simplicity let us assume array indices start from 1 and position 0 and position n+1 contain value 1 in accordance with the question, where n is the number of elements in array.
So we need to select positions p+1 to q-1 in such an order that maximizes the amount.
Using this, we obtain recurrence relation :
If f(p,q) is maximum amount obtained by choosing only from positions p+1 to q-1, then we have :
f(p, q) = max ( f(p,k) + f(k,q) + arr[p] * arr[k] * arr[q] ) for k between p and q (Excluding p and q)
where k is last position chosen from positions p+1 to q-1 before choosing either p or q
And here is the python implementation :
import numpy as np
n = int(input("Enter the no. of inputs : "))
arr = [1]
arr = arr + list( map( int, input("Enter the list : ").split() ) )
arr.append(1)
# matrix created to memoize values instead of recomputing
mat = np.zeros( (n+2, n+2), dtype = "i8" )
# Bottom-up dynamic programming approach
for row in range ( n + 1, -1, -1 ) :
for column in range ( row + 2, n + 2 ) :
# This initialization to zero may not work when there are negative integers in the list.
max_sum = 0
# Recurrence relation
# mat[row][column] should have the maximmum product sum from indices row+1 until column-1
# And arr[row] and arr[column] are boundary values for sub_array
# By above notation, if column <= row + 1, then there would be no elements between them and thus mat[row][column] should remain zero
for k in range ( row + 1 , column ) :
max_sum = max( max_sum, mat[row][k] + mat[k][column] + ( arr[row] * arr[k] * arr[column] ) )
mat[row][column] = max_sum
print(mat[0][n+1])
The problem is that I have seen the following question in a programming round of interview before some time back. Though my solution seems to be working, it has O(n^3) time complexity and O(n^2) space complexity.
Can I do better, what about the case when all values of array positions are positive (original question assumes this). And any help on reducing space complexity is also appreciated.
Thank you.
Edit :
Though this is no proof, as suggested by #risingStark I have seen the same question on LeetCode also where all correct algorithms seem to have used O(n^2) space running in O(n^3) time for general case solution.

How to find contiguous subarray of integers in an array from n arrays such that the sum of elements of such contiguous subarrays is minimum

Input: n arrays of integers of length p.
Output: An array of p integers built by copying contiguous subarrays of the input arrays into matching indices of the output, satisfying the following conditions.
At most one subarray is used from each input array.
Every index of the output array is filled from exactly one subarray.
The output array has the minimum possible sum.
Suppose I have 2 arrays:
[1,7,2]
[2,1,8]
So if I choose a subarray [1,7] from array 1 and subarray [8] from array 2. since these 2 subarrays are not overlapping for any index and are contiguous. We are also not taking any subarray twice from an array from which we have already chosen a subarray.
We have the number of elements in the arrays inside the collection = 2 + 1 = 3, which is the same as the length of the individual array (i.e. len(array 1) which is equal to 3). So, this collection is valid.
The sum here for [1,7] and [8] is 1 + 7 + 8 = 16
We have to find a collection of such subarrays such that the total sum of the elements of subarrays is minimum.
A solution to the above 2 arrays would be a collection [2,1] from array 1 and [2] from array 2.
This is a valid collection and the sum is 2 + 1 + 2 = 5 which is the minimum sum for any such collection in this case.
I cannot think of any optimal or correct approach, so I need help.
Some Ideas:
I tried a greedy approach by choosing minimum elements from all array for a particular index since the index is always increasing (non-overlapping) after a valid choice, I don't have to bother about storing minimum value indices for every array. But this approach is clearly not correct since it will visit the same array twice.
Another method I thought was to start from the 0th index for all arrays and start storing their sum up to k elements for every array since the no. of arrays are finite, I can store the sum upto k elements in an array. Now I tried to take a minimum across these sums and for a "minimum sum", the corresponding subarray giving this sum (i.e. k such elements in that array) can be a candidate for a valid subarray of size k, thus if we take this subarray, we can add a k + 1-th element corresponding to every array into their corresponding sum and if the original minimum still holds, then we can keep on repeating this step. When the minima fail, we can consider the subarray up to the index for which minima holds and this will be a valid starting subarray. However, this approach will also clearly fail because there could exist another subarray of size < k giving minima along with remaining index elements from our subarray of size k.
Sorting is not possible either, since if we sort then we are breaking consecutive condition.
Of course, there is a brute force method too.
I am thinking, working through a greedy approach might give a progress in the approach.
I have searched on other Stackoverflow posts, but couldn't find anything which could help my problem.
To get you started, here's a recursive branch-&-bound backtracking - and potentially exhaustive - search. Ordering heuristics can have a huge effect on how efficient these are, but without mounds of "real life" data to test against there's scant basis for picking one over another. This incorporates what may be the single most obvious ordering rule.
Because it's a work in progress, it prints stuff as it goes along: all solutions found, whenever they meet or beat the current best; and the index at which a search is cut off early, when that happens (because it becomes obvious that the partial solution at that point can't be extended to meet or beat the best full solution known so far).
For example,
>>> crunch([[5, 6, 7], [8, 0, 3], [2, 8, 7], [8, 2, 3]])
displays
new best
L2[0:1] = [2] 2
L1[1:2] = [0] 2
L3[2:3] = [3] 5
sum 5
cut at 2
L2[0:1] = [2] 2
L1[1:3] = [0, 3] 5
sum 5
cut at 2
cut at 2
cut at 2
cut at 1
cut at 1
cut at 2
cut at 2
cut at 2
cut at 1
cut at 1
cut at 1
cut at 0
cut at 0
So it found two ways to get a minimal sum 5, and the simple ordering heuristic was effective enough that all other paths to full solutions were cut off early.
def disp(lists, ixs):
from itertools import groupby
total = 0
i = 0
for k, g in groupby(ixs):
j = i + len(list(g))
chunk = lists[k][i:j]
total += sum(chunk)
print(f"L{k}[{i}:{j}] = {chunk} {total}")
i = j
def crunch(lists):
n = len(lists[0])
assert all(len(L) == n for L in lists)
# Start with a sum we know can be beat.
smallest_sum = sum(lists[0]) + 1
smallest_ixs = [None] * n
ixsofar = [None] * n
def inner(i, sumsofar, freelists):
nonlocal smallest_sum
assert sumsofar <= smallest_sum
if i == n:
print()
if sumsofar < smallest_sum:
smallest_sum = sumsofar
smallest_ixs[:] = ixsofar
print("new best")
disp(lists, ixsofar)
print("sum", sumsofar)
return
# Simple greedy heuristic: try available lists in the order
# of smallest-to-largest at index i.
for lix in sorted(freelists, key=lambda lix: lists[lix][i]):
L = lists[lix]
newsum = sumsofar
freelists.remove(lix)
# Try all slices in L starting at i.
for j in range(i, n):
newsum += L[j]
# ">" to find all smallest answers;
# ">=" to find just one (potentially faster)
if newsum > smallest_sum:
print("cut at", j)
break
ixsofar[j] = lix
inner(j + 1, newsum, freelists)
freelists.add(lix)
inner(0, 0, set(range(len(lists))))
How bad is brute force?
Bad. A brute force way to compute it: say there are n lists each with p elements. The code's ixsofar vector contains p integers each in range(n). The only constraint is that all occurrences of any integer that appears in it must be consecutive. So a brute force way to compute the total number of such vectors is to generate all p-tuples and count the number that meet the constraints. This is woefully inefficient, taking O(n**p) time, but is really easy, so hard to get wrong:
def countb(n, p):
from itertools import product, groupby
result = 0
seen = set()
for t in product(range(n), repeat=p):
seen.clear()
for k, g in groupby(t):
if k in seen:
break
seen.add(k)
else:
#print(t)
result += 1
return result
For small arguments, we can use that as a sanity check on the next function, which is efficient. This builds on common "stars and bars" combinatorial arguments to deduce the result:
def count(n, p):
# n lists of length p
# for r regions, r from 1 through min(p, n)
# number of ways to split up: comb((p - r) + r - 1, r - 1)
# for each, ff(n, r) ways to spray in list indices = comb(n, r) * r!
from math import comb, prod
total = 0
for r in range(1, min(n, p) + 1):
total += comb(p-1, r-1) * prod(range(n, n-r, -1))
return total
Faster
Following is the best code I have for this so far. It builds in more "smarts" to the code I posted before. In one sense, it's very effective. For example, for randomized p = n = 20 inputs it usually finishes within a second. That's nothing to sneeze at, since:
>>> count(20, 20)
1399496554158060983080
>>> _.bit_length()
71
That is, trying every possible way would effectively take forever. The number of cases to try doesn't even fit in a 64-bit int.
On the other hand, boost n (the number of lists) to 30, and it can take an hour. At 50, I haven't seen a non-contrived case finish yet, even if left to run overnight. The combinatorial explosion eventually becomes overwhelming.
OTOH, I'm looking for the smallest sum, period. If you needed to solve problems like this in real life, you'd either need a much smarter approach, or settle for iterative approximation algorithms.
Note: this is still a work in progress, so isn't polished, and prints some stuff as it goes along. Mostly that's been reduced to running a "watchdog" thread that wakes up every 10 minutes to show the current state of the ixsofar vector.
def crunch(lists):
import datetime
now = datetime.datetime.now
start = now()
n = len(lists[0])
assert all(len(L) == n for L in lists)
# Start with a sum we know can be beat.
smallest_sum = min(map(sum, lists)) + 1
smallest_ixs = [None] * n
ixsofar = [None] * n
import threading
def watcher(stop):
if stop.wait(60):
return
lix = ixsofar[:]
while not stop.wait(timeout=600):
print("watch", now() - start, smallest_sum)
nlix = ixsofar[:]
for i, (a, b) in enumerate(zip(lix, nlix)):
if a != b:
nlix.insert(i,"--- " + str(i) + " -->")
print(nlix)
del nlix[i]
break
lix = nlix
stop = threading.Event()
w = threading.Thread(target=watcher, args=[stop])
w.start()
def inner(i, sumsofar, freelists):
nonlocal smallest_sum
assert sumsofar <= smallest_sum
if i == n:
print()
if sumsofar < smallest_sum:
smallest_sum = sumsofar
smallest_ixs[:] = ixsofar
print("new best")
disp(lists, ixsofar)
print("sum", sumsofar, now() - start)
return
# If only one input list is still free, we have to take all
# of its tail. This code block isn't necessary, but gives a
# minor speedup (skips layers of do-nothing calls),
# especially when the length of the lists is greater than
# the number of lists.
if len(freelists) == 1:
lix = freelists.pop()
L = lists[lix]
for j in range(i, n):
ixsofar[j] = lix
sumsofar += L[j]
if sumsofar >= smallest_sum:
break
else:
inner(n, sumsofar, freelists)
freelists.add(lix)
return
# Peek ahead. The smallest completion we could possibly get
# would come from picking the smallest element in each
# remaining column (restricted to the lists - rows - still
# available). This probably isn't achievable, but is an
# absolute lower bound on what's possible, so can be used to
# cut off searches early.
newsum = sumsofar
for j in range(i, n): # pick smallest from column j
newsum += min(lists[lix][j] for lix in freelists)
if newsum >= smallest_sum:
return
# Simple greedy heuristic: try available lists in the order
# of smallest-to-largest at index i.
sortedlix = sorted(freelists, key=lambda lix: lists[lix][i])
# What's the next int in the previous slice? As soon as we
# hit an int at least that large, we can do at least as well
# by just returning, to let the caller extend the previous
# slice instead.
if i:
prev = lists[ixsofar[i-1]][i]
else:
prev = lists[sortedlix[-1]][i] + 1
for lix in sortedlix:
L = lists[lix]
if prev <= L[i]:
return
freelists.remove(lix)
newsum = sumsofar
# Try all non-empty slices in L starting at i.
for j in range(i, n):
newsum += L[j]
if newsum >= smallest_sum:
break
ixsofar[j] = lix
inner(j + 1, newsum, freelists)
freelists.add(lix)
inner(0, 0, set(range(len(lists))))
stop.set()
w.join()
Bounded by DP
I've had a lot of fun with this :-) Here's the approach they were probably looking for, using dynamic programming (DP). I have several programs that run faster in "smallish" cases, but none that can really compete on a non-contrived 20x50 case. The runtime is O(2**n * n**2 * p). Yes, that's more than exponential in n! But it's still a minuscule fraction of what brute force can require (see above), and is a hard upper bound.
Note: this is just a loop nest slinging machine-size integers, and using no "fancy" Python features. It would be easy to recode in C, where it would run much faster. As is, this code runs over 10x faster under PyPy (as opposed to the standard CPython interpreter).
Key insight: suppose we're going left to right, have reached column j, the last list we picked from was D, and before that we picked columns from lists A, B, and C. How can we proceed? Well, we can pick the next column from D too, and the "used" set {A, B, C} doesn't change. Or we can pick some other list E, the "used" set changes to {A, B, C, D}, and E becomes the last list we picked from.
Now in all these cases, the details of how we reached state "used set {A, B, C} with last list D at column j" make no difference to the collection of possible completions. It doesn't matter how many columns we picked from each, or the order in which A, B, C were used: all that matters to future choices is that A, B, and C can't be used again, and D can be but - if so - must be used immediately.
Since all ways of reaching this state have the same possible completions, the cheapest full solution must have the cheapest way of reaching this state.
So we just go left to right, one column at a time, and remember for each state in the column the smallest sum reaching that state.
This isn't cheap, but it's finite ;-) Since states are subsets of row indices, combined with (the index of) the last list used, there are 2**n * n possible states to keep track of. In fact, there are only half that, since the way sketched above never includes the index of the last-used list in the used set, but catering to that would probably cost more than it saves.
As is, states here are not represented explicitly. Instead there's just a large list of sums-so-far, of length 2**n * n. The state is implied by the list index: index i represents the state where:
i >> n is the index of the last-used list.
The last n bits of i are a bitset, where bit 2**j is set if and only if list index j is in the set of used list indices.
You could, e.g., represent these by dicts mapping (frozenset, index) pairs to sums instead, but then memory use explodes, runtime zooms, and PyPy becomes much less effective at speeding it.
Sad but true: like most DP algorithms, this finds "the best" answer but retains scant memory of how it was reached. Adding code to allow for that is harder than what's here, and typically explodes memory requirements. Probably easiest here: write new to disk at the end of each outer-loop iteration, one file per column. Then memory use isn't affected. When it's done, those files can be read back in again, in reverse order, and mildly tedious code can reconstruct the path it must have taken to reach the winning state, working backwards one column at a time from the end.
def dumbdp(lists):
import datetime
_min = min
now = datetime.datetime.now
start = now()
n = len(lists)
p = len(lists[0])
assert all(len(L) == p for L in lists)
rangen = range(n)
USEDMASK = (1 << n) - 1
HUGE = sum(sum(L) for L in lists) + 1
new = [HUGE] * (2**n * n)
for i in rangen:
new[i << n] = lists[i][0]
for j in range(1, p):
print("working on", j, now() - start)
old = new
new = [HUGE] * (2**n * n)
for key, g in enumerate(old):
if g == HUGE:
continue
i = key >> n
new[key] = _min(new[key], g + lists[i][j])
newused = (key & USEDMASK) | (1 << i)
for i in rangen:
mask = 1 << i
if newused & mask == 0:
newkey = newused | (i << n)
new[newkey] = _min(new[newkey],
g + lists[i][j])
result = min(new)
print("DONE", result, now() - start)
return result

Maximize number of inversion count in array

We are given an unsorted array A of integers (duplicates allowed) with size N possibly large. We can count the number of pairs with indices i < j, for which A[i] < A[j], let's call this X.
We can change maximum one element from the array with a cost equal to the difference in absolute values (for instance, if we replace element on index k with the new number K, the cost Y is | A[k] - K |).
We can only replace this element with other elements found in the array.
We want to find the minimum possible value of X + Y.
Some examples:
[1,2,2] should return 1 (change the 1 to 2 such that the array becomes [2,2,2])
[2,2,3] should return 1 (change the 3 to 2)
[2,1,1] should return 0 (because no changes are necessary)
[1,2,3,4] should return 6 (this is already the minimum possible value)
[4,4,5,5] should return 3 (this can accomplished by changing the first 4 into a 5 or the last 5 in a 4)
The number of pairs can be found with a naive O(n²) solution, here in Python:
def calc_x(arr):
n = len(arr)
cnt = 0
for i in range(n):
for j in range(i+1, n):
if arr[j] > arr[i]:
cnt += 1
return cnt
A brute-force solution is easily written as for example:
def f(arr):
best_val = calc_x(arr)
used = set(arr)
for i, v in enumerate(arr):
for replacement in used:
if replacement == v:
continue
arr2 = arr[0:i] + replacement + arr[i:]
y = abs(replacement - v)
x = calc_x(arr2)
best_val = min(best_val, x + y)
return best_val
We can count for each element the number of items right of it larger than itself in O(n*log(n)) using for instance an AVL-tree or some variation on merge sort.
However, we still have to search which element to change and what improvement it can achieve.
This was given as an interview question and I would like some hints or insights as how to solve this problem efficiently (data structures or algorithm).
Definitely go for a O(n log n) complexity when counting inversions.
We can see that when you change a value at index k, you can either:
1) increase it, and then possibly reduce the number of inversions with elements bigger than k, but increase the number of inversions with elements smaller than k
2) decrease it (the opposite thing happens)
Let's try not to count x every time you change a value. What do you need to know?
In case 1):
You have to know how many elements on the left are smaller than your new value v and how many elements on the right are bigger than your value. You can pretty easily check that in O (n). So what is your x now? You can count it with the following formula:
prev_val - your previous value
prev_x - x that you've counted at the beginning of your program
prev_l - number of elements on the left smaller than prev_val
prev_r - number of elements on the right bigger than prev_val
v - new value
l - number of elements on the right smaller than v
r - number of elements on the right bigger than v
new_x = prev_x + r + l - prev_l - prev_r
In the second case you pretty much do the opposite thing.
Right now you get something like O( n^3 ) instead of O (n^3 log n), which is probably still bad. Unfortunately that's all what I came up for now. I'll definitely tell you if I come up with sth better.
EDIT: What about memory limit? Is there any? If not, you can just for each element in the array make two sets with elements before and after the current one. Then you can find the amount of smaller/bigger in O (log n), making your time complexity O (n^2 log n).
EDIT 2: We can also try to check, what element would be the best to change to a value v, for every possible value v. You can make then two sets and add/erase elements from them while checking for every element, making the time complexity O(n^2 log n) without using too much space. So the algorithm would be:
1) determine every value v that you can change any element, calculate x
2) for each possible value v:
make two sets, push all elements into the second one
for each element e in array:
add previous element (if there's any) to the first set and erase element e from the second set, then count number of bigger/smaller elements in set 1 and 2 and calculate new x
EDIT 3: Instead of making two sets, you could go with prefix sum for a value. That's O (n^2) already, but I think we can go even better than this.

Optimize parameters of a pairwise distance function in Matlab

This question is related to matlab: find the index of common values at the same entry from two arrays.
Suppose that I have an 1000 by 10000 matrix that contains value 0,1,and 2. Each row are treated as a sample. I want to calculate the pairwise distance between those samples according to the formula d = 1-1/(2p)sum(a/c+b/d) where a,b,c,d can treated as as the row vector of length 10000 according to some definition and p=10000. c and d are probabilities such that c+d=1.
An example of how to find the values of a,b,c,d: suppose we want to find d between sample i and bj, then I look at row i and j.
If kth entry of row i and j has value 2 and 2, then a=2,b=0,c=1,d=0 (I guess I will assign 0/0=0 in this case).
If kth entry of row i and j has value 2 and 1 or vice versa, then a=1,b=0,c=3/4,d=1/4.
The similar assignment will give to the case for 2,0(a=0,b=0,c=1/2,d=1/2),1,1(a=1,b=1,c=1/2,d=1/2),1,0(a=0,b=1,c=1/4,d=3/4),0,0(a=0,b=2,c=0,d=1).
The matlab code I have so far is using for loops for i and j, then find the cases above by using find, then create two arrays for a/c and b/d. This is extremely slow, is there a way that I can improve the efficiency?
Edit: the distance d is the formula given in this paper on page 13.
Provided those coefficients are fixed, then I think I've successfully vectorised the distance function. Figuring out the formulae was fun. I flipped things around a bit to minimise division, and since I wasn't aware of pdist until #horchler's comment, you get it wrapped in loops with the constants factored out:
% m is the data
[n p] = size(m, 1);
distance = zeros(n);
for ii=1:n
for jj=ii+1:n
a = min(m(ii,:), m(jj,:));
b = 2 - max(m(ii,:), m(jj,:));
c = 4 ./ (m(ii,:) + m(jj,:));
c(c == Inf) = 0;
d = 1 - c;
distance(ii,jj) = sum(a.*c + b.*d);
% distance(jj,ii) = distance(ii,jj); % optional for the full matrix
end
end
distance = 1 - (1 / (2 * p)) * distance;

Why is the average number of steps for finding an item in an array N/2?

Could somebody explain why the average number of steps for finding an item in an unsorted array data-structure is N/2?
This really depends what you know about the numbers in the array. If they're all drawn from a distribution where all the probability mass is on a single value, then on expectation it will take you exactly 1 step to find the value you're looking for, since every value is the same, for example.
Let's now make a pretty strong assumption, that the array is filled with a random permutation of distinct values. You can think of this as picking some arbitrary sorted list of distinct elements and then randomly permuting it. In this case, suppose you're searching for some element in the array that actually exists (this proof breaks down if the element is not present). Then the number of steps you need to take is given by X, where X is the position of the element in the array. The average number of steps is then E[X], which is given by
E[X] = 1 Pr[X = 1] + 2 Pr[X = 2] + ... + n Pr[X = n]
Since we're assuming all the elements are drawn from a random permutation,
Pr[X = 1] = Pr[X = 2] = ... = Pr[X = n] = 1/n
So this expression is given by
E[X] = sum (i = 1 to n) i / n = (1 / n) sum (i = 1 to n) i = (1 / n) (n)(n + 1) / 2
= (n + 1) / 2
Which, I think, is the answer you're looking for.
The question as stated is just wrong. Linear search may perform better.
Perhaps a simpler example that shows why the average is N/2 is this:
Assume you have an unsorted array of 10 items: [5, 0, 9, 8, 1, 2, 7, 3, 4, 6]. This is all the digits [0..9].
Since the array is unsorted (i.e. you know nothing about the order of the items), the only way you can find a particular item in the array is by doing a linear search: start at the first item and go until you find what you're looking for, or you reach the end.
So let's count how many operations it takes to find each item. Finding the first item (5) takes only one operation. Finding the second item (0) takes two. Finding the last item (6) takes 10 operations. The total number of operations required to find all 10 items is 1+2+3+4+5+6+7+8+9+10, or 55. The average is 55/10, or 5.5.
The "linear search takes, on average, N/2 steps" conventional wisdom makes a number of assumptions. The two biggest are:
The item you're looking for is in the array. If an item isn't in the array, then it takes N steps to determine that. So if you're often looking for items that aren't there, then your average number of steps per search is going to be much higher than N/2.
On average, each item is searched for approximately as often as any other item. That is, you search for "6" as often as you search for "0", etc. If some items are looked up significantly more often than others, then the average number of steps per search is going to be skewed in favor of the items that are searched for more frequently. The number will be higher or lower than N/2, depending on the positions of the most frequently looked-up items.
While I think templatetypedef has the most instructive answer, in this case there is a much simpler one.
Consider permutations of the set {x1, x2, ..., xn} where n = 2m. Now take some element xi you wish to locate. For each permutation where xi occurs at index m - k, there is a corresponding mirror image permutation where xi occurs at index m + k. The mean of these possible indices is just [(m - k) + (m + k)]/2 = m = n/2. Therefore the mean of all all possible permutations of the set is n/2.
Consider a simple reformulation of the question:
What would be the limit of
lim (i->inf) of (sum(from 1 to i of random(n)) /i)
Or in C:
int sum = 0, i;
for (i = 0; i < LARGE_NUM; i++) sum += random(n);
sum /= LARGE_NUM;
If we assume that our random have even distribution of values (each value from 1 to n is equally likely to be produced), then the expected result would be (1+n)/2.

Resources