Labview Array Index Tracking - arrays

Currently working with a pretty simple XY plot (Y values from a random generator, and X values from the while loop count). These are both stored in arrays and at certain X thresholds, the Y array will be decimated to certain factors (10, 100, 1000...).
However my goal with this VI is to be able to decimate in "chunks." So in other words, every 1,000-point chunk, decimate the array with a factor of 10. And every 10,000-point chunk, decimate with a factor of 100. After each of these chunks, the arrays should continue to index at +1 until they reach another "chunk" and then be decimated appropriately.
For example;
Index: 998, 999, 1000, 1001... Decimate Factor 10
1998, 1999, 2000, 2001... Decimate Factor 10
...
9998, 9999, 10000, 10001... Decimate Factor 100
(my current setup permanently changes the decimation factor once it reaches a certain X value, and from then on will only record data points in increments, of 10, 100, 1000...).
Thanks for any help! See code below

Answered as an edit on the original thread this question was asked:
Labview - Increasing Array Index with Array Size Limiting
Copying info from there:
EDIT: #JonathanVahala was asking about using configurable decimation below. See this image which shows a way to do this:

Related

How to scale very large numbers such that they could be represented as an array index?

I have a 2D array of size 30*70.
I have 70 columns. My values are very large ranging from 8066220960081 to (some number with same power of 10 as lowerlimit) and I need to plot a scatter plot in an array. How do I index into the array given very large values?
Also, I need to do this in kernel space
Let's take an array long long int A with large values.
A[0] = 393782040
A[1] = 2*393782040
... and so on
A[N] = 8066220960081; where N = 30*70 - 1
We can scale A with a factor or we can shift A by a certain number and scale it again. That's where you can deal with numbers ranging between 0 and 1 or -1 and 1 or x and y. You choose as per your need. Theoretically, this should not make a difference to the scatter plot other than the placement of the axis. However, if your scatter plot is also a representative of the underlying values i.e. the dots are proportional to values; then it is a good idea to be nice to your plotting tool and not flood it with terribly large values that might lead to overflow depending on how the code for plotting is written.
PS: I would assume you know how to flatten a 2d array.
I just ended up doing regular interval calculation between max and min
and then start from min + interval*index to get the number.
index would be the index in array.

How to find the max values of reshape-able array1 in a rolling window of variable size given array1 and array2 are equal length before padding?

IGNORE EVERYTHING AND SKIP TO EDIT AT THE BOTTOM FOR CONDENSED EXPLANATION
FULL QUESTION:
How can I find the index of the max values of a reshape-able array of observed values in a rolling window of variable size given that the array of observed values corresponds to an array of observation times, and given that both arrays must be padded at identical indices?
SHORT PREFACE:
(I am having trouble fixing a code that is long and has a lot of moving parts. As such, I have only provided information I feel necessary to address my question and have left out other details that would make this post even longer than it is, though I can post a workable version of the code if requested.)
SETUP:
I have a text file containing times of observations and observed values at those times. I read the contents of the text file into the appropriate lists with the goal of performing a 'maximum self-similarity test', which entails finding the maximum value in a rolling window over an entire list of values; however, the first rolling window is 2 elements wide (check indices 0 and 1, then 2 and 3, ..., len(data)-1 and len(data)), then 4 elements wide, then 8 elements wide, etc. Assuming an array of 100 elements (I actually have around 11,000 data points), the last rolling window that is 8 elements wide will be disregarded because it is incomplete. To do this (with some help from SO), I first defined a function to reshape an array such that it can be called by a parent function.
def shapeshifter(ncol, my_array=data):
my_array = np.array(my_array)
desired_size_factor = np.prod([n for n in ncol if n != -1])
if -1 in ncol: # implicit array size
desired_size = my_array.size // desired_size_factor * desired_size_factor
else:
desired_size = desired_size_factor
return my_array.flat[:desired_size].reshape(ncol)
The parent function that calls this will loop over each row to find the maximums.
def looper(ncol, my_array=data):
my_array = shapeshifter((-1, ncol))
rows = [my_array[index] for index in range(len(my_array))]
res = []
for index in range(len(rows)):
res.append( max(rows[index]) )
return res
And looper is called by a grandparent function that will change the size of the window for which the maximum values are obtained.
def metalooper(window_size, my_array=data):
outer = [looper(win) for win in window_size]
return outer
The next line calls the grandparent function, which in turns calls the sub-functions. In the line below, window_size is a predefined list of window sizes (ex: [2,4,8,16,...]).
ans = metalooper(window_size)
PURPOSE (can remove if unnecessary):
The function metalooper should return a list of sublists, for which each sublist contains the maximum elements of the rolling window. I then "normalize" (for lack of a better word) each value in the sublists by taking the logarithmic value of each maximum, only then to sum the elements of each sublist (such that the number of sums equals the number of sublists). Each sum is then divided by its respective weight, which gives the y-values that will be plotted against the window sizes. This plot should be piecewise (linear or power-law).
PROBLEM:
My array of data points only contains the observed values and not the times (all of which I have converted into hours) that correspond to the observations. The times are not consecutive, so there may be an observation at 4 hrs, another at 7 hrs, another at 7.3 hrs, etc. My first mistake is not padding zeroes for non-consecutive times (ex: observation_1 at 4 hrs, observation_2 at 6 hours ==> observed_value = 0 at 5 hrs) as I should have moved the rolling window over the hours of observation (ex: window size of 2 means between [0,2) hours, [2,4) hours, etc) instead of the observed values at those times. However, my problem is compounded by the fact that there are also duplicate hours that fit within a window (ex: if multiple observations are made at 1 and 1.1 hours within a window of [0,2); regardless, I should find the maximum observed value in each rolling window, which entails knowing which observed values correspond to which times of observation without disregarding padded zeroes. However, how can I efficiently pad zeroes at identical indices in both lists? I am aware that I can floor the hours of observation to check which window an observed value should fall into. However, I am unsure how to proceed after this point as well - if I can pad both lists and if could find the index of the maximum observed value for each window, I can then us that index to get the desired observed value and the corresponding time of observation; I do not know how to do this or where to begin with this as my approach with for-looping lists is extremely slow. I would appreciate any help or advice on how to fix this. (Apologies for the length of this post, not sure how to condense beyond this). I would prefer to adapt my existing approach, but am open to alternatives if my method is too ridiculous.
EDIT:
To see how these functions work, let's use an example list data.
>> data = np.array(np.linspace(1,20,20))
# data corresponds to the observed values and not the observation times,
# below is a proof of concept using the values in data
>> print(shapeshifter((-1,2))) # 2 columns, -1 is always there
[[ 1. 2.]
[ 3. 4.]
[ 5. 6.]
[ 7. 8.]
[ 9. 10.]
[ 11. 12.]
[ 13. 14.]
[ 15. 16.]
[ 17. 18.]
[ 19. 20.]]
>> print(looper(2)) # get maximum in window_size (AKA length of each row) of 2 for each row of reshaped array via shapeshifter
[2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0]
def window_params(dataset=data): # concerned with window_size
numdata = len(dataset) ## N = 11764
lim = np.floor(np.log2(numdata)) ## last term of j = 13
time_sc_index = np.linspace(1, lim, num=lim) ## j = [1,2,3,...,floor(log_2(N))=13]
window_size = [2**j for j in time_sc_index] ## scale = [2,4,8,...,8192]
block_size = np.floor([numdata/sc for sc in window_size]) ## b_j (sc ~ scale)
return numdata, time_sc_index, window_size, block_size
numdata, time_sc_index, window_size, block_size = window_params()
>> print(window_size)
[2.0, 4.0, 8.0, 16.0]
>> print(metalooper(window_size)) # call looper for each window_size
[[2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0], [4.0, 8.0, 12.0, 16.0, 20.0], [8.0, 16.0], [16.0]]
My issue is that these observations each correspond to different times. The list of times can be something like
times = [0, 4, 6, 6, 9, ...] # times are floored, duplicate times correspond to multiple observations at floored times
I need have a list of consecutive times [0, 1, 2, 3, ...], each of which correspond to an observed value from the list data (as each data point is observed at a specific time). My goal is to find the maximum observed value in each window of times. Using the times above, the observed value at time=0 is data[0] and the observed value at time=1 is 0 since there is no observation at that time. Similarly, I would use the maximum observed value at duplicate times; in other words, I have 2 observations at time=6, so I would want the maximum observed value at that time. While my windows roll over only observed values, I actually need the windows to roll over all hours (including time=1 per this example) to find the maximum observed values at those times. In such a case, rolling a window over a time range that contains duplicate times should only count one of the duplicate times - specifically the one that corresponds to the maximum observed value at that time. My thinking is to pad zeroes into both lists (times and data) such that the index of times corresponds to the index of data. I'm trying to find an efficient way to proceed, though I'm having trouble figuring out a way to proceed at all.

Sampling realizations in multidimensional array in R

Basically, I want to sample 9 independent realizations for the uniform distribution U(0,1) 2578 times, and this works fine, using either
replicate(2578,{runif(9,0,1)})
or
F=c()
for (i in 1:2578){
F[i,]=runif(9,0,1)
}
Now I want this to be repeated let's say 10 times, i.e. creating 10 new 2578x9 samples. I want to create a multidimensional array, or to visualize it better a rectangular parallelepiped with length 9, height 2578, and width whatever (10, 1000, 100000, ...). How can I achieve this?
I think your simulated data could benefit from being structured directly into an array: that would make them much easier to handle:
dims <- c(2578, 9, 100)
tmp <- runif(prod(dims))
A <- array(tmp, dims)

Find an element in an array, but the element can jump

There is an array where all but one of the cells are 0, and we want to find the index of that single non-zero cell. The problem is, every time that you check for a cell in this array, that non-zero element will do one of the following:
move forward by 1
move backward by 1
stay where it is.
For example, if that element is currently at position 10, and I check what is in arr[5], then the element may be at position 9, 10 or 11 after I checked arr[5].
We only need to find the position where the element is currently at, not where it started at (which is impossible).
The hard part is, if we write a for loop, there really is no way to know if the element is currently in front of you, or behind you.
Some more context if it helps:
The interviewer did give a hint which is maybe I should move my pointer back after checking x-number of cells. The problem is, when should I move back, and by how many slots?
While "thinking out loud", I started saying a bunch of common approaches hoping that something would hit. When I said recursion, the interviewer did say "recursion is a good start". I don't know recursion really is the right approach, because I don't see how I can do recursion and #1 at the same time.
The interviewer said this problem can't be solved in O(n^2). So we are looking at at least O(n^3), or maybe even exponential.
Tl;dr: Your best bet is to keep checking each even index in the array in turn, wrapping around as many times as necessary until you find your target. On average you will stumble upon your target in the middle of your second pass.
First off, as many have already said, it is indeed impossible to ensure you will find your target element in any given amount of time. If the element knows where your next sample will be, it can always place itself somewhere else just in time. The best you can do is to sample the array in a way that minimizes the expected number of accesses - and because after each sample you learn nothing except if you were successful or not and a success means you stop sampling, an optimal strategy can be described simply as a sequence of indexes that should be checked, dependent only on the size of the array you're looking through. We can test each strategy in turn via automated means to see how well they perform. The results will depend on the specifics of the problem, so let's make some assumptions:
The question doesn't specify the starting position our target. Let us assume that the starting position is chosen uniformly from across the entire array.
The question doesn't specify the probability our target moves. For simplicity let's say it's independent on parameters such as the current position in the array, time passed and the history of samples. Using the probability 1/3 for each option gives us the least information, so let's use that.
Let us test our algorithms on an array of 100 101 elements. Also, let us test each algorithm one million times, just to be reasonably sure about its average case behavior.
The algorithms I've tested are:
Random sampling: after each attempt we forget where we were looking and choose an entirely new index at random. Each sample has an independent 1/n chance of succeeding, so we expect to take n samples on average. This is our control.
Sweep: try each position in sequence until our target is found. If our target wasn't moving, this would take n/2 samples on average. Our target is moving, however, so we may miss it on our first sweep.
Slow sweep: the same, except we test each position several times before moving on. Proposed by Patrick Trentin with a slowdown factor of 30x, tested with a slowdown factor of 2x.
Fast sweep: the opposite of slow sweep. After the first sample we skip (k-1) cells before testing the next one. The first pass starts at ary[0], the next at ary[1] and so on. Tested with each speed up factor (k) from 2 to 5.
Left-right sweep: First we check each index in turn from left to right, then each index from right to left. This algorithm would be guaranteed to find our target if it was always moving (which it isn't).
Smart greedy: Proposed by Aziuth. The idea behind this algorithm is that we track each cell probability of holding our target, then always sampling the cell with the highest probability. On one hand, this algorithm is relatively complex, on the other hand it sounds like it should give us the optimal results.
Results:
The results are shown as [average] ± [standard derivation].
Random sampling: 100.889145 ± 100.318212
At this point I have realised a fencepost error in my code. Good thing we have a control sample. This also establishes that we have in the ballpark of two or three digits of useful precision (sqrt #samples), which is in line with other tests of this type.
Sweep: 100.327030 ± 91.210692
The chance of our target squeezing through the net well counteracts the effect of the target taking n/2 time on average to reach the net. The algorithm doesn't really fare any better than a random sample on average, but it's more consistent in its performance and it isn't hard to implement either.
slow sweep (x0.5): 128.272588 ± 99.003681
While the slow movement of our net means our target will probably get caught in the net during the first sweep and won't need a second sweep, it also means the first sweep takes twice as long. All in all, relying on the target moving onto us seems a little inefficient.
fast sweep x2: 75.981733 ± 72.620600
fast sweep x3: 84.576265 ± 83.117648
fast sweep x4: 88.811068 ± 87.676049
fast sweep x5: 91.264716 ± 90.337139
That's... a little surprising at first. While skipping every other step means we complete each lap in twice as many turns, each lap also has a reduced chance of actually encountering the target. A nicer view is to compare Sweep and FastSweep in broom-space: rotate each sample so that the index being sampled is always at 0 and the target drifts towards the left a bit faster. In Sweep, the target moves at 0, 1 or 2 speed each step. A quick parallel with the Fibonacci base tells us that the target should hit the broom/net around 62% of the time. If it misses, it takes another 100 turns to come back. In FastSweep, the target moves at 1, 2 or 3 speed each step meaning it misses more often, but it also takes half as much time to retry. Since the retry time drops more than the hit rate, it is advantageous to use FastSweep over Sweep.
Left-right sweep: 100.572156 ± 91.503060
Mostly acts like an ordinary sweep, and its score and standard derivation reflect that. Not too surprising a result.
Aziuth's smart greedy: 87.982552 ± 85.649941
At this point I have to admit a fault in my code: this algorithm is heavily dependent on its initial behavior (which is unspecified by Aziuth and was chosen to be randomised in my tests). But performance concerns meant that this algorithm will always choose the same randomized order each time. The results are then characteristic of that randomisation rather than of the algorithm as a whole.
Always picking the most likely spot should find our target as fast as possible, right? Unfortunately, this complex algorithm barely competes with Sweep 3x. Why? I realise this is just speculation, but let us peek at the sequence Smart Greedy actually generates: During the first pass, each cell has equal probability of containing the target, so the algorithm has to choose. If it chooses randomly, it could pick up in the ballpark of 20% of cells before the dips in probability reach all of them. Afterwards the landscape is mostly smooth where the array hasn't been sampled recently, so the algorithm eventually stops sweeping and starts jumping around randomly. The real problem is that the algorithm is too greedy and doesn't really care about herding the target so it could pick at the target more easily.
Nevertheless, this complex algorithm does fare better than both simple Sweep and a random sampler. it still can't, however, compete with the simplicity and surprising efficiency of FastSweep. Repeated tests have shown that the initial randomisation could swing the efficiency anywhere between 80% run time (20% speedup) and 90% run time (10% speedup).
Finally, here's the code that was used to generate the results:
class WalkSim
attr_reader :limit, :current, :time, :p_stay
def initialize limit, p_stay
#p_stay = p_stay
#limit = limit
#current = rand (limit + 1)
#time = 0
end
def poke n
r = n == #current
#current += (rand(2) == 1 ? 1 : -1) if rand > #p_stay
#current = [0, #current, #limit].sort[1]
#time += 1
r
end
def WalkSim.bench limit, p_stay, runs
histogram = Hash.new{0}
runs.times do
sim = WalkSim.new limit, p_stay
gen = yield
nil until sim.poke gen.next
histogram[sim.time] += 1
end
histogram.to_a.sort
end
end
class Array; def sum; reduce 0, :+; end; end
def stats histogram
count = histogram.map{|k,v|v}.sum.to_f
avg = histogram.map{|k,v|k*v}.sum / count
variance = histogram.map{|k,v|(k-avg)**2*v}.sum / (count - 1)
{avg: avg, stddev: variance ** 0.5}
end
RUNS = 1_000_000
PSTAY = 1.0/3
LIMIT = 100
puts "random sampling"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{y.yield rand (LIMIT + 1)}}
}
puts "sweep"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{0.upto(LIMIT){|i|y.yield i}}}
}
puts "x0.5 speed sweep"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{0.upto(LIMIT){|i|2.times{y.yield i}}}}
}
(2..5).each do |speed|
puts "x#{speed} speed sweep"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{speed.times{|off|off.step(LIMIT, speed){|i|y.yield i}}}}
}
end
puts "sweep LR"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{
0.upto(LIMIT){|i|y.yield i}
LIMIT.downto(0){|i|y.yield i}
}}
}
$sg_gen = Enumerator.new do |y|
probs = Array.new(LIMIT + 1){1.0 / (LIMIT + 1)}
loop do
ix = probs.each_with_index.map{|v,i|[v,rand,i]}.max.last
probs[ix] = 0
probs = [probs[0] * (1 + PSTAY)/2 + probs[1] * (1 - PSTAY)/2,
*probs.each_cons(3).map{|a, b, c| (a + c) / 2 * (1 - PSTAY) + b * PSTAY},
probs[-1] * (1 + PSTAY)/2 + probs[-2] * (1 - PSTAY)/2]
y.yield ix
end
end
$sg_cache = []
def sg_enum; Enumerator.new{|y| $sg_cache.each{|n| y.yield n}; $sg_gen.each{|n| $sg_cache.push n; y.yield n}}; end
puts "smart greedy"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {sg_enum}
no forget everything about loops.
copy this array to another array and then check what cells are now non-zero. for example if your main array is mainArray[] you can use:
int temp[sizeOfMainArray]
int counter = 0;
while(counter < sizeOfArray)
{
temp[counter] == mainArray[counter];
}
//then check what is non-zero in copied array
counter = 0;
while(counter < sizeOfArray)
{
if(temp[counter] != 0)
{
std::cout<<"I Found It!!!";
}
}//end of while
One approach perhaps :
i - Have four index variables f,f1,l,l1. f is pointing at 0,f1 at 1, l is pointing at n-1 (end of the array) and l1 at n-2 (second last element)
ii - Check the elements at f1 and l1 - are any of them non zero ? If so stop. If not, check elements at f and l (to see if the element has jumped back 1).
iii - If f and l are still zero, increment the indexes and repeat step ii. Stop when f1 > l1
Iff an equality check against an array index makes the non-zero element jump.
Why not think of a way where we don't really require an equality check with an array index?
int check = 0;
for(int i = 0 ; i < arr.length ; i++) {
check |= arr[i];
if(check != 0)
break;
}
Orrr. Maybe you can keep reading arr[mid]. The non-zero element will end up there. Some day. Reasoning: Patrick Trentin seems to have put it in his answer (somewhat, its not really that, but you'll get an idea).
If you have some information about the array, maybe we can come up with a niftier approach.
Ignoring the trivial case where the 1 is in the first cell of the array if you iterate through the array testing each element in turn you must eventually get to the position i where the 1 is in cell i+2. So when you read cell i+1 one of three things is going to happen.
The 1 stays where it is, you're going to find it next time you look
The 1 moves away from you, your back to the starting position with the 1 at i+2 next time
The 1 moves to cell you've just checked, it dodged your scan
Re-reading the i+1 cell will find the 1 in case 3 but just give it another chance to move in cases 1 and 2 so a strategy based on re-reading won't work.
My option would therefore to adopt a brute force approach, if I keep scanning the array then I'm going to hit case 1 at some point and find the elusive 1.
Assumptions:
The array is no true array. This is obvious given the problem. We got some class that behaves somewhat like an array.
The array is mostly hidden. The only public operations are [] and size().
The array is obfuscated. We cannot get any information by retrieving it's address and then analyze the memory at that position. Even if we iterate through the whole memory of our system, we can't do tricks due to some advanced cryptographic means.
Every field of the array has the same probability to be the first field that hosts the one.
We know the probabilities of how the one changes it's position when triggered.
Probability controlled algorithm:
Introduce another array of same size, the probability array (over double).
This array is initialized with all fields to be 1/size.
Every time we use [] on the base array, the probability array changes in this way:
The accessed position is set to zero (did not contain the one)
An entry becomes the sum of it's neighbors times the probability of that neighbor to jump to the entries position. (prob_array_next_it[i] = prob_array_last_it[i-1]*prob_jump_to_right + prob_array_last_it[i+1]*prob_jump_to_left + prob_array_last_it[i]*prob_dont_jump, different for i=0 and i=size-1 of course)
The probability array is normalized (setting one entry to zero set the sum of the probabilities to below one)
The algorithm accesses the field with the highest probability (chooses amongst those that have)
It might be able to optimize this by controlling the flow of probabilities, but that needs to be based on the wandering event and might require some research.
No algorithm that tries to solve this problem is guaranteed to terminate after some time. For a complexity, we would analyze the average case.
Example:
Jump probabilities are 1/3, nothing happens if trying to jump out of bounds
Initialize:
Hidden array: 0 0 1 0 0 0 0 0
Probability array: 1/8 1/8 1/8 1/8 1/8
1/8 1/8 1/8
First iteration: try [0] -> failure
Hidden array: 0 0 1 0 0 0 0 0 (no jump)
Probability array step 1: 0
1/8 1/8 1/8 1/8 1/8 1/8 1/8
Probability array step 2: 1/24 2/24 1/8
1/8 1/8 1/8 1/8 1/8
Probability array step 2: same normalized (whole array * 8/7):
1/21 2/21 1/7
1/7 1/7 1/7 1/7 1/7
Second iteration: try [2] as 1/7 is the maximum and this is the first field with 1/7 -> success (example should be clear by now, of course this might not work so fast on another example, had no interest of doing this for a lot of iterations since the probabilities would get cumbersome to compute by hand, would need to implement it. Note that if the one jumped to the left, we wouldn't have checked it so fast, even if it remained there for some time)

Extracting 2 numbers n times and placing back the addition in O(n) instead of O(n*log(n))

I'm presenting a problem my professor showed in class, with my O(n*log(n)) solution:
Given a list of n numbers we'd like to perform the following n-1 times:
Extract the two minimal elements x,y from the list and present them
Create a new number z , where z = x+y
Put z back into the list
Suggest a data structure and algorithm for O(n*log(n)) , and O(n)
Solution:
We'll use a minimal heap:
Creating the heap one time only would take O(n). After that, extracting the two minimal elements would take O(log(n)). Placing z into the heap would take O(log(n)).
Performing the above n-1 times would take O(n*log(n)), since:
O(n)+O(n∙(logn+logn ))=O(n)+O(n∙logn )=O(n∙logn )
But how can I do it in O(n)?
EDIT:
By saying: "extract the two minimal elements x,y from the list and present them ", I mean printf("%d,%d" , x,y), where x and y are the smallest elements in the current list.
This is not a full answer. But if the list was sorted, then your problem is easiy doable in O(n). To do it, arrange all of the numbers in a linked list. Maintain a pointer to a head, and somewhere in the middle. At each step, take the top two elements off of the head, print them, advance the middle pointer until it is where the sum should go, and insert the sum.
The starting pointer will move close to 2n times and the middle pointer will move about n times, with n inserts. All of those operations are O(1) so the sum total is O(n).
In general you cannot sort in time O(n), but there are a number of special cases in which you can. So in some cases it is doable.
The general case is, of course, not solvable in time O(n). Why not? Because given your output, in time O(n) you can run through the output of the program, build up the list of pairwise sums in order as you go, and filter them out of the output. What is left is the elements of the original list in sorted order. This would give a O(n) general sorting algorithm.
Update: I was asked to show how could you go from the output (10, 11), (12, 13), (14, 15), (21, 25), (29, 46) to the input list? The trick is that you always keep everything in order then you know how to look. With positive integers, the next upcoming sum to use will always be at the start of that list.
Step 0: Start
input_list: (empty)
upcoming sums: (empty)
Step 1: Grab output (10, 11)
input_list: 10, 11
upcoming_sums: 21
Step 2: Grab output (12, 13)
input_list: 10, 11, 12, 13
upcoming_sums: 21, 25
Step 3: Grab output (14, 15)
input_list: 10, 11, 12, 13, 14, 15
upcoming_sums: 21, 25, 29
Step 4: Grab output (21, 25)
input_list: 10, 11, 12, 13, 14, 15
upcoming_sum: 29, 46
Step 5: Grab output (29, 46)
input_list: 10, 11, 12, 13, 14, 15
upcoming_sum: 75
This isn't possible in the general case.
Your problem statement reads that you must reduce your array to a single element, performing a total of n-1 reduction operations. Therefore, the number of reduction operations performed is on the order of O(n). To achieve a overall running time of O(n), each reduction operation must run in O(1).
You have clearly defined your reduction operation:
remove the 2 minimal elements in the array and print them, then
insert the sum of those elements into the array.
If your data structure were a sorted list, it is trivial to remove two minimal elements in O(1) time (pop them off the end of the list). However, reinserting an element in O(1) is not possible (in the general case). As SteveJessop pointed out, if you could insert into a sorted list in O(1) time, the resultant operations would constitute an O(n) sorting algorithm. But there is no such known algorithm.
There are some exceptions here. If your numbers are integers, you may be able to use "radix insert" to achieve O(1) inserts. If your array of numbers are sufficiently sparse in the number line, you may be able to deduce insert points in O(1). There are numerous other exceptions, but they are all exceptions.
This answer doesn't answer your question, per se, but I believe it's relevant enough to warrant an answer.
If the range of values is less than n, then this can be solved in O(n).
1> Create an array mk of size equal to range of values and initialize it to all zero
2> traverse through the array and increment value of mk at the position of the array element.
i.e if the array element is a[i] then increment mk[a[i]]
3) For presenting the answers after each of the n-1 operations follow the following steps:
There are two cases:
Case 1 : all of a[i] are positive
traverse through mk array from 0 to its size
cnt = 0
do this till cnt doesn't equal 2
grab a nonzero element decrease its value by 1 and increment cnt by 1
you can get two minimum values in this way
present them
now do mk[sum of two minimum]++
Case 2 : some of a[i] is negative
<still to update>
O(nlogn) is easy - just use a heap, treap or skiplist.
O(n) sounds tough.
https://en.wikipedia.org/wiki/Heap_%28data_structure%29
https://en.wikipedia.org/wiki/Treap
https://en.wikipedia.org/wiki/Skip_list

Resources