Finding arrays that satisfy the average case - arrays

I'm not sure whether to post this is Mathematics or here, but it's Algorithms, so I'm going to try here.
Essentially, I have an algorithm to find the median value in an array. It is, in essence, the quick select algorithm.
What I want to do is find arrays of numbers that satisfy the average case. I.E, when the array length is 5, the average number of basic operations is 6. I want to find the relationship between the arrays that output 6, so that I can programatically build an array of length x, and get the number of operations
I've been generating a tonne of arrays to try and find a pattern by hand, and I can't see it. I have been using the permutations of {1,2,3,4,5}, and going higher than that and it becomes too unwieldy to look at (minimum 720 arrays), and lower there is not enough variation to find a pattern.
The way I found 6 basic operations was to run the algorithm over ever permutation of {1,2,3,4,5}, and output the result into a list, which I then piped into python and ran sum(list)/len(list). I then manually went through and found the arrays that output 6, and looked at them. First alone, and then in the group, to see if I could find any characteristics.
The first pivot is always zero.
I'm either looking for some kind of formula to generate these arrays, or a way to analyse the data to obtain the pattern.
Edit:
I should clarify that I am looking for a way to programmatically generate arrays that meets the criteria of 'average case' for the quick select.

Related

Is there an algorithm to check how similar are two 2d data sets?

I need help
First of all, I'm not looking if the 2 data sets are equal (A==B), or if the have similar features, because they are similar.
I have two 2D data sets (there are actually 2 vector fields), one is 'fixed' and the other is 'experimental', I want to know HOW MUCH equal they are. My thought is to get a number per point who say if they are equal in a range of values (0 to 1, including decimals). That is for make an iterative algorithm to find the best experimental data set who agrees with the fixed one... but first I need to find "how much equal they are"
It's like measure the error to minimize it
If one has |A| = |B| and the same (or close) sample points, one could use simply the standard deviation of each pair of |a-b|, where a \in A, b \in B, pairwise. One doesn't need a separate temporary array if you use a stable, on-line algorithm like Welford's, (just take the square root at the end to get the standard deviation.)

Algorithm - What is the best algorithm for detecting duplicate numbers in small array?

What is the best algorithm for detecting duplicate numbers in array, the best in speed, memory and avoiving overhead.
Small Array like [5,9,13,3,2,5,6,7,1] Note that 5 i dublicate.
After searching and reading about sorting algorithms, I realized that I will use one of these algorithms, Quick Sort, Insertion Sort or Merge Sort.
But actually I am really confused about what to use in my case which is a small array.
Thanks in advance.
To be honest, with that size of array, you may as well choose the O(n2) solution (checking every element against every other element).
You'll generally only need to worry about performance if/when the array gets larger. For small data sets like this, you could well have found the duplicate with an 'inefficient' solution before the sort phase of an efficient solution will have finished :-)
In other words, you can use something like (pseudo-code):
for idx1 = 0 to nums.len - 2 inclusive:
for idx2 = idx1 + 1 to nums.len - 1 inclusive:
if nums[idx1] == nums[idx2]:
return nums[idx1]
return no dups found
This finds the first value in the array which has a duplicate.
If you want an exhaustive list of duplicates, then just add the duplicate value to another (initially empty) array (once only per value) and keep going.
You can sort it using any half-decent algorithm though, for a data set of the size you're discussing, even a bubble sort would probably be adequate. Then you just process the sorted items sequentially, looking for runs of values but it's probably overkill in your case.
Two good approaches depend on the fact that you know or not the range from which numbers are picked up.
Case 1: the range is known.
Suppose you know that all numbers are in the range [a, b[, thus the length of the range is l=b-a.
You can create an array A the length of which is l and fill it with 0s, thus iterate over the original array and for each element e increment the value of A[e-a] (here we are actually mapping the range in [0,l[).
Once finished, you can iterate over A and find the duplicate numbers. In fact, if there exists i such that A[i] is greater than 1, it implies that i+a is a repeated number.
The same idea is behind counting sort, and it works fine also for your problem.
Case 2: the range is not known.
Quite simple. Slightly modify the approach above mentioned, instead of an array use a map where the keys are the number from your original array and the values are the times you find them. At the end, iterate over the set of keys and search those that have been found more then once.
Note.
In both the cases above mentioned, the complexity should be O(N) and you cannot do better, for you have at least to visit all the stored values.
Look at the first example: we iterate over two arrays, the lengths of which are N and l<=N, thus the complexity is at max 2*N, that is O(N).
The second example is indeed a bit more complex and dependent on the implementation of the map, but for the sake of simplicity we can safely assume that it is O(N).
In memory, you are constructing data structures the sizes of which are proportional to the number of different values contained in the original array.
As it usually happens, memory occupancy and performance are the keys of your choice. Greater the former, better the latter and vice versa. As suggested in another response, if you know that the array is small, you can safely rely on an algorithm the complexity of which is O(N^2), but that does not require memory at all.
Which is the best choice? Well, it depends on your problem, we cannot say.

How to find minimum element in an array repeatedly but between different indices input by user.

An array of n numbers is given. Number of times minimum to be found out is given,let it be p, Indices are also given for each case repeatedly . I traversed the array to find min in array between given indices and repeated this procedure p times using for loop but I want it to be more efficient, How can I do so?
What you need, is to use some efficient algorithm for Range Minimum Query problem. Please follow the provided link. There you will find a comprehensive explanation how to do this.
Depends on the number of queries, the kind of queries etc.
Without any such information, taking an O(n) time pre-processing hit for Range Minimum Query setup will make each of the min queries O(1).
Depending on the expected query patterns etc, there are tradeoffs that you can make.

determine if an array has the numbers a to b each once [duplicate]

This question already has answers here:
How to tell if an array is a permutation in O(n)?
(16 answers)
Closed 9 years ago.
Given an array A of size n, and two numbers a and b with b-a+1=n, I need to determine whether or not A contains each of the numbers between a and b (exactly once).
For example, if n=4 and a=1,b=4, then I'm looking to see if A is a rearrangement of [1,2,3,4].
In practice, I need to do this with O(1) space (no hash table).
My first idea was to sort A, but I have to do this without rearranging A, so that's out.
My second idea is to run through A once, adding up the entries and checking that they are in the correct range. At the end, I have to get the right sum (for a=1,b=n, this is n(n+1)/2), but this doesn't always catch everything, e.g. [1,1,4,4,5] passes the test for n=5,a=1,b=5, but shouldn't.
The only idea of mine that works is to pass through the array n times making sure to see each number once and only once. Is there a faster solution?
You can do this with a single pass through the array, using only a minor modification of the n(n+1)/2 method you already mentioned.
To do so, walk through the array, ignoring elements outside the a..b range. For numbers that are in the correct range, you want to track three values: the sum of the numbers, the sum of the squares of the numbers, and the count of the numbers.
You can pre-figure the correct values for both the sum of numbers and the sum of the squares (and, trivially, the count).
Then compare your result to the expected results. Consider, for example, if you're searching for 1, 2, 3, 4. If you used only the sums of the numbers, then [1, 1, 4, 4] would produce the correct result (1+2+3+4 = 10, 1+1+4+4 = 10), but if you also add the sums of the squares, the problem is obvious: 1+4+9+16 = 30 but 1+1+16+16 = 34.
This is essentially applying (something at least very similar to) a Bloom filter to the problem. Given a sufficiently large group and a fixed pair of functions, there's going to be some set of incorrect inputs that will produce the correct output. You can reduce that possibility to an arbitrarily low value by increasing the number of filters you apply. Alternatively, you can probably design an adaptive algorithm that can't be fooled--offhand, it seems like if your range of inputs is N, then raising each number to the power N+1 will probably assure that you can only get the correct result with exactly the correct inputs (but I'll admit, I'm not absolutely certain that's correct).
Here is a O(1) space and O(n) solution that might help :-
Find the mean and standard deviation in range (a,b)
Scan the array and find mean and standard deviation.
if any number is outside (a,b) return false
if(mean1!=mean2 || sd1!=sd2) return false else true.
Note : I might not be 100% accurate.
Here's a solution that fails with the probability of a hash collision.
Take an excellent (for example cryptographic) hash function H.
Compute: xor(H(x) for x in a...b)
Compute: xor(H(A[i]) for i in 1...n)
If the two are the different, then for sure you don't have a permutation. If the two are the same, then you've almost certainly got a permutation. You can make this immune to input that's been picked to produce a hash collision by including a random seed into the hash.
This is obviously O(b-a) in running time, needs O(1) external storage, and trivial to implement.

How to know if an array is sorted?

I already read this post but the answer didn't satisfied me Check if Array is sorted in Log(N).
Imagine I have a serious big array over 1,000,000 double numbers (positive and/or negative) and I want to know if the array is "sorted" trying to avoid the max numbers of comparisons because comparing doubles and floats take too much time. Is it possible to use statistics on It?, and if It was:
It is well seen by real-programmers?
Should I take samples?
How many samples should I take
Should they be random, or in a sequence?
How much is the %error permitted to say "the array sorted"?
Thanks.
That depends on your requirements. If you can say that if 100 random samples out of 1.000.000 is enough the assume it's sorted - then so it is. But to be absolutely sure, you will always have to go through every single entry. Only you can answer this question since only you know how certain you need to be about it being sorted.
This is a classic probability problem taught in high school. Consider this question:
What is the probability that the batch will be rejected?
In a batch of 8,000, clocks 7% are defective. A random sample of 10 (without replacement) from the 8,000 is selected and tested. If at least one is defective the entire batch will be rejected.
So you can take a number of random samples from your large array and see if it's sorted, but you must note that you need to know the probability that the sample is out of order. Since you don't have that information, a probabilistic approach wouldn't work efficiently here.
(However, you can check 50% of the array and naively conclude that there is a 50% chance that it is sorted correctly.)
If you run a divide and conquer algorithm using multiprocessing (real parallelism, so only for multi-core CPUs) you can check whether an array is sorted or not in Log(N).
If you have GPU multiprocessing you can achieve Log(N) very easily since modern graphics card are able to run few thousands processes in parallel.
Your question 5 is the question that you need to answer to determine the other answers. To ensure the array is perfectly sorted you must go through every element, because any one of them could be the one out of place.
The maximum number of comparisons to decide whether the array is sorted is N-1, because there are N-1 adjacent number pairs to compare. But for simplicity, we'll say N as it does not matter if we look at N or N+1 numbers.
Furthermore, it is unimportant where you start, so let's just start at the beginning.
Comparison #1 (A[0] vs. A[1]). If it fails, the array is unsorted. If it succeeds, good.
As we only compare, we can reduce this to the neighbors and whether the left one is smaller or equal (1) or not (0). So we can treat the array as a sequence of 0's and 1's, indicating whether two adjacent numbers are in order or not.
Calculating the error rate or the propability (correct spelling?) we will have to look at all combinations of our 0/1 sequence.
I would look at it like this: We have 2^n combinations of an array (i.e. the order of the pairs, of which only one is sorted (all elements are 1 indicating that each A[i] is less or equal to A[i+1]).
Now this seems to be simple:
initially the error is 1/2^N. After the first comparison half of the possible combinations (all unsorted) get eliminated. So the error rate should be 1/2^n + 1/2^(n-1).
I'm not a mathematician, but it should be quite easy to calculate how many elements are needed to reach the error rate (find x such that ERROR >= sum of 1/2^n + 1/2^(n-1)... 1/^(2-x) )
Sorry for the confusing english. I come from germany..
Since every single element can be the one element that is out-of-line, you have to run through all of them, hence your algorithm has runtime O(n).
If your understanding of "sorted" is less strict, you need to specify what exaclty you mean by "sorted". Usually, "sorted" means that adjacent elements meet a less or less-or-equal condition.
Like everyone else says, the only way to be 100% sure that it is sorted is to run through every single element, which is O(N).
However, it seems to me that if you're so worried about it being sorted, then maybe having it sorted to begin with is more important than the array elements being stored in a contiguous portion in memory?
What I'm getting at is, you could use a map whose elements by definition follow a strict weak ordering. In other words, the elements in a map are always sorted. You could also use a set to achieve the same effect.
For example: std::map<int,double> collectoin; would allow you to almost use it like an array: collection[0]=3.0; std::cout<<collection[0]<<std:;endl;. There are differences, of course, but if the sorting is so important then an array is the wrong choice for storing the data.
The old fashion way.Print it out and see if there in order. Really if your sort is wrong you would probably see it soon. It's more unlikely that you would only see a few misorders if you were sorting like 100+ things. When ever I deal with it my whole thing is completely off or it works.
As an example that you probably should not use but demonstrates sampling size:
Statistically valid sample size can give you a reasonable estimate of sortedness. If you want to be 95% certain eerything is sorted you can do that by creating a list of truly random points to sample, perhaps ~1500.
Essentially this is completely pointless if the list of values being out of order in one single place will break subsequent algorithms or data requirements.
If this is a problem, preprocess the list before your code runs, or use a really fast sort package in your code. Most sort packages also have a validation mode, where it simply tells you yes, the list meets your sort criteria - or not. Other suggestions like parallelization of your check with threads are great ideas.

Resources