I need to design an algorithm that finds the k'th smallest element in unsorted array using function that called "MED3":
This function finds the n/3 (floor) and 2n/3 (ceil) elements of the array if it was sorted (very similar to median, but instead of n/2 it returns those values).
I thought about kind of partition around those 2 values, and than to continue like QuickSelect, but the problem is that "MED3" doesn't return indices of the 2 values, only the values.
for example, if the array is: 1, 2, 10, 1, 7, 6, 3, 4, 4 it returns 2 (n/3 value) and 4 (2n/3 value).
I also thought to run over the array and to take all the values between 2 and 4 (for example, in the given array above) to new array and then use "MED3" again, but can be duplicates (if the array is 2, 2, 2, 2, ..., 2 I would take all the elements each time).
Any ideas? I must use "MED3".
* MED3 is like a black box, it runs in linear time.
Thank you.
I think you're on the right track, but instead of taking 2 to 4, I'd suggest removing the first n/3 values that are <= MED3.floor() and the first n/3 values that are >= MED3.ceil(). That avoids issues with too many duplicates. If two passes/cycle aren't too expensive, you can remove all values < MED3.floor() + up to a total of n/3 values = MED3.floor() (do the same for ceil())
then repeat until you are at the k'th smallest target.
I am trying to solve a complex problem on HackerRank.com that involves creating a solution that accepts both small and large arrays of data ranging from 10 integers to 99,000 integers in length.
Find the problem here -> https://www.hackerrank.com/challenges/array-and-simple-queries
The Problem
How to put this simple is that I have take a array, copy a range of numbers from that array that the user specifies, then append it to a new array.
i = 2
j = 4
a = [1, 2, 3, 4, 5, 6, 7, 8]
for numbers in range(i, j + 1):
b.append(a[numbers - 1])
The range of numbers is appended to the b[] array. This should be 2, 3, 4 in the above example. Now I want to remove() the 2, 3, 4 from the a[] array. This is where I run into problems.
for numbers in range(i, j + 1):
a.remove(a[i-1])
This should remove numbers 2, 3, 4 and leave the a[] array as 1, 5, 6, 7, 8. This works in most cases as specified.
However, in larger arrays such as 500 in length. I see that a.remove() randomly removes numbers not in the range of i, j + 1.
Example
i = 239
j = 422
It removes a[47] and places it in another position as well removes i through j. I have NO IDEA why a[47] is being removed with the code specified above. Is remove() buggy?
What I Need Help On
I'm not trying to have the problem solved for me. I'm trying to understand why remove() is not working correctly. Logic says that it should not be removing anything from i through j, yet it is. Any help is greatly appreciated.
The .remove method on arrays doesn't remove elements by their index, but their value. If you want to delete part of the list, use the del operator (e.g. del a[5] to delete the sixth element, and del a[1:4] to delete the second, third, and fourth elements).
(As for solving this problem efficiently: if you look at the operations in reverse order, I think you don't have to actually manipulate an array.)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
Lets say i need to construct an algorithm that determines if two arrays containing only integers contains the same numbers. The order or number of each value doesn't matter.
For an example what im after:
{ 1, 2, 3, 4, 100, 120} { 1, 4, 100, 2, 120, 3} -> true
{ 2, 5, 8, -2, -2, 100, 102} { 2, 5, -2, 100, 102} -> false
What is a good and methodical way to approach this problem and writing it down into psuedo code?
For each array, make a sorted version and remove duplicates. Then, you can just compare these versions.
Alternatively, on a higher level, you may convert arrays into sets (unordered collections of unique elements), and then compare these sets.
Note I'm assuming that the cardinalities of matching elements must be the same in both arrays to call them a match. Your example does not make clear what to do in that case. #Gassa's solution is good, but there is a more general way to look at this. Convert each array to a multiset and then compare the multisets. #Gassa's sorted arrays are serving as easily comparable multisets. But others multiset representations are possible. A hash mapping elements to counts is another. Balanced trees are yet another.
In Python, you can use the hash of counts method like this:
def toCountHash(a):
h = {}
for x in a:
if x in h:
h[x] += 1
else:
h[x] = 1
h
def arraysHaveSameElements(a, b):
return toCountHash(a) == toCountHash(b)
A final optimization is to use one array to add elements to a single multiset and then use the other to remove them, returning "not equal" if removal fails or if the final result is not the emptyset.
The most simplest one (python ish pseudo code), try to read and understand whats going on in the code
count = 0 # keep a counter
for i in array A: # loop through items in A
for j in array B: # nested loop, for every items in A, check if it is in B
if i == j:
count = count + 1 # increment the counter by 1 if it is in B
if count == len(Array A): # check if the counter is the number of elements in A
return true # if so, return true
return false # false, otherwise
1) Check length for both, If not equal then return false & done i.e they are not equal
2) Otherwise grab one the array in temp variable and loop thru it while removing current index value from the other array.
3) Check if other array is empty. If yes then equal otherwise not equal
Make sure to use temp variables and copy your array, otherwise you may end up removing elements from the original array
Eg I have array (/1,3,4,5,7,9,11/), how do I remove its 3rd element? I couldn't find an array function which does that, nor I find a loop an elegant solution, since I don't know how to append to an array (this means, add an element next to the previous defined element.)
I want to remove all even elements from an array... I know there is only one.
I can find its index using MINLOC, but I don't know how to remove an element from array.
Let
a = (/1,3,4,5,7,9,11/)
then
pack(a,mod(a,2)/=0)
will return the odd elements of a. This isn't quite the same as removing the 3rd element, but your question suggests that removing the even element(s) is really what you want to do.
If you declare
integer, dimension(:), allocatable :: oddones
then
oddones = pack(a,mod(a,2)/=0)
will leave oddones containing the odd elements of a. You'll need an up-to-date compiler to use this automatic allocation.
Note that in Fortran, as in any sane language, arrays are of fixed size so removing an element isn't really supported. However, if a itself were allocatable then you could use a on the lhs of the expression. Let's leave it to the philosophers whether or not a remains the same under this operation.
I have not used FORTRAN in a while, but look at the functions CSHIFT and EOSHIFT. I think you can't change the size of the array.
If you work with a fixed maximum size, but want dynamic behaviour within, you can just shift the elements:
integer, parameter :: max=10
integer :: a(max)
integer :: length_a, index, i
length_a = 8
a(1:length_a)=[1, 3, 4, 5, 7, 9, 11, 13]
!remove (third) element
index = 3
do i=index,length_a-1
a(i)=a(i+1)
end do
length_a=length_a-1
The advantage is that this does not use an array temporary.
You could just do some slicing.
integer, parameter:: max = 10
integer used ! number of elements used
integer, dimension(1:max):: store
...
used = max
...
! Remove element n
if (n > 0 .and. n < used) then
store(n:(used - 1)) = store((n+1):used)
end if
if (n > 0) used = used - 1
A working example, thanks to this discussion. Thanks, Dick Hendrickson and Dave Frank.
Quote:
joel GUERRERO wrote
(Thursday, September 02, 2004 6:29 PM)
I have this question, how can I do to add a record or to delete a
record from a list or array n fortran 90??
That is, let suppose that I have the following array:
(1 4 3 9 10 2 15 8)
And I want to add the value 13 in the 4th position in order to obtain:
(1 4 3 13 9 10 2 15 8)
You asked above several weeks back, which led to my asking the newsgroup what syntax F2003 adds to facilitate this operation.
Dick Hendrickson's response indicates below would work using F2003
syntax.
integer,allocatable :: v(:)
v = [1, 4, 3, 9, 10, 2, 15, 8]
v = [v(:3), 13, v(4:)] ! insert 13 into expanded size v at v(4)
dave_frank 9/3/2004 8:07:59 AM
Working answer: as asked, removes 1 matching item only:
program hello
integer a(8)
integer b(7)
a=[1, 3, 4, 5, 7, 9, 11, 13]
index = minloc(a, dim=1, mask=(mod(a, 2) .eq. 0))
b=[a(1:index-1), a(index+1:size(a))]
print *, b
end program Hello
You may be able to loop, "while (index)" with additional edits as necessary, to remove all matches, if you like.
I have for example 5 arrays with some inserted elements (numbers):
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
I need to find most common elements in those arrays and every element should go all the way till the end (see example below). In this example that would be the bold combination (or the same one but with "30" on the end, it's the "same") because it contains the smallest number of different elements (only two, 4 and 2/30).
This combination (see below) isn't good because if I have for ex. "4" it must "go" till it ends (next array mustn't contain "4" at all). So combination must go all the way till the end.
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
EDIT2: OR
1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30
OR anything else is NOT good.
Is there some algorithm to speed this thing up (if I have thousands of arrays with hundreds of elements in each one)?
To make it clear - solution must contain lowest number of different elements and the groups (of the same numbers) must be grouped from first - larger ones to the last - smallest ones. So in upper example 4,4,4,2 is better then 4,2,2,2 because in first example group of 4's is larger than group of 2's.
EDIT: To be more specific. Solution must contain the smallest number of different elements and those elements must be grouped from first to last. So if I have three arrrays like
1,2,3
1,4,5
4,5,6
Solution is 1,1,4 or 1,1,5 or 1,1,6 NOT 2,5,5 because 1's have larger group (two of them) than 2's (only one).
Thanks.
EDIT3: I can't be more specific :(
EDIT4: #spintheblack 1,1,1,2,4 is the correct solution because number used first time (let's say at position 1) can't be used later (except it's in the SAME group of 1's). I would say that grouping has the "priority"? Also, I didn't mention it (sorry about that) but the numbers in arrays are NOT sorted in any way, I typed it that way in this post because it was easier for me to follow.
Here is the approach you want to take, if arrays is an array that contains each individual array.
Starting at i = 0
current = arrays[i]
Loop i from i+1 to len(arrays)-1
new = current & arrays[i] (set intersection, finds common elements)
If there are any elements in new, do step 6, otherwise skip to 7
current = new, return to step 3 (continue loop)
print or yield an element from current, current = arrays[i], return to step 3 (continue loop)
Here is a Python implementation:
def mce(arrays):
count = 1
current = set(arrays[0])
for i in range(1, len(arrays)):
new = current & set(arrays[i])
if new:
count += 1
current = new
else:
print " ".join([str(current.pop())] * count),
count = 1
current = set(arrays[i])
print " ".join([str(current.pop())] * count)
>>> mce([[1, 4, 8, 10], [1, 2, 3, 4, 11, 15], [2, 4, 20, 21], [2, 30]])
4 4 4 2
If all are number lists, and are all sorted, then,
Convert to array of bitmaps.
Keep 'AND'ing the bitmaps till you hit zero. The position of the 1 in the previous value indicates the first element.
Restart step 2 from the next element
This has now turned into a graphing problem with a twist.
The problem is a directed acyclic graph of connections between stops, and the goal is to minimize the number of lines switches when riding on a train/tram.
ie. this list of sets:
1,4,8,10 <-- stop A
1,2,3,4,11,15 <-- stop B
2,4,20,21 <-- stop C
2,30 <-- stop D, destination
He needs to pick lines that are available at his exit stop, and his arrival stop, so for instance, he can't pick 10 from stop A, because 10 does not go to stop B.
So, this is the set of available lines and the stops they stop on:
A B C D
line 1 -----X-----X-----------------
line 2 -----------X-----X-----X-----
line 3 -----------X-----------------
line 4 -----X-----X-----X-----------
line 8 -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----
If we consider that a line under consideration must go between at least 2 consecutive stops, let me highlight the possible choices of lines with equal signs:
A B C D
line 1 -----X=====X-----------------
line 2 -----------X=====X=====X-----
line 3 -----------X-----------------
line 4 -----X=====X=====X-----------
line 8 -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----
He then needs to pick a way that transports him from A to D, with the minimal number of line switches.
Since he explained that he wants the longest rides first, the following sequence seems the best solution:
take line 4 from stop A to stop C, then switch to line 2 from C to D
Code example:
stops = [
[1, 4, 8, 10],
[1,2,3,4,11,15],
[2,4,20,21],
[2,30],
]
def calculate_possible_exit_lines(stops):
"""
only return lines that are available at both exit
and arrival stops, discard the rest.
"""
result = []
for index in range(0, len(stops) - 1):
lines = []
for value in stops[index]:
if value in stops[index + 1]:
lines.append(value)
result.append(lines)
return result
def all_combinations(lines):
"""
produce all combinations which travel from one end
of the journey to the other, across available lines.
"""
if not lines:
yield []
else:
for line in lines[0]:
for rest_combination in all_combinations(lines[1:]):
yield [line] + rest_combination
def reduce(combination):
"""
reduce a combination by returning the number of
times each value appear consecutively, ie.
[1,1,4,4,3] would return [2,2,1] since
the 1's appear twice, the 4's appear twice, and
the 3 only appear once.
"""
result = []
while combination:
count = 1
value = combination[0]
combination = combination[1:]
while combination and combination[0] == value:
combination = combination[1:]
count += 1
result.append(count)
return tuple(result)
def calculate_best_choice(lines):
"""
find the best choice by reducing each available
combination down to the number of stops you can
sit on a single line before having to switch,
and then picking the one that has the most stops
first, and then so on.
"""
available = []
for combination in all_combinations(lines):
count_stops = reduce(combination)
available.append((count_stops, combination))
available = [k for k in reversed(sorted(available))]
return available[0][1]
possible_lines = calculate_possible_exit_lines(stops)
print("possible lines: %s" % (str(possible_lines), ))
best_choice = calculate_best_choice(possible_lines)
print("best choice: %s" % (str(best_choice), ))
This code prints:
possible lines: [[1, 4], [2, 4], [2]]
best choice: [4, 4, 2]
Since, as I said, I list lines between stops, and the above solution can either count as lines you have to exit from each stop or lines you have to arrive on into the next stop.
So the route is:
Hop onto line 4 at stop A and ride on that to stop B, then to stop C
Hop onto line 2 at stop C and ride on that to stop D
There are probably edge-cases here that the above code doesn't work for.
However, I'm not bothering more with this question. The OP has demonstrated a complete incapability in communicating his question in a clear and concise manner, and I fear that any corrections to the above text and/or code to accommodate the latest comments will only provoke more comments, which leads to yet another version of the question, and so on ad infinitum. The OP has gone to extraordinary lengths to avoid answering direct questions or to explain the problem.
I am assuming that "distinct elements" do not have to actually be distinct, they can repeat in the final solution. That is if presented with [1], [2], [1] that the obvious answer [1, 2, 1] is allowed. But we'd count this as having 3 distinct elements.
If so, then here is a Python solution:
def find_best_run (first_array, *argv):
# initialize data structures.
this_array_best_run = {}
for x in first_array:
this_array_best_run[x] = (1, (1,), (x,))
for this_array in argv:
# find the best runs ending at each value in this_array
last_array_best_run = this_array_best_run
this_array_best_run = {}
for x in this_array:
for (y, pattern) in last_array_best_run.iteritems():
(distinct_count, lengths, elements) = pattern
if x == y:
lengths = tuple(lengths[:-1] + (lengths[-1] + 1,))
else :
distinct_count += 1
lengths = tuple(lengths + (1,))
elements = tuple(elements + (x,))
if x not in this_array_best_run:
this_array_best_run[x] = (distinct_count, lengths, elements)
else:
(prev_count, prev_lengths, prev_elements) = this_array_best_run[x]
if distinct_count < prev_count or prev_lengths < lengths:
this_array_best_run[x] = (distinct_count, lengths, elements)
# find the best overall run
best_count = len(argv) + 10 # Needs to be bigger than any possible answer.
for (distinct_count, lengths, elements) in this_array_best_run.itervalues():
if distinct_count < best_count:
best_count = distinct_count
best_lengths = lengths
best_elements = elements
elif distinct_count == best_count and best_lengths < lengths:
best_count = distinct_count
best_lengths = lengths
best_elements = elements
# convert it into a more normal representation.
answer = []
for (length, element) in zip(best_lengths, elements):
answer.extend([element] * length)
return answer
# example
print find_best_run(
[1,4,8,10],
[1,2,3,4,11,15],
[2,4,20,21],
[2,30]) # prints [4, 4, 4, 30]
Here is an explanation. The ...this_run dictionaries have keys which are elements in the current array, and they have values which are tuples (distinct_count, lengths, elements). We are trying to minimize distinct_count, then maximize lengths (lengths is a tuple, so this will prefer the element with the largest value in the first spot) and are tracking elements for the end. At each step I construct all possible runs which are a combination of a run up to the previous array with this element next in sequence, and find which ones are best to the current. When I get to the end I pick the best possible overall run, then turn it into a conventional representation and return it.
If you have N arrays of length M, this should take O(N*M*M) time to run.
I'm going to take a crack here based on the comments, please feel free to comment further to clarify.
We have N arrays and we are trying to find the 'most common' value over all arrays when one value is picked from each array. There are several constraints 1) We want the smallest number of distinct values 2) The most common is the maximal grouping of similar letters (changing from above for clarity). Thus, 4 t's and 1 p beats 3 x's 2 y's
I don't think either problem can be solved greedily - here's a counterexample [[1,4],[1,2],[1,2],[2],[3,4]] - a greedy algorithm would pick [1,1,1,2,4] (3 distinct numbers) [4,2,2,2,4] (two distinct numbers)
This looks like a bipartite matching problem, but I'm still coming up with the formulation..
EDIT : ignore; This is a different problem, but if anyone can figure it out, I'd be really interested
EDIT 2 : For anyone that's interested, the problem that I misinterpreted can be formulated as an instance of the Hitting Set problem, see http://en.wikipedia.org/wiki/Vertex_cover#Hitting_set_and_set_cover. Basically the left hand side of the bipartite graph would be the arrays and the right hand side would be the numbers, edges would be drawn between arrays that contain each number. Unfortunately, this is NP complete, but the greedy solutions described above are essentially the best approximation.