I found a puzzle online on interviewStreet and tried to solve it as follows:
There is an infinite integer grid at which N people have their houses on. They decide to
unite at a common meeting place, which is someone's house. From any given cell, all 8
adjacent cells are reachable in 1 unit of time. eg: (x,y) can be reached from (x-1,y+1)
in a single unit of time. Find a common meeting place which minimizes the sum of the
travel times of all the persons.
I thought first about writing a solution in n² complexity in time, but the constraints are
1<=N<=10^5 and The absolute value of each co-ordinate in the input will be atmost 10^9
So, I changed my first approach and instead of looking at the problem with the distances and travel times, I looked at the different houses as different bodies with different weights. And instead of calculating all the distances, I look for the center of gravity of the group of bodies.
Here's the code of my "solve" function, vectorToTreat is an lengthX2 table storing all the data about the points on the grid and resul is the number to print to stdout:
long long solve(long long** vectorToTreat, int length){
long long resul = 0;
int i;
long long x=0;
long long y=0;
int tmpCur=-1;
long long tmp=-1;
for(i=0;i<length;i++){
x+=vectorToTreat[i][0];
y+=vectorToTreat[i][1];
}
x=x/length;
y=y/length;
tmp = max(absol(vectorToTreat[0][0]-x),absol(vectorToTreat[0][1]-y));
tmpCur = 0;
for(i=1;i<length;i++){
if(max(absol(vectorToTreat[i][0]-x),absol(vectorToTreat[i][1]-y))<tmp){
tmp = max(absol(vectorToTreat[i][0]-x),absol(vectorToTreat[i][1]-y));
tmpCur = i;
}
}
for(i=0;i<length;i++){
if(i!=tmpCur)
resul += max(absol(vectorToTreat[i][0]-vectorToTreat[tmpCur][0]),absol(vectorToTreat[i][1]-vectorToTreat[tmpCur][1]));
}
return resul;
}
The problem now is that I passed 12 official test cases over 13, and I don't see what I'm doing wrong, any ideas?
Thanks in advance.
AE
The key to this problem is the notion of centroid of a set of points. The meeting place is the closest house to the centroid for the set of points representing all the houses. With this approach you can solve the problem in linear time, i.e. O(N). I did it in Python, submitted my solution and passed all tests.
However, it is easy to build a data set for which the centroid approach does not work. Here's an example:
[(0, 0), (0, 1), (0, 2), (0, 3),
(1, 0), (1, 1), (1, 2), (1, 3),
(2, 0), (2, 1), (2, 2), (2, 3),
(3, 0), (3, 1), (3, 2), (3, 3),
(101, 101)]
The best solution is meeting at the house at (2, 2) and the cost is 121 (you can find this with exhaustive search - O(N^2)). However, the centroid approach gives a different result:
centroid is (7, 7)
closest house to centroid is (3, 3)
cost of meeting at (3, 3) is 132
Test cases on the web site are obviously shaped in a such a way that the centroid solution is OK, or perhaps they just wanted to figure out if you know about the notion of centroid.
I didn't read your code, but consider the following example:
2 guys live at (0, 0)
1 guy lives at (2, 0)
4 guys live at (3, 0)
The center of gravity is at (2, 0), with minimum total travel time of 8, but the optimum solution is at (3, 0) with minimum total travel time of 7.
Hello and thanks to you for your answers and comments, they were very helpful.
I finally gave up on my algorithm using the center of gravity, when I ran some samples on it, I noticed that when the houses are gathered in different villages with different distances between them, the algorithm does not work.
If we consider the example that #Rostor stated above:
(0,0), (1,0), (2000,0), (3000,0), (3001,0), (3002, 0), (3003, 0)
The algorithm using the center of gravity answers that the 3rd house is the solution, but the right answer is the 4th house.
The right notion to use in this kind of problems is the median, and adapt it to the dimensions wanted.
Here is a great article talking about The Geometric median, Hope it helps.
SOLUTION:
if all points are in line and people can move only in 2 driections (left and right)
sort points and calculate two arrays one if they move only left and other if they move only right.
add both vectors and find minimum to find solution
if people can move only 4 directions (left, down, up, right) you can apply same rule, all you need to support is when you sort in one axis you must be able to usort back, so when sorting you must also save sorting permutations
if people can move in 8 directions (as in question) you can use same algorhitm as when used in 4 directions (2. algorhitm ), since if you correctly observe movements you can see that it is possible to make same number of moves if everybody moves only diagonaly and there is no need for them to move left, right up and down , but only left-up, up-right, left-down and down-right if for each point (x,y) holds that (x+y) % 2 == 0 - imagine that grid is chessboard and houses are on black squares only
Before applying 2. algorhitm you have to make points tranformation so
(x,y) becomes (x+y,x-y) - this is rotations of points by 45 degrees. Then you apply 2. algorhitm and divide result by 2.
"...which is someone's house" means you pick an occupied house, not an arbitrary location.
Edit: oops, max(abs(a-A),abs(b-B)) replaces (abs(a-A)+abs(b-B)). See L_p space for more details when p->infinty.
The distance from (a,b) to (A,B) is max(abs(a-A),abs(b-B)). A brute force way is to compute the total travel time to meet at each occupied house, keeping track of the best meeting place so far.
This may take a while. The center-of-mass sorting may allow you to prioritize the search order. I see you are using the good center of mass calculation for this metric: to take a simple average of the first coordinate and a simple average of the second component.
If you think a bit about the distance function you get as travel time between (x1,y1) and (x2,y2)
def dist( (x1,y1), (x2,y2)):
dx = abs(x2-x1)
dy = abs(y2-y1)
return max(dx, dy)
You can see that if you make a sketch on a paper with a grid.
So you only have to iterate over each house, sum up the travel times of the others and take the house with the minum sum.
The full solution is
houses = [ (7,4), (1,1), (3,2), (-3, 2), (2,7), (8, 3), (10, 9) ]
def dist( (x1, y1), (x2, y2)):
dx = abs(x1-x2)
dy = abs(y1-y2)
return max(dx, dy)
def summed_time_to(p0, houses):
return sum(dist(p0, p1) for p1 in houses)
distances = [ (summed_time_to(p, houses), i) for i, p in enumerate(houses) ]
distances.sort()
min_dist = distances[0][0]
print "best houses are:"
for d, i in distances:
if d==min_dist:
print i, "at", houses[i]
I wrote a quick-and-dirty grid-distance tester in scala, which compares the average with the minimum of an exhaustive search:
class Coord (val x:Int, val y: Int) {
def delta (other: Coord) = {
val dx = math.abs (x - other.x)
val dy = math.abs (y - other.y)
List (dx, dy).max
}
override def toString = " (" + x + ":" + y + ") "
}
def run (M: Int) {
val r = util.Random
// reproducable set:
// r.setSeed (17)
val ucells = (1 to 2 * M).map (dummy => new Coord (r.nextInt (M), r.nextInt (M))).toSet take (M) toSeq
val cells = ucells.sortWith ((a,b) => (a.x < b.x || a.x == b.x && a.y <= b.y))
def distanceSum (lc: Seq[Coord], cell: Coord) = lc.map (c=> cell.delta (c)).sum
val exhaustiveSearch = for (x <- 0 to M-1;
y <- 0 to M-1)
yield (distanceSum (cells, new Coord (x, y)))
def sum (lc: Seq[Coord]) = ((0,0) /: lc) ((a, b) => (a._1 + b.x, a._2 + b.y))
def avg (lc: Seq[Coord]) = {
val s = sum (lc)
val l = lc.size
new Coord ((s._1 + l/2) / l, (s._2 + l/2) / l)
}
val av = avg (ucells)
val avgMethod = distanceSum (cells, av)
def show (cells : Seq[Coord]) {
val sc = cells.sortWith ((a,b) => (a.x < b.x || a.x == b.x && a.y <= b.y))
var idx = 0
print ("\t")
(0 to M).foreach (i => print (" " + (i % 10)))
println ()
for (x <- 0 to M-1) {
print (x + "\t")
for (y <- 0 to M -1) {
if (idx < M && sc (idx).x == x && sc (idx).y == y) {
print (" x")
idx += 1 }
else if (x == av.x && y == av.y) print (" A")
else print (" -")
}
println ()
}
}
show (cells)
println ("exhaustive Search: " + exhaustiveSearch.min)
println ("avgMethod: " + avgMethod)
exhaustiveSearch.sliding (M, M).toList.map (println)
}
Here is some sample output:
run (10)
0 1 2 3 4 5 6 7 8 9 0
0 - x - - - - - - - -
1 - - - - - - - - - -
2 - - - - - - - - - -
3 x - - - - - - - - -
4 - x - - - - - - - -
5 - - - - - - x - - -
6 - - - - A - - x - -
7 - x x - - - - - - -
8 - - - - - - - - - x
9 x - - - - - - - - x
exhaustive Search: 36
avgMethod: 37
Vector(62, 58, 59, 60, 62, 64, 67, 70, 73, 77)
Vector(57, 53, 50, 52, 54, 57, 60, 63, 67, 73)
Vector(53, 49, 46, 44, 47, 50, 53, 57, 63, 69)
Vector(49, 46, 43, 41, 40, 43, 47, 53, 59, 66)
Vector(48, 43, 41, 39, 37, 37, 43, 49, 56, 63)
Vector(47, 43, 39, 37, 36, 37, 39, 46, 53, 61)
Vector(48, 43, 39, 36, 37, 38, 40, 43, 51, 59)
Vector(50, 44, 40, 39, 38, 40, 42, 45, 49, 57)
Vector(52, 47, 44, 42, 42, 42, 45, 48, 51, 55)
Vector(55, 52, 49, 47, 46, 47, 48, 51, 54, 58)
The average isn't always the perfect position (as shown in this example), but you can follow the neighbours with even or better value, to find the best position. It is a good starting point, and I nether found a sample of a local optimum, where you get stuck. This could be essential for huge datasets.
But I don't have a prove whether this is always the case, and how to find the perfect position directly.
I tried to solve that using the method of geometric median. But only 11 of 13 test cases passed. This was my strategy.
1. finding centroid of a set of points.
2. then found the point closest to that centroid.
i even tried but only got pass through 4 of 13 test cases.
segmentation fault wat they say.
but i have made two arrays of 100001 each and some variables m using.
My algo.
find centroid of the given points.
find the point closest to centroid.
get the sum of all the distances using maximum(abs(a-A),abs(b-B)).
Related
So, I have this array:
numbers = [5, 9, 3, 19, 70, 8, 100, 2, 35, 27]
What I want to do is to create another array from this one, but now each value of this new array must be equal to the corresponding value
in the numbers array multiplied by the following.
For example: the first value of the new array should be 45, as it is the multiplication
of 5 (first value) and 9 (next value). The second value of the new array should be 27, as it is the multiplication of 9 (second
value) and 3 (next value), and so on. If there is no next value, the multiplication must be done by 2.
So, this array numbers should result in this other array: [45, 27, 57 ,1330, 560, 800, 200, 70, 945, 54]
I only managed to get to this code, but I'm having problems with index:
numbers = [5,9,3,19,70,8,100,2,35,27]
new_array = []
x = 0
while x <= 8: # Only got it to work until 8 and not the entire index of the array
new_array.append(numbers[x] * numbers[x + 1])
x += 1
print(new_array)
How can I make it work no matter what is index of the array and then if there's no next number, multiply it by 2? I've tried everything but this was the closest I could get.
Try:
numbers = [5, 9, 3, 19, 70, 8, 100, 2, 35, 27]
out = [a * b for a, b in zip(numbers, numbers[1:] + [2])]
print(out)
Prints:
[45, 27, 57, 1330, 560, 800, 200, 70, 945, 54]
Andrej Kesely's approach is totally fine and would be the way to go for an experienced python developer.
But I assume you are kind of new to python, so here is a more simple approach if you are a bit familiar with other programming languages:
#function called multiply, taking an int[], returns int[]
def multiply(values):
newData = []
valuesLength = len(values) - 1
for i in range(valuesLength):
newData.append(values[i] * values[i+1])
newData.append(values[valuesLength] * 2)
return newData
#init int[], calling multiply-function and printing the data
numbers = [5,9,3,19,70,8,100,2,35,27]
newData = multiply(numbers)
print(newData)
The multiply-Function basically initiates an empty array, then loops over the passed values, multiplying them with the following value, leaves the loop one value too early and finally adds the last value by multiplying it with 2.
With the same approach as you did, making use of len(numbers):
numbers = [5,9,3,19,70,8,100,2,35,27]
new_array = []
x = 0
while x < len(numbers):
nxt = 2 if x+1 >= len(numbers) else numbers[x+1]
new_array.append(numbers[x] * nxt)
x += 1
print(new_array)
NOTE: The shorthand for nxt = 2.... is explained in the first comment of this answer: https://stackoverflow.com/a/14461963/724039
I would like the increase the minimum 'distance' between values in an array. For example, if I have the array
44,45,47,51,65,66
I would like the minimum 'distance' to be 2. So, the desired output would be
44,46,48,51,65,67
I've tried doing something like
prevValue = array[0]
array.pop(0)
for a in array:
if(prevValue + 1 >= a):
a += 1
this wasn't the entire code as I had create temp arrays to not mess up the original one. But, this logic does not work.
Has anybody done anything similar? I was looking at np.arrange() but that wasn't the desired use case.
Thank you!
The line a += 1 only modifies your local variable named a; it would not modify any elements in a list. Here is one way to do what you want:
list1 = [44, 45, 47, 51, 65, 66]
list2 = []
prev = None
for a in list1:
if prev and prev + 1 >= a:
a = prev + 2
list2.append(a)
prev = a
print(list2)
They are actually called "lists", not "arrays", in Python. Using the correct search terms might help you find answers better on your own.
Try (assuming the lst is sorted):
lst = [44, 45, 47, 51, 65, 66]
distance = 2
for i in range(1, len(lst)):
diff = lst[i] - lst[i - 1]
if diff < distance:
lst[i] += distance - diff
print(lst)
Prints:
[44, 46, 48, 51, 65, 67]
For distance = 5:
[44, 49, 54, 59, 65, 70]
We have an array of N positive elements. We can perform M operations on this array. In each operation we have to select a subarray(contiguous) of length W and increase each element by 1. Each element of the array can be increased at most K times.
We have to perform these operations such that the minimum element in the array is maximized.
1 <= N, W <= 10^5
1 <= M, K <= 10^5
Time limit: 1 sec
I can think of an O(n^2) solution but it is exceeding time limit. Can somebody provide an O(nlogn) or better solution for this?
P.S.- This is an interview question
It was asked in a Google interview and I solved it by using sliding window, heap and increment in a range logic. I will solve the problem in 3 parts:
Finding out the minimum of every subarray of size W. This can be done in O(n) by using sliding window with priority queue. The maximum of every window must be inserted into a min-heap with 3 variable: [array_value, left_index, right_index]
Now, make auxiliary array initialised to 0 with of size N. Perform pop operation on heap M number of times and in each pop operation perform 3 task:
value, left_index, right_index = heap.pop() # theoretical function to pop minimum
Increment the value by 1,
increment by 1 in auxiliary array at left_index and decrement by 1 in
auxiliary array at right_index+1
Again insert this pair into heap. [with incremented value and same indexes]
After performing M operations traverse the given array with auxiliary array and add the cumulative sum till index 'i' to element at index 'i' in array.
Return minimum of array.
Time Complexity
O(N) <- for minimum element in every window + building heap.
O(M*logN) <- Extracting and inserting into heap.
O(N) <- For traversing to add cumulative sum.
So, overall is O(N + M*logN + N) which is O(M*logN)
Space Complexity
O(N) <- Extra array + heap.
Few things can be easily optimised above like inserting values in heap, only left_index can be inserted and as right_index = left_index + k.
My Code
from heapq import heappop, heappush
from collections import deque
def find_maximised_minimum(arr, n, m, k):
"""
arr -> Array, n-> Size of array
m -> increment operation that can be performed
k -> window size
"""
heap = []
q = deque()
# sliding window + heap building
for i in range(k):
while q and arr[q[-1]] > arr[i]:
q.pop()
q.append(i)
for i in range(k, n):
heappush(heap, [arr[q[0]], i - k, i - 1])
while q and q[0] <= i - k:
q.popleft()
while q and arr[q[-1]] > arr[i]:
q.pop()
q.append(i)
heappush(heap, [arr[q[0]], n - k, n - 1])
# auxiliary array
temp = [0 for i in range(n)]
# performing M increment operations
while m:
top = heappop(heap)
temp[top[1]] += 1
try:
temp[top[2] + 1] -= 1
except:
# when the index is last, so just ignore
pass
top[0] += 1
heappush(heap, top)
m -= 1
# finding cumulative sum
sumi = 0
for i in range(n):
sumi += temp[i]
arr[i] += sumi
print(min(arr))
if __name__ == '__main__':
# find([1, 2, 3, 4, 5, 6], 6, 5, 2)
# find([73, 77, 60, 100, 94, 24, 31], 7, 9, 1)
# find([24, 41, 100, 70, 97, 89, 38, 68, 41, 93], 10, 6, 5)
# find([88, 36, 72, 72, 37, 76, 83, 18, 76, 54], 10, 4, 3)
find_maximised_minimum([98, 97, 23, 13, 27, 100, 75, 42], 8, 5, 1)
What if we kept a copy of the array sorted ascending, pointing each element to its original index? Think about the order of priority when incrementing the elements. Also, does the final order of operations matter?
Once the lowest element reaches the next lowest element, what must then be incremented? And if we apply k operations to any one element does it matter in which w those increments were applied?
I have a table of values stored into a list of lists like:
A = [ [a[1],b[1],c[1]],
[a[2],b[2],c[2]],
...
[a[m],b[m],c[m]]]
with
a[i] < b[1]
b[i] < a[i+1]
0 < c[i] < 1
and a numpy array such as:
X = [x[1], x[2], ..., x[n]]
I need to create an array
Y = [y[1], y[2], ..., y[n]]
where each value of Y will correspond to
for i in [1,2, ..., n]:
for k in [1,2, ..., m]:
if a[k] < x[i] < b[k]:
y[i] = c[k]
else:
y[i] = 1
Please note that X and Y have the same length, but A is totally different. Y can take any value in the third column of A (c[k] for k= 1,2,... m), as long as a[k] < x[i] < b[k] is met (for k= 1,2,... m and for i= 1,2,... n).
In the actual cases I am working on, n = 6789 and m = 6172.
I could do the verification using nested "for" cycles, but it is really slow. What is the fastest way to accomplish this? what if X and Y where 2D numpy arrays?
SAMPLE DATA:
a = [10, 20, 30, 40, 50, 60, 70, 80, 90]
b = [11, 21, 31, 41, 51, 61, 71, 81, 91]
c = [ 0.917, 0.572, 0.993 , 0.131, 0.44, 0.252 , 0.005, 0.375, 0.341]
A = A = [[d,e,f] for d,e,f in zip(a,b,c)]
X = [1, 4, 10.2, 20.5, 25, 32, 41.3, 50.5, 73]
EXPECTED RESULTS:
Y = [1, 1, 0.993, 0.132, 1, 1, 1, 0.375, 1 ]
Approach #1: Using brute-force comparison with broadcasting -
import numpy as np
# Convert to numpy arrays
A_arr = np.array(A)
X_arr = np.array(X)
# Mask that represents "if a[k] < x[i] < b[k]:" for all i,k
mask = (A_arr[:,None,0]<X_arr) & (X_arr<A_arr[:,None,1])
# Get indices where the mask has 1s, i.e. the conditionals were satisfied
_,C = np.where(mask)
# Setup output numpy array and set values in it from third column of A
# that has conditionals satisfied for specific indices
Y = np.ones_like(X_arr)
Y[C] = A_arr[C,2]
Approach #2: Based on binning with np.searchsorted -
import numpy as np
# Convert A to 2D numpy array
A_arr = np.asarray(A)
# Setup intervals for binning later on
intv = A_arr[:,:2].ravel()
# Perform binning & get interval & grouped indices for each X
intv_idx = np.searchsorted(intv, X, side='right')
grp_intv_idx = np.floor(intv_idx/2).astype(int)
# Get mask of valid indices, i.e. X elements are within grouped intervals
mask = np.fmod(intv_idx,2)==1
# Setup output array
Y = np.ones(len(X))
# Extract col-3 elements with grouped indices and valid ones from mask
Y[mask] = A_arr[:,2][grp_intv_idx[mask]]
# Remove (set to 1's) elements that fall exactly on bin boundaries
Y[np.in1d(X,intv)] = 1
Please note that if you need the output as a list, you can convert the numpy array to a list with a call like this - Y.tolist().
Sample run -
In [480]: A
Out[480]:
[[139.0, 355.0, 0.5047342078960846],
[419.0, 476.0, 0.3593886192040009],
[580.0, 733.0, 0.3137694021600973]]
In [481]: X
Out[481]: [555, 689, 387, 617, 151, 149, 452]
In [482]: Y
Out[482]:
array([ 1. , 0.3137694 , 1. , 0.3137694 , 0.50473421,
0.50473421, 0.35938862])
With 1-d arrays, it's not too bad:
a,b,c = np.array(A).T
mask = (a<x) & (x<b)
y = np.ones_like(x)
y[mask] = c[mask]
If x and y are higher-dimensional, then your A matrix will also need to be bigger. The basic concept works the same, though.
I have an array of already ordered values (e.g. vec=[20, 54, 87, 233]). Array contains ~300 elements. I have a value, which I need to search in this array. The successful search is not only the exact value, but also +/- 5 digits within the range. For example, in this case values like 17 or 55 should be also considered as found. What is the most efficient way to do this? I used the loop like below, but I guess it does not take in the account that my array is already ordered. In addition, in case of non-empty I get to check manually how distant was the value because find does not return position. This is not a big problem since my "finds" are only 15%.
bRes = find(vec >= Value-5 & vec <= Value+5);
if ~isempty(bRes)
distGap = GetGapDetails(Value, vec);
return;
end
Thanks!
Vadim
The best way to search for a value in a list that is already sorted is a binary search, which takes only O(log(n)) time. This is better than comparing the value with every item in the list, which costs O(n). As far as I know, Matlab does not have a function to do exactly this. As already mentioned by Natan, you can (a)buse the built-in function histc for this, which is written in C and presumably does a binary search.
function good = is_within_range(value, vector, threshold)
% check that vector is sorted, comment this out for speed
assert(all(diff(vector) > 0))
assert(threshold > 0)
% pad vector with +- inf for histc
vector = [-inf, vector, inf];
% find index of value in vector, so that vector(ind) <= value < vector(ind+1)
% abuse histc, ignore bincounts
[~, ind] = histc(value, vector);
% check if we are within +- threshold from a value in vector,
% either below or above
good = (value <= vector(ind) + threshold) | value >= (vector(ind+1) - threshold);
Some quick tests:
>> is_within_range(0, [10, 30, 80], 5)
ans = 0
>> is_within_range(4, [10, 30, 80], 5)
ans = 0
>> is_within_range(5, [10, 30, 80], 5)
ans = 1
>> is_within_range(10, [10, 30, 80], 5)
ans = 1
>> is_within_range(15, [10, 30, 80], 5)
ans = 1
>> is_within_range(16, [10, 30, 80], 5)
ans = 0
>> is_within_range(31, [10, 30, 80], 5)
ans = 1
>> is_within_range(36, [10, 30, 80], 5)
ans = 0
And as a bonus, this function is vectorized, so you can test more than one value at the same time:
>> is_within_range([0, 4, 5, 10, 15, 16, 31, 36], [10, 30, 80], 5)
ans =
0 0 1 1 1 0 1 0
This will be somewhat more efficient:
bRes = vec >= Value-5 & vec <= Value+5;
if any(bRes) ...
You are right that MATLAB will likely not take advantage of the fact that 'vec' is already sorted. You could write a binary search to zero in on the range of interest (that is, work in O(log(N)) time rather than O(N) time), but with only 300 elements in the array, I suspect your current implementation will hold up well.
let's say your array is stored in var 'A' and your value is 'v':
A(A>v+5 || A<v-5)=[];