Related
I have been working on this challenge: Count Triplets, and after a lot of hard work, my algorithm did not work out for every test case.
Since in the discussion, I have seen a code and tried to find out the real functionality of the code, I am still not able to understand how, this code works.
Solution:
from collections import defaultdict
arr = [1,3,9,9,27,81]
r = 3
v2 = defaultdict(int)
v3 = defaultdict(int)
count = 0
for k in arr:
count += v3[k]
v3[k*r] += v2[k]
v2[k*r] += 1
print(count)
The above code works for every test case perfectly. I have tested for value of k, v2, v3 to understand but still don't understand how the code works so smooth with the counting triplets. I cannot think of that solution in my dreams too. I wonder how people are so smart to work out this solution. Nevertheless, I would be glad if I would get the proper explanation. Thanks
Output for k,v2,v3
from collections import defaultdict
arr = [1,3,9,9,27,81]
r = 3
v2 = defaultdict(int)
v3 = defaultdict(int)
count = 0
for k in arr:
count += v3[k]
v3[k*r] += v2[k]
v2[k*r] += 1
print(k, count, v2, v3)
OUTPUT
1 0 defaultdict(<class 'int'>, {1: 0, 3: 1}) defaultdict(<class 'int'>, {1: 0, 3: 0})
3 0 defaultdict(<class 'int'>, {1: 0, 3: 1, 9: 1}) defaultdict(<class 'int'>, {1: 0, 3: 0, 9: 1})
9 1 defaultdict(<class 'int'>, {27: 1, 1: 0, 3: 1, 9: 1}) defaultdict(<class 'int'>, {27: 1, 1: 0, 3: 0, 9: 1})
9 2 defaultdict(<class 'int'>, {27: 2, 1: 0, 3: 1, 9: 1}) defaultdict(<class 'int'>, {27: 2, 1: 0, 3: 0, 9: 1})
27 4 defaultdict(<class 'int'>, {27: 2, 1: 0, 3: 1, 81: 1, 9: 1}) defaultdict(<class 'int'>, {27: 2, 1: 0, 3: 0, 81: 2, 9: 1})
81 6 defaultdict(<class 'int'>, {1: 0, 3: 1, 243: 1, 81: 1, 9: 1, 27: 2}) defaultdict(<class 'int'>, {1: 0, 3: 0, 243: 1, 81: 2, 9: 1,
27: 2})
1. The problem
The function has two parameters, namely:
arr: an array of integers
r: an integer, the common ratio
So, the input can be something like
arr: [1, 2, 2, 4]
r: 2
The goal is to return the count of triplets that form a geometric progression.
2. How to solve it
To solve it there's variou ways. For instances, from SagunB based on the comment from RobertsN
Can be done in O(n) -> single pass through data
No division necessary and single multiplications by R are all that's needed
Using map(C++) or dict(Java, Python) is a must -> can be unordered map (saves O(logN))
Try to think forward when reading a value -> will this value form part of a triplet later?
No need to consider (R == 1) as a corner case
from collections import Counter
# Complete the countTriplets function below.
def countTriplets(arr, r):
r2 = Counter()
r3 = Counter()
count = 0
for v in arr:
if v in r3:
count += r3[v]
if v in r2:
r3[v*r] += r2[v]
r2[v*r] += 1
return count
Or like you said
from collections import defaultdict
# Complete the countTriplets function below.
def countTriplets(arr, r):
v2 = defaultdict(int)
v3 = defaultdict(int)
count = 0
for k in arr:
count += v3[k]
v3[k*r] += v2[k]
v2[k*r] += 1
return count
3. End result
Both cases will pass all the current 13 Test cases in HackerRank
4. Explanation of your case
Comments from RobertsN pretty much explain your code (which is very similar to yours). Still, for a better clarification to understand how the code works, just print the what happens to count, v2 and v3.
Assuming you'll have as input
4 2
1 2 2 4
The expected output is
2
Also, we know that by definition both v2 and v3 will look like
defaultdict(<class 'int'>, {})
which leaves the for loop left to understand. What can cause some confusion there is the operator += but that was already addressed by me in another answer.
So, now to understand the rest we can change the loop to
for k in arr:
print(f"Looping...")
print(f"k: {k}")
print(f"v3_before_count: {v3}")
count += v3[k]
print(f"count: {count}")
print(f"k*r: {k*r}")
print(f"v3_before: {v3}")
v3[k*r] += v2[k]
print(f"v3[k*r]: {v3[k*r]}")
print(f"v2[k]: {v2[k]}")
print(f"v3_after: {v3}")
print(f"v2_before: {v2}")
v2[k*r] += 1
print(f"v2_after: {v2}")
print(f"v2[k*r]: {v2[k*r]}")
Will allow you to see
Looping...
k: 1
v3_before_count: defaultdict(<class 'int'>, {})
count: 0
k*r: 2
v3_before: defaultdict(<class 'int'>, {1: 0})
v2_before_v3: defaultdict(<class 'int'>, {1: 0})
v3[k*r]: 0
v2[k]: 0
v3_after: defaultdict(<class 'int'>, {1: 0, 2: 0})
v2_before: defaultdict(<class 'int'>, {1: 0})
v2_after: defaultdict(<class 'int'>, {1: 0, 2: 1})
v2[k*r]: 1
Looping...
k: 2
v3_before_count: defaultdict(<class 'int'>, {1: 0, 2: 0})
count: 0
k*r: 4
v3_before: defaultdict(<class 'int'>, {1: 0, 2: 0})
v2_before_v3: defaultdict(<class 'int'>, {1: 0, 2: 0})
v3[k*r]: 1
v2[k]: 1
v3_after: defaultdict(<class 'int'>, {1: 0, 2: 0, 4: 1})
v2_before: defaultdict(<class 'int'>, {1: 0, 2: 1})
v2_after: defaultdict(<class 'int'>, {1: 0, 2: 1, 4: 1})
v2[k*r]: 1
Looping...
k: 2
v3_before_count: defaultdict(<class 'int'>, {1: 0, 2: 0, 4: 1})
count: 0
k*r: 4
v3_before: defaultdict(<class 'int'>, {1: 0, 2: 0, 4: 1})
v2_before_v3: defaultdict(<class 'int'>, {1: 0, 2: 0, 4: 1})
v3[k*r]: 2
v2[k]: 1
v3_after: defaultdict(<class 'int'>, {1: 0, 2: 0, 4: 2})
v2_before: defaultdict(<class 'int'>, {1: 0, 2: 1, 4: 1})
v2_after: defaultdict(<class 'int'>, {1: 0, 2: 1, 4: 2})
v2[k*r]: 2
Looping...
k: 4
v3_before_count: defaultdict(<class 'int'>, {1: 0, 2: 0, 4: 2})
count: 2
k*r: 8
v3_before: defaultdict(<class 'int'>, {1: 0, 2: 0, 4: 2})
v2_before_v3: defaultdict(<class 'int'>, {1: 0, 2: 0, 4: 2})
v3[k*r]: 2
v2[k]: 2
v3_after: defaultdict(<class 'int'>, {1: 0, 2: 0, 4: 2, 8: 2})
v2_before: defaultdict(<class 'int'>, {1: 0, 2: 1, 4: 2})
v2_after: defaultdict(<class 'int'>, {1: 0, 2: 1, 4: 2, 8: 1})
v2[k*r]: 1
and extract the desired illations. What can we observe from that?
count increases in the last loop from 0 to 2.
k goes through all values of the arr - so it'll be 1, 2, 2 and 4.
in the initial loop, v3_before_count is {} and v3_before is {1:0}
etc.
Most likely this process will lead to questions and answering them will leave you closer to understand it.
So the code is tracking potential pairs and triplets as it walks through the array.
For each value in the array:
// Increment count by the number of triplets that end with k
count += v3[k]
// Increment the number of potential triplets that will end with k*r
v3[k*r] += v2[k]
// Increment the number of potential pairs that end with k*r
v2[k*r] += 1
The number of triplets for any given k is the number of pairs for any given k/r that we've encountered up to this point.
Note throughout the loop, v3[k] and v2[k] will often be zero, until they hit our predicted k*r value from a previous iteration.
I have been trying to make sense of it and finally, this C# code should be clear to follow
static long countTriplets(List<long> arr, long r)
{
//number of times we encounter key*r
var doubles = new Dictionary<long, long>();
//number of times we encounter a triplet
var triplets = new Dictionary<long, long>();
long count = 0;
foreach (var key in arr)
{
long keyXr = key * r;
if (triplets.ContainsKey(key))
count += triplets[key];
if (doubles.ContainsKey(key))
{
if (triplets.ContainsKey(keyXr))
triplets[keyXr] += doubles[key];
else
triplets.Add(keyXr, doubles[key]);
}
if (doubles.ContainsKey(keyXr))
doubles[keyXr]++;
else
doubles.Add(keyXr, 1);
}
return count;
}
from collections import defaultdict
arr = [1,3,9,9,27,81]
r = 3
v2 = defaultdict(int) #if miss get 0
v3 = defaultdict(int) #if miss get 0
count = 0`enter code here`
for k in arr:
#updating the count, starts propagating with delay=2 elements
count += v3[k]
# number of triplets with last component ending
# on index i of k in array
v3[k*r] += v2[k]
# number of pairs with last component ending
# on index i of k in array
v2[k*r] += 1
print(count)
Best to understand it on example - suppose we have array 11111,
and we are on i=3, so 111>1<1.
v2 has currently count for 111, 11>1< there are two pairs ending with >1< generally n-1 for length(array)=n.
Now at v3 we construct count recursively from v2, as follows: for each pair created and counted with v2 we assign last component there are n such options
for #pairs = n.
So for i=3:
11.1 (v2=1) //this pair remains by sum
+
.111 (v2=2) //these are new
1.11 (v2=2)
Hope this helps!
Potential value of number X : the number of triplets there are if any number uses X as the precedence to completely form a triplet.
Let take an example: 1 2 4 with r = 2.
S1: with 1: no triplet cause 1/2=0.5 and 1/2/2=0.25 not available. Add 1 to hashmap.
S2: with 2: 1 potential triplet can be formed if the final number is reached (the 4). Add 2 to hashmap.
S3: with 4: 1 potential triplet can be form if the final number is reached (the 8). Add 4 to hashmap. At the same time we have 1 triplet because 4/2 & 4/2/2 exist in the hashmap.
But how do we know there only 1? Because in order to reach to number 4 of a triplet, you must go through number 2, and we only has 1 number 2 before number 4.
So total is 1. Easy.
What if the input is 1 2 2 2 4?
we have the potential: 1: 0; 2: 1 number 1; 4: 3 number 2 => 3 triplets
Let add 1 to the input, we have: 1 1 2 2 2 4 with r=2
With the 1st 2, we have 2 potential triplet because there are 2 number 1 before it.
With the 2nd 2, we have double
With the 3rd 2, we have triple
So the total is 2(number 1) x 3 (number 2) = 6 potential triplet
And when the index reached the number 4, similar to Step 3 above, we have total triplet is 6.
This is demonstration of 2 hashmap we iterated through the array:
Input
Potential
Count
1 1 2 2 2 4
{1:0, 2:6, 4:3}
{1:2, 2:3, 4:1}
1 1 2 2 2 4 8 16
{1:0, 2:6, 4:3, 8:1, 16:1}
{1:2, 2:3, 4:1, 8:1, 16:1 }
As the second input in table above, we can say:
with triplet number (1,2,4) we have 6 triplets (potential at number 2)
with triplet number (2,4,8) we have 3 triplets (potential at number 4)
with triplet number (4,8,16) we have 1 triplets (potential at number 8)
SO total is 10 triplets.
And this is my javascript solution
function countTriplets(arr, r) {
const numberAtThisPointOf = {};
const potentialTripletOf = {};
let total = 0;
for (let i = 0; i < arr.length; i++) {
const key = arr[i];
if (numberAtThisPointOf[key] === undefined) {
potentialTripletOf[key] = 0;
numberAtThisPointOf[key] = 0;
}
// if key is final part of a triplet, mean the other 2 smaller numbers exist, the `key % r === 0` & `(key/r) % r === 0` to avoid decimal number in javascript
if (key % r === 0 && numberAtThisPointOf[key/r] !== undefined & numberAtThisPointOf[key/r/r] !== undefined && (key/r) % r === 0) {
total += potentialTripletOf[key/r];
}
// update potential case of current key
if (numberAtThisPointOf[key/r] !== undefined && key % r === 0) {
potentialTripletOf[key] += numberAtThisPointOf[key/r];
}
numberAtThisPointOf[key]++;
}
return total;
}
Can be thoughts like below.
Geometric progression is form of : A, AR , ARR,....
Now consider if element in arr is :
element == ARR or third term of triplet means we have completed the triplet hence update the count.
element == AR or second term of triplet so the next element in GP will be(element multiplied by R ) ARR and to be updated in ARR or r3 dictionary .
element == A or first term so next element in GP will be (element multiplied R) AR hence to be updated in AR or r2 dictionary.
Given arrays (say row vectors) A and B, how do I find an array C such that merging B and C will give A?
For example, given
A = [2, 4, 6, 4, 3, 3, 1, 5, 5, 5];
B = [2, 3, 5, 5];
then
C = multiset_diff(A, B) % Should be [4, 6, 4, 3, 1, 5]
(the order of the result does not matter here).
For the same A, if B = [2, 4, 5], then the result should be [6, 4, 3, 3, 1, 5, 5].
(Since there were two 4s in A and one 4 in B, the result C should have 2 - 1 = 1 4 in it. Similarly for the other values.)
PS: Note that setdiff would remove all instances of 2, 3, and 5, whereas here they need to be removed just however many times they appear in B.
Performance: I ran some quick-n-dirty benchmarks locally, here are the results for future reference:
#heigele's nested loop method performs best for small lengths of A (say upto N = 50 or so elements). It does 3x better for small (N=20) As, and 1.5x better for medium-sized (N=50) As, compared to the next best method - which is:
#obchardon's histc-based method. This is the one performs the best when A's size N starts to be 100 and above. For eg., this does 3x better than the above nested loop method when N = 200.
#matt's for+find method does comparably to the histc method for small N, but quickly degrades in performance for larger N (which makes sense since the entire C == B(x) comparison is run every iteration).
(The other methods are either several times slower or invalid at the time of writing.)
Still another approach using the histc function:
A = [2, 4, 6, 4, 3, 3, 1, 5, 5, 5];
B = [2, 3, 5, 5];
uA = unique(A);
hca = histc(A,uA);
hcb = histc(B,uA);
res = repelem(uA,hca-hcb)
We simply calculate the number of repeated elements for each vectors according to the unique value of vector A, then we use repelem to create the result.
This solution do not preserve the initial order but it don't seems to be a problem for you.
I use histc for Octave compatibility, but this function is deprecated so you can also use histcounts
Here's a vectorized way. Memory-inefficient, mostly for fun:
tA = sum(triu(bsxfun(#eq, A, A.')), 1);
tB = sum(triu(bsxfun(#eq, B, B.')), 1);
result = setdiff([A; tA].', [B; tB].', 'rows', 'stable');
result = result(:,1).';
The idea is to make each entry unique by tagging it with an occurrence number. The vectors become 2-column matrices, setdiff is applied with the 'rows' option, and then the tags are removed from the result.
You can use the second output of ismember to find the indexes where elements of B are in A, and diff to remove duplicates:
This answer assumes that B is already sorted. If that is not the case, B has to be sorted before executing above solution.
For the first example:
A = [2, 4, 6, 4, 3, 3, 1, 5, 5, 5];
B = [2, 3, 5, 5];
%B = sort(B); Sort if B is not sorted.
[~,col] = ismember(B,A);
indx = find(diff(col)==0);
col(indx+1) = col(indx)+1;
A(col) = [];
C = A;
>>C
4 6 4 3 1 5
For the second example:
A = [2, 4, 6, 4, 3, 3, 1, 5, 5, 5];
B = [2, 4, 5, 5];
%B = sort(B); Sort if B is not sorted.
[~,col] = ismember(B,A);
indx = find(diff(col)==0);
col(indx+1) = col(indx)+1;
A(col) = [];
C = A;
>>C
6 4 3 3 1 5
I'm not a fan of loops, but for random perturbations of A this was the best I came up with.
C = A;
for x = 1:numel(B)
C(find(C == B(x), 1, 'first')) = [];
end
I was curious about looking at the affect of different orders of A on a solution approach so I setup a test like this:
Ctruth = [1 3 3 4 5 5 6];
for testNumber = 1:100
Atest = A(randperm(numel(A)));
C = myFunction(Atest,B);
C = sort(C);
assert(all(C==Ctruth));
end
Strongly inspired by Matt, but on my machine 40% faster:
function A = multiDiff(A,B)
for j = 1:numel(B)
for i = 1:numel(A)
if A(i) == B(j)
A(i) = [];
break;
end
end
end
end
I have a 1D array of sorted non-unique numbers. The number of times they repeat is random.
It is associated with an array of weights with the same size. For a given series of identical elements, the associated series of weights may or may not have repeated elements as well and in this whole array of weights, there may or may not be repeated elements. E.g:
pos = np.array([3, 3, 7, 7, 9, 9, 9, 10, 10])
weights = np.array([2, 10, 20, 8, 5, 7, 15, 7, 2])
I need to extract an array of unique elements of pos, but where the unique element is the one with the greatest weight.
The working solution I came up with involves looping:
pos = np.array([3, 3, 7, 7, 9, 9, 9, 10, 10])
weights = np.array([2, 10, 20, 8, 5, 7, 15, 7, 2])
# Get the number of occurences of the elements in pos but throw away the unique array, it's not the one I want.
_, ucounts = np.unique(pos, return_counts=True)
# Initialize the output array.
unique_pos_idx = np.zeros([ucounts.size], dtype=np.uint32)
last = 0
for i in range(ucounts.size):
maxpos = np.argmax( weights[last:last+ucounts[i]] )
unique_pos_idx[i] = last + maxpos
last += ucounts[i]
# Result is:
# unique_pos_idx = [1 2 6 7]
but I’m not using much of the Python language or Numpy (apart from the use of numpy arrays) so I wonder if there is a more Pythonesque and/or more efficient solution than even a Cython version of the above?
Thanks
Here's one vectorized approach -
sidx = np.lexsort([weights,pos])
out = sidx[np.r_[np.flatnonzero(pos[1:] != pos[:-1]), -1]]
Possible improvement(s) on performance -
1] A faster way to get the sorted indices sidx with scaling -
sidx = (pos*(weights.max()+1) + weights).argsort()
2] The indexing at the end could be made faster with boolean-indexing, specially when dealing with many such intervals/groupings -
out = sidx[np.concatenate((pos[1:] != pos[:-1], [True]))]
Runtime test
All approaches :
def org_app(pos, weights):
_, ucounts = np.unique(pos, return_counts=True)
unique_pos_idx = np.zeros([ucounts.size], dtype=np.uint32)
last = 0
for i in range(ucounts.size):
maxpos = np.argmax( weights[last:last+ucounts[i]] )
unique_pos_idx[i] = last + maxpos
last += ucounts[i]
return unique_pos_idx
def vec_app(pos, weights):
sidx = np.lexsort([weights,pos])
return sidx[np.r_[np.flatnonzero(pos[1:] != pos[:-1]), -1]]
def vec_app_v2(pos, weights):
sidx = (pos*(weights.max()+1) + weights).argsort()
return sidx[np.concatenate((pos[1:] != pos[:-1], [True]))]
Timings and verification -
For the setup, let's use the sample and tile it 10000 times with scaling, as we intend to create 1000 times more number of intervals. Also, let's use unique numbers in weights, so that the argmax indices aren't confused by identical numbers :
In [155]: # Setup input
...: pos = np.array([3, 3, 7, 7, 9, 9, 9, 10, 10,])
...: pos = (pos + 10*np.arange(10000)[:,None]).ravel()
...: weights = np.random.choice(10*len(pos), size=len(pos), replace=0)
...:
...: print np.allclose(org_app(pos, weights), vec_app(pos, weights))
...: print np.allclose(org_app(pos, weights), vec_app_v2(pos, weights))
...:
True
True
In [156]: %timeit org_app(pos, weights)
...: %timeit vec_app(pos, weights)
...: %timeit vec_app_v2(pos, weights)
...:
10 loops, best of 3: 56.4 ms per loop
100 loops, best of 3: 14.8 ms per loop
1000 loops, best of 3: 1.77 ms per loop
In [157]: 56.4/1.77 # Speedup with vectorized one over loopy
Out[157]: 31.864406779661017
I would like to know if there is a better way (in the case my implementation is correct) to find a sub-sequence of integers in a given array. I have implemented the solution using golang (if this is an impediment for a review I could use a different language). If I am not mistaken the bellow implementation is close to O(b).
package main
import "fmt"
func main() {
a := []int{1, 2, 3}
b := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
r := match(a, b)
fmt.Println("Match found for case 1: ", r)
a = []int{1, 2, 3}
b = []int{4, 5, 6, 7, 8, 9}
r = match(a, b)
fmt.Println("Match found for case 2: ", r)
a = []int{1, 2, 3}
b = []int{1, 5, 3, 7, 8, 9}
r = match(a, b)
fmt.Println("Match found for case 3: ", r)
a = []int{1, 2, 3}
b = []int{4, 5, 1, 7, 3, 9}
r = match(a, b)
fmt.Println("Match found for case 4: ", r)
a = []int{1, 2, 3}
b = []int{4, 5, 6, 1, 2, 3}
r = match(a, b)
fmt.Println("Match found for case 5: ", r)
a = []int{1, 2, 3}
b = []int{1, 2, 1, 2, 3}
r = match(a, b)
fmt.Println("Match found for case 6: ", r)
a = []int{1, 2, 3, 4, 5}
b = []int{4, 1, 5, 3, 6, 1, 2, 4, 4, 5, 7, 8, 1, 2, 2, 4, 1, 3, 3, 4}
r = match(a, b)
fmt.Println("Match found for case 7: ", r)
a = []int{1, 2, 1, 2, 1}
b = []int{1, 1, 2, 2, 1, 2, 1}
r = match(a, b)
fmt.Println("Match found for case 8: ", r)
}
func match(a []int, b []int) bool {
if len(b) < len(a) {
return false
}
lb := len(b) - 1
la := len(a) - 1
i := 0
j := la
k := 0
counter := 0
for {
if i > lb || j > lb {
break
}
if b[i] != a[k] || b[j] != a[la] {
i++
j++
counter = 0
continue
} else {
i++
counter++
if k < la {
k++
} else {
k = 0
}
}
if counter >= la+1 {
return true
}
}
return counter >= la+1
}
Correctness
As discussed in the comment section, there are a family of string matching algorithms, which normally categorized into single pattern and multiple pattern matching algorithm. In your case it belongs to single pattern string matching problem.
From my knowledge, the most well-known algorithm is KMP algorithm which uses dynamic programming, and an alternative named Rabin-Karp's algorithm which uses rolling hash technique to speed up the process. Both runs in O(max(a,b)).
However, your code is not very alike to these algorithm's normal implementation, at least to my experience. Therefore I suspect the correctness of your code at the first place. You can try cases like a = {1, 2, 1, 2, 1}, b { 1, 1, 2, 2, 1, 2, 1 } to see it is not giving correct result.
Therefore you can
Abandon current algorithm and learn those standard one, implement them
Outline the logic and sketch a proof of your current algorithm, compared it with the logic behind those standard algorithms to verify its correctness
I will leave this part to you
Complexity
To directly answer your OP:
No, O(max(a,b)) is the optimal you can achieve in this problem, which is also the complexity of the standard known algorithms mentioned above.
My understanding is that, it actually makes sense as at worst case, you HAVE TO read each character of the longer string at least 1 time.
Your current algorithm is also O(b) clearly, as you loop using i from 0 to length of b, and no matter which condition you fall into i will increase by 1, giving total O(b)
Therefore complexity is actually not the problem, the correctness is the problem.
Since you are only looking for a sequence, i would probably convert everything to string type and use the standard strings package.
Playground
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Println(strings.Contains("1, 2, 3, 4, 5, 6, 7, 8, 9", "1, 2, 3"))
fmt.Println(strings.Contains("4, 5, 6, 7, 8, 9", "1, 2, 3"))
fmt.Println(strings.Contains("1, 5, 3, 7, 8, 9", "1, 2, 3"))
fmt.Println(strings.Contains("4, 5, 1, 7, 3, 9", "1, 2, 3"))
fmt.Println(strings.Contains("4, 5, 6, 1, 2, 3", "1, 2, 3"))
fmt.Println(strings.Contains("4, 5, 6, 1, 2, 3, 2", "1, 2, 2, 3"))
fmt.Println(strings.Contains("1, 2, 1, 2, 3", "1, 2, 3"))
}
I have two lists of input (call one a, the other b), I am trying to see if a can be matched to b if elements of a are removed. For example if a = [1, 1, 2, 3, 3, 3, 4] and b = [1,2,3] , the function would find that it can remove one occurrence of 1, two occurrences of 3 and the only occurrence of 4 so that a == b. I don't know how I would go about getting this to work in Python.
a = [1, 1, 2, 3, 3, 3, 4]
b = [1, 2, 3]
i = 0
ans = False
if len(a) >= len(b):
for item in a:
if item == b[i]:
i += 1
if i >= len(b):
ans = True
break
print ans
Not quite as efficient as sophiadw's answer, but this approach will tell you which items in a aren't needed to match b, and the input lists don't need to be sorted.
from collections import Counter
#returns False if a can't be pared down to match b.
#If it can be pared down, returns a dictionary of which items to remove.
def can_match(a,b):
c_a = Counter(a)
c_b = Counter(b)
if c_b - c_a:
return False
return c_a - c_b
a = [1,1,2,3,3,3,4]
b = [1,2,3]
print can_match(a,b)
Result:
Counter({3: 2, 1: 1, 4: 1})