Generate a list of unique permutations - permutation

Say I have a list of 3 symbols :
l:`s1`s2`s3
What is the q-way to generate the following list of n*(n+1)/2 permutations?
(`s1;`s1),(`s1;`s2),(`s1;`s3),(`s2;`s2),(`s2;`s3),(`s3;`s3)
This can be seen as in the context of correlation matrix, where I want all the upper triangular part of the correlation matrix, including the diagonal.
Of course the size of my initial list will exceed 3, so I would like a generic function to perform this operation.
I know how to generate the diagonal elements:
q) {(x,y)}'[l;l]
(`s1`s1;`s2`s2;`s3`s3)
But I don't know how to generate the non-diagonal elements.

Another solution you might find useful:
q)l
`s1`s2`s3
q){raze x,/:'-1_{1_x}\[x]}l
s1 s1
s1 s2
s1 s3
s2 s2
s2 s3
s3 s3
This uses the scan accumulator to create a list of lists of symbols, with each dropping the first element:
q)-1_{1_x}\[l]
`s1`s2`s3
`s2`s3
,`s3
The extra -1_ is needed since the scan will also return an empty list at the end. Then join each element of the list onto this result using an each-right and an each:
{x,/:'-1_{1_x}\[x]}l
(`s1`s1;`s1`s2;`s1`s3)
(`s2`s2;`s2`s3)
,`s3`s3
Finally use a raze to get the distinct permutations.
EDIT: could also use
q){raze x,/:'til[count x]_\:x}l
s1 s1
s1 s2
s1 s3
s2 s2
s2 s3
s3 s3
which doesnt need the scan at all and is very similar to the scan solution performance-wise!

I would try below code
{distinct asc each x cross x}`s1`s2`s3
It
cross generates all (s_i, s_j) pairs
asc each sorts every pair by index, so `s3`s1 becomes `s1`s3
distinct removes duplicates
Not the most efficient way by very short one.

If I am understanding the question (apologies if I have missed something). Below should give you what you are looking for
q)test:`s1`s2`s3`s4`s5
q)(til cnt) _' raze (-1+cnt:count test)cut test,'/:test
(`s1`s1;`s2`s1;`s3`s1;`s4`s1;`s5`s1)
(`s2`s2;`s3`s2;`s4`s2;`s5`s2)
(`s3`s3;`s4`s3;`s5`s3)
(`s4`s4;`s5`s4)
,`s5`s5

Related

Algorithm for searching an array for 5 elements which sum to a value

[I asked lately a similar question, Search unsorted array for 3 elements which sum to a value
and got wonderful answers, thank you all! :)]
I need your help for solving the following problem:
I am looking for an algorithm, the time-complexity must be ϴ( n³ ).
The algorithm searches an unsorted array (of n integers) for 5 different integers
which sum to a given z.
E.g.: for the input: ({2,5,7,6,3,4,9,8,21,10} , 22)
the output should be true for we can sum up 2+7+6+3+4=22
(the sorting doesn't really matter. The array can be sorted first without affecting the complexity.
So you can look at the problem as if the array is already sorted.)
-No memory constraints-
-We only know that the array elements are n integers.-
Any help would be appriciated.
Algorithm:
1) Generate an array consisting of pairs of your initial integers and sort it. That step will take O(n^2 * log (n^2)) time.
2) Choose a value from your initial array. O(n) ways.
3) Now you have a very similar problem to the linked one. You have to choose two pairs such that their sum will be equal to z - chosen value. Thankfully, you have an array of all pairs, already sorted, of length O(n^2). Finding such pairs should be straightforward -- same thing you did in a 3 integer sum problem. You make two pointers and move both of them O(n^2) times in total.
O(n^3) total complexity.
You may get into some problems with finding pairs that consist of your chosen value. Skip every pair that consists of your chosen value (just move the pointer further when you reach such a pair like it never existed).
Let's say that you have two pairs, p1 and p2, such that sum(p1) + sum(p2) + chosen value = z. If all of the integers in p1 and p2 are different, you have the solution. If not, that's where it gets a little bit messy.
Let's fix p1 and check the next value after p2. It may have the same sum as p2 since two different pairs can have same sum. If it does, definitely you will not have the same collision with p1 as you had with p2, but you may get a collision with the other integer of p1. If so, check the second value after p2, if it also has the same sum -- it definitely won't have any collision with p1.
So assuming that there are at least 3 pairs with same sum as p1 or p2, you will always find a solution checking 3 values for fixed p1 or checking 3 values for fixed p2.
The only possibility left is that there are less than 3 pairs with same sum as p1 and there are less than 3 pairs with same sum as p2. You can choose them in up to 4 ways -- just check each possibility.
It is a bit unpleasant, but in constant amount of operations you are able to handle such problems. That means the total complexity is O(n^3).

Regex with no 2 consecutive a's and b's

I have been trying out some regular expressions lately. Now, I have 3 symbols a, b and c.
I first looked at a case where I don't want 2 consecutive a's. The regex would be something like:
((b|c + a(b|c))*(a + epsilon)
Now I'm wondering if there's a way to generalize this problem to say something like:
A regular expression with no two consecutive a's and no two consecutive b's. I tried stuff like:
(a(b|c) + b(a|c) + c)* (a + b + epsilon)
But this accepts inputs such as"abba" or "baab" which will have 2 consecutive a's (or b's) which is not what I want. Can anyone suggest me a way out?
If you can't do a negative match then perhaps you can use negative lookahead to exclude strings matching aa and bb? Something like the following (see Regex 101 for more information):
(?!.*(aa|bb).*)^.*$
I (think I) solved this by hand-drawing a finite state machine, then, generating a regex using FSM2Regex. The state machine is written below (with the syntax from the site):
#states
s0
s1
s2
s3
#initial
s0
#accepting
s1
s2
s3
#alphabet
a
b
c
#transitions
s0:a>s1
s0:b>s2
s0:c>s3
s1:b>s2
s1:c>s3
s2:a>s1
s2:c>s3
s3:c>s3
s3:a>s1
s3:b>s2
If you look at the transitions, you'll notice it's fairly straightforward- I have states that correspond to a "sink" for each letter of the alphabet, and I only allow transitions out of that state for other letters (not the "sink" letter). For example, s1 is the "sink" for a. From all other states, you can get to s1 with an a. Once you're in s1, though, you can only get out of it with a b or a c, which have their own "sinks" s2 and s3 respectively. Because we can repeat c, s3 has a transition to itself on the character c. Paste the block text into the site, and it'll draw all this out for you, and generate the regex.
The regex it generated for me is:
c+cc*(c+$+b+a)+(b+cc*b)(cc*b)*(c+cc*(c+$+b+a)+$+a)+(a+cc*a+(b+cc*b)(cc*b)*(a+cc*a))(cc*a+(b+cc*b)(cc*b)*(a+cc*a))*(c+cc*(c+$+b+a)+(b+cc*b)(cc*b)*(c+cc*(c+$+b+a)+$+a)+b+$)+b+a
Which, I'm pretty sure, is not optimal :)
EDIT: The generated regex uses + as the choice operator (usually known to us coders as |), which means it's probably not suitable to pasting into code. However, I'm too scared to change it and risk ruining my regex :)
You can use back references to match the prev char
string input = "acbbaacbba";
string pattern = #"([ab])\1";
var matchList = Regex.Matches(input, pattern);
This pattern will match: bb, aa and bb. If you don't have any match in your input pattern, it means that it does not contain a repeated a or b.
Explanation:
([ab]): define a group, you can extend your symbols here
\1: back referencing the group, so for example, when 'a' is matched, \1 would be 'a'
check this page: http://www.regular-expressions.info/backref.html

How to locate in a huge list of numbers, two numbers where xi=xj?

I have the following question, and it screams at me for a solution with hashing:
Problem :
Given a huge list of numbers, x1........xn where xi <= T, we'd like to know
whether or not exists two indices i,j, where x_i == x_j.
Find an algorithm in O(n) run time, and also with expectancy of O(n), for the problem.
My solution at the moment : We use hashing, where we'll have a mapping function h(x) using chaining.
First - we build a new array, let's call it A, where each cell is a linked list - this would be the destination array.
Now - we run on all the n numbers and map each element in x1........xn, to its rightful place, using the hash function. This would take O(n) run time.
After that we'll run on A, and look for collisions. If we'll find a cell where length(A[k]) > 1
then we return the xi and xj that were mapped to the value that's stored in A[k] - total run time here would be O(n) for the worst case , if the mapped value of two numbers (if they indeed exist) in the last cell of A.
The same approach can be ~twice faster (on average), still O(n) on average - but with better constants.
No need to map all the elements into the hash and then go over it - a faster solution could be:
for each element e:
if e is in the table:
return e
else:
insert e into the table
Also note that if T < n, there must be a dupe within the first T+1 elements, from pigeonhole principle.
Also for small T, you can use a simple array of size T, no hash is needed (hash(x) = x). Initializing T can be done in O(1) to contain zeros as initial values.

Given an unordered list of integers, return a value not present in the list

I have an algorithm problem that I came across at work but have not been able to come up with a satisfactory solution for. I browsed this forum some and the closest I have come to the same problem is How to find a duplicate element in an array of shuffled consecutive integers?.
I have a list of N elements of integers which can contain the elements 1-M (M>N), further the list is unsorted. I want a function that will take this list as input and return a value in range 1-M not present in the list. The list contains no duplicates. I was hoping for an o(N) solution, with out using additional space
UPDATE: function cannot change the original list L
for instance N = 5 M = 10
List (L): 1, 2, 4, 8, 3
then f(L) = 5
To be honest i dont care if it returns an element other than 5, just so long as it meets the contraints above
The only solution I have come up with so far is using an additional array of M elements. Walking through the input list and setting the corresponding array elements to 1 if present in the list. Then iterating over this list again and returning the index of the first element with value 0. As you can see this uses additional o(M) space and has complexity 2*o(N).
Any help would we appreciated.
Thanks for the help everyone. The stack overflow community is definitely super helpful.
To give everyone a little more context of the problem I am trying to solve.
I have a set of M token that I give out to some clients (one per client). When a client is done with the token they get returned to my pile. As you can see the original order I give client a token is sorted.
so M = 3 Tokens
client1: 1 <2,3>
client2: 2 <3>
client1 return: 1 <1,3>
client 3: 3 <1>
Now the question is giving client4 token 1. I could at this stage give client 4 token 2 and sort the list. Not sure if that would help. In any case if I come up with a nice clean solution I will be sure to post it
Just realised I might have confused everyone. I do not have the list of free token with me when I am called. I could statically maintain such a list but I would rather not
You can do divide and conquer. Basically given the range 1..m, do a quicksort style swapping with m/2 as the pivot. If there are less than m/2 elements in the first half, then there is a missing number and iteratively find it. Otherwise, there is a missing number in the second half. Complexity: n+n/2+n/4... = O(n)
def findmissing(x, startIndex, endIndex, minVal, maxVal):
pivot = (minVal+maxVal)/2
i=startIndex
j=endIndex
while(True):
while( (x[i] <= pivot) and (i<j) ):
i+=1
if i>=j:
break
while( (x[j] > pivot) and (i<j) ):
j+=1
if i>=j:
break
swap(x,i,j)
k = findlocation(x,pivot)
if (k-startIndex) < (pivot-minVal):
findmissing(x,startIndex, k, minVal, pivot)
else:
findmissing(x, k+1, endIndex, pivot+1, maxVal)
I have not implemented the end condition which I will leave it to you.
You can have O(N) time and space. You can be sure there is an absent element within 1..N+1, so make an array of N+1 elements, and ignore numbers larger than N+1.
If M is large compared to N, say M>2N, generate a random number in 1..M and check if it is not on the list in O(N) time, O(1) space. The probability you will find a solution in a single pass is at least 1/2, and therefore (geometric distribution) the expected number of passes is constant, average complexity O(N).
If M is N+1 or N+2, use the approach described here.
Can you do something like a counting sort? Create an array of size (M-1) then go through the list once (N) and change the array element indexed at i-1 to one. After looping once through N, search 0->(M-1) until you find the first array with a zero value.
Should me O(N+M).
Array L of size (M-1): [0=0, 1=0, 2=0, 3=0, 4=0, 5=0, 6=0, 7=0, 8=0, 9=0]
After looping through N elements: [0=1, 1=1, 2=1, 3=1, 4=0, 5=0, 6=0, 7=1, 8=0, 9=0]
Search array 0->(M-1) finds index 4 is zero, therefore 5 (4+1) is the first integer not in L.
After reading your updated i guess you are making it over complex. First of all let me list down what i get from your question
Yoou need to give a token to the client regardless of its order, quoting from your original post
for instance N = 5 M = 10 List (L): 1, 2, 4, 8, 3 then f(L) = 5 To be
honest i dont care if it returns an element other than 5, just so long
as it meets the contraints above
Secondly, you are already mantaining a list of "M" Tokens
Client is fetching the token and after using it returning it back to you
Given these 2 points, why don't you implement a TokenPool?
Implement your M list based on a Queue
Whenever a client ask for a a token, fetch a token from the queue i.e. removing it from queue. By this method, your queue will always maintain those tokens which aren't given away. you are doing it O(1)
Whenever a client is done with the token he will return it back to you. Add it back to the queue. Again O(1).
In whole implementation, you wouldn't have to loop through any of list. All you have to do is to Generate the token and insert in the queue.

One-way flight trip problem

You are going on a one-way indirect flight trip that includes billions an unknown very large number of transfers.
You are not stopping twice in the same airport.
You have 1 ticket for each part of your trip.
Each ticket contains src and dst airport.
All the tickets you have are randomly sorted.
You forgot the original departure airport (very first src) and your destination (last dst).
Design an algorithm to reconstruct your trip with minimum big-O complexity.
Attempting to solve this problem I have started to use a symmetric difference of two sets, Srcs and Dsts:
1)Sort all src keys in array Srcs
2)Sort all dst keys in array Dsts
3)Create an union set of both arrays to find non-duplicates - they are your first src and last dst
4)Now, having the starting point, traverse both arrays using the binary search.
But I suppose there must be another more effective method.
Construct a hashtable and add each airport into the hash table.
<key,value> = <airport, count>
Count for the airport increases if the airport is either the source or the destination. So for every airport the count will be 2 ( 1 for src and 1 for dst) except for the source and the destination of your trip which will have the count as 1.
You need to look at each ticket at least once. So complexity is O(n).
Summary: below a single-pass algorithm is given. (I.e., not just linear, but looks each ticket exactly once, which of course is optimal number of visits per ticket). I put the summary because there are many seemingly equivalent solutions and it would be hard to spot why I added another one. :)
I was actually asked this question in an interview. The concept is extremely simple: each ticket is a singleton list, with conceptually two elements, src and dst.
We index each such list in a hashtable using its first and last elements as keys, so we can find in O(1) if a list starts or ends at a particular element (airport). For each ticket, when we see it starts where another list ends, just link the lists (O(1)). Similarly, if it ends where another list starts, another list join. Of course, when we link two lists, we basically destroy the two and obtain one. (The chain of N tickets will be constructed after N-1 such links).
Care is needed to maintain the invariant that the hashtable keys are exactly the first and last elements of the remaining lists.
All in all, O(N).
And yes, I answered that on the spot :)
Edit Forgot to add an important point. Everyone mentions two hashtables, but one does the trick as well, because the algorithms invariant includes that at most one ticket list starts or begins in any single city (if there are two, we immediately join the lists at that city, and remove that city from the hashtable). Asymptotically there is no difference, it's just simpler this way.
Edit 2 Also of interest is that, compared to solutions using 2 hashtables with N entries each, this solution uses one hashtable with at most N/2 entries (which happens if we see the tickets in an order of, say, 1st, 3rd, 5th, and so on). So this uses about half memory as well, apart from being faster.
Construct two hash tables (or tries), one keyed on src and the other on dst. Choose one ticket at random and look up its dst in the src-hash table. Repeat that process for the result until you hit the end (the final destination). Now look up its src in the dst-keyed hash table. Repeat the process for the result until you hit the beginning.
Constructing the hash tables takes O(n) and constructing the list takes O(n), so the whole algorithm is O(n).
EDIT: You only need to construct one hash table, actually. Let's say you construct the src-keyed hash table. Choose one ticket at random and like before, construct the list that leads to the final destination. Then choose another random ticket from the tickets that have not yet been added to the list. Follow its destination until you hit the ticket you initially started with. Repeat this process until you have constructed the entire list. It's still O(n) since worst case you choose the tickets in reverse order.
Edit: got the table names swapped in my algorithm.
It's basically a dependency graph where every ticket represents a node and the src and dst airport represents directed links, so use a topological sort to determine the flight order.
EDIT: Although since this is an airline ticket and you know you actually made an itinerary you could physically perform, sort by departure date and time in UTC.
EDIT2: Assuming each airport you have a ticket to uses a three character code, you can use the algorithm described here (Find three numbers appeared only once) to determine the two unique airports by xoring all the airports together.
EDIT3: Here's some C++ to actually solve this problem using the xor method. The overall algorithm is as follows, assuming a unique encoding from airport to an integer (either assuming a three letter airport code or encoding the airport location in an integer using latitude and longitude):
First, XOR all the airport codes together. This should be equal to the initial source airport XOR the final destination airport. Since we know that the initial airport and the final airport are unique, this value should not be zero. Since it's not zero, there will be at least one bit set in that value. That bit corresponds to a bit that is set in one of the airports and not set in the other; call it the designator bit.
Next, set up two buckets, each with the XORed value from the first step. Now, for every ticket, bucket each airport according to whether it has the designator bit set or not, and xor the airport code with the value in the bucket. Also keep track for each bucket how many source airports and destination airports went to that bucket.
After you process all the tickets, pick one of the buckets. The number of source airports sent to that bucket should be one greater or less than the number of destination airports sent to that bucket. If the number of source airports is less than the number of destination airports, that means the initial source airport (the only unique source airport) was sent to the other bucket. That means the value in the current bucket is the identifier for the initial source airport! Conversely, if the number of destination airports is less than the number of source airports, the final destination airport was sent to the other bucket, so the current bucket is the identifier for the final destination airport!
struct ticket
{
int src;
int dst;
};
int get_airport_bucket_index(
int airport_code,
int discriminating_bit)
{
return (airport_code & discriminating_bit)==discriminating_bit ? 1 : 0;
}
void find_trip_endpoints(const ticket *tickets, size_t ticket_count, int *out_src, int *out_dst)
{
int xor_residual= 0;
for (const ticket *current_ticket= tickets, *end_ticket= tickets + ticket_count; current_ticket!=end_ticket; ++current_ticket)
{
xor_residual^= current_ticket->src;
xor_residual^= current_ticket->dst;
}
// now xor_residual will be equal to the starting airport xor ending airport
// since starting airport!=ending airport, they have at least one bit that is not in common
//
int discriminating_bit= xor_residual & (-xor_residual);
assert(discriminating_bit!=0);
int airport_codes[2]= { xor_residual, xor_residual };
int src_count[2]= { 0, 0 };
int dst_count[2]= { 0, 0 };
for (const ticket *current_ticket= tickets, *end_ticket= tickets + ticket_count; current_ticket!=end_ticket; ++current_ticket)
{
int src_index= get_airport_bucket_index(current_ticket->src, discriminating_bit);
airport_codes[src_index]^= current_ticket->src;
src_count[src_index]+= 1;
int dst_index= get_airport_bucket_index(current_ticket->dst, discriminating_bit);
airport_codes[dst_index]^= current_ticket->dst;
dst_count[dst_index]+= 1;
}
assert((airport_codes[0]^airport_codes[1])==xor_residual);
assert(abs(src_count[0]-dst_count[0])==1); // all airports with the bit set/unset will be accounted for as well as either the source or destination
assert(abs(src_count[1]-dst_count[1])==1);
assert((src_count[0]-dst_count[0])==-(src_count[1]-dst_count[1]));
int src_index= src_count[0]-dst_count[0]<0 ? 0 : 1;
// if src < dst, that means we put more dst into the source bucket than dst, which means the initial source went into the other bucket, which means it should be equal to this bucket!
assert(get_airport_bucket_index(airport_codes[src_index], discriminating_bit)!=src_index);
*out_src= airport_codes[src_index];
*out_dst= airport_codes[!src_index];
return;
}
int main()
{
ticket test0[]= { { 1, 2 } };
ticket test1[]= { { 1, 2 }, { 2, 3 } };
ticket test2[]= { { 1, 2 }, { 2, 3 }, { 3, 4 } };
ticket test3[]= { { 2, 3 }, { 3, 4 }, { 1, 2 } };
ticket test4[]= { { 2, 1 }, { 3, 2 }, { 4, 3 } };
ticket test5[]= { { 1, 3 }, { 3, 5 }, { 5, 2 } };
int initial_src, final_dst;
find_trip_endpoints(test0, sizeof(test0)/sizeof(*test0), &initial_src, &final_dst);
assert(initial_src==1);
assert(final_dst==2);
find_trip_endpoints(test1, sizeof(test1)/sizeof(*test1), &initial_src, &final_dst);
assert(initial_src==1);
assert(final_dst==3);
find_trip_endpoints(test2, sizeof(test2)/sizeof(*test2), &initial_src, &final_dst);
assert(initial_src==1);
assert(final_dst==4);
find_trip_endpoints(test3, sizeof(test3)/sizeof(*test3), &initial_src, &final_dst);
assert(initial_src==1);
assert(final_dst==4);
find_trip_endpoints(test4, sizeof(test4)/sizeof(*test4), &initial_src, &final_dst);
assert(initial_src==4);
assert(final_dst==1);
find_trip_endpoints(test5, sizeof(test5)/sizeof(*test5), &initial_src, &final_dst);
assert(initial_src==1);
assert(final_dst==2);
return 0;
}
Create two data structures:
Route
{
start
end
list of flights where flight[n].dest = flight[n+1].src
}
List of Routes
And then:
foreach (flight in random set)
{
added to route = false;
foreach (route in list of routes)
{
if (flight.src = route.end)
{
if (!added_to_route)
{
add flight to end of route
added to route = true
}
else
{
merge routes
next flight
}
}
if (flight.dest = route.start)
{
if (!added_to_route)
{
add flight to start of route
added to route = true
}
else
{
merge routes
next flight
}
}
}
if (!added to route)
{
create route
}
}
Put in two Hashes:
to_end = src -> des;
to_beg = des -> src
Pick any airport as a starting point S.
while(to_end[S] != null)
S = to_end[S];
S is now your final destination. Repeat with the other map to find your starting point.
Without properly checking, this feels O(N), provided you have a decent Hash table implementation.
A hash table won't work for large sizes (such as the billions in the original question); anyone who has worked with them knows that they're only good for small sets. You could instead use a binary search tree, which would give you complexity O(n log n).
The simplest way is with two passes: The first adds them all to the tree, indexed by src. The second walks the tree and collects the nodes into an array.
Can we do better? We can, if we really want to: we can do it in one pass. Represent each ticket as a node on a liked list. Initially, each node has null values for the next pointer. For each ticket, enter both its src and dest in the index. If there's a collision, that means that we already have the adjacent ticket; connect the nodes and delete the match from the index. When you're done, you'll have made only one pass, and have an empty index, and a linked list of all the tickets in order.
This method is significantly faster: it's only one pass, not two; and the store is significantly smaller (worst case: n/2 ; best case: 1; typical case: sqrt(n)), enough so that you might be able to actually use a hash instead of a binary search tree.
Each airport is a node. Each ticket is an edge. Make an adjacency matrix to represent the graph. This can be done as a bit field to compress the edges. Your starting point will be the node that has no path into it (it's column will be empty). Once you know this you just follow the paths that exist.
Alternately you could build a structure indexable by airport. For each ticket you look up it's src and dst. If either is not found then you need to add new airports to your list. When each is found you set a the departure airport's exit pointer to point to the destination, and the destination's arrival pointer to point to the departure airport. When you are out of tickets you must traverse the entire list to determine who does not have a path in.
Another way would be to have a variable length list of mini-trips that you connect together as you encounter each ticket. Each time you add a ticket you see if the ends of any existing mini-trip match either the src or dest of you ticket. If not, then your current ticket becomes it's own mini-trip and is added to the list. If so then the new ticket is tacked on to the end(s) of the existing trip(s) that it matches, possibly splicing two existing mini-trips together, in which case it would shorten the list of mini-trips by one.
This is the simple case of a single path state machine matrix.
Sorry for the pseudo-code being in C# style, but it was easier to express the idea with objects.
First, construct a turnpike matrix.
Read my description of what a turnpike matrix is (don't bother with the FSM answer, just the explanation of a turnpike matrix) at What are some strategies for testing large state machines?.
However, the restrictions you describe make the case a simple single path state machine. It is the simplest state machine possible with complete coverage.
For a simple case of 5 airports,
vert nodes=src/entry points,
horiz nodes=dst/exit points.
A1 A2 A3 A4 A5
A1 x
A2 x
A3 x
A4 x
A5 x
Notice that for each row, as well as for each column, there should be no more than one transition.
To get the path of the machine, you would sort the matrix into
A1 A2 A3 A4 A5
A2 x
A1 x
A3 x
A4 x
A5 x
Or sort into a diagonal square matrix - an eigen vector of ordered pairs.
A1 A2 A3 A4 A5
A2 x
A5 x
A1 x
A3 x
A4 x
where the ordered pairs are the list of tickets:
a2:a1, a5:a2, a1:a3, a3:a4, a4:a5.
or in more formal notation,
<a2,a1>, <a5,a2>, <a1,a3>, <a3,a4>, <a4,a5>.
Hmmm .. ordered pairs huh? Smelling a hint of recursion in Lisp?
<a2,<a1,<a3,<a4,a5>>>>
There are two modes of the machine,
trip planning - you don't know how
many airports there are, and you
need a generic trip plan for an
unspecified number of airports
trip reconstruction - you have all
the turnpike tickets of a past trip
but they are all one big stack in
your glove compartment/luggage bag.
I am presuming your question is about trip reconstruction. So, you pick one ticket after another randomly from that pile of tickets.
We presume the ticket pile is of indefinite size.
tak mnx cda
bom 0
daj 0
phi 0
Where 0 value denotes unordered tickets. Let us define unordered ticket as a ticket where its dst is not matched with the src of another ticket.
The following next ticket finds that mnx(dst) = kul(src) match.
tak mnx cda kul
bom 0
daj 1
phi 0
mnx 0
At any moment you pick the next ticket, there is a possibility that it connects two sequential airports. If that happen, you create a cluster node out of that two nodes:
<bom,tak>, <daj,<mnx,kul>>
and the matrix is reduced,
tak cda kul
bom 0
daj L1
phi 0
where
L1 = <daj,<mnx,kul>>
which is a sublist of the main list.
Keep on picking the next random tickets.
tak cda kul svn xml phi
bom 0
daj L1
phi 0
olm 0
jdk 0
klm 0
Match either existent.dst to new.src
or existent.src to new.dst:
tak cda kul svn xml
bom 0
daj L1
olm 0
jdk 0
klm L2
<bom,tak>, <daj,<mnx,kul>>, <<klm,phi>, cda>
The above topological exercise is for visual comprehension only. The following is the algorithmic solution.
The concept is to cluster ordered pairs into sublists to reduce the burden on the hash structures we will use to house the tickets. Gradually, there will be more and more pseudo-tickets (formed from merged matched tickets), each containing a growing sublist of ordered destinations. Finally, there will remain one single pseudo-ticket containing the complete itinerary vector in its sublist.
As you see, perhaps, this is best done with Lisp.
However, as an exercise of linked lists and maps ...
Create the following structures:
class Ticket:MapEntry<src, Vector<dst> >{
src, dst
Vector<dst> dstVec; // sublist of mergers
//constructor
Ticket(src,dst){
this.src=src;
this.dst=dst;
this.dstVec.append(dst);
}
}
class TicketHash<x>{
x -> TicketMapEntry;
void add(Ticket t){
super.put(t.x, t);
}
}
So that effectively,
TicketHash<src>{
src -> TicketMapEntry;
void add(Ticket t){
super.put(t.src, t);
}
}
TicketHash<dst>{
dst -> TicketMapEntry;
void add(Ticket t){
super.put(t.dst, t);
}
}
TicketHash<dst> mapbyDst = hash of map entries(dst->Ticket), key=dst
TicketHash<src> mapbySrc = hash of map entries(src->Ticket), key=src
When a ticket is randomly picked from the pile,
void pickTicket(Ticket t){
// does t.dst exist in mapbyDst?
// i.e. attempt to match src of next ticket to dst of an existent ticket.
Ticket zt = dstExists(t);
// check if the merged ticket also matches the other end.
if(zt!=null)
t = zt;
// attempt to match dst of next ticket to src of an existent ticket.
if (srcExists(t)!=null) return;
// otherwise if unmatched either way, add the new ticket
else {
// Add t.dst to list of existing dst
mapbyDst.add(t);
mapbySrc.add(t);
}
}
Check for existent dst:
Ticket dstExists(Ticket t){
// find existing ticket whose dst matches t.src
Ticket zt = mapbyDst.getEntry(t.src);
if (zt==null) return false; //no match
// an ordered pair is matched...
//Merge new ticket into existent ticket
//retain existent ticket and discard new ticket.
Ticket xt = mapbySrc.getEntry(t.src);
//append sublist of new ticket to sublist of existent ticket
xt.srcVec.join(t.srcVec); // join the two linked lists.
// remove the matched dst ticket from mapbyDst
mapbyDst.remove(zt);
// replace it with the merged ticket from mapbySrc
mapbyDst.add(zt);
return zt;
}
Ticket srcExists(Ticket t){
// find existing ticket whose dst matches t.src
Ticket zt = mapbySrc.getEntry(t.dst);
if (zt==null) return false; //no match
// an ordered pair is matched...
//Merge new ticket into existent ticket
//retain existent ticket and discard new ticket.
Ticket xt = mapbyDst.getEntry(t.dst);
//append sublist of new ticket to sublist of existent ticket
xt.srcVec.join(t.srcVec); // join the two linked lists.
// remove the matched dst ticket from mapbyDst
mapbySrc.remove(zt);
// replace it with the merged ticket from mapbySrc
mapbySrc.add(zt);
return zt;
}
Check for existent src:
Ticket srcExists(Ticket t){
// find existing ticket whose src matches t.dst
Ticket zt = mapbySrc.getEntry(t.dst);
if (zt == null) return null;
// if an ordered pair is matched
// remove the dst from mapbyDst
mapbySrc.remove(zt);
//Merge new ticket into existent ticket
//reinsert existent ticket and discard new ticket.
mapbySrc.getEntry(zt);
//append sublist of new ticket to sublist of existent ticket
zt.srcVec.append(t.srcVec);
return zt;
}
I have a feeling the above has quite some typos, but the concept should be right. Any typo found, someone could help correct it for, plsss.
Easiest way is with hash tables, but that doesn't have the best worst-case complexity (O(n2))
Instead:
Create a bunch of nodes containing (src, dst) O(n)
Add the nodes to a list and sort by src O(n log n)
For each (destination) node, search the list for the corresponding (source) node O(n log n)
Find the start node (for instance, using a topological sort, or marking nodes in step 3) O(n)
Overall: O(n log n)
(For both algorithms, we assume the length of the strings is negligible ie. comparison is O(1))
No need for hashes or something alike.
The real input size here is not necessarily the number of tickets (say n), but the total 'size' (say N) of the tickets, the total number of char needed to code them.
If we have a alphabet of k characters (here k is roughly 42) we can use bucketsort techniques to sort an array of n strings of a total size N that are encoded with an alphabet of k characters in O(n + N + k) time. The following works if n <= N (trivial) and k <= N (well N is billions, isn't it)
In the order the tickets are given, extract all airport codes from the tickets and store them in a struct that has the code as a string and the ticket index as a number.
Bucketsort that array of structs according to their code
Run trough that sorted array and assign an ordinal number (starting from 0) to each newly encountered airline code. For all elements with the same code (they are consecutive) go to the ticket (we have stored the number with the code) and change the code (pick the right, src or dst) of the ticket to the ordinal number.
During this run through the array we may identify original source src0.
Now all tickets have src and dst rewritten as ordinal numbers, and the tickets may be interpreted as one list starting in src0.
Do a list ranking (= toplogical sort with keeping track of the distance from src0) on the tickets.
If you assume a joinable list structure that can store everything (probably on disk):
Create 2 empty hash tables S and D
grab the first element
look up its src in D
If found, remove the associated node from D and link it to the current node
If not found, insert the node into S keyed on src
repeat from 3 the other way src<->des, S<->D
repeat from 2 with the next node.
O(n) time. As for space, the birthday paradox (or something much like it) will keep your data set a lot smaller than the full set. In the bad luck case where it still gets to large (worst case is O(n)), you can evict random runs from the hash table and insert them at the end of the processing queue. Your speed could go to pot but as long as you can far excede the threashold for expecting collisions (~O(sqrt(n))) you should expect to see your dataset (the tables and input queue combined) regularly shrink.
It seems to me like a graph-based approach is based here.
Each airport is a node, each ticket is an edge. Let's make every edge undirected for now.
In the first stage you are building the graph: for each ticket, you lookup the source and destination and build an edge between them.
Now that the graph is constructed, we know that it is acyclical and that there is a single path through it. After all, you only have tickets for trips you took, and you never visited the same airport once.
In the second stage, you are searching the graph: pick any node, and initiate a search in both directions until you find you cannot continue. These are your source and destination.
If you need to specifically say which was source and which was destination, add a directory property to each edge (but keep it an undirected graph). Once you have the candidate source and destination, you can tell which is which based on the edge connected to them.
The complexity of this algorithm would depend on the time it takes to lookup a particular node. If you could achieve an O(1), then the time should be linear. You have n tickets, so it takes you O(N) steps to build the graph, and then O(N) to search and O(N) to reconstruct the path. Still O(N). An adjacency matrix will give you that.
If you can't spare the space, you could do a hash for the nodes, which would give you O(1) under optimal hashing and all that crap.
Note that if the task were only to determine the source and destination airports (instead of reconstructing the whole trip), the puzzle would probably become more interesting.
Namely, assuming that airport codes are given as integers, the source and destination airports can be determined using O(1) passes of the data and O(1) additional memory (i.e. without resorting to hashtables, sorting, binary search, and the like).
Of course, once you find the source, it also becomes a trivial matter to index and traverse the full route, but from that point on the whole thing will require at least O(n) additional memory anyway (unless you can sort the data in place, which, by the way, allows to solve the original task in O(n log n) time with O(1) additional memory)
Let's forget the data structures and graphs for a moment.
First I need to point out that everybody made an assumption that there are no loops. If the route goes through one airport twice than it's a much larger problem.
But let's keep the assumption for now.
The input data is in fact an ordered set already. Every ticket is an element of the relation that introduces order to a set of airports. (English is not my mother tongue, so these might not be correct math terms)
Every ticket holds information like this: airportX < airportY, so while doing one pass through the tickets an algorithm can recreate an ordered list starting from just any airport.
Now let's drop the "linear assumption". No order relation can be defined out of that kind of stuff. The input data has to be treated as production rules for a formal grammar, where grammar's vocabulary set is a set of ariport names.
A ticket like that:
src: A
dst: B
is in fact a pair of productions:
A->AB
B->AB
from which you only can keep one.
Now you have to generate every possible sentence, but you can use every production rule once. The longest sentence that uses every its production only once is a correct solution.
Prerequisites
First of all, create some kind of subtrip structure that contains a part of your route.
For example, if your complete trip is a-b-c-d-e-f-g, a subtrip could be b-c-d, i.e. a connected subpath of your trip.
Now, create two hashtables that map a city to the subtrip structure the city is contained in. Thereby, one Hashtable stands for the city a subtrip is starting with, the other stands for the cities a subtrip is ending with. That means, one city can occur at most once in one of the hashtables.
As we will see later, not every city needs to be stored, but only the beginning and the end of each subtrip.
Constructing subtrips
Now, take the tickets just one after another. We assume the ticket to go from x to y (represented by (x,y)). Check, wheter x is the end of some subtrip s(since every city is visited only once, it can not be the end of another subtrip already). If x is the beginning, just add the current ticket (x,y) at the end of the subtrip s. If there is no subtrip ending with x, check whether there is a subtrip t beginning with y. If so, add (x,y) at the beginning of t. If there's also no such subtrip t, just create a new subtrip containing just (x,y).
Dealing with subtrips should be done using some special "tricks".
Creating a new subtrip s containing (x,y) should add x to the hashtable for "subtrip beginning cities" and add y to the hashtable for "subtrip ending cities".
Adding a new ticket (x,y) at the beginning of the subtrip s=(y,...), should remove y from the hashtable of beginning cities and instead add x to the hashtable of beginning cities.
Adding a new ticket (x,y) at the end of the subtrip s=(...,x), should remove x from the hashtable of ending cities and instead add y to the hashtable of ending cities.
With this structure, subtrips corresponding to a city can be done in amortized O(1).
After this is done for all tickets, we have some subtrips. Note the fact that we have at most (n-1)/2 = O(n) such subtrips after the procedure.
Concatenating subtrips
Now, we just consider the subtrips one after another. If we have a subtrip s=(x,...,y), we just look in our hashtable of ending cities, if there's a subtrip t=(...,x) ending with x. If so, we concatenate t and s to a new subtrip. If not, we know, that s is our first subtrip; then, we look, if there's another subtrip u=(y,...) beginning with y. If so, we concatenate s and u. We do this until just one subtrip is left (this subtrip is then our whole original trip).
I hope I didnt overlook somtehing, but this algorithm should run in:
constructing all subtrips (at most O(n)) can be done in O(n), if we implement adding tickets to a subtrip in O(1). This should be no problem, if we have some nice pointer structure or something like that (implementing subtrips as linked lists). Also changing two values in the hashtable is (amortized) O(1). Thus, this phase consumes O(n) time.
concatenating the subtrips until just one is left can also be done in O(n). Too see this, we just need to look at what is done in the second phase: Hashtable lookups, that need amortized O(1) and subtrip concatenation that can be done in O(1) with pointer concatenation or something.
Thus, the whole algorithm takes time O(n), which might be the optimal O-bound, since at least every ticket might need to be looked at.
I have written a small python program, uses two hash tables one for count and another for src to dst mapping.
The complexity depends on the implementation of the dictionary. if dictionary has O(1) then complexity is O(n) , if dictionary has O( lg(n) ) like in STL map, then complexity is O( n lg(n) )
import random
# actual journey: a-> b -> c -> ... g -> h
journey = [('a','b'), ('b','c'), ('c','d'), ('d','e'), ('e','f'), ('f','g'), ('g','h')]
#shuffle the journey.
random.shuffle(journey)
print("shffled journey : ", journey )
# Hashmap to get the count of each place
map_count = {}
# Hashmap to find the route, contains src to dst mapping
map_route = {}
# fill the hashtable
for j in journey:
source = j[0]; dest = j[1]
map_route[source] = dest
i = map_count.get(source, 0)
map_count[ source ] = i+1
i = map_count.get(dest, 0)
map_count[ dest ] = i+1
start = ''
# find the start point: the map entry with count = 1 and
# key exists in map_route.
for (key,val) in map_count.items():
if map_count[key] == 1 and map_route.has_key(key):
start = key
break
print("journey started at : %s" % start)
route = [] # the route
n = len(journey) # number of cities.
while n:
route.append( (start, map_route[start]) )
start = map_route[start]
n -= 1
print(" Route : " , route )
I provide here a more general solution to the problem:
You can stop several times in the same airport, but you have to use every ticket exactly 1 time
You can have more than 1 ticket for each part of your trip.
Each ticket contains src and dst airport.
All the tickets you have are randomly sorted.
You forgot the original departure airport (very first src) and your destination (last dst).
My method returns list of cities (vector) that contain all specified cities, if such chain exists, and empty list otherwise. When there are several ways to travel the cities, the method returns lexicographically smallest list.
#include<vector>
#include<string>
#include<unordered_map>
#include<unordered_set>
#include<set>
#include<map>
using namespace std;
struct StringPairHash
{
size_t operator()(const pair<string, string> &p) const {
return hash<string>()(p.first) ^ hash<string>()(p.second);
}
};
void calcItineraryRec(const multimap<string, string> &cities, string start,
vector<string> &itinerary, vector<string> &res,
unordered_set<pair<string, string>, StringPairHash> &visited, bool &found)
{
if (visited.size() == cities.size()) {
found = true;
res = itinerary;
return;
}
if (!found) {
auto pos = cities.equal_range(start);
for (auto p = pos.first; p != pos.second; ++p) {
if (visited.find({ *p }) == visited.end()) {
visited.insert({ *p });
itinerary.push_back(p->second);
calcItineraryRec(cities, p->second, itinerary, res, visited, found);
itinerary.pop_back();
visited.erase({ *p });
}
}
}
}
vector<string> calcItinerary(vector<pair<string, string>> &citiesPairs)
{
if (citiesPairs.size() < 1)
return {};
multimap<string, string> cities;
set<string> uniqueCities;
for (auto entry : citiesPairs) {
cities.insert({ entry });
uniqueCities.insert(entry.first);
uniqueCities.insert(entry.second);
}
for (const auto &startCity : uniqueCities) {
vector<string> itinerary;
itinerary.push_back(startCity);
unordered_set<pair<string, string>, StringPairHash> visited;
bool found = false;
vector<string> res;
calcItineraryRec(cities, startCity, itinerary, res, visited, found);
if (res.size() - 1 == cities.size())
return res;
}
return {};
}
Here is an example of usage:
int main()
{
vector<pair<string, string>> cities = { {"Y", "Z"}, {"W", "X"}, {"X", "Y"}, {"Y", "W"}, {"W", "Y"}};
vector<string> itinerary = calcItinerary(cities); // { "W", "X", "Y", "W", "Y", "Z" }
// another route is possible {W Y W X Y Z}, but the route above is lexicographically smaller.
cities = { {"Y", "Z"}, {"W", "X"}, {"X", "Y"}, {"W", "Y"} };
itinerary = calcItinerary(cities); // empty, no way to travel all cities using each ticket exactly one time
}

Resources