If a regular language only contains Kleene star, then is it possible that it comes from the concatenation of two non-regular languages? - concatenation

I want to know that given a regular language L that only contains Kleene star operator (e.g (ab)*), is it possible that L can be generated by the concatenation of two non-regular languages? I try to prove that L can be only generated by the concatenation of two regular languages.
Thanks.

This statement is false. Consider these two languages over Σ = {a}:
L1 = { an | n is a power of two } ∪ { ε }
L2 = { an | n is not a power of two } ∪ { ε }
Neither of these languages are regular (the first one can be proven to be nonregular by using the Myhill-Nerode theorem, and the second is closely related to the complement of L1 and can also be proven to be nonregular.
However, I'm going to claim that L1L2 = a*. First, note that any string in the concatenation L1L2 has the form an and therefore is an element of a*. Next, take any string in a*; let it be an. If n is a power of two, then it can be formed as the concatenation of an from L1 and ε from L2. Otherwise, n isn't a power of two, and it can be formed as the concatenation of ε from L1 and an from L2. Therefore, L1L2 = a*, so the theorem you're trying to prove is false.
Hope this helps!

Related

Haskell: List v. Array, difference in performance

Another question from a Haskell n00b.
I'm comparing the efficiency of various methods used to solve Problem #14 on the Project Euler website. In particular, I'm hoping to better understand the factors driving the difference in evaluation time for four (slightly) different approaches to solving the problem.
(Descriptions of problem #14 and the various approaches are below.)
First, a quick overview of Problem #14. It has to do with "Collatz numbers" (i.e., same programming exercise as my previous post which explored a different aspect of Haskell). A Collatz number for a given integer is equal to the length of the Collatz sequence for that integer. A Collatz sequence for an integer is calculated as follows: the first number ("n0") in the sequence is that integer itself; if n0 is even, the next number in the sequence ("n1") is equal to n / 2; if n0 is odd, then n1 is equal to 3 * n0 + 1. We continue recursively extending the sequence until we arrive at 1, at which point the sequence is finished. For example, the collatz sequence for 5 is: {5, 16, 8, 4, 2, 1} (because 16 = 3 * 5 + 1, 8 = 16 / 2, 4 = 8 / 2,...).
Problem 14 asks us to find the integer below 1,000,000 which has the largest Collatz number. To that effect, we can consider a function "collatz" which, when passed an integer "n" as an argument, returns the integer below n with the largest Collatz number. In other words, p 1000000 gives us the answer to Problem #14.
For the purposes of this exercise (i.e., understanding differences in evaluation time), we can consider Haskell versions of 'collatz' which vary across two dimensions:
(1) Implementation: Do we store the dataset of Collatz numbers (which will be generated for all integers 1..n) as a list or an array? I call this the "implementation" dimension, i.e., a function's implementation is either "list" or "array".
(2) Algorithm: do we calculate the Collatz number for any given integer n by extending out the Collatz sequence until it is complete (i.e., until we reach 1)? Or do we only extend out the sequence until we reach a number k which is smaller than n (at which point we can simply use the Collatz number of k, which we've already calculated)? I call this the "algorithm" dimension, i.e., a function's algorithm is either "complete" (calculation of Collatz number for each integer) or "partial". The latter obviously requires fewer operations.
Below are the four possible versions of the "collatz" function: array / partial, list / partial, array / complete and list / complete:
import Data.Array ( (!) , listArray , assocs )
import Data.Ord ( comparing )
import Data.List ( maximumBy )
--array implementation; partial algorithm (FEWEST OPERATIONS)
collatzAP x = maximumBy (comparing snd) $ assocs a where
a = listArray (0,x) (0:1:[c n n | n <- [2..x]])
c n i = let z = if even i then div i 2 else 3*i+1
in if i < n then a ! i else 1 + c n z
--list implementation; partial algorithm
collatzLP x = maximum a where
a = zip (0:1:[c n n | n <- [2..x]]) [0..x]
c n i = let z = if even i then div i 2 else 3*i+1
in if i < n then fst (a!!i) else 1 + c n z
--array implementation, complete algorithm
collatzAC x = maximumBy (comparing snd) $ assocs a where
a = listArray (0,x) (0:1:[c n n | n <- [2..x]])
c n i = let z = if even i then div i 2 else 3*i+1
in if i == 1 then 1 else 1 + c n z
--list implementation, complete algorithm (MOST OPERATIONS)
collatzLC x = maximum a where
a = zip (0:1:[c n n | n <- [2..x]]) [0..x]
c n i = let z = if even i then div i 2 else 3*i+1
in if i == 1 then 1 else 1 + c n z
Regarding speed of evaluation: I know that arrays are far faster to access than lists (i.e., O(1) vs. O(n) access time for a given index n) so I expected the 'array' implementation of "collatz" to be faster than the 'list' implementation, ceteris paribus. Also, I expected the 'partial' algorithm to be faster than the 'complete' algorithm (ceteris paribus), given it needs to perform fewer operations in order to construct the dataset of Collatz numbers.
Testing our four functions across inputs of varying size, we observe the following evaluation times (comments below):
It's indeed the case that the 'array/partial' version is the fastest version of "collatz" (by a good margin). However, I find it a bit counter-intuitive that 'list/complete' isn't the slowest version. That honor goes to 'list/partial', which is more than 20x slower than 'list/complete'!
My question: Is the difference in evaluation time between 'list/partial' and 'list/complete' (as compared to that between 'array/partial' and 'array/complete') entirely due to the difference in access efficiency between lists and arrays in Haskell? Or am I not performing a "controlled experiment" (i.e., are there other factors at play)?
I do not understand how the question about relative performance of two algorithms that work with lists are related to arrays at all...but here is my take:
Try to avoid indexing lists, especially long lists, if performance is of any concern. Indexing is really a traversal (as you know). "List/partial" is indexing/traversing a lot. List/complete is not. Hence the difference between Array/complete and List/complete is negligible, and the different between "list/partial" and the rest is huge.

Largest triangle in convex hull

The question has already been answered, but the main problem I am facing is in understanding one of the answers..
From
https://stackoverflow.com/a/1621913/2673063
How is the following algorithm O(n) ?
It states as
By first sorting the points / computing the convex hull (in O(n log n) time) if necessary, we can assume we have the convex polygon/hull with the points cyclically sorted in the order they appear in the polygon. Call the points 1, 2, 3, … , n. Let (variable) points A, B, and C, start as 1, 2, and 3 respectively (in the cyclic order). We will move A, B, C until ABC is the maximum-area triangle. (The idea is similar to the rotating calipers method, as used when computing the diameter (farthest pair).)
With A and B fixed, advance C (e.g. initially, with A=1, B=2, C is advanced through C=3, C=4, …) as long as the area of the triangle increases, i.e., as long as Area(A,B,C) ≤ Area(A,B,C+1). This point C will be the one that maximizes Area(ABC) for those fixed A and B. (In other words, the function Area(ABC) is unimodal as a function of C.)
Next, advance B (without changing A and C) if that increases the area. If so, again advance C as above. Then advance B again if possible, etc. This will give the maximum area triangle with A as one of the vertices. (The part up to here should be easy to prove, and simply doing this separately for each A would give O(n2). But read on.) Now advance A again, if it improves the area, etc.
Although this has three "nested" loops, note that B and C always advance "forward", and they advance at most 2n times in total (similarly A advances at most n times), so the whole thing runs in O(n) time.
As the author of the answer that is the subject of the question, I feel obliged to give a more detailed explanation of the O(n) runtime.
Firstly, just as an example, here is a figure from the paper, showing the first few steps of the algorithm, for a particular sample input (a 12-gon). First we start with A, B, C as three consecutive vertices (step 1 in the figure), advance C as long as area increases (steps 2 to 6), then advance B, and so on.
The triangles with asterisks above them are the "anchored local maxima", i.e., the ones that are best for a given A (i.e., advancing either C or B would decrease the area).
As far as the runtime being O(n): Let the "actual" value of B, in terms of the number of times it's been incremented and ignoring the wrap around, be nB, and similarly for C be nC. (In other words, B = nB % n and C = nC % n.) Now, note that,
("B is ahead of A") whatever the value of A, we have A ≤ nB < A + n
nB is always increasing
So, as A varies from 0 to n, we know that nB only varies between 0 and 2n: it can be incremented at most 2n times. Similarly nC. This shows that the running time of the algorithm, which is proportional to the total number of times A, B and C are incremented, is bounded by O(n) + O(2n) + O(2n), which is O(n).
Think about it like this: each of A, B, C are pointers that, at any given moment, point towards one of the elements of the convex hull. Due to the way the algorithm increments them, each one of them will point to each element of the convex hull at most once. Therefore, each one will iterate over a collection of O(n) elements. They will never be reset, once one of them has passed an element, it will not pass that element ever again.
Since there are 3 pointers (A, B, C), we have time complexity 3 * O(n) = O(n).
Edit:
As the code is presented in the provided link, it sounds possible that it is not O(n), since B and C wrap around the array. However, according to the description, this wrapping around does not sound necessary: before seeing the code, I imagined the method stopping the advancement of B and C past n. In that case, it would definitely be O(n). As the code is presented however, I'm not sure.
It might still be that, for some mathematical reason, B and C still iterate only O(n) times in the entirety of the algorithm, but I can't prove that. Neither can I prove that it is correct to not wrap around (as long as you take care of index out of bounds errors).

Determining k of LR(k) from this example?

I have prepared the following grammar that generates a subset of C logical and integer arithmetic expressions:
Expression:
LogicalOrExpression
LogicalOrExpression ? Expression : LogicalOrExpression
LogicalOrExpression:
LogicalAndExpression
LogicalOrExpression || LogicalAndExpression
LogicalAndExpression:
EqualityExpression
LogicalAndExpression && RelationalExpression
EqualityExpression:
RelationalExpression
EqualityExpression EqualityOperator RelationalExpression
EqualityOperator:
==
!=
RelationalExpression:
AdditiveExpression
RelationalExpression RelationalOperator AdditiveExpression
RelationalOperator:
<
>
<=
>=
AdditiveExpression:
MultiplicativeExpression
AdditiveExpression AdditiveOperator MultiplicativeExpression
AdditiveOperator:
+
-
MultiplicativeExpression:
UnaryExpression
MultiplicativeExpression MultiplicativeOperator UnaryExpression
MultiplicativeOperator:
*
/
%
UnaryExpression:
PrimaryExpression
UnaryOperator UnaryExpression
UnaryOperator:
+
-
!
PrimaryExpression:
BoolLiteral // TERMINAL
IntegerLiteral // TERMINAL
Identifier // TERMINAL
( Expression )
I want to try using shift/reduce parsing and so would like to know what is the smallest k (if any) for which this grammar is LR(k)? (and more generally how to determine the k from an arbitrary grammar if possible?)
The sample grammar is (almost) an operator precedence grammar, or Floyd grammar (FG). To make it an FG, you'd have to macro-expand the non-terminals whose right-hand sides consist of only a single terminal, because operator precedence grammars must be operator grammars, and an operator grammar has the feature that no right-hand side has two consecutive non-terminals.
All operator-precedence grammars are LR(1). It's also trivial to show whether or not an operator grammar has the precedence property, and particularly trivial in the case that every terminal appears in precisely one right-hand side, as in your grammar. An operator grammar in which every terminal appears in precisely one right-hand side is always an operator-precedence grammar [1] and consequently always LR(1).
FGs are a large class of grammars, some of them even useful (Algol 60, for example, was described by an FG) for which it is easy to answer the question about them being LR(k) for some k, since the answer is always "yes, with K == 1". Just for precision, here are the properties. We use the normal convention where a grammar G is a 4-tuple (N, Σ, P, S) where N is a set of non-terminals; Σ is a set of terminals, P is a set of productions, and S is the start symbol. We write V for N &Union; Σ. In any grammar, we have:
N &Intersection; Σ &equals; ∅
S &in; N
P &subset; V&plus; × V*
The "context-free" requirement restricts P so every left-hand-side is a single non-terminal:
P &subset; Σ × V*
In an operator grammar, P is further restricted: no right-hand side is either empty, and no right-hand side has two consecutive non-terminals:
P &subset; Σ × (V+ − V*ΣΣV*)
In an operator precedence grammar, we define three precedence relations, ⋖, ⋗ and ≐. These are defined in terms of the relations Leads and Trails [2], where `
T Leads V iff T is the first terminal in some string derived from V
T Trails V iff T is the last terminal in some string derived from V
Then:
t1 ⋖ t2 iff ∃v &bepsi; t2 Leads v ∧ N&rightarrow;V*t1vV* &in; P
t1 ⋗ t2 iff ∃v &bepsi; t1 Trails v ∧ N&rightarrow;V*vt2V* &in; P
t1 &esdot; t2 iff N&rightarrow;V*t1t2V* &in; P ∨ N&rightarrow;V*t1V't2V* &in; P
An intuitive way of thinking about those relations is this: Normally when we do the derivations, we just substitute RHS for LHS, but suppose we substitute ⋖ RHS ⋗ instead. Then we can modify a derivation by dropping the non-terminals and collapsing strings of consecutive ⋖ and ⋗ to single symbols, and finally adding &esdot; between any two consecutive terminals which have no intervening operator. From that, we just read off the relations.
Now, we can perform that computation on any operator grammar, but there is nothing which forces the above relations to be exclusive. An operator grammar is a Floyd grammar precisely if those three relations are mutually exclusive.
Verifying that an operator grammar has mutually exclusive precedence relations is straight-forward; Leads and Trails require a transitive closure over First and Last, which is roughly O(|G|2) (it's actually the product of the number of non-terminals and the number of productions); from there, the precedence relations can be computed with a single linear scan over all productions in the grammar, which is O(|G|).
From Donald Knuths On the Translation of Languages from Left to Right, in the abstract,
It is shown that the problem of whether or not a grammar is LR(k) for some k is undecidable,
In otherwords,
Given a grammar G, "∃k. G ∊ LR(k)" is undecidable.
Therefore, the best we can do in general is try constructing a parser for LR(0), then LR(1), LR(2), etc. At some point you will succeed, or you may at some point give up when k becomes large.
This specific grammar
In this specific case, I happen to know that the grammar you give is LALR(1), which means it must therefore be LR(1). I know this because I have written LALR parsers for similar languages. It can't be LR(0) for obvious reasons (the grammar {A -> x, A -> A + x} is not LR(0)).

How to define Xor in Coq and prove its properties

This should be an easy question. I'm new with Coq.
I want to define the exclusive or in Coq (which to the best of my knowledge is not predefined). The important part is to allow for multiple propositions (e.g. Xor A B C D).
I also need the two properties:
(Xor A1 A2 ... An)/\~A1 -> Xor A2... An
(Xor A1 A2 ... An)/\A1 -> ~A2/\.../\~An
I'm currently having trouble defining the function for an undefined number of variables. I tried to define it by hand for two, three, four and five variables (that's how many I need). But then proving the properties is a pain and seems very inefficient.
Given your second property, I assume that your definition of exclusive or at higher arities is “exactly one of these propositions is true” (and not “an odd number of these propositions is true” or “at least one of these propositions is true and at least one is false”, which are other possible generalizations).
This exclusive or is not an associative property. This means you can't just define higher-arity xor as xor(A1,…,An)=xor(A1,xor(A2,…)). You need a global definition, and this means that the type constructor must take a list of arguments (or some other data structure, but a list is the most obvious choice).
Inductive xor : list Prop -> Prop := …
You now have two reasonable choices: build your definition of xor inductively from first principles, or invoke a list predicate. The list predicate would be “there is a unique element in the list matching this predicate”. Since the standard list library does not define this predicate, and defining it is slightly harder than defining xor, we'll build xor inductively.
The argument is a list, so let's break down the cases:
xor of an empty list is always false;
xor of the list (cons A L) is true iff either of these two conditions is met:
A is true and none of the elements of L are true;
A is false and exactly one of the elements of L is true.
This means we need to define an auxiliary predicate on lists of propositions, nand, characterizing the lists of false propositions. There are many possibilities here: fold the /\ operator, induct by hand, or call a list predicate (again, not in the standard list library). I'll induct by hand, but folding /\ is another reasonable choice.
Require Import List.
Inductive nand : list Prop -> Prop :=
| nand_nil : nand nil
| nand_cons : forall (A:Prop) L, ~A -> nand L -> nand (A::L).
Inductive xor : list Prop -> Prop :=
| xor_t : forall (A:Prop) L, A -> nand L -> xor (A::L)
| xor_f : forall (A:Prop) L, ~A -> xor L -> xor (A::L).
Hint Constructors nand xor.
The properties you want to prove are simple corollaries of inversion properties: given a constructed type, break down the possibilities (if you have a xor, it's either a xor_t or a xor_f). Here's a manual proof of the first; the second is very similar.
Lemma xor_tail : forall A L, xor (A::L) -> ~A -> xor L.
Proof.
intros. inversion_clear H.
contradiction.
assumption.
Qed.
Another set of properties you're likely to want is the equivalences between nand and the built-in conjunction. As an example, here's a proof that nand (A::nil) is equivalent to ~A. Proving that nand (A::B::nil) is equivalent to ~A/\~B and so on are merely more of the same. In the forward direction, this is once more an inversion property (analyse the possible constructors of the nand type). In the backward direction, this is a simple application of the constructors.
Lemma nand1 : forall A, nand (A::nil) <-> ~A.
Proof.
split; intros.
inversion_clear H. assumption.
constructor. assumption. constructor.
Qed.
You're also likely to need substitution and rearrangement properties at some point. Here are a few key lemmas that you may want to prove (these shouldn't be very difficult, just induct on the right stuff):
forall A1 B2 L, (A1<->A2) -> (xor (A1::L) <-> xor (A2::L))
forall K L1 L2, (xor L1 <-> xor L2) -> (xor (K++L1) <-> xor (K++L2))
forall K A B L, xor (K++A::B::L) <-> xor (K::B::A::L)
forall K L M N, xor (K++L++M++N) <-> xor (K++M++L++N)
Well, I suggest you start with Xor for 2 arguments and prove its properties.
Then if you want to generalize it you can define Xor taking a list of arguments -- you should
be able to define it and prove its properties using your 2-argument Xor.
I could give some more details but I think it's more fun to do it on your own, let me know how it goes :).

find if two arrays contain the same set of integers without extra space and faster than NlogN

I came across this post, which reports the following interview question:
Given two arrays of numbers, find if each of the two arrays have the
same set of integers ? Suggest an algo which can run faster than NlogN
without extra space?
The best that I can think of is the following:
(a) sort each array, and then (b) have two pointers moving along the two arrays and check if you find different values ... but step (a) has already NlogN complexity :(
(a) scan shortest array and put values into a map, and then (b) scan second array and check if you find a value that is not in the map ... here we have linear complexity, but we I use extra space
... so, I can't think of a solution for this question.
Ideas?
Thank you for all the answers. I feel many of them are right, but I decided to choose ruslik's one, because it gives an interesting option that I did not think about.
You can try a probabilistic approach by choosing a commutative function for accumulation (eg, addition or XOR) and a parametrized hash function.
unsigned addition(unsigned a, unsigned b);
unsigned hash(int n, int h_type);
unsigned hash_set(int* a, int num, int h_type){
unsigned rez = 0;
for (int i = 0; i < num; i++)
rez = addition(rez, hash(a[i], h_type));
return rez;
};
In this way the number of tries before you decide that the probability of false positive will be below a certain treshold will not depend on the number of elements, so it will be linear.
EDIT: In general case the probability of sets being the same is very small, so this O(n) check with several hash functions can be used for prefiltering: to decide as fast as possible if they are surely different or if there is a probability of them being equivalent, and if a slow deterministic method should be used. The final average complexity will be O(n), but worst case scenario will have the complexity of the determenistic method.
You said "without extra space" in the question but I assume that you actually mean "with O(1) extra space".
Suppose that all the integers in the arrays are less than k. Then you can use in-place radix sort to sort each array in time O(n log k) with O(log k) extra space (for the stack, as pointed out by yi_H in comments), and compare the sorted arrays in time O(n log k). If k does not vary with n, then you're done.
I'll assume that the integers in question are of fixed size (eg. 32 bit).
Then, radix-quicksorting both arrays in place (aka "binary quicksort") is constant space and O(n).
In case of unbounded integers, I believe (but cannot proof, even if it is probably doable) that you cannot break the O(n k) barrier, where k is the number of digits of the greatest integer in either array.
Whether this is better than O(n log n) depends on how k is assumed to scale with n, and therefore depends on what the interviewer expects of you.
A special, not harder case is when one array holds 1,2,..,n. This was discussed many times:
How to tell if an array is a permutation in O(n)?
Algorithm to determine if array contains n...n+m?
mathoverflow
and despite many tries no deterministic solutions using O(1) space and O(n) time were shown. Either you can cheat the requirements in some way (reuse input space, assume integers are bounded) or use probabilistic test.
Probably this is an open problem.
Here is a co-rp algorithm:
In linear time, iterate over the first array (A), building the polynomial
Pa = A[0] - x)(A[1] -x)...(A[n-1] - x). Do the same for array B, naming this polynomial Pb.
We now want to answer the question "is Pa = Pb?" We can check this probabilistically as follows. Select a number r uniformly at random from the range [0...4n] and compute d = Pa(r) - Pb(r) in linear time. If d = 0, return true; otherwise return false.
Why is this valid? First of all, observe that if the two arrays contain the same elements, then Pa = Pb, so Pa(r) = Pb(r) for all r. With this in mind, we can easily see that this algorithm will never erroneously reject two identical arrays.
Now we must consider the case where the arrays are not identical. By the Schwart-Zippel Lemma, P(Pa(r) - Pb(r) = 0 | Pa != Pb) < (n/4n). So the probability that we accept the two arrays as equivalent when they are not is < (1/4).
The usual assumption for these kinds of problems is Theta(log n)-bit words, because that's the minimum needed to index the input.
sshannin's polynomial-evaluation answer works fine over finite fields, which sidesteps the difficulties with limited-precision registers. All we need are a prime of the appropriate (easy to find under the same assumptions that support a lot of public-key crypto) or an irreducible polynomial in (Z/2)[x] of the appropriate degree (difficulty here is multiplying polynomials quickly, but I think the algorithm would be o(n log n)).
If we can modify the input with the restriction that it must maintain the same set, then it's not too hard to find space for radix sort. Select the (n/log n)th element from each array and partition both arrays. Sort the size-(n/log n) pieces and compare them. Now use radix sort on the size-(n - n/log n) pieces. From the previously processed elements, we can obtain n/log n bits, where bit i is on if a[2*i] > a[2*i + 1] and off if a[2*i] < a[2*i + 1]. This is sufficient to support a radix sort with n/(log n)^2 buckets.
In the algebraic decision tree model, there are known Omega(NlogN) lower bounds for computing set intersection (irrespective of the space limits).
For instance, see here: http://compgeom.cs.uiuc.edu/~jeffe/teaching/497/06-algebraic-tree.pdf
So unless you do clever bit manipulations/hashing type approaches, you cannot do better than NlogN.
For instance, if you used only comparisons, you cannot do better than NlogN.
You can break the O(n*log(n)) barrier if you have some restrictions on the range of numbers. But it's not possible to do this if you cannot use any extra memory (you need really silly restrictions to be able to do that).
I would also like to note that even O(nlog(n)) with sorting is not trivial if you have O(1) space limit as merge sort uses O(n) space and quicksort (which is not even strict o(nlog(n)) needs O(log(n)) space for the stack. You have to use heapsort or smoothsort.
Some companies like to ask questions which cannot be solved and I think it is a good practice, as a programmer you have to know both what's possible and how to code it and also know what are the limits so you don't waste your time on something that's not doable.
Check this question for a couple of good techniques to use:
Algorithm to tell if two arrays have identical members
For each integer i check that the number of occurrences of i in the two arrays are either both zero or both nonzero, by iterating over the arrays.
Since the number of integers is constant the total runtime is O(n).
No, I wouldn't do this in practice.
Was just thinking if there was a way you could hash the cumulative of both arrays and compare them, assuming the hashing function doesn't produce collisions from two differing patterns.
why not i find the sum , product , xor of all the elements one array and compare them with the corresponding value of the elements of the other array ??
the xor of elements of both arrays may give zero if the it is like
2,2,3,3
1,1,2,2
but what if you compare the xor of the elements of two array to be equal ???
consider this
10,3
12,5
here xor of both arrays will be same !!! (10^3)=(12^5)=9
but their sum and product are different . I think two different set of elements cannot have same sum ,product and xor !
This can be analysed by simple bitvalue examination.
Is there anything wrong in this approach ??
I'm not sure that correctly understood the problem, but if you are interested in integers that are in both array:
If N >>>>> 2^SizeOf(int) (count of bit for integer (16, 32, 64)) there is one solution:
a = Array(N); //length(a) = N;
b = Array(M); //length(b) = M;
//x86-64. Integer consist of 64 bits.
for i := 0 to 2^64 / 64 - 1 do //very big, but CONST
for k := 0 to M - 1 do
if a[i] = b[l] then doSomething; //detected
for i := 2^64 / 64 to N - 1 do
if not isSetBit(a[i div 64], i mod 64) then
setBit(a[i div 64], i mod 64);
for i := 0 to M - 1 do
if isSetBit(a[b[i] div 64], b[i] mod 64) then doSomething; //detected
O(N), with out aditional structures
All I know is that comparison based sorting cannot possibly be faster than O(NlogN), so we can eliminate most of the "common" comparison based sorts. I was thinking of doing a bucket sort. Perhaps if this qn was asked in an interview, the best response would first be to clarify what sort of data those integers represent. For e.g., if they represent a persons age, then we know that the range of values of int is limited, and can use bucket sort at O(n). However, this will not be in place....
If the arrays have the same size, and there are guaranteed to be no duplicates, sum each of the arrays. If the sum of the values is different, then they contain different integers.
Edit: You can then sum the log of the entries in the arrays. If that is also the same, then you have the same entries in the array.

Resources