DFA for all words that contain an even number of 0's and whose number of 1's is divisible by 3 - dfa

I am really trying to solve this one but it seems quite imppossbile for me. Can someone show me ?

So here's an idea:
States E2, E1 and E0 indicate an even number of 0s and the remainder of the division of the number of 1s with 3 e.g. E2 means even 0s and (number of 1s)mod3=2
If you get a 0 you jump from E to the corresponding O and vice versa.
If you get a 1 you go to the next remainder starting from 0 - 2 - 1 and cycle.
The accepting state is even 0s and (number of 1s)mod 3 = 0

Related

Change the minimum number of entries in an array so that the sum of any k consecutive items is even

We are given an array of integers. We have to change the minimum number of those integers however we'd like so that, for some fixed parameter k, the sum of any k consecutive items in the array is even.
Example:
N = 8; K = 3;
A = {1,2,3,4,5,6,7,8}
We can change 3 elements (4th,5th,6th)
so the array can be {1,2,3,5,6,7,7,8}
then
1+2+3=6 is even
2+3+5=10 is even
3+5+6=14 is even
5+6+7=18 is even
6+7+7=20 is even
7+7+8=22 is even
There's a very nice O(n)-time solution to this problem that, at a high level, works like this:
Recognize that determining which items to flip boils down to determining a pattern that repeats across the array of which items to flip.
Use dynamic programming to determine what that pattern is.
Here's how to arrive at this solution.
First, some observations. Since all we care about here is whether the sums are even or odd, we actually don't care about the numbers' exact values. We just care about whether they're even or odd. So let's begin by replacing each number with either 0 (if the number is even) or 1 (if it's odd). Now, our task is to make each window of k elements have an even number of 1s.
Second, the pattern of 0s and 1s that results after you've transformed the array has a surprising shape: it's simply a repeated copy of the first k elements of the array. For example, suppose k = 5 and we decide that the array should start off as 1 0 1 1 1. What must the sixth array element be? Well, in moving from the first window to the second, we dropped a 1 off the front of the window, changing the parity to odd. We therefore have to have the next array element be a 1, which means that the sixth array element must be a 1, equal to the first array element. The seventh array element then has to be a 0, since in moving from the second window to the third we drop off a zero. This process means that whatever we decide on for the first k elements turns out to determine the entire final sequence of values.
This means that we can reframe the problem in the following way: break the original input array of n items into n/k blocks of size k. We're now asked to pick a sequence of 0s and 1s such that
this sequence differs in as few places as possible from the n/k blocks of k items each, and
the sequence has an even number of 1s.
For example, given the input sequence
0 1 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1
and k = 3, we would form the blocks
0 1 1, 0 1 1, 1 0 0, 1 0 1, 1 1 0, 1 1 1
and then try to find a pattern of length three with an even number of 1s in it such that replacing each block with that pattern requires the fewest number of edits.
Let's see how to take that problem on. Let's work one bit at a time. For example, we can ask: what's the cost of making the first bit a 0? What's the cost of making the first bit a 1? The cost of making the first bit a 0 is equal to the number of blocks that have a 1 at the front, and the cost of making the first bit a 1 is equal to the number of blocks that have a 0 at the front. We can work out the cost of setting each bit, individually, to either to zero or to one. That gives us a matrix like this one:
Bit #0 Bit #1 Bit #2 Bit #3 ... Bit #k-1
---------------------+--------+--------+--------+--------+--------+----------
Cost of setting to 0 | | | | | | |
Cost of setting to 1 | | | | | | |
We now need to choose a value for each column with the goal of minimizing the total cost picked, subject to the constraint that we pick an even number of bits to be equal to 1. And this is a nice dynamic programming exercise. We consider subproblems of the form
What is the lowest cost you can make out of the first m columns from the table, provided your choice has parity p of items chosen from the bottom row?
We can store this in an (k + 1) × 2 table T[m][p], where, for example, T[3][even] is the lowest cost you can achieve using the first three columns with an even number of items set to 1, and T[6][odd] is the lowest cost you can achieve using the first six columns with an odd number of items set to 1. This gives the following recurrence:
T[0][even] = 0 (using zero columns costs nothing)
T[0][odd] = ∞ (you cannot have an odd number of bits set to 1 if you use no colums)
T[m+1][p] = min(T[m][p] + cost of setting this bit to 0, T[m][!p] + cost of setting this bit to 1) (either use a zero and keep the same parity, or use a 1 and flip the parity).
This can be evaluated in time O(k), and the resulting minimum cost is given by T[n][even]. You can use a standard DP table walk to reconstruct the optimal solution from this point.
Overall, here's the final algorithm:
create a table costs[k+1][2], all initially zero.
/* Populate the costs table. costs[m][0] is the cost of setting bit m
* to 0; costs[m][1] is the cost of setting bit m to 1. We work this
* out by breaking the input into blocks of size k, then seeing, for
* each item within each block, what its parity is. The cost of setting
* that bit to the other parity then increases by one.
*/
for i = 0 to n - 1:
parity = array[i] % 2
costs[i % k][!parity]++ // Cost of changing this entry
/* Do the DP algorithm to find the minimum cost. */
create array T[k + 1][2]
T[0][0] = 0
T[0][1] = infinity
for m from 1 to k:
for p from 0 to 1:
T[m][p] = min(T[m - 1][p] + costs[m - 1][0],
T[m - 1][!p] + costs[m - 1][1])
return T[m][0]
Overall, we do O(n) work with our initial pass to work out the costs of setting each bit, independently, to 0. We then do O(k) work with the DP step at the end. The overall work is therefore O(n + k), and assuming k ≤ n (otherwise the problem is trivial) the cost is O(n).

Determine the adjacency of two fibonacci number

I have many fibonacci numbers, if I want to determine whether two fibonacci number are adjacent or not, one basic approach is as follows:
Get the index of the first fibonacci number, say i1
Get the index of the second fibonacci number, say i2
Get the absolute value of i1-i2, that is |i1-i2|
If the value is 1, then return true.
else return false.
In the first step and the second step, it may need many comparisons to get the correct index by using accessing an array.
In the third step, it need one subtraction and one absolute operation.
I want to know whether there exists another approach to quickly to determine the adjacency of the fibonacci numbers.
I don't care whether this question could be solved mathematically or by any hacking techniques.
If anyone have some idea, please let me know. Thanks a lot!
No need to find the index of both number.
Given that the two number belongs to Fibonacci series, if their difference is greater than the min. number among them then those two are not adjacent. Other wise they are.
Because Fibonacci series follows following rule:
F(n) = F(n-1) + F(n-2) where F(n)>F(n-1)>F(n-2).
So F(n) - F(n-1) = F(n-2) ,
=> Diff(n,n-1) < F(n-1) < F(n-k) for k >= 1
Difference between two adjacent fibonaci number will always be less than the min number among them.
NOTE : This will only hold if numbers belong to Fibonacci series.
Simply calculate the difference between them. If it is smaller than the smaller of the 2 numbers they are adjacent, If it is bigger, they are not.
Each triplet in the Fibonacci sequence a, b, c conforms to the rule
c = a + b
So for every pair of adjacent Fibonaccis (x, y), the difference between them (y-x) is equal to the value of the previous Fibonacci, which of course must be less than x.
If 2 Fibonaccis, say (x, z) are not adjacent, then their difference must be greater than the smaller of the two. At minimum, (if they are one Fibonacci apart) the difference would be equal to the Fibonacci between them, (which is of course greater than the smaller of the two numbers).
Since for (a, b, c, d)
since c= a+b
and d = b+c
then d-b = (b+c) - b = c
By Binet's formula, the nth Fibonacci number is approximately sqrt(5)*phi**n, where phi is the golden ration. You can use base phi logarithms to recover the index easily:
from math import log, sqrt
def fibs(n):
nums = [1,1]
for i in range(n-2):
nums.append(sum(nums[-2:]))
return nums
phi = (1 + sqrt(5))/2
def fibIndex(f):
return round((log(sqrt(5)*f,phi)))
To test this:
for f in fibs(20): print(fibIndex(f),f)
Output:
2 1
2 1
3 2
4 3
5 5
6 8
7 13
8 21
9 34
10 55
11 89
12 144
13 233
14 377
15 610
16 987
17 1597
18 2584
19 4181
20 6765
Of course,
def adjacentFibs(f,g):
return abs(fibIndex(f) - fibIndex(g)) == 1
This fails with 1,1 -- but there is little point for explicit testing special logic for such an edge-case. Add it in if you want.
At some stage, floating-point round-off error will become an issue. For that, you would need to replace math.log by an integer log algorithm (e.g. one which involves binary search).
On Edit:
I concentrated on the question of how to recover the index (and I will keep the answer since that is an interesting problem in its own right), but as #LeandroCaniglia points out in their excellent comment, this is overkill if all you want to do is check if two Fibonacci numbers are adjacent, since another consequence of Binet's formula is that sufficiently large adjacent Fibonacci numbers have a ratio which differs from phi by a negligible amount. You could do something like:
def adjFibs(f,g):
f,g = min(f,g), max(f,g)
if g <= 34:
return adjacentFibs(f,g)
else:
return abs(g/f - phi) < 0.01
This assumes that they are indeed Fibonacci numbers. The index-based approach can be used to verify that they are (calculate the index and then use the full-fledged Binet's formula with that index).

GROUPING_ID functionality

I don't quite understand how this function works, I've been looking over the below documentation, and I have some issues.
http://msdn.microsoft.com/en-us/library/bb510624.aspx
So, I understand how GROUPING() works perfectly, but the output for GROUPING_ID is quite impossible for me to comprehend how it's done because it's not the same with the explanation.
For example I have the following string of ones and zeroes: 010. In the documentation it says it's equal to 2. Also I read in a SQL book that each byte (the rightmost) is EQUAL with 2 at the power of the byte position minus one.
So, (2^2 - 1) + (2^1 - 1 ) + (2^0 - 1), but isn't that the same for each binary number? (100/101/110/etc), and the result isn't 2 either....
EDIT 1 :
This is how the explanation from the book is:
Another function that you can use to identify the grouping sets is GROUPING_ID. This
function accepts the list of grouped columns as inputs and returns an integer representing a
bitmap. The rightmost bit represents the rightmost input. The bit is 0 when the respective element
is part of the grouping set and 1 when it isn’t. Each bit represents 2 raised to the power
of the bit position minus 1; so the rightmost bit represents 1, the one to the left of it 2, then 4,
then 8, and so on. The result integer is the sum of the values representing elements that are
not part of the grouping set because their bits are turned on. Here’s a query demonstrating
the use of this function.
There has to be an error because there is no way a number is calculated by 2^(position) - 1, is it a mistake ? I've been calculating with 2^(bitposition) *1 and the outputs are correct.
For example the I've done this
GROUPING_ID(a,b,c),
GROUPING(a),
GROUPING(b),
GROUPING(c)
And let's say we have the following output
3, 0, 1, 1
So our binary string is 011 and 3 is the output of the GROUPING_ID function, if we calculate the string
2^0 * 1 + 2^1 * 1 + 2^0 *2 = 1 + 2 + 0 = 3
There is no other logic that I see here, I cannot calculate with minus as the upper quote says, the thing is that on MSDN the weirder definition seems to be somewhat similar with this one:
Each GROUPING_ID argument must be an element of the GROUP BY list. GROUPING_ID () returns an integer bitmap whose lowest N bits may be lit.
A lit bit indicates the corresponding argument is not a grouping column for the given output row. The lowest-order bit corresponds to argument N, and the N-1th lowest-order bit corresponds to argument 1.
First of all, when they say
Each bit represents 2 raised to the power of the bit position minus 1
they do not mean 2position - 1 but rather 2position - 1. Apparently, for the purpose of their description they chose to count bits from 1 (for the rightmost bit) rather than from 0.
Secondly, each bit represents the said value when it is set, i.e. when it is 1. So, naturally, you do not do just
21 - 1 + 22 - 1 + ... + 2N - 1
but rather
bit1 × 21 - 1 + bit2 × 22 - 1 + ... + bitN × 2N - 1
which is the normal way of converting the binary representation to the decimal one and is also the method you have shown near the end of your question.
Lets say we have binary number 0101
we went from right to left
1->(2^0*1)=1
0->(2^1*0)=0
1->(2^2*1)=4
0->(2^3*0)=0
if we summerize all results we have 5
so 0101(binary)=5(decimal)

Find the missing number in a group {0......2^k -1} range

Given an array that has the numbers {0......2^k -1} except for one number ,
find a good algorithm that finds the missing number.
Please notice , you can only use :
for A[i] return the value of bit j.
swap A[i] with A[j].
My answer : use divide & conquer , check the bit number K of all the numbers , if the K bit (now we're on the LSB) is 0 then move the number to the left side, if the K bit is 1 then move the number to the right side.
After the 1st iteration , we'd have two groups , where one of them is bigger than the other , so we continue to do the same thing, with the smaller group , and I think that I need to check the K-1 bit this time.
But from some reason I've tried with 8 numbers , from 0.....7 , and removed 4 (say that I want to find out that 4 is the missing number) , however to algorithm didn't work out so good. So where is my mistake ?
I assume you can build xor bit function using get bit j.
The answer will be (xor of all numbers)
PROOF: a xor (2^k-1-a) = 2^k-1 (a and (2^k-1-a) will have different bits in first k positions).
Then 0 xor 1 xor ... xor 2^k-1 = (0 xor 2^k-1) xor (1 xor 2^k-2).... (2^(k-1) pairs) = 0.
if number n is missing the result will be n, because 0 xor 1 xor 2....xor n-1 xor n+1 xor ... = 0 xor 1 xor 2....xor n-1 xor n+1 xor ... xor n xor n = 0 xor n = n
EDIT: This will not work if k = 1.
Ron,
your solution is correct. This problem smells Quicksort, doesn't it ?
What you do with the Kth bit (all 0's to the left, 1's to the right) is a called a partition - you need to find the misplaced elements in pairs and swap them. It's the process used in Hoare's Selection and in Quicksort, with special element classification - no need to use a pivot element.
You forgot to tell in the problem statement how many elements there are in the array (2^k-2 or more), i.e. if repetitions are allowed.
If repetitions are not allowed, every partition will indeed be imbalanced by one element. The algorithm to use is an instance of Hoare's Selection (only partition the smallest halve). At every partition stage, the number of elements to be considered is halved, hence O(N) running time. This is optimal since every element needs to be known before the solution can be found.
[If repetitions are allowed, use modified Quicksort (recursively partition both halves) until you arrive at an empty half. The running time is probably O(N Lg(N)) then, but this needs to be checked.]
You say that the algorithm failed on your test case: you probably mis-implemented some detail.
An example:
Start with
5132670 (this is range {0..7})
After partitioning on bit weight=4 you get
0132|675
where the shortest half is
675 (this is range {4..7})
After partitioning on bit weight=2, you get
5|67
where the shortest half is
5 (this is range {4..5})
After partitioning on bit weight=1, you get
|5
where the shortest half is empty (this is range {4}).
Done.
for n just add them all and subtract the result from n*(n+1)/2
n*(n+1)/2 is sum of 1...n all numbers. If one of them is missing, then sum of those n-1 numbers will be n*(n+1)/2-missingNumber
Your answer is: n*(n+1)/2-missingNumber where n is 2^k-1
Given the fact that for a given bit position j, there are exactly 2^(k-1) numbers which have it set to 0, and 2^(k-1) which have it set to 1 use the following algorithm.
start with an array B of boolean of size k
init the array to false everywhere
for each number A[i]
for each position j
get the value v
if v is 1 invert the boolean at position j
end for
end for
If a position is false at the end then the missing number does have a zero at
this position, otherwise it has a one (for k >1, If k = 1 then it is the inverse). Now to implement your array of booleans
create a number of size 2k, where the lower k are set to 0, and the upper
are set to 1. Then
invert the boolean at position j
is simply *
swap B[j] with B[j+k].
With this representation the missing number is the lower k elements of the array
B. Well this algorithm is still O(k*2^k) but you can say it is O(n*log(n))
of the input.
you can consider elements as string of k bits and at each step i if the number of ones or zeros in position i is 2^(k-i) you should remove all those strings an continue for example
100 111 010
101 110 000 011
so
100 111 101 110 all will be removed
and between 010 000 011 , 010 and 011 will be removed because their second bit is 1
000 remain and its rightmost bit is zero so 001 is the missing number

Finding the maximum area in given binary data

I have a problem with describing algorithm for finding maximum rectangular area of binary data, where 1 occurs k-times more often than 0. Data is always n^2 bits like this:
For example data for n = 4 looks like:
1 0 1 0
0 0 1 1
0 1 1 1
1 1 0 1
Value of k can be 1 .. j (k = 1 means, that number of 0 and 1 is equal).
For above example of data and for k = 1 solution is:
1 0 1 0 <- 4 x '0' and 4 x '1'
0 0 1 1
0 1 1 1
1 1 0 1
But in this example:
1 1 1 0
0 1 0 0
0 0 0 0
0 1 1 1
Solution would be:
1 1 1 0
0 1 0 0
0 0 0 0
0 1 1 1
I tried with few brute force algorithms but for n > 20 it is getting too slow. Can you advise me how I should solve this problem?
As RBerteig proposed - the problem can be also described like that: "In a given square bitmap with cells set to 1 or 0 by some arbitrary process, find the largest rectangular area where the 1's and 0's occur in a specified ratio, k."
Bruteforce should do just fine here for n < 100, if properly implemented: solution below has O(n^4) time and O(n^2) memory complexity. 10^8 operations should be well under 1 second on modern PC (especially considering that each operation is very cheap: few additions and subtractions).
Some observations
There're O(n^4) sub-rectangles to consider and each of them can be a solution.
If we can find number of 1's and 0's in each sub-rectangle in O(1) (constant time), we'll solve problem in O(n^4) time.
If we know number of 1's in some sub-rectangle, we can find number of zeroes (through area).
So, the problem is reduced to following: create data structure allowing to find number of 1's in each sub-rectangle in constant time.
Now, imagine we have sub-rectangle [i0..i1]x[j0..j1]. I.e., it occupies rows between i0 and i1 and columns between j0 and j1. And let count_ones be the function to count number of 1's in subrectangle.
This is the main observation:
count_ones([i0..i1]x[j0..j1]) = count_ones([0..i1]x[0..j1]) - count_ones([0..i0 - 1]x[0..j1]) - count_ones([0..i1]x[0..j0 - 1]) + count_ones([0..i0 - 1]x[0..j0 - 1])
Same observation with practical example:
AAAABBB
AAAABBB
CCCCDDD
CCCCDDD
CCCCDDD
CCCCDDD
If we need to find number of 1's in D sub-rectangle (3x4), we can do it by taking number of 1's in the whole rectangle (A + B + C + D), subtracting number of 1's in (A + B) rectangle, subtracting number of 1's in (A + C) rectangle, and adding number of 1's in (A) rectangle. (A + B + C + D) - (A + B) - (A + C) + (A) = D
Thus, we need table sums, for each i and j containing number of 1's in sub-rectangle [0..i][0..j].
You can create this table in O(n^2), but even the direct way to fill it (for each i and j iterate all elements of [0..i][0..j] area) will be O(n^4).
Having this table,
count_ones([i0..i1]x[j0..j1]) = sums[i1][j1] - sums[i0 - 1][j1] - sums[i1][j0 - 1] + sums[i0 - 1][j0 - 1]
Therefore, time complexity O(n^4) reached.
This is still brute force, but something you should note is that you don't have to recompute everything from scratch for a new i*j rectangle. Instead, for each possible rectangle size, you can move the rectangle across the n*n grid one step at a time, decrementing the counts for the bits no longer within the rectangle and incrementing the counts for the bits that newly entered the rectangle. You could potentially combine this with varying the rectangle size, and try to find an optimal pattern for moving and resizing the rectangle.
Just some hints..
You could impose better restrictions on the values. The requirement leads to condition
N1*(k+1) == S*k, where N1 is number of ones in an area, and S=dx*dy is its surface.
It can be rewritten in better form:
N1/k == S/(k+1).
Because the greatest common divisor of numbers n and n+1 is always 1, then N1 have to be multiple of k and dx*dy to be multiple of k+1. It reduces greatly the possible space of solutions, the larger is k, the better (for dx*dy case you'll need to play with prime divisors of k+1).
Now, because you need just the surface of the largest area with such property, it would be wise to start from largest areas and move to smaller ones. By trying dx*dy from n^2 downto k+1 that would satisfy the divisor and the bounding conditions, you'll find quite fast the solution, muuuch faster than O(n^4), because of a special reason: except cases when the array was specially constructed, if we assume a random input, the probability that there are N1 ones out of S values in the (n-dx+1)*(n-dy+1) areas that have the surface S will constantly grow with decrease of S. (large values of k will make the probability smaller, but in the same time they will make the filter for dx and dy pairs stronger).
Also, this problem: http://ioinformatics.org/locations/ioi99/contest/land/land.shtml , looks somehow similar, maybe you'll find some ideas in their solution.

Resources