Construct an NFA/DFA/Regular Expression - theory

Hey so im trying to make a NFA/DFA or Regular expression for this language.
l = {Even-length Strings over the alphabet {0,1} of at least length 6 that begin and end with the same symbol.}
This is the NFA i have so far
enter image description here

The proposed NFA isn't doing what you want. For example it accepts a string of nine 1's. It will also accept nine 1's followed by eight 0's.
For a regex, suppose for now we're only concerned with the "starting and ending with 1" case. We want always to add characters in groups of 2. Adding a second character is 1(0|1). The minimum length string has 6 characters and ends with 1. That suggests two 1's with four of any character in the middle: 1(0|1)(0|1)(0|1)(0|1)1. Now we need to allow any number of pairs of characters added in the middle so the length always remains even. This suggests 1 (0|1) (0|1) (0|1) (0|1) ((0|1)(0|1))* 1. To recap, this is all even length strings of length at least 6 that start and end in 1.
The same for first/last character 0 is just substitution: 0 (0|1) (0|1) (0|1) (0|1) ((0|1)(0|1))* 0
So an answer is these two "or"ed with alternation:
1 (0|1) (0|1) (0|1) (0|1) ((0|1)(0|1))* 1 | 0 (0|1) (0|1) (0|1) (0|1) ((0|1)(0|1))* 0
Note I've added spaces for readability in all regexes above.
The idea for an NFA is similar. If you are familiar with Thompson's algorithm, this is roughly what it produces from the regex.
Note this is nearly a DFA. There are a couple of states with non-deterministic transitions. You can apply one of the well-known NFA to DFA conversion algorithms to get it. Or you can just reason it out. A good exercise.
I've been assuming "pure" classical regexes. If you allow the normal regex library syntax, the final result can be shorter: 1[01]{2}([01]{2})+1|0[01]{2}([01]{2})+0

Related

Daily Coding Problem 260 : Reconstruct a jumbled array - Intuition?

I'm going through the question below.
The sequence [0, 1, ..., N] has been jumbled, and the only clue you have for its order is an array representing whether each number is larger or smaller than the last. Given this information, reconstruct an array that is consistent with it.
For example, given [None, +, +, -, +], you could return [1, 2, 3, 0, 4].
I went through the solution on this post but still unable to understand it as to why this solution works. I don't think I would be able to come up with the solution if I had this in front of me during an interview. Can anyone explain the intuition behind it? Thanks in advance!
This answer tries to give a general strategy to find an algorithm to tackle this type of problems. It is not trying to prove why the given solution is correct, but lying out a route towards such a solution.
A tried and tested way to tackle this kind of problem (actually a wide range of problems), is to start with small examples and work your way up. This works for puzzles, but even so for problems encountered in reality.
First, note that the question is formulated deliberately to not point you in the right direction too easily. It makes you think there is some magic involved. How can you reconstruct a list of N numbers given only the list of plusses and minuses?
Well, you can't. For 10 numbers, there are 10! = 3628800 possible permutations. And there are only 2⁹ = 512 possible lists of signs. It's a very huge difference. Most original lists will be completely different after reconstruction.
Here's an overview of how to approach the problem:
Start with very simple examples
Try to work your way up, adding a bit of complexity
If you see something that seems a dead end, try increasing complexity in another way; don't spend too much time with situations where you don't see progress
While exploring alternatives, revisit old dead ends, as you might have gained new insights
Try whether recursion could work:
given a solution for N, can we easily construct a solution for N+1?
or even better: given a solution for N, can we easily construct a solution for 2N?
Given a recursive solution, can it be converted to an iterative solution?
Does the algorithm do some repetitive work that can be postponed to the end?
....
So, let's start simple (writing 0 for the None at the start):
very short lists are easy to guess:
'0++' → 0 1 2 → clearly only one solution
'0--' → 2 1 0 → only one solution
'0-+' → 1 0 2 or 2 0 1 → hey, there is no unique outcome, though the question only asks for one of the possible outcomes
lists with only plusses:
'0++++++' → 0 1 2 3 4 5 6 → only possibility
lists with only minuses:
'0-------'→ 7 6 5 4 3 2 1 0 → only possibility
lists with one minus, the rest plusses:
'0-++++' → 1 0 2 3 4 5 or 5 0 1 2 3 4 or ...
'0+-+++' → 0 2 1 3 4 5 or 5 0 1 2 3 4 or ...
→ no very obvious pattern seem to emerge
maybe some recursion could help?
given a solution for N, appending one sign more?
appending a plus is easy: just repeat the solution and append the largest plus 1
appending a minus, after some thought: increase all the numbers by 1 and append a zero
→ hey, we have a working solution, but maybe not the most efficient one
the algorithm just appends to an existing list, no need to really write it recursively (although the idea is expressed recursively)
appending a plus can be improved, by storing the largest number in a variable so it doesn't need to be searched at every step; no further improvements seem necessary
appending a minus is more troublesome: the list needs to be traversed with each append
what if instead of appending a zero, we append -1, and do the adding at the end?
this clearly works when there is only one minus
when two minus signs are encountered, the first time append -1, the second time -2
→ hey, this works for any number of minuses encountered, just store its counter in a variable and sum with it at the end of the algorithm
This is in bird's eye view one possible route towards coming up with a solution. Many routes lead to Rome. Introducing negative numbers might seem tricky, but it is a logical conclusion after contemplating the recursive algorithm for a while.
It works because all changes are sequential, either adding one or subtracting one, starting both the increasing and the decreasing sequences from the same place. That guarantees we have a sequential list overall. For example, given the arbitrary
[None, +, -, +, +, -]
turned vertically for convenience, we can see
None 0
+ 1
- -1
+ 2
+ 3
- -2
Now just shift them up by two (to account for -2):
2 3 1 4 5 0
+ - + + -
Let's look at first to a solution which (I think) is easier to understand, formalize and demonstrate for correctness (but I will only explain it and not demonstrate in a formal way):
We name A[0..N] our input array (where A[k] is None if k = 0 and is + or - otherwise) and B[0..N] our output array (where B[k] is in the range [0, N] and all values are unique)
At first we see that our problem (find B such that B[k] > B[k-1] if A[k] == + and B[k] < B[k-1] if A[k] == -) is only a special case of another problem:
Find B such that B[k] == max(B[0..k]) if A[k] == + and B[k] == min(B[0..k]) if A[k] == -.
Which generalize from "A value must larger or smaller than the last" to "A value must be larger or smaller than everyone before it"
So a solution to this problem is a solution to the original one as well.
Now how do we approach this problem?
A greedy solution will be sufficient, indeed is easy to demonstrate that the value associated with the last + will be the biggest number in absolute (which is N), the one associated with the second last + will be the second biggest number in absolute (which is N-1) ecc...
And in the same time the value associated with the last - will be the smallest number in absolute (which is 0), the one associated with the second last - will be the second smallest (which is 1) ecc...
So we can start filling B from right to left remembering how many + we have seen (let's call this value X), how many - we have seen (let's call this value Y) and looking at what is the current symbol, if it is a + in B we put N-X and we increase X by 1 and if it is a - in B we put 0+Y and we increase Y by 1.
In the end we'll need to fill B[0] with the only remaining value which is equal to Y+1 and to N-X-1.
An interesting property of this solution is that if we look to only the values associated with a - they will be all the values from 0 to Y (where in this case Y is the total number of -) sorted in reverse order; if we look to only the values associated with a + they will be all the values from N-X to N (where in this case X is the total number of +) sorted and if we look at B[0] it will always be Y+1 and N-X-1 (which are equal).
So the - will have all the values strictly smaller than B[0] and reverse sorted and the + will have all the values strictly bigger than B[0] and sorted.
This property is the key to understand why the solution proposed here works:
It consider B[0] equals to 0 and than it fills B following the property, this isn't a solution because the values are not in the range [0, N], but it is possible with a simple translation to move the range and arriving to [0, N]
The idea is to produce a permutation of [0,1...N] which will follow the pattern of [+,-...]. There are many permutations which will be applicable, it isn't a single one. For instance, look the the example provided:
[None, +, +, -, +], you could return [1, 2, 3, 0, 4].
But you also could have returned other solutions, just as valid: [2,3,4,0,1], [0,3,4,1,2] are also solutions. The only concern is that you need to have the first number having at least two numbers above it for positions [1],[2], and leave one number in the end which is lower then the one before and after it.
So the question isn't finding the one and only pattern which is scrambled, but to produce any permutation which will work with these rules.
This algorithm answers two questions for the next member of the list: get a number who’s both higher/lower from previous - and get a number who hasn’t been used yet. It takes a starting point number and essentially create two lists: an ascending list for the ‘+’ and a descending list for the ‘-‘. This way we guarantee that the next member is higher/lower than the previous one (because it’s in fact higher/lower than all previous members, a stricter condition than the one required) and for the same reason we know this number wasn’t used before.
So the intuition of the referenced algorithm is to start with a referenced number and work your way through. Let's assume we start from 0. The first place we put 0+1, which is 1. we keep 0 as our lowest, 1 as the highest.
l[0] h[1] list[1]
the next symbol is '+' so we take the highest number and raise it by one to 2, and update both the list with a new member and the highest number.
l[0] h[2] list [1,2]
The next symbol is '+' again, and so:
l[0] h[3] list [1,2,3]
The next symbol is '-' and so we have to put in our 0. Note that if the next symbol will be - we will have to stop, since we have no lower to produce.
l[0] h[3] list [1,2,3,0]
Luckily for us, we've chosen well and the last symbol is '+', so we can put our 4 and call is a day.
l[0] h[4] list [1,2,3,0,4]
This is not necessarily the smartest solution, as it can never know if the original number will solve the sequence, and always progresses by 1. That means that for some patterns [+,-...] it will not be able to find a solution. But for the pattern provided it works well with 0 as the initial starting point. If we chose the number 1 is would also work and produce [2,3,4,0,1], but for 2 and above it will fail. It will never produce the solution [0,3,4,1,2].
I hope this helps understanding the approach.
This is not an explanation for the question put forward by OP.
Just want to share a possible approach.
Given: N = 7
Index: 0 1 2 3 4 5 6 7
Pattern: X + - + - + - + //X = None
Go from 0 to N
[1] fill all '-' starting from right going left.
Index: 0 1 2 3 4 5 6 7
Pattern: X + - + - + - + //X = None
Answer: 2 1 0
[2] fill all the vacant places i.e [X & +] starting from left going right.
Index: 0 1 2 3 4 5 6 7
Pattern: X + - + - + - + //X = None
Answer: 3 4 5 6 7
Final:
Pattern: X + - + - + - + //X = None
Answer: 3 4 2 5 1 6 0 7
My answer definitely is too late for your problem but if you need a simple proof, you probably would like to read it:
+min_last or min_so_far is a decreasing value starting from 0.
+max_last or max_so_far is an increasing value starting from 0.
In the input, each value is either "+" or "-" and for each increase the value of max_so_far or decrease the value of min_so_far by one respectively, excluding the first one which is None. So, abs(min_so_far, max_so_far) is exactly equal to N, right? But because you need the range [0, n] but max_so_far and min_so_far now are equal to the number of "+"s and "-"s with the intersection part with the range [0, n] being [0, max_so_far], what you need to do is to pad it the value equal to min_so_far for the final solution (because min_so_far <= 0 so you need to take each value of the current answer to subtract by min_so_far or add by abs(min_so_far)).

Binary search modification

I have been attempting to solve following problem. I have a sequence of positive
integer numbers which can be very long (several milions of elements). This
sequence can contain "jumps" in the elements values. The aforementioned jump
means that two consecutive elements differs each other by more than 1.
Example 01:
1 2 3 4 5 6 7 0
In the above mentioned example the jump occurs between 7 and 0.
I have been looking for some effective algorithm (from time point of view) for
finding of the position where this jump occurs. This issue is complicated by the
fact that there can be a situation when two jumps are present and one of them
is the jump which I am looking for and the other one is a wrap-around which I
am not looking for.
Example 02:
9 1 2 3 4 6 7 8
Here the first jump between 9 and 1 is a wrap-around. The second jump between
4 and 6 is the jump which I am looking for.
My idea is to somehow modify the binary search algorithm but I am not sure whether it is possible due to the wrap-around presence. It is worthwhile to say that only two jumps can occur in maximum and between these jumps the elements are sorted. Does anybody have any idea? Thanks in advance for any suggestions.
You cannot find an efficient solution (Efficient meaning not looking at all numbers, O(n)) since you cannot conclude anything about your numbers by looking at less than all. For example if you only look at every second number (still O(n) but better factor) you would miss double jumps like these: 1 5 3. You can and must look at every single number and compare it to it's neighbours. You could split your workload and use a multicore approach but that's about it.
Update
If you have the special case that there is only 1 jump in your list and the rest is sorted (eg. 1 2 3 7 8 9) you can find this jump rather efficiently. You cannot use vanilla binary search since the list might not be sorted fully and you don't know what number you are searching but you could use an abbreviation of the exponential search which bears some resemblance.
We need the following assumptions for this algorithm to work:
There is only 1 jump (I ignore the "wrap around jump" since it is not technically between any following elements)
The list is otherwise sorted and it is strictly monotonically increasing
With these assumptions we are now basically searching an interruption in our monotonicity. That means we are searching the case when 2 elements and b have n elements between them but do not fulfil b = a + n. This must be true if there is no jump between the two elements. Now you only need to find elements which do not fulfil this in a nonlinear manner, hence the exponential approach. This pseudocode could be such an algorithm:
let numbers be an array of length n fulfilling our assumptions
start = 0
stepsize = 1
while (start < n-1)
while (start + stepsize > n)
stepsize -= 1
stop = start + stepsize
while (numbers[stop] != numbers[start] + stepsize)
// the number must be between start and stop
if(stepsize == 1)
// congratiulations the jump is at start to start + 1
return start
else
stepsize /= 2
start += stepsize
stepsize *= 2
no jump found

How would you set the NgPattern of an Angular NgMessages Application for two given years as a date range [duplicate]

I'm trying to use the range pattern [01-12] in regex to match two digit mm, but this doesn't work as expected.
You seem to have misunderstood how character classes definition works in regex.
To match any of the strings 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, or 12, something like this works:
0[1-9]|1[0-2]
References
regular-expressions.info/Character Classes
Numeric Ranges (have many examples on matching strings interpreted as numeric ranges)
Explanation
A character class, by itself, attempts to match one and exactly one character from the input string. [01-12] actually defines [012], a character class that matches one character from the input against any of the 3 characters 0, 1, or 2.
The - range definition goes from 1 to 1, which includes just 1. On the other hand, something like [1-9] includes 1, 2, 3, 4, 5, 6, 7, 8, 9.
Beginners often make the mistakes of defining things like [this|that]. This doesn't "work". This character definition defines [this|a], i.e. it matches one character from the input against any of 6 characters in t, h, i, s, | or a. More than likely (this|that) is what is intended.
References
regular-expressions.info/Brackets for Grouping and Alternation with the vertical bar
How ranges are defined
So it's obvious now that a pattern like between [24-48] hours doesn't "work". The character class in this case is equivalent to [248].
That is, - in a character class definition doesn't define numeric range in the pattern. Regex engines doesn't really "understand" numbers in the pattern, with the exception of finite repetition syntax (e.g. a{3,5} matches between 3 and 5 a).
Range definition instead uses ASCII/Unicode encoding of the characters to define ranges. The character 0 is encoded in ASCII as decimal 48; 9 is 57. Thus, the character definition [0-9] includes all character whose values are between decimal 48 and 57 in the encoding. Rather sensibly, by design these are the characters 0, 1, ..., 9.
See also
Wikipedia/ASCII
Another example: A to Z
Let's take a look at another common character class definition [a-zA-Z]
In ASCII:
A = 65, Z = 90
a = 97, z = 122
This means that:
[a-zA-Z] and [A-Za-z] are equivalent
In most flavors, [a-Z] is likely to be an illegal character range
because a (97) is "greater than" than Z (90)
[A-z] is legal, but also includes these six characters:
[ (91), \ (92), ] (93), ^ (94), _ (95), ` (96)
Related questions
is the regex [a-Z] valid and if yes then is it the same as [a-zA-Z]
A character class in regular expressions, denoted by the [...] syntax, specifies the rules to match a single character in the input. As such, everything you write between the brackets specify how to match a single character.
Your pattern, [01-12] is thus broken down as follows:
0 - match the single digit 0
or, 1-1, match a single digit in the range of 1 through 1
or, 2, match a single digit 2
So basically all you're matching is 0, 1 or 2.
In order to do the matching you want, matching two digits, ranging from 01-12 as numbers, you need to think about how they will look as text.
You have:
01-09 (ie. first digit is 0, second digit is 1-9)
10-12 (ie. first digit is 1, second digit is 0-2)
You will then have to write a regular expression for that, which can look like this:
+-- a 0 followed by 1-9
|
| +-- a 1 followed by 0-2
| |
<-+--> <-+-->
0[1-9]|1[0-2]
^
|
+-- vertical bar, this roughly means "OR" in this context
Note that trying to combine them in order to get a shorter expression will fail, by giving false positive matches for invalid input.
For instance, the pattern [0-1][0-9] would basically match the numbers 00-19, which is a bit more than what you want.
I tried finding a definite source for more information about character classes, but for now all I can give you is this Google Query for Regex Character Classes. Hopefully you'll be able to find some more information there to help you.
This also works:
^([1-9]|[0-1][0-2])$
[1-9] matches single digits between 1 and 9
[0-1][0-2] matches double digits between 10 and 12
There are some good examples here
The []s in a regex denote a character class. If no ranges are specified, it implicitly ors every character within it together. Thus, [abcde] is the same as (a|b|c|d|e), except that it doesn't capture anything; it will match any one of a, b, c, d, or e. All a range indicates is a set of characters; [ac-eg] says "match any one of: a; any character between c and e; or g". Thus, your match says "match any one of: 0; any character between 1 and 1 (i.e., just 1); or 2.
Your goal is evidently to specify a number range: any number between 01 and 12 written with two digits. In this specific case, you can match it with 0[1-9]|1[0-2]: either a 0 followed by any digit between 1 and 9, or a 1 followed by any digit between 0 and 2. In general, you can transform any number range into a valid regex in a similar manner. There may be a better option than regular expressions, however, or an existing function or module which can construct the regex for you. It depends on your language.
Use this:
0?[1-9]|1[012]
07: valid
7: valid
0: not match
00 : not match
13 : not match
21 : not match
To test a pattern as 07/2018 use this:
/^(0?[1-9]|1[012])\/([2-9][0-9]{3})$/
(Date range between 01/2000 to 12/9999 )
As polygenelubricants says yours would look for 0|1-1|2 rather than what you wish for, due to the fact that character classes (things in []) match characters rather than strings.
My solution to keep mm-yyyy is ^0*([1-9]|1[0-2])-(20[2-4][0-9])$

Indexing an array with a string (C)

I have an array of unsigned integers, each corresponding to a string with 12 characters, that can contain 4 different characters, namely 'A','B','C','D'. Thus the array will contain 4^12 = 16777216 elements. The ordering of the elements in the array is arbitrary; I can choose which one corresponds to each string. So far, I have implemented this as simply as that:
unsigned int my_array[16777216];
char my_string[12];
int index = string_to_index(my_string);
my_array[index] = ...;
string_to_index() simply assigns 2 bits per character like this:
A --> 00, B --> 01, C --> 10, D --> 11
For example, ABCDABCDABCD corresponds to the index (000110110001101100011011)2 = (1776411)10
However, I know for a fact that each string that is used to access the array is the previous string shifted once to the left with a new last character. For example after I access with ABCDABCDABCD, the next access will use BCDABCDABCDA, or BCDABCDABCDB, BCDABCDABCDC, BCDABCDABCDD.
So my question is:
Is there a better way to implement the string_to_index function to take under consideration this last fact, so that elements that are consecutively accessed are closer in the array? I am hoping to improve my caching performance by doing so.
edit: Maybe I was not very clear: I am looking for a completely different string to index correspondence scheme, so that the indexes of ABCDABCDABCD and BCDABCDABCDA are closer.
If the following assumptions are true for your problem then the solution you implemented is best one.
The right most char of next string is randomly selected with equal probability for each valid character
Start of the sequence is not same always (it is random).
Reason:
When I first read your question I came up with the following tree: (reduced your problem to string of length three characters and only 2 possible characters A and B for simplicity) Note that left most child of root node (AAA in this case) is always same as root node (AAA) hence I am not building that branch further.
AAA
/ \
AAB
/ \
ABA ABB
/ \ / \
BAA BAB BBA BBB
In this tree each node has its next possible sequence as child nodes. To improve on cache you need to traverse this tree using breadth-first traversal and store it in the array in the same order. For the above tree we get following string index combination.
AAA 0
AAB 1
ABA 2
ABB 3
BAA 4
BAB 5
BBA 6
BBB 7
Assuming value(A) = 0 and value(B) = 1, index can be calculated as
index = 2^0 * (value(string[2])) + 2^1 * (value(string[1])) + 2^2 * (value(string[0]))
This is same solution as you are using.
I have written a python script to check this for other combinations too (like string of length 4 characters with A B C as possible characters). Script link
So unless the 2 assumptions made at the beginning are false than your solution already takes care of cache optimisation.
I think we could define "closer" first.
For example, we could define a function F which takes a method of calculating the indices of strings. Then F will check every string's index and return a certain value based on the distance of neighbor strings' indices.
Then we can compare various ways of calculating the index and find a best one.
Of course we could examine shorter strings first.

Bitmask to flip bits ... without XOR?

Pretty simple, really. I want to negate an integer which is represented in 2's complement, and to do so, I need to first flip all the bits in the byte. I know this is simple with XOR--just use XOR with a bitmask 11111111. But what about without XOR? (i.e. just AND and OR). Oh, and in this crappy assembly language I'm using, NOT doesn't exist. So no dice there, either.
You can't build a NOT gate out of AND and OR gates.
As I was asked to explain, here it is nicely formatted. Let's say you have any number of AND and OR gates. Your inputs are A, 0 and 1. You have six possibilities as you can make three pairs out of three signals (pick one that's left out) and two gates. Now:
Operation Result
A AND A A
A AND 1 A
A AND 0 0
A OR A A
A OR 1 1
A OR 0 A
So after you fed any of your signals into the first gate, your new set of signals is still just A, 0 and 1. Therefore any combination of these gates and signals will only get you A, 0 and 1. If your final output is A, then this means that for both values of A it won't equal !A, if your final output is 0 then A = 0 is such a value that your final value is not !A same for 1.
Edit: that monotony comment is also correct! Let me repeat here: if you change any of the inputs of AND / OR from 0 to 1 then the output won't decrease. Therefore if you claim to build a NOT gate then I will change your input from 0 to 1 , your output also can't decrease but it should -- that's a contradiction.
Does (foo & ~bar) | (~foo & bar) do the trick?
Edit: Oh, NOT doesn't exist. Didn't see that part!

Resources