DFA of two simple languages and then product construction of those two languages

DFA of two simple languages and then product construction of those two languages - dfa

The language below is the intersection of two simpler languages. First, identify the simpler languages and give the state diagrams of the DFAs that recognize them. Then, use the product construction to build a DFA that recognizes the language specified below; give its state diagram before and after simplification if there are any unneeded states or states that can be combined.
Language: {w is a member of {0,1}* | w contains an odd number of 0s and the sum of its 0s and 1s is equal to 1}
This is my proposed solution: https://imgur.com/a/lh5Hwfr Should the bottom two states be connected with 0s?
Also, what would be the DFA if it was OR instead of AND?

Here's a drawing I hope will help understand how to do this:
Language A is "odd number of zeros". States are labeled Z0 and Z1 indicating even or odd number of zeros.
Language B is "exactly one one" (which is equivalent to "sum of digits equals one"). States are labeled I0, I1 and I2 indicaing zero, one or more ones.
Language A+B can be interpreted as A∩B (ignoring the dotted circles) or AUB (counting the dotted circles). If building A∩B, states Z0I2 and Z1I2 can be joined together.
I hope this gives not only an answer to the exact problem in the question, but also an idea how to build similar answers for similar problems.

Related

Venn diagram notation for all other sets except one

I'm trying to find a Venn diagram notation that can illustrate data that is only in a single set.
If I can select data from all the other sets, without knowing how many there are, then I can find the intersection of their complement, to select data only in the targeting set.
My current solution looks like this, but it assumes the existance of sets B and C.
The eventual diagram expecting to look like this:

One way to do it would be by using a system based on regions rather than sets. In your case, it would be the region that belongs to set A but does not belong to any other set. You can find the rationale to do that here. The idea is to express the region as a binary chain where 1 means "belongs to set n" and 0 means "does not belong to set n", where n is determined by the ordering of the sets.
In your example, you might define A as the last set, and therefore as the last bit. With three sets CBA, your region would be 001. The nice thing about this is that the leading zeroes can be naturally disregarded. Your region would be 1b, not matter how many sets there are (the b is for "binary").
You might even extend the idea by translating the number to another base. For instance, say that you want to express the region of elements belonging to set B only. With the same ordering as before, it would be 010 or 10b. But you can also express it as a decimal number and say "region 2". This expression would be valid if sets A and B exist, independently of the presence of any other set.

is there a C function for regex using a deterministic automaton?

The POSIX regex functions compile the regular expressions into non-deterministic finite automata (NFAs). One problem with that is there is no way to tell at compilation time whether those automata will use excessive stack space or take excessive cpu time. That makes them (in some sense) unsuitable for use in a real time system.
An equivalent deterministic finite automaton executes in linear time. It disadvantage is that it may use an excessive number of states, which translates to a large amount of program memory. On the plus side, though, is the fact that you know the number of states used at the time you compile the regular expression.
That means you can know at regular expression compile time whether it is suitable for your application. That brings me to my question: Is there a regular expression library for C that compiles to a DFA? The answer to a question that might be as useful would answer the question: Is there a regular expression library for C that gives useful information on memory and cpu utilization?
Ken

Yes. 2. It's a matter of simple algebra. 3. Here
https://github.com/RockBrentwood/RegEx
(originally in the comp.compilers archive.)
Here an early description on comp.compilers, from which this ultimately descended.
https://compilers.iecc.com/comparch/article/93-05-083
and another later description
https://compilers.iecc.com/comparch/article/93-10-022
The older version of the RegEx C programs on GitHub may be found in the AI repository at Carnegie Mellon University here
https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/parsing/regex
I will try to retcon the 1993-2021 evolution-stream of it into the current GitHub snapshot so that you can have the whole history, rather than just the snapshot of the latest versions. (It would be nice if GitHub supported retconning and retrofitting history-streams, by the way.)
An automaton is little more than the graphic display of a finite right linear system of inequations. Every rational expression is the least fixed point solution to such a system, which can be unfolded from the expression purely by algebraic means.
This is a general result of Kleene algebra, so it goes well beyond just regular expressions; e.g. rational subsets of any monoid; a special case being rational subsets of product monoids, which includes rational transductions as a special case. And the algebraic method used in the C routines is mostly (but not entirely) generic to Kleene algebras.
I'm trying to adapt the calculation in {nfa,dfa}.c to handle both inputs and outputs. There are a few places where it makes specific assumption that the Kleene algebra is the free Kleene algebra ( = regular expression algebra). And that has to be modified, to allow it to be generalized to non-free Kleene algebras, like the rational transductions.
Regular expressions over an alphabet $X$ comprise the Kleene algebra $ℜX^*$ of the rational subsets of the free monoid $X^*$ generated by $X$. Correspondingly, $ℜX^*$ is the free Kleene algebra generated by $X$.
The underlying theory (with respect to which "free" refers to) can be 1st-order or 2nd order.
The 1st-order theory (notwithstanding Conway's "no finite axiomatization" result, mis-stated and mis-applied as a folklore theorem) is a finitely axiomatized algebra consisting of (a) the axioms for a semiring, with an idempotent sum $x + x = x$ (usually denoted $x | x$) ... i.e. a "dioid", and (b) the corresponding partial ordering relation defined by ($x ≥ y ⇔ (∃z) x = z + y ⇔ x = x + y$); (c) the Kleene star operator $x ↦ x^*$, which (d) provides least fixed point solutions $b^* a c^* = μx (x ≥ a + bx + xc)$. (A set of axioms to embody (d) are $x^* = 1 + x x^*$ and $x ≥ a + bx + xc ⇒ x ≥ b^* a c^*$.) That dates from the mid 1990's by Kozen.
The algebra presented by the 1st order theory is not closed under congruence relations (because, in fact, all computations can be represented by a Kleene algebra taken over a suitably defined non-trivial monoid; so the word problem isn't solvable either). The 2nd order formulation - which predates the 1st order formulation - is the closure of the 1st order formulation under congruence. It has (a) the axioms of a dioid and (b) the least fixed points of all rational subsets and (c) distributivity with respect to the rational least fixed point. The last two axioms can be narrowed and combined into a single axiom for the least fixed point: $μ_{n≥0}(ab^nc) = ab^*c$.
Using the terminology in LNCS 4988 (https://link.springer.com/chapter/10.1007%2F978-3-540-78913-0_14), this comprises the category of "rational dioids" = rationally closed idempotent semirings. It has a tensor product ⊗, which was part of the suite of additional infrastructure and expanded terminology laid out in LNCS11194 (pp. 21-36, 37-52) https://dblp.org/db/conf/RelMiCS/ramics2018.html.
The software requires and uses only the 1st order formulation.
Rational transductions over an input alphabet $X$ and output alphabet $Y$, similarly, comprise the rational subsets $ℜ(X^* × Y^*)$ of the product monoid $X^* × Y^*$; and in the rational-dioid category, the rational transduction algebra is the tensor product $ℜ(X^* × Y^*) = ℜX^* ⊗ ℜY^*$ of the respective regular expression algebras.
In turn, that algebra is effectively just the algebra of regular expressions over the disjoint union of $X$ and $Y$ modulo the commutativity rule $xy = yx$ for $x ∈ X, y ∈ Y$, so the process can be adapted and generalized adapted to:
(a) "transducers" - where both X and Y are present,
(b) "generators", where only $Y$ is present and $X = {1}$,
(c) "recognizers", where only $X$ is present and $Y = {1}$ and even
(d) generalizations of these where multiple input and/or output channels are allowed.
Example: the Kleene algebra $ℜX_0^* ⊗ ℜX_1^* ⊗ ℜY_0^* ⊗ ℜY_1^*$ would be for transducers with two input channels (one with alphabet $X_0$ the other with alphabet $X_1$) and two output channels (with respective alphabets $Y_0$ and $Y_1$).

Make DFA of the set of all strings from {0,1} whose tenth symbol from the right end is a 1

What will be the transition diagram of the set of all strings from {0,1} whose tenth symbol from the right end is a 1 ?
I know the regular expression is (0+1)*1(0+1)(0+1)(0+1)(0+1)(0+1)(0+1)(0+1)(0+1)(0+1)
But I couldn't turn it into DFA.

I encountered this question in Introduction to Automata Theory, Languages, and Computations by John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman.
First, we need to get hold of the NFA here. This would have 11 states in linear formation with the start having a self loop with 0,1 and a transition to the next state with 1 (not 0). All the other 9 transitions from qi -> qi+1 would have transition for 0,1 both. The last state, q10 will be our accept state.
That was fairly a straightforward approach for the NFA. Now if we convert this NFA to a DFA considering all 211 subsets of set of states of the NFA, we would get the solution.
I will add the NFA Diagram when I get some time. And try to figure out an easy trick to solve the DFA without considering all the possibilities.
You can look at Example 1.30 from Introduction to the Theory of Computation - M. Sipser - 3rd Edition at page 51. There they showed an NFA for strings having 1 at the third position from the end would have 4 states but that corresponding DFA would have 8 states. Not posting the screenshots for copyright issues.

DFA of the set of all strings from {0,1} whose tenth symbol from the right end is a ...
the regular expression for given language is :
(0+1)*1(0+1)(0+1)(0+1)(0+1)(0+1)(0+1)(0+1)(0+1)(0+1)
Hope you got it..!!

Sorting and making "genes" in output bitstrings from a genetic algorithm

I was wondering if anybody had suggestions as to how I could analyze an output bitstring that is being permuted by a genetic algorithm. In particular it would be nice if I could try to identify patterns of bits (I'm calling them genes here) that seem to yield a desirable cv score. The difficulty comes in trying to examine these datasets because there are a lot of them (I have probably already something like 30 million bitstrings that are 140 bits long and I'll probably hit over 100 million pretty quickly), so after I sort out the desirable data there is still ALOT of potential datasets and doing similarity comparisons by eye is out of the question. My questions are:
How should I compare for similarity between these bitstrings?
How can I identify "genes" in these bitstrings in an algorithmic (aka programmable) way?

As you want to extract common gene-patterns, what about looking at the intersection of the two strings. So if you have
set1 = 11011101110011...
set2 = 11001100000110...
# apply bitwise '=='
set1 && set2 == 11101110000010...
The result now shows what genes are the same, and could be used in further analysis.

For the similarity part you need to do an exclusive-or (XOR). The result of this bit-wise operation will give you the difference between two bit strings, and is probably the most efficient and easy way of doing it (for pair comparison). As an example:
>>> from bitarray import bitarray
>>> a = bitarray('0001100111')
>>> b = bitarray('0100110110')
>>> a ^ b
bitarray('0101010001')
Then you can either count the differences, inspect quickly where the differences lie, etc.
For the second part, it depends on the representation of course, and on the programming language (PL) chosen for the implementation. Most PL libraries will have a search function, that retrieves all or at least the first of the indexes where some pattern is found in a string (or bitstring, or bitstream...). You just have to refer to the documentation of your chosen PL to know more about the performance if you have more than one option for the task.

NFA to DFA conversion = deterministic?

I am struggling a bit with the meaning of determinism and nondeterminism. I get the difference when it comes to automata, but I can't seem to find an answer for the following: Is a NFA to DFA transformation deterministic?
If multiple DFAs can be constructed for the same regular language, does that mean that the result of a NFA to DFA transformation is not unique? And thus a nondeterministic algorithm?
I'm happy with any information you guys might be able to provide.
Thanks in advance!

There are two different concepts at play here. First, you are correct that there can be many different DFAs equivalent to the same NFA, just as there can be many NFAs that are all equivalent to one another.
Independently, there are several algorithms for converting an NFA into a DFA. The standard algorithm taught in most introductory classes on formal languages is the subset construction (also called the powerset construction). That algorithm is deterministic - there's a specific sequence of steps to follow to convert an NFA to a DFA, and accordingly you'll always get back the same DFA whenever you feed in the same NFA. You could conceivably have a nondeterministic algorithm for converting an NFA to a DFA, where the algorithm might produce one of many different DFAs as output, but to the best of my knowledge there aren't any famous algorithms of this sort.
Hope this helps!

DFA- means deterministic finite automata
Where as NFA- means non deterministic finite automata..
In dfa for every state there is a transition for both the inputs... I we have...{a, b} are the inputs for the given question.. For.. Every state there is a transition for both a and b... That automata is known as deterministic finite automata..
Where as in NDA we need not to have both input transitions for every state... At least one transition... is sufficient...
In NFA Epsilon transition is also accepted.. And dead state is also accepted...
In nfa... No of states required is less.. When compare to dfa.. Every dfa is equivalent to nfa... But every dfa is not equivalent to nfa...