Regular Languages and Concatenation - concatenation

Regular languages are closed under concatenation - this is demonstrable by having the accepting state(s) of one language with an epsilon transition to the start state of the next language.
If we consider the language L = {a^n | n >=0}, this language is regular (it is simply a*). If we concatenate it with another language L = {b^n | n >=0}, which is also regular, we end up with a^nb^n, but we obviously know this isn't regular.
Where am I going wrong with my logic here?

The definition of the concatenation of two languages L1 and L2 is the set of all strings wx where w ∈ L1 and x ∈ L2. This means that L1L2 consists of all possible strings formed by pairing one string from L1 and one string from L2, which isn't necessarily the same as pairing up matching strings from each language.
As a result, as #Oli Charlesworth pointed out, the language you get back here isn't actually { anbn | n in N }. Instead, it's the language { anbm | n in N and m in N }, which is the language a*b*. This language is regular, since it's given by the regular languages.
Hope this helps!

Related

Regex with no 2 consecutive a's and b's

I have been trying out some regular expressions lately. Now, I have 3 symbols a, b and c.
I first looked at a case where I don't want 2 consecutive a's. The regex would be something like:
((b|c + a(b|c))*(a + epsilon)
Now I'm wondering if there's a way to generalize this problem to say something like:
A regular expression with no two consecutive a's and no two consecutive b's. I tried stuff like:
(a(b|c) + b(a|c) + c)* (a + b + epsilon)
But this accepts inputs such as"abba" or "baab" which will have 2 consecutive a's (or b's) which is not what I want. Can anyone suggest me a way out?
If you can't do a negative match then perhaps you can use negative lookahead to exclude strings matching aa and bb? Something like the following (see Regex 101 for more information):
(?!.*(aa|bb).*)^.*$
I (think I) solved this by hand-drawing a finite state machine, then, generating a regex using FSM2Regex. The state machine is written below (with the syntax from the site):
#states
s0
s1
s2
s3
#initial
s0
#accepting
s1
s2
s3
#alphabet
a
b
c
#transitions
s0:a>s1
s0:b>s2
s0:c>s3
s1:b>s2
s1:c>s3
s2:a>s1
s2:c>s3
s3:c>s3
s3:a>s1
s3:b>s2
If you look at the transitions, you'll notice it's fairly straightforward- I have states that correspond to a "sink" for each letter of the alphabet, and I only allow transitions out of that state for other letters (not the "sink" letter). For example, s1 is the "sink" for a. From all other states, you can get to s1 with an a. Once you're in s1, though, you can only get out of it with a b or a c, which have their own "sinks" s2 and s3 respectively. Because we can repeat c, s3 has a transition to itself on the character c. Paste the block text into the site, and it'll draw all this out for you, and generate the regex.
The regex it generated for me is:
c+cc*(c+$+b+a)+(b+cc*b)(cc*b)*(c+cc*(c+$+b+a)+$+a)+(a+cc*a+(b+cc*b)(cc*b)*(a+cc*a))(cc*a+(b+cc*b)(cc*b)*(a+cc*a))*(c+cc*(c+$+b+a)+(b+cc*b)(cc*b)*(c+cc*(c+$+b+a)+$+a)+b+$)+b+a
Which, I'm pretty sure, is not optimal :)
EDIT: The generated regex uses + as the choice operator (usually known to us coders as |), which means it's probably not suitable to pasting into code. However, I'm too scared to change it and risk ruining my regex :)
You can use back references to match the prev char
string input = "acbbaacbba";
string pattern = #"([ab])\1";
var matchList = Regex.Matches(input, pattern);
This pattern will match: bb, aa and bb. If you don't have any match in your input pattern, it means that it does not contain a repeated a or b.
Explanation:
([ab]): define a group, you can extend your symbols here
\1: back referencing the group, so for example, when 'a' is matched, \1 would be 'a'
check this page: http://www.regular-expressions.info/backref.html

Union and intersection of two languages

Given the language
L1 = {(i|l)p(f|g)n(f|h)m(f|i)r(l|m)p : n + m > r > 0, p >= 0}
and
L2 = (f|g)*(h|i)+
make an automaton for L1 ∪ L2 and (another one for) L1 ∩ L2.
I know that the L1 is a CFL and you need a PDA to parse it and I know that L2 is a RL and a DFA is to be used.
My question is: how do you make the intersection (and the union)? That is, what is the actual language L3 = L1 ∩ L2 on which you make the automaton and how do you compute it?
Union is easy - if you have PDAs for two languages, a nondeterministic PDA that accepts the union can be had by introducing a new initial state with epsilon transitions to each of the initial states of the automata for your languages. The accepted language will be exactly whatever is accepted by either (or both) automata.
Intersection is a little more tricky. Now, you say you want an automaton - that could refer to any theoretical machine, including Turing machines or, equivalently, two-stack PDAs. If you are OK with creating a two-stack PDA then your task is straightforward. Use the Cartesian product machine construction used to construct a DFA for A and B given DFAs for A and B, and have each transition of the form ((a, b), w) -> (a', b') push a symbol (x, y) onto the stack iff (a, w)->a' pushes x onto A's stack and (b, w)->b' pushes y onto B's stack.
Of course, if you start with a DFA for the RL and perform the above construction you only need one stack and the resulting automaton is a PDA, not a two-stack PDA.

Concatenation of a infinite language and a finite language

Why is it that the concatenation of a infinite language and a finite language always finite iff the language is not the empty set? I thought concatenating an infinite language with the empty set would just be the infinite language.
This statement is false. Try concatenating Σ* and Σ. This gives back Σ+, which is infinite.
I think that you meant
The concatenation of an infinite language I and a finite language F is infinite iff F ≠ ∅
This statement is true. If F = ∅ then IF is the empty set because the concatenation of any language and the empty language is the empty language. Specifically, IF = { wx | w in I and x in F }, so if F is empty, there are no x's that satisfy the condition x in F.
On the other hand, if x ≠ ∅ we can prove that IF is infinite. Consider any string w ∈ I whose length is longer than any string in F. Then the set wF = { wx | x in F } is infinite because there's at least one string in wF in-between any two multiples of |w|. Since wF ⊆ IF, this means that IF is infinite.
Hope this helps!

Context Free pumping lemma

Is the following language context free?
L = {a^i b^k c^r d^s | i+s = k+r, i,k,r,s >= 0}
I've tried to come up with a context free grammar to generate this but I can not, so I'm assuming its not context free. As for my proof through contradiction:
Assume that L is context free,
Let p be the constant given by the pumping lemma,
Choose string S = a^p b^p c^p d^p where S = uvwxy
As |vwx| <= p, then at most vwx can contain two distinct symbols:
case a) vwx contains only a single type of symbol, therefore uv^2wx^2y will result in i+s != k+r
case b) vwx contains two types of symbols:
i) vwx is composed of b's and c's, therefore uv^2wx^2y will result in i+s != k+r
Now my problem is that if vwx is composed of either a's and b's, or c's and d's, then pumping them won't necessary break the language as i and k or s and r could increase in unison resulting in i+s == k+r.
Am I doing something wrong or is this a context free language?
I can't come up with a CFG to generate that particular language at the top of my head either, but we know that a language is context free iff some pushdown automata recognizes it.
Designing such a PDA won't be too difficult. Some ideas to get you started:
we know i+s=k+r. Equivalently, i-k-r+s = 0 (I wrote it in that order since that is the order in they appear). The crux of the problem is deciding what to do with the stack if (k+r)>i.
If you aren't familiar with PDA's or cannot use them to answer the problem, at least you know now that it is Context Free.
Good luck!
Here is a grammar that accepts this language:
A -> aAd
A -> B
A -> C
B -> aBc
B -> D
C -> bCd
C -> D
D -> bDc
D -> ε

Subset Sum TI Basic Programming

I'm trying to program my TI-83 to do a subset sum search. So, given a list of length N, I want to find all lists of given length L, that sum to a given value V.
This is a little bit different than the regular subset sum problem because I am only searching for subsets of given lengths, not all lengths, and recursion is not necessarily the first choice because I can't call the program I'm working in.
I am able to easily accomplish the task with nested loops, but that is becoming cumbersome for values of L greater than 5. I'm trying for dynamic solutions, but am not getting anywhere.
Really, at this point, I am just trying to get the list references correct, so that's what I'm looking at. Let's go with an example:
L1={p,q,r,s,t,u}
so
N=6
let's look for all subsets of length 3 to keep it relatively short, so L = 3 (6c3 = 20 total outputs).
Ideally the list references that would be searched are:
{1,2,3}
{1,2,4}
{1,2,5}
{1,2,6}
{1,3,4}
{1,3,5}
{1,3,6}
{1,4,5}
{1,4,6}
{1,5,6}
{2,3,4}
{2,3,5}
{2,3,6}
{2,4,5}
{2,4,6}
{2,5,6}
{3,4,5}
{3,4,6}
{3,5,6}
{4,5,6}
Obviously accomplished by:
FOR A,1,N-2
FOR B,A+1,N-1
FOR C,B+1,N
display {A,B,C}
END
END
END
I initially sort the data of N descending which allows me to search for criteria that shorten the search, and using FOR loops screws it up a little at different places when I increment the values of A, B and C within the loops.
I am also looking for better dynamic solutions. I've done some research on the web, but I can't seem to adapt what is out there to my particular situation.
Any help would be appreciated. I am trying to keep it brief enough as to not write a novel but explain what I am trying to get at. I can provide more details as needed.
For optimisation, you simply want to skip those sub-trees of the search where you already now they'll exceed the value V. Recursion is the way to go but, since you've already ruled that out, you're going to be best off setting an upper limit on the allowed depths.
I'd go for something like this (for a depth of 3):
N is the total number of array elements.
L is the desired length (3).
V is the desired sum
Y[] is the array
Z is the total
Z = 0
IF Z <= V
FOR A,1,N-L
Z = Z + Y[A]
IF Z <= V
FOR B,A+1,N-L+1
Z = Z + Y[B]
IF Z <= V
FOR C,B+1,N-L+2
Z = Z + Y[C]
IF Z = V
DISPLAY {A,B,C}
END
Z = Z - Y[C]
END
END
Z = Z - Y[B]
END
END
Z = Z - Y[A]
END
END
Now that's pretty convoluted but it basically check at every stage whether you've already exceed the desired value and refuses to check lower sub-trees as an efficiency measure. It also keeps a running total for the current level so that it doesn't have to do a large number of additions when checking at lower levels. That's the adding and subtracting of the array values against Z.
It's going to get even more complicated when you modify it to handle more depth (by using variables from D to K for 11 levels (more if you're willing to move N and L down to W and X or if TI BASIC allows more than one character in a variable name).
The only other non-recursive way I can think of doing that is to use an array of value groups to emulate recursion with iteration, and that will look only slightly less hairy (although the code should be less nested).

Resources