How do I design the transition functions for this pushdown automaton? - language-theory

I'm studying for a test on PDA, and I want to know how to design a pushdown automaton that recognizes the following language:
L = {a^max(0,n-m)b^n a^m| n,m >=0}
How can I design a transition function to recognize if n-m is greater than 0?
And please, if you have some course materials with exercises of this level solved, put a link.

you can decide where to go from a current state based on the value on top of the stack, you use symbols on the stack to keep notes about the state of the parsing.
here is how i think this would work:
this are symbols read from the input, THIS are symbols on a stack.
the symbol X on the botoom of the stack means that n <= m
do not confuse X with Z, which is the initial symbol of the stack and helps determine when the stack is empty.
there are probably some problems with this solution, but it the overall approach should be correct.
... and good luck with your test :-)
first you read all the a symbols from the beginning of the string and add A to a stack for each of them, or push X if there was no a
then you read all the b symbols:
if the stack is empty (Z is on top), B is on top or X is on top, then you push another B to the stack.
if stack has A on top, then you remove it.
the last step is to read the final a symbols.
if there is B on the stack, remove it.
if there is X on the stack, then keep it there
if the stack is empty (Z on top), then this must be the end of the string
another edit:
sorry if the above isn't clear ... i'll try to formalize it.
accepting states are (4) and (5), starting state is (1). and it's nondeterministic
and the transition rules:
state (1) : read the first batch of a symbols
(1) a; Z / AZ -> (1)
(1) a; A / AA -> (1)
(1) epsilon; A / A -> (2)
(1) epsilon; Z / Z -> (2)
state (2) : read b symbols
(2) b; Z / BZ -> (2)
(2) b; X / BX -> (2)
(2) b; B / BB -> (2)
(2) b; A / epsilon -> (2)
(2) epsilon; B / B -> (3)
(2) epsilon; X / X -> (3)
(2) epsilon; Z / Z -> (3)
state (3) : read the last as
(3) a; B / nothing -> (3)
(3) epsilon; X / X -> (4)
(3) epsilon; Z / Z -> (5)
state (4) : the trailing as if m > n
(4) a; X / X -> (4)
state (5) is for accepting the exact string when m < n
(and just to be absolutely clear -- when there is no way of a state and the reading cursor is not at the end of the word, then the word is not accepted)
This could maybe be made a bit simpler by using adidional states instead of the stack symbol X, but i guess you can do it yourself :-)

Easiest way is probably to write a grammar for the language and then build a PDA for that.
Easiest way to write a grammar is to first split the language based on that 'max' so its easier to see what is going on
L = L1 \union L2
L1 = { b^n a^m | m >= n >= 0 }
L2 = { a^(n-m) b^n a^m | n >= m >= 0 }
now rewrite L1 and L2 to make them a bit simpler ( j = m-n, k = n-m )
L1 = { b^n a^n a^j | j,n >= 0 }
L2 = { a^k b^k b^m a^m | k,m >= 0 }
these turn into very simple grammars
L1 := BA A
L2 := AB BA
AB := a AB b | \epsilon
BA := b BA a | \epsilon
A := a A | \epsilon
L := L1 | L2
now build a PDA from them -- easiest to use an automated tool, but can be done manually if you need to show all the work

I think that language L is a context-sensitive or type 1 language on the Chompsky hierarchy. I could be wrong.
Language L1 = { a^n b^n c^n : n >= 1 } is an example of a type 1 language., and this looks pretty similar. It sounds like a trick question to me! I don't think there is a type 2 or context-free grammar or PDA that could recognize or generate L.

Related

Questions about an XOR article

https://florian.github.io/xor-trick/
There's a part in the article that reads (the first step operates on (x,y))
x ^= y # => (x ^ y, y)
y ^= x # => (x ^ y, y ^ x ^ y) = (x ^ y, x)
x ^= y # => (x ^ y ^ x, x) = (y, x)
I would have thought second line would be
y ^= x # =>(x ^ y ^ x, y ^ x) = (y, y ^ x)
then third line
x ^= y # => (y, y ^ x ^ y) = (y, x)
If that part of the article is correct and I'm in error about how it should work, any tips about what I'm missing?
Later,the article has
If we analyze the individual bits in u ^ v, then every 0 means that
the bit had the same value in both u and v. Every 1 means that the
bits differed.
Using this, we find the first 1 in u ^ v, i.e. the first position i
where u and v have to differ. Then we partition A as well as the
numbers from 1 to n according to that bit. We end up with two
partitions, each of which contains two sets:
Partition 0
The set of all values from 1 to n where the i-th bit is 0
The set of all values from A where the i-th bit is 0
Partition 1
The set of all values from 1 to n where the i-th bit is 1
The set of all values from A where the i-th bit is 1
Since u and v differ in position i, we know that they have to be in
different partitions.
Let me see if I get this. Partition 0 contains one of u or v, we don't know which. Partition 1 contains v or u, whichever wasn't in Partition 0. We operate each with ^u^v to get u and v respectively. So yes, the array has to be partitioned into two parts, and I think I understand the basis for the partition and why it has to be done (to later isolate u and v after operating ^u^v).
But why is it noted that each partition contains two sets? I'm assuming the ith bit is the first "1" bit in u^v. It wouldn't be enough to partition the array into two parts, one partition with ith bit being 0 and one partition with ith bit being 1? 🤔 What's the significance of the set of all values from 1 to n where th i-th bit is 0, or 1, respectively?
Or is it not significant that the partitions each have two sets, and the sets are just a leftover byproduct of how the partitions were determined?
Thanks for any answers.
You ask two related questions, I'll answer both:
First, in this XOR-int-swap code the commented x and y do not stand for the current variable values, rather they stand for the original values of those variables
x ^= y # => (x ^ y, y)
y ^= x # => (x ^ y, y ^ x ^ y) = (x ^ y, x)
x ^= y # => (x ^ y ^ x, x) = (y, x)
The first operation x ^= y sets x to original_x ^ original_y.
The second operation y ^= x sets y to original_y ^ original_x ^ original_y, which is just original_x.
The third operation x ^= y sets x to original_x ^ original_y ^ original_x, which becomes original_y.
The second part is about the question How can we use XOR to find two missing values in a range/sequence of numbers?.
Your understanding is correct that by using XOR on both the sequence (without missing numbers) and the array (with two missing numbers), our final result is just the XOR of the two missing numbers, since doing XOR with the same number twice cancels itself out.
Since the two missing numbers have to be different numbers, they will differ at some index i, at which their XOR will be 1.
Once we have this index we can just run the algorithm that checks for a single missing number, simply by partitioning the numbers on whether they have the i-th bit set or not. And because the two numbers differ on the i-th bit, each partition will have exactly one of the missing numbers.
why is it noted that each partition contains two sets
The word 'set' might be the confusing part here, what is meant is "a set of numbers that was XORed with", the general idea is that:
we create a variable a = 0
we XOR a with all numbers in the sequence that have the i-th bit set (this is the first 'set')
we XOR a with all numbers in our array that have the i-th bit set (this is the second 'set')
Again the XOR operations cancel each other out, just for the one missing number we executed the XOR only once, so a is 0 ^ missing_number, which is missing_number.
Find a detailed explanation of this method in this answer here.

Understanding "well founded" proofs in Coq

I'm writing a fixpoint that requires an integer to be incremented "towards" zero at every iteration. This is too complicated for Coq to recognize as a decreasing argument automatically and I'm trying prove that my fixpoint will terminate.
I have been copying (what I believe is) an example of a well-foundedness proof for a step function on Z from the standard library. (Here)
Require Import ZArith.Zwf.
Section wf_proof_wf_inc.
Variable c : Z.
Let Z_increment (z:Z) := (z + ((Z.sgn c) * (-1)))%Z.
Lemma Zwf_wf_inc : well_founded (Zwf c).
Proof.
unfold well_founded.
intros a.
Qed.
End wf_proof_wf_inc.
which creates the following context:
c : Z
wf_inc := fun z : Z => (z + Z.sgn c * -1)%Z : Z -> Z
a : Z
============================
Acc (Zwf c) a
My question is what does this goal actually mean?
I thought that the goal I'd have to prove for this would at least involve the step function that I want to show has the "well founded" property, "Z_increment".
The most useful explanation I have looked at is this but I've never worked with the list type that it uses and it doesn't explain what is meant by terms like "accessible".
Basically, you don't need to do a well founded proof, you just need to prove that your function decreases the (natural number) abs(z). More concretely, you can implement abs (z:Z) : nat := z_to_nat (z * Z.sgn z) (with some appropriate conversion to nat) and then use this as a measure with Function, something like Function foo z {measure abs z} := ....
The well founded business is for showing relations are well-founded: the idea is that you can prove your function terminates by showing it "decreases" some well-founded relation R (think of it as <); that is, the definition of f x makes recursive subcalls f y only when R y x. For this to work R has to be well-founded, which intuitively means it has no infinitely descending chains. CPDT's general recursion chapter as a really good explanation of how this really works.
How does this relate to what you're doing? The standard library proves that, for all lower bounds c, x < y is a well-founded relation in Z if additionally its only applied to y >= c. I don't think this applies to you - instead you move towards zero, so you can just decrease abs z with the usual < relation on nats. The standard library already has a proof that this relation is well founded, and that's what Function ... {measure ...} uses.

Big integer addition code

I am trying to immplement big integer addition in CUDA using the following code
__global__ void add(unsigned *A, unsigned *B, unsigned *C/*output*/, int radix){
int id = blockIdx.x * blockDim.x + threadIdx.x;
A[id ] = A[id] + B[id];
C[id ] = A[id]/radix;
__syncthreads();
A[id] = A[id]%radix + ((id>0)?C[id -1]:0);
__syncthreads();
C[id] = A[id];
}
but it does not work properly and also i don't now how to handle the extra carry bit. Thanks
TL;DR build a carry-lookahead adder where each individual additionner adds modulo radix, instead of modulo 2
Additions need incoming carries
The problem in your model is that you have a rippling carry. See Rippling carry adders.
If you were in an FPGA that wouldn't be a problem because they have dedicated logic to do that fast (carry chains, they're cool). But alas, you're on a GPU !
That is, for a given id, you only know the input carry (thus whether you are going to sum A[id]+B[id] or A[id]+B[id]+1) when all the sums with smaller id values have been computed. As a matter of fact, initially, you only know the first carry.
A[3]+B[3] + ? A[2]+B[2] + ? A[1]+B[1] + ? A[0]+B[0] + 0
| | | |
v v v v
C[3] C[2] C[1] C[0]
Characterize the carry output
And each sum also has a carry output, which isn't on the drawing. So you have to think of the addition in this larger scheme as a function with 3 inputs and 2 outputs : (C, c_out) = add(A, B, c_in)
In order to not wait O(n) for the sum to complete (where n is the number of items your sum is cut into), you can precompute all the possible results at each id. That isn't such a huge load of work, since A and B don't change, only the carries. So you have 2 possible outputs : (c_out0, C) = add(A, B, 0) and (c_out1, C') = add(A, B, 1).
Now with all these results, we need to basically implement a carry lookahead unit.
For that, we need to figure out to functions of each sum's carry output P and G :
P a.k.a. all of the following definitions
Propagate
"if a carry comes in, then a carry will go out of this sum"
c_out1 && !c_out0
A + B == radix-1
G a.k.a. all of the following definitions
Generate
"whatever carry comes in, a carry will go out of this sum"
c_out1 && c_out0
c_out0
A + B >= radix
So in other terms, c_out = G or (P and c_in). So now we have a start of an algorithm that can tell us easily for each id the carry output as a function of its carry input directly :
At each id, compute C[id] = A[id]+B[id]+0
Get G[id] = C[id] > radix -1
Get P[id] = C[id] == radix-1
Logarithmic tree
Now we can finish in O(log(n)), even though treeish things are nasty on GPUs, but still shorter than waiting. Indeed, from 2 additions next to each other, we can get a group G and a group P :
For id and id+1 :
step = 2
if id % step == 0, do steps 6 through 10, otherwise, do nothing
group_P = P[id] and P[id+step/2]
group_G = (P[id+step/2] and G[id]) or G[id+step/2]
c_in[id+step/2] = G[id] or (P[id] and c_in[id])
step = step * 2
if step < n, go to 5
At the end (after repeating steps 5-10 for every level of your tree with less ids every time), everything will be expressed in terms of Ps and Gs which you computed, and c_in[0] which is 0. On the wikipedia page there are formulas for the grouping by 4 instead of 2, which will get you an answer in O(log_4(n)) instead of O(log_2(n)).
Hence the end of the algorithm :
At each id, get c_in[id]
return (C[id]+c_in[id]) % radix
Take advantage of hardware
What we really did in this last part, was mimic the circuitry of a carry-lookahead adder with logic. However, we already have additionners in the hardware that do similar things (by definition).
Let us replace our definitions of P and G based on radix by those based on 2 like the logic inside our hardware, mimicking a sum of 2 bits a and b at each stage : if P = a ^ b (xor), and G = a & b (logical and). In other words, a = P or G and b = G. So if we create a intP integer and a intG integer, where each bit is respectively the P and G we computed from each ids sum (limiting us to 64 sums), then the addition (intP | intG) + intG has the exact same carry propagation as our elaborate logical scheme.
The reduction to form these integers will still be a logarithmic operation I guess, but that was to be expected.
The interesting part, is that each bit of the sum is function of its carry input. Indeed, every bit of the sum is eventually function of 3 bits a+b+c_in % 2.
If at that bit P == 1, then a + b == 1, thus a+b+c_in % 2 == !c_in
Otherwise, a+b is either 0 or 2, and a+b+c_in % 2 == c_in
Thus we can trivially form the integer (or rather bit-array) int_cin = ((P|G)+G) ^ P with ^ being xor.
Thus we have an alternate ending to our algorithm, replacing steps 4 and later :
at each id, shift P and G by id : P = P << id and G = G << id
do an OR-reduction to get intG and intP which are the OR of all the P and G for id 0..63
Compute (once) int_cin = ((P|G)+G) ^ P
at each id, get `c_in = int_cin & (1 << id) ? 1 : 0;
return (C[id]+c_in) % radix
PS : Also, watch out for integer overflow in your arrays, if radix is big. If it isn't then the whole thing doesn't really make sense I guess...
PPS : in the alternate ending, if you have more than 64 items, characterize them by their P and G as if radix was 2^64, and re-run the same steps at a higher level (reduction, get c_in) and then get back to the lower level apply 7 with P+G+carry in from higher level

C code to Haskell

So, i would like to convert a part of C code to Haskell. I wrote this part (it's a simplified example of what I want to do) in C, but being the newbie I am in Haskell, I can't really make it work.
float g(int n, float a, float p, float s)
{
int c;
while (n>0)
{
c = n % 2;
if (!c) s += p;
else s -= p;
p *= a;
n--;
}
return s;
}
Anyone got any ideas/solutions?
Lee's translation is already pretty good (well, he confused the odd and even cases(1)), but he fell into a couple of performance traps.
g n a p s =
if n > 0
then
let c = n `mod` 2
s' = (if c == 0 then (-) else (+)) s p
p' = p * a
in g (n-1) a p' s'
else s
He used mod instead of rem. The latter maps to machine division, the former performs additional checks to ensure a non-negative result. Thus mod is a bit slower than rem, and if either satisfies the needs - because they yield identical results in the case where both arguments are non-negative; or because the result is only compared to 0 (both conditions are satisfied here) - rem is preferable. Even better, and a bit more idiomatic is to use even (which uses rem for the reasons mentioned above). The difference is not huge, though.
No type signature. That means that the code is (type-class) polymorphic, and thus no strictness analysis is possible, nor any specialisations. If the code is used in the same module at a specific type, GHC can (and usually will, if optimisations are enabled) create a specialised version for that specific type that allows strictness analysis and some other optimisations (inlining of class methods like (+) etc.), in that case, one does not pay the polymorhism penalty. But if the use site is in a different module, that cannot happen. If (type-class) polymorphic code is desired, one should mark it INLINABLE or INLINE (for GHC < 7), so that its unfolding is exposed in the .hi file and the function can be specialised and optimised at the use site.
Since g is recursive, it cannot be inlined [meaning, GHC cannot inline it; in principle it is possible] at use sites, which often would enable more optimisations than a mere specialisation.
One technique that often allows better optimisation for recursive functions is the worker/wrapper transformation. One creates a wrapper that calls a recursive (local) worker, then the non-recursive wrapper can be inlined, and when the worker is called with known arguments, that can enable further optimisations like constant folding or, in the case of function arguments, inlining. In particular the latter often has an enormous impact, when combined with a static-argument-transformation (arguments that never change in the recursive calls are not passed as arguments to the recursive worker).
In this case, we only have one static argument of type Float, so a worker/wrapper transformation with a SAT typically makes no difference (as a rule of thumb, a SAT pays off when
the static argument is a function
several non-function arguments are static
so by this rule, we shouldn't expect any benefit from w/w + SAT, and in general, there is none). Here we have one special case where w/w + SAT can make a big difference, and that is when the factor a is 1. GHC has {-# RULES #-} that eliminate multiplication by 1 for various types, and with such a short loop body, a multiplication more or less per iteration makes a difference, the running time is reduced by about 40% after points 3 and 4 have been applied. (There are no RULES for multiplication by 0 or by -1 for floating point types because 0*x = 0 resp. (-1)*x = -x don't hold for NaNs.) For all other a, the w/w + SATed
{-# INLINABLE g #-}
g n a p s = worker n p s
where
worker n p s
| n <= 0 = s
| otherwise = let s' = if even n then s + p else s - p
in worker (n-1) a (p*a) s'
does not perform measurably different from the top-level recursive version with the same optimisations done.
Strictness. GHC's strictness analyser is good, but not perfect. It cannot see far enough through the algorithm to determine that the function is
strict in p if n >= 1 (assuming addition - (+) - is strict in both arguments)
also strict in a if n >= 2 (assuming strictness of (*) in both arguments)
and then produce a worker that is strict in both. Instead you get a worker that uses an unboxed Int# for n and an unboxed Float# for s (I'm using the type Int -> Float -> Float -> Float -> Float here, corresponding to the C), and boxed Floats for a and p. Thus in each iteration you get two unboxings and a re-boxing. That costs (relatively) a lot of time, since besides that it's just a bit of simple arithmetic and tests.
Help GHC along a bit, and make the worker (or g itself, if you don't do the worker/wrapper transform) strict in p (bang pattern for example). That is enough to allow GHC producing a worker using unboxed values throughout.
Using division to test parity (not applicable if the type is Int and the LLVM backend is used).
GHC's optimiser hasn't got down to the low-level bits very much yet, so the native code generator emits a division instruction for
x `rem` 2 == 0
and, when the rest of the loop body is as cheap as it is here, that costs a lot of time. LLVM's optimiser has already been taught to replace that with a bitmasking at type Int, so with ghc -O2 -fllvm you don't need to do that manually. With the native code generator, substituting that with
x .&. 1 == 0
(needs import Data.Bits of course) produces a significant speedup (on normal platforms where a bitwise and is much faster than a division).
The final result
{-# INLINABLE g #-}
g n a p s = worker n p s
where
worker k !ap acc
| k > 0 = worker (k-1) (ap*a) (if k .&. (1 :: Int) == 0 then acc + ap else acc - ap)
| otherwise = acc
performs not measurably different (for the tested values) from the result of gcc -O3 -msse2 loop.c, except for a = -1, where gcc replaces the multiplication with a negation (assuming all NaNs equivalent).
(1) He's not alone in that,
c = n % 2;
if (!c) s += p;
else s -= p;
seems to be really tricky, as far as I can see everybody(2) got that wrong.
(2) With one exception ;)
As a first step, let's simplify your code:
float g(int n, float a, float p, float s) {
if (n <= 0) return s;
float s2 = n % 2 == 0 ? s + p : s - p;
return g(n - 1, a, a*p, s2)
}
We have turned your original function into a recursive one that exhibits a certain structure. It's a sequence! We can turn this into Haskell conveniently:
gs :: Bool -> Float -> Float -> Float -> [Float]
gs nb a p s = s : gs (not nb) a (a*p) (if nb then s - p else s + p)
Finally we just need to index this list:
g :: Integer -> Float -> Float -> Float -> Float
g n a p s = gs (even n) a p s !! (n - 1)
The code is not tested, but it should work. If not, it's probably just an off-by-one error.
Here is how I would tackle this problem in Haskell. First, I observe that there are several loops merged into one here: we are
forming a geometric sequence (whose factor is a suitably negative version of p)
taking a prefix of the sequence
summing the result
So my solution follows this structure as well, with a tiny bit of s and p thrown in for good measure because that's what your code does. In a from-scratch version, I'd probably drop those two parameters entirely.
g n a p s = sum (s : take n (iterate (*(-a)) start)) where
start | odd n = -p
| otherwise = p
A fairly direct translation would be:
g n a p s =
if n > 0
then
let c = n `mod` 2
s' = (if c == 0 then (-) else (+)) s p
p' = p * a
in g (n-1) a p' s'
else s
Look at the signature of the g function (i.e., float g (int n, float a, float p, float s)) you know that your Haskell function will receive 4 elements and return a float, thus:
g :: Integer -> Float -> Float -> Float -> Float
let us now look into the loop, we see that n > 0 is the stop case, and n--; will be the decreasing step used on the recursive call. Therefore:
g :: Integer -> Float -> Float -> Float -> Float
g n a p s | n <= 0 = s
to n > 0, you have another conditional if (!(n % 2)) s += p; else s -= p; inside the loop. If n is odd than you will do s += p, p *= a and n--. In Haskell it will be:
g :: Integer -> Float -> Float -> Float -> Float
g n a p s | n <= 0 = s
| odd n = g (n-1) a (p*a) (s+p)
If n is even than you will do s-=p, p*=a; and n--. Thus:
g :: Integer -> Float -> Float -> Float -> Float
g n a p s | n <= 0 = s
| odd n = g (n-1) a (p*a) (s+p)
| otherwise = g (n-1) a (p*a) (s-p)
To expand on #Landei and #MathematicalOrchid 's comments below the question: The algorithm proposed to solve the problem at hand is always O(n). However, if you realize that what you're actually doing is computing a partial sum of the geometric series, you can use the well-known summation formula:
g n a p s = s + (-1)**n * p * ((-a)**n-1) / (-a-1)
This will be faster as the exponentiation can be done faster than O(n) by repeated squaring or other clever methods, which are likely automatically employed for integer powers by modern compilers.
You can encode loops almost-naturally with the Haskell Prelude function until :: (a -> Bool) -> (a -> a) -> a -> a:
g :: Int -> Float -> Float -> Float -> Float
g n a p s =
fst.snd $
until ((<= 0).fst)
(\(n,(!s,!p)) -> (n-1, (if even n then s+p else s-p, p*a)))
(n,(s,p))
The bang-patterns !s and !p mark strictly-calculated intermediate variables, to prevent excessive laziness which would otherwise harm efficiency.
until pred step start repeatedly applies the step function until pred called with the last generated value will hold, starting with initial value start. It can be represented by the pseudocode:
def until (pred, step, start): // well, actually,
while( true ): def until (pred, step, start):
if pred(start): return(start) if pred(start): return(start)
start := step(start) call until(pred, step, step(start))
The first pseudocode is equivalent to the second (which is how until is actually implemented) in the presence of tail call optimization, which is why in many functional languages where TCO is present loops are encoded via recursion.
So in Haskell, until is coded as
until p f x | p x = x
| otherwise = until p f (f x)
But it could have been coded differently, making explicit the interim results:
until p f x = last $ go x -- or, last (go x)
where go x | p x = [x]
| otherwise = x : go (f x)
Using the Haskell standard higher-order functions break and iterate this could be written as a stream-processing code,
until p f x = let (_,(r:_)) = break p (iterate f x) in r
-- or: span (not.p) ....
or just
until p f x = head $ dropWhile (not.p) $ iterate f x -- or, equivalently,
-- head . dropWhile (not.p) . iterate f $ x
If TCO weren't present in a given Haskell implementation, the last version would be the one to use.
Hopefully this makes clearer how the stream-processing code from Daniel Wagner's answer comes about,
g n a p s = s + (sum . take n . iterate (*(-a)) $ if odd n then (-p) else p)
because the predicate involved is about counting down from n, and
fst . snd . head . dropWhile ((> 0).fst) $
iterate (\(n,(!s,!p)) -> (n-1, (if even n then s+p else s-p, p*a)))
(n,(s,p))
===
fst . snd . head . dropWhile ((> 0).fst) $
iterate (\(n,(!s,!p)) -> (n-1, (s+p, p*(-a))))
(n,(s, if odd n then (-p) else p)) -- 0 is even
===
fst . (!! n) $
iterate (\(!s,!p) -> (s+p, p*(-a)))
(s, if odd n then (-p) else p)
===
foldl' (+) s . take n . iterate (*(-a)) $ if odd n then (-p) else p
In pure FP, the stream-processing paradigm makes all history of a computation available, as a stream (list) of values.

How can I prove if this language is regular or not?

How can I prove if this language is regular or not?
L = {an bn: n≥1} union {an bn+2: n≥1}
I'll give an approach and a sketch of a prove, there might be some holes in it that I believe you can fill yourself.
The idea is to use nerode's theorem - show that there are infinte number of equivalence groups for RL - and from the theorem you can derive that the language is irregular.
Define two types of sets:
G_j = {anb k | n-k = j , k≥1} for each j in
[-2,-1,0,1,...]
H_j = {aj } for each j in
[0,1,...]
G_illegal = {0,1}* / (G_j U H_j) [for each j in the specified range]
It is easy to see that for each x in G_illegal, and for each z in {a,b}*: xz is not in L.
So, for every x,y in G_illegal and for each z in {a,b}*: xz in L <-> yz in L.
Also, for each z in {a,b}* - and for each x,y in some G_j [same j for both]:
if z contains a, both xz and yz are not in L
if z = bj, then xz = an bk bj, and since k+j = n - xz is in L. Same applies for y, so yz is in L.
if z = bj+2, then xz = an bk bj+2, and since k+j+2 = n+2 - xz is in L. Same applies for y, so yz is in L.
otherwise, x is bi such that i≠j and i≠j+2, and you get that both xz and yz are not in L.
So, for every j and for every x,y in G_j and for each z in {a,b}*: xz in L <-> yz in L.
Prove the same for every H_j using the same approach.
Also, it is easy to show that for each x G_j U H_j, and for each y in G_illegal - for z = bj, xz is in L and yz is not in L.
For x in G_j, and y in H_i, for z = abj+1 - it is easy to see that xz is not in L and yz is in L.
It is easy to see that for x,y in G_j and G_i respectively or x, y in H_j, H_i - for z = bj: xz is in L while yz is not.
We just proved that the sets we created are actually the equivalence relations for RL from nerode's theorem, and since we have infinite number of these sets, each is an equivalence relation for RL [we have H_j and G_j for every j] - we can derive from nerode's theorem that the language L is irregular.
You could just use the pumping lemma for regular languages. It basically says that if you can find a string for any given integer n and any partition of this string into xyz such that |xy| <= n, and |y| > 0, then you can pump the y part of the string, and it has to stay in the language, that means, if xy^iz it's not in the language for some i, then the language is not regular.
The proof goes like this, kind of a adversary proof. Suppose someone tells you that this language is regular. Then ask him for a number n > 0. You build a convenient string of length greater than n, and you give to the adversary. He partitions the string in x, y z, in any way he wants, as long as |xy| <= n. Then you have to pump y (repeat it i times) until you find a string that is not in that same language.
In this case, I tell you: give me n. You fix n. The I tell you: take the string "a^n b^{n+2}", and tell you to split it. In any way that you can split this string, you will always have to make y = a^k, with k > 0, since you are force to make |xy| <= n, and my string begins with a^n. Here is the trick, you give the adversary a string such that any way he can split it, he gives you a part that you can pump. So now we pump y, let's say, 0 times, and you get "a^m b^{n+2}" with m < n, which is not in your language. Done. We can also pump a 1 time, n times, n! factorial times, anything you need to make it leave the language.
The proof of this theorem goes around saying that if you have a regular language then you have an automaton with n states for some fixed n. If a string has more than n characters, then it must go through some cycle in your automaton. If we name x the part of the string before entering the cycle, and y the part in the cycle, it's clear that we can pump y as many times as we want, because we can keep running on the cycle as many times as we want, and the resulting string has to be in the language, because it will be recognized by that automaton. To use the theorem to prove for non-regularity, since we don't know how the supposed automaton will be, we have to leave to the adversary the choose for n and for the position of the cycle inside the automaton (there will be no automaton, but you say to the adversary something like: dare to give me an automaton and I will show you it cannot exist.)

Resources