Converting a Context-Free Grammar to a PDA - theory

I am attempting to convert the following CFG to a pushdown automaton:
S → AS | A
A → 0A | 1B | 1
B → 0B | 0
I'm not really sure how to approach this problem, or the problem of CFG->PDA in general.

Conversation of Context free grammar to Pushdown automata:
Steps to convert CFG to Pushdown automata:
Step-1:The first symbol on R.H.S. production must be a terminal symbol.
Step-2:Convert the given productions of CFG into GNF.
Step-3:The PDA will only have one state {q}.
Step-4:The initial symbol of CFG will be the initial symbol in the PDA.
Step-5:For non-terminal symbol, add the following rule:
δ(q, ε, A) = (q, α)
Where the production rule is A → α.
Step-6:For each terminal symbols, add the following rule:
δ(q, a, a) = (q, ε) for every terminal symbol

You may use JFlap application to do it for you.
http://www.jflap.org/
Beyond this there are several other interesting functionalyties in that application that will help you study formal languages.
I've been using it for about two weeks and I'm loving it.

Related

printing stack with variable names with gdb?

I was just reading this article.
In the article, the author uses gdb to look around in a c executable.
At one point, when a breakpoint is hit, the author says to have a look at the stack, and shows this output:
STACK:
0x00007fffffffdf40│+0x0000: 0x00007fffffffe058 → 0x00007fffffffe380
0x00007fffffffdf48│+0x0008: 0x0000000100401050
0x00007fffffffdf50│+0x0010: 0x00007fffffffe050 → 0x0000000000000001
0x00007fffffffdf58│+0x0018: 0x0000000000402004 → “p#ssw0rD”
0x00007fffffffdf60│+0x0020: 0x0000000000000000 ← $rbp
0x00007fffffffdf68│+0x0028: 0x00007ffff7ded0b3 → <__libc_start_main+243> mov edi, eax
0x00007fffffffdf70│+0x0030: 0x00007ffff7ffc620 → 0x0005043700000000
0x00007fffffffdf78│+0x0038: 0x00007fffffffe058 → 0x00007fffffffe380 →
This is nice, but how do I generate this output in gdb?
I've been googling for a while with no luck
Also, in this output there is two different columns of hex adresses, I'm guessing one points to the stack, what is the other one? and which is which?
The author doesn't state it explicitly, but in their gdb output you can see the prompt gef>. This indicates they are likely making use of the gef addon for gdb.
I have never used this addon myself, but you can see in some of the example output on the gef site that the addon has a stack view identical to the output you gave above.
The gef addon makes use of gdb's Python API to provide additional features for gdb, one of which appears to be the alternative stack view.

difference yap and swi-prolog reading canonical lists

I have the following test code trying to read file into a list
open('raw250-split1.pl', read, Stream),
read(Stream,train_xs(TrainXs)),
length(TrainXs, MaxTrain).
I will omit part of the output due to the file is quite large.
It works well with yap,
➜ chill git:(master) ✗ yap [18/06/19| 5:48PM]
% Restoring file /usr/lib/Yap/startup.yss
YAP 6.2.2 (x86_64-linux): Sat Sep 17 13:59:03 UTC 2016
?- open('raw250-split1.pl', read, Stream),
read(Stream, train_xs(TrainXs)),
length(TrainXs, MaxTrain).
MaxTrain = 225,
Stream = '$stream'(3),
TrainXs = [[parse([which,rivers,run,through,states,bordering,new,mexico,/],answer(_A,(river(_A),traverse(_A,_B),next_to(_B,_C),const(_C,stateid('new mexico')))))],
<omited output>
,[parse([what,is,the,largest,state,capital,in,population,?],answer(_ST,largest(_SU,(capital(_ST),population(_ST,_SU)))))]]
But on swi-prolog, it will produce Type error
➜ chill git:(master) ✗ swipl [18/06/19| 7:24PM]
Welcome to SWI-Prolog (threaded, 64 bits, version 7.6.4)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.
For online help and background, visit http://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).
?- open('raw250-split1.pl', read, Stream),
read(Stream, train_xs(TrainXs)),
length(TrainXs, MaxTrain).
ERROR: raw250-split1.pl:4:
Type error: `list' expected, found `parse(which.(rivers.(run.(through.(states.(bordering.(new.(mexico.((/).[])))))))),
<omited output>
,answer(_67604,(state(_67604),next_to(_67604,_67628),const(_67628,stateid(kentucky))))).[].(parse(what.((is).(the.(largest.(state.(capital.(in.(population.((?).[])))))))),answer(_67714,largest(_67720,(capital(_67714),population(_67714,_67720))))).[].[]))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))' (a compound)
In:
[10] throw(error(type_error(list,...),context(...,_67800)))
[7] <user>
Note: some frames are missing due to last-call optimization.
Re-run your program in debug mode (:- debug.) to get more detail.
What might be the problem for the error here?
File raw250-split1.pl can be found from the ftp url below, if you'd like to try it.
Thank you for the help!
I am trying to migrate an earlier code to SWI-Prolog, which was written in
SICStus 3 #3: Thu Sep 12 09:54:27 CDT 1996 or earlier
by Raymond J. Mooney ftp://ftp.cs.utexas.edu/pub/mooney/chill/.
All the questions with this tag are all related to this task. I'm new to prolog, helps and suggestions are welcomed!
The raw250-split1.pl was apparently written using canonical notation. The traditional list functor is ./2 but SWI-Prolog 7.x changed it to '[|]'/2 in order to use ./2 for other purposes. This results in the the variable TrainXs being instantiated by the read/2 call to a compound term whose argument is not a list:
?- open('raw250-split1.pl', read, Stream), read(Stream,train_xs(TrainXs)).
Stream = <stream>(0x7f8975e08e90),
TrainXs = parse(which.(rivers.(run.(through.(states.(bordering.(... . ...)))))), answer(_94, (river(_94), traverse(_94, _100), next_to(_100, _106), const(_106, stateid('new mexico'))))).[].(parse(what.((is).(the.(highest.(point.(... . ...))))), answer(_206, (high_point(_204, _206), const(_204, stateid(montana))))).[].(parse(what.((is).(the.(most.(... . ...)))), answer(_298, largest(_300, (population(_298, _300), state(...), ..., ...)))).[].(parse(through.(which.(states.(... . ...))), answer(_414, (state(_414), const(..., ...), traverse(..., ...)))).[].(parse(what.((is).(... . ...)), answer(_500, longest(_500, river(...)))).[].(parse(how.(... . ...), answer(_566, (..., ...))).[].(parse(... . ..., answer(..., ...)).[].(parse(..., ...).[].(... . ... .(... . ...))))))))).
YAP still uses the ./2 functor for lists, which explains why it can handle it. A workaround for SWI-Prolog is to start it with the --traditional command-line option:
$ swipl --traditional
...
?- open('raw250-split1.pl', read, Stream), read(Stream,train_xs(TrainXs)).
Stream = <stream>(0x7faeb2f77700),
TrainXs = [[parse([which, rivers, run, through, states, bordering|...], answer(_94, (river(_94), traverse(_94, _100), next_to(_100, _106), const(_106, stateid('new mexico')))))], [parse([what, is, the, highest, point|...], answer(_206, (high_point(_204, _206), const(_204, stateid(montana)))))], [parse([what, is, the, most|...], answer(_298, largest(_300, (population(_298, _300), state(...), ..., ...))))], [parse([through, which, states|...], answer(_414, (state(_414), const(..., ...), traverse(..., ...))))], [parse([what, is|...], answer(_500, longest(_500, river(...))))], [parse([how|...], answer(_566, (..., ...)))], [parse([...|...], answer(..., ...))], [parse(..., ...)], [...]|...].
The type error you get is due to the length/2 expecting a list when the first argument is bound.
There is a tilde as last character in that file, causing the syntax being invalid, so you should remove it before reading. I don't know why YAP accept the file as valid, should raise an error AFAIK.
There is a read option dotlists/2 in SWI-Prolog:
dotlists(Bool)
If true (default false), read .(a,[]) as a
list, even if lists are internally nor constructed
using the dot as functor. This is primarily intended
to read the output from write_canonical/1 from
other Prolog systems. See section 5.1.
http://www.swi-prolog.org/pldoc/man?predicate=read_term/2
This gives you the desired result, without changing the mode:
Welcome to SWI-Prolog (threaded, 64 bits, version 8.1.0)
?- read_term(X, [dotlists(true)]).
|: .(a,.(b,.(c,[]))).
X = [a, b, c].

How are various glyphs encoded inside a PDF content stream?

I am working on a program that outputs PDF documents. Given a sequence of UTF-8 encoded characters and the name of a font that shall be used to render it, I would like to show the appropriate glyphs that make the actual content of the document. I would like to be able to display national characters such as č or ö. It would be great to support ligatures like ae or ffi.
The problem is, I do not know how the actual glyphs to be shown are specified (inside a content stream, for example).
If I, for example, want to display the string "Hello World", I need not to worry about encoding, I simply write (Hello World)Tj. The PDF reader will then use the appropriate font to render this string.
But what if I wanted to show the string
It is difficult to read the PDF specification all day. Prostě dočista nemožné!
with the ligatures ffi, fi and ea and the Czech national symbols ě, č and é in a given font, how would I proceed?
I am trying to get through the PDF specification, but it is not easy.
How do I find out the "code of the glyph" that corresponds to a given character or ligature?
How is this code encoded within a PDF content stream?
Help is much appreciated.
Edit: I may have overestimated the problem. Counting the glyphs that are needed to display a "common European document", I cannot think of a way how this number could exceed 256. If my assumptions are correct, I can remap the encoding of the font completely. This should be sufficient to cover all common symbols of the latin alphabet, numbers, punctuation, common symbols like ( and [ and still I would have plenty of room for national symbols, ligatures and other elements of high-quality typography. (I can implement a priority queue to select the most used ligatures if the total number of glyphs shall exceed 256.)
That being said, I do not think I need to use the CID-keyed fonts.
Still I wander how do I map UTF-8 encoded characters onto glyphs of an arbitrary font. I have the AFM of the font available. For the DejaVu font, for example, character information go like this:
C 63 ; WX 536 ; N question ; B 67 -15 488 743 ;
C 64 ; WX 1000 ; N at ; B 65 -174 930 705 ;
C 65 ; WX 722 ; N A ; B -6 0 732 730 ;
But after the 256th character is mapped, the codes are -1:
C 255 ; WX 564 ; N ydieresis ; B -3 -223 563 767 ;
C -1 ; WX 722 ; N Amacron ; B -6 0 732 899 ;
C -1 ; WX 596 ; N amacron ; B 49 -15 568 746 ;
For example, if I had the sequence 11100010 10000010 10101100 (Euro sign) in my input, how would I know what glyph name it corresponds to so that I can map it in the /Encoding dictionary?
Encoding varies based on the font type. Typically, there is a font resource that is defined as the current font and within that font dictionary is a reference to a base font and a means of describing the encoding (via the /Encoding key). If that key doesn't exist, the encoding will be "standard", but you can use other simple encodings such as /MacRoman and /WinAnsi for the value of the encoding, or you can specify a standard encoding and an encoding delta to show the differences.
Easy so far - as long as you're working with 8-bit characters. For many early apps, they would create a couple different fonts, one with say Roman encoding and another that maps roman characters to unavailable characters. In order to do that, your encoding delta would include references to the ligatures and other typically non-encoded symbols. This works great for Type 1 fonts, but is specifically contraindicated by the spec in the section on TrueType Fonts:
A nonsymbolic font should specify MacRomanEncoding or WinAnsiEncoding as the value of its Encoding entry, with no Differences array
This is vastly different when you want to use, say, Unicode. In which case you would be using a CID font (a font based on character IDs). In that case there is a procedure referenced by the font which is used to map from a character encoding in your string to a character ID in your font (and vice versa). I would strongly recommend that you read and fully understand section 9.7 in the PDF specification on Composite Fonts, which describes everything you need in order to encode UTF16BE into strings to get them to render properly in PDF. It is decidedly non-trivial in that there are a lot of details that if missed will result in a blank rendered page in Acrobat.
As a software engineer who professionally writes code that produces and consumes PDF, let me state that when I get tasked with having to put in special cases in my code to deal with non-spec compliant PDF, a little piece of me dies inside. Please, please, don't even think of releasing any documents you produce into the wild until they pass Preflight at the least. This is not the same as "Acrobat renders it so it must be OK." Let me give you an example - I've seen a number of files in the wild that include fonts that are missing the key elements of the FontDescriptor dictionary, including /Ascent, /Descent, /CapHeight, etc. These render in Acrobat, but are in violation of the spec since each of those is required. I know how Acrobat handles that - it comes with an enormous database of font metrics and looks up the value if it can't find it in the file (heck, it might even ignore the metrics in the file). I don't have that luxury, so I have to do a number of (potentially expensive/invalid) stop gap measures.
You might want to consider using a library to do this work for you - maybe iText which has a decent enough licensing scheme for education because, I get it, you're a student. There are some C based libraries too. Maybe you can figure a way to make GhostScript do your bidding.
If you are unwilling or unable to follow my advice with regards to cleaving to the specification or to use a library which ostensibly does so, please do me the favor of at least filling out the /Creator and /Producer strings in the Document Information Dictionary referenced by the trailer (see sections 14.3.3 and section 7.5.5). That way, when I have to parse/consume/manipulate your documents, I will have a way to directly cast aspersions on your parentage.
Let's go top down and start with the page object - I'm using output from my own library and am stripping out what I think you don't need:
1 0 obj <<
/Type /Page
/Parent 18 0 R
/Resources <<
/Font <<
/U0 13 0 R
>>
/ProcSet [ /PDF /Text ]
>>
/MediaBox [ 0 0 612 792 ]
/Contents 19 0 R
/Dur -1
>>
endobj
U0 is a reference to a font that will be used for unicode text.
The content stream is intended to print the following text: Greek: Γειά σου κόσμος.
BT /U0 24 Tf 72 670 Td
(\000G\000r\000e\000e\000k\000:\000 \003\223\003\265\003\271\003\254\000 \003\303\003\277\003\305\000 \003\272\003\314\003\303\003\274\003\277\003\302)
Tj ET
The font dictionary referenced looks like this:
13 0 obj <<
/BaseFont /DejaVuSansCondensed
/DescendantFonts [ 4 0 R ]
/ToUnicode 14 0 R
/Type /Font
/Subtype /Type0
/Encoding /Identity-H
>>
endobj
Which has the /ToUnicode entry points to a stream containing the following PostScript code:
/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> <FFFF> endcodespacerange 1 beginbfrange <0000> <FFFF> <0000> endbfrange endcmap CMapName currentdict /CMap defineresource pop end end
which is defined by the CID font specification.
and the DescendantFonts array points to this object:
4 0 obj <<
/Subtype /CIDFontType2
/Type /Font
/BaseFont /DejaVuSansCondensed
/CIDSystemInfo 7 0 R
/FontDescriptor 8 0 R
/DW 1000
/W 9 0 R
/CIDToGIDMap 10 0 R
>>
The CIDToGIDMap is a compressed stream with the actual map, the CIDSystemInfo is <</Registry (Adobe) /Ordering (USC) /Supplement 0>> (it's a reference because I share it among all unicode fonts that I output. The FontDescriptor is a straight forward boiler plate, and the W array is derived from the font metrics.
With all this detail, are you understanding why I don't say lightly, "walk away before you pollute my environment any furhter"?
I'm really beginning to question the nature of the this assignment. Writing a simple PDF is one thing, but writing code that can handle full unicode in any arbitrary OpenType/TrueType font requires you to understand the CID spec and the TrueType spec (hint: I have a full TrueType parser that can extract all the metrics for any glyph in a font so that I can output the /W array).
If, however, you are required to only output to Type 1 fonts, well my friend, your life got a whole lot easier, because you would take your entire UTF8 stream, read it as unicode and for every unique character that comes in, you build a map from a unicode character to a glyph name and an internal character number by using this table. The internal character number essentially the unique index of the character that came in mod. So for example, if you have less than 257 unique characters on the page, you will have exactly one font that is encoded to map to the characters in the order that the arrived. If you had "abcba" for input, the output string in pdf would be (\000\001\002\001\000) and would map to a font with an encoding dictionary with a differences array that would be [0/a/b/c]. If you have n unique characters where n > 256, you're going to have (n / 256) + 1 fonts, each with encodings.
If your teacher/professor wants anything but Type 1 fonts in a short period of time, s/he has unrealistic expectations for the students and/or low expectations for the quality of output. You should ask whether your are required to handle CID fonts and if you are, then your professor is at the very least a sadist. It took me, a seasoned professional, about 4 days to write a TrueType parser for extracting widths. I had the advantage of (1) using a managed language (C#) which cut down on concerns that will be biting your ass in C and was also able to use reflection to automate parsing and (2) when I don't have interruptions, I write solid code about 10-20 times faster than a typical student, so my 32 hours would translate into 320 student hours, more or less (then again, my code has different constraints than yours - it has to consume any crap font it gets gracefully), so let's call it 200 or less if you're allowed to steal something like stb. That's just for getting one particular element in the font descriptor.

COBOL expression as index in table array

just a short quick question.
How do you index an expression into a COBOL array?
For example, if my index k=1, I would like to do the following to find an element of k=2
element(k+1)
Unfortunately this is not acceptable in COBOL and I would like to know if there is any alternative?
I'm not sure why you think that won't work, as long as you put it in a Cobol statement.
ID DIVISION.
PROGRAM-ID. SUBMOD.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 A-NICELY-NAMED-TABLE.
05 FILLER OCCURS 2 TIMES.
10 A-NICELY-NAMED-ENTRY PIC X.
01 ANOTHER-PIECE-OF-DATA PIC X VALUE SPACE.
01 A-NICELY-NAMED-SUBSCRIPT BINARY
PIC 9(4).
LINKAGE SECTION.
01 L-INPUT PIC X(4).
01 L-TO-HEX PIC BXBXBXBX.
PROCEDURE DIVISION USING L-INPUT L-TO-HEX.
MOVE "A" TO A-NICELY-NAMED-ENTRY ( 1 )
MOVE "B" TO A-NICELY-NAMED-ENTRY ( 2 )
MOVE 1 TO A-NICELY-NAMED-SUBSCRIPT
IF A-NICELY-NAMED-ENTRY ( A-NICELY-NAMED-SUBSCRIPT + 1 )
EQUAL TO "B"
MOVE A-NICELY-NAMED-ENTRY
( A-NICELY-NAMED-SUBSCRIPT + 1 )
TO ANOTHER-PIECE-OF-DATA
END-IF
DISPLAY ">" ANOTHER-PIECE-OF-DATA "<"
GOBACK
.
Output is:
>B<
With reference to your comment, it is not a "strictness" thing by any means. It is that "+ 1" is one thing, a "relative subscript", and "+1" is something else, it is a second subscript.
Depending on your compiler, you may be able to code:
MOVE ELEMENT(k++1) ...
You may have to put up with some moaning from the compiler, and I suppose in some it may not work. It would, however, but a horrible way to write Cobol.
I'd suggest not using names like ELEMENT. Too likely at some point in the future to appear as a "reserved word" for Cobol. Don't be shorthand. Use good names, use effective spacings. It'll help you understand your program a little later, and will help anyone else who has to look at it.

How Cobol dynamic call works using group as program identifier?

I have the following call statement :
038060 CALL PROG USING
038070 DFH
038080 L000
038090 ZONE-E
038100 ZONE-S.
This call is dynamic and use PROG.
PROG is a group defined as :
018630 01 XX00.
018640 10 PROG.
018650 15 XX00-S06 PICTURE X(6)
018660 VALUE SPACE.
018670 15 XX00-S02 PICTURE X(2)
018680 VALUE SPACE.
018690 10 XX00-S92 PICTURE 9(02)
018700 VALUE ZERO.
018710 10 XX00-S91 PICTURE 9(1)
018720 VALUE ZERO.
018730 10 XX00-S9Z PICTURE 9(1)
018740 VALUE ZERO.
018750 10 XX00-9B0 PICTURE X(05)
018760 VALUE SPACE.
018770 10 XX00-0B0 PICTURE X(02)
018780 VALUE SPACE.
018790 10 XX00-BB1 PICTURE X(01)
018800 VALUE SPACE.
018810 10 XX00-SFN PICTURE X(07)
I cut here but there is a lot of field after...
It seems that actual progname to use is stored in :
XX00-S06
and
XX00-S02
I've also other cases where the name is on 3 or 4 fields, and the progname length is not always 8.
So my question is how Cobol know where to pick the good program name in the group? What are the resolution rules?
Configuration : I use Microfocus Net Express compiler and the environment is UniKix.
Dynamic call rules in COBOL are fairly simple. Given something like:
CALL WS-NAME USING...
COBOL will resolve the program name currently stored in WS-NAME against the load module libraries
available to it based on
a linear search. The first matching load module entry point name that matches WS-NAME is used.
It doesn't matter how complex, or simple, the definition of WS-NAME is. The total length used for the name
is whatever the length of WS-NAME is. For example:
01 WS-NAME.
05 WS-NAME-FIRST-PART PIC X(3).
05 WS-NAME-MIDDLE-PART PIC X(2).
05 WS-NAME-LAST-PART PIC X(3).
WS-NAME is composed of 3 subordinate fields giving a total of 8 characters. You can populate these individually or just move
something into WS-NAME as a whole. If the length of WS-NAME is less than 8 characters, the trailing characters will be
set to spaces on any receiving field. For example:
01 WS-SHORT-NAME.
05 WS-SHORT-NAME-FIRST-PART PIC X(4) VALUE 'AAAA'.
05 WS-SHORT-NAME-LAST-PART PIC X(2) VALUE 'BB'.
Here WS-SHORT-NAME is only 6 characters long. MOVING WS-SHORT-NAME to any longer PIC X type variable as in:
MOVE WS-SHORT-NAME TO WS-NAME
Will result in WS-NAME taking on the value 'AAAABBbb' (note the two trailing spaces). During libary search
for a matching entry point name, the trailing spaces are not significant so on the CALL statement you could use
either:
CALL WS-NAME
or
CALL-WS-SHORT-NAME
And they will resolve to the same entry point.
I am not sure what the length rules are for MicroFocus COBOL but, for IBM z/os dynamically called
program names cannot exceed 8 characters (if they do, the name is truncated to 8 characters).
I will add little more to NeilB with specific information about Micro Focus COBOL.
fyi: PROGRAM-ID, ENTRY-POINTS are restricted to 30-31 characters (check your "System Limits and Programming Restrictions" section in the docs).

Resources