Extracting Arabic Proper Names from a text using Stanford-Parser - text-parsing

I am trying to extract Arabic proper names from a text using Stanford Parser.
for example if I have an input sentence:
تكريم سعد الدين الشاذلى
using the Arabic Stanford parser, the tree diagram will be:
(ROOT (NP (NN تكريم) (NP (NNP سعد) (DTNNP الدين) (NNP الشاذلى))))
I want to extract the proper name:
سعد الدين الشاذلى
which have the sub-tree:
(NP (NNP سعد) (DTNNP الدين) (NNP الشاذلى))
I have tried this: similar question
but there is some thing wrong in this line:
List<TaggedWord> taggedWords = (Tree) lp.apply(str);
the error in putting a tree type in a list of taggedword
another thing that I didnot understand that where could i use the suggested taggedYield() function
Any Ideas, please?

This is pretty basic Java with respect to the library, but what you want is:
Tree tree = lp.apply(str);
List<TaggedWord> taggedWords = tree.taggedYield();
for (TaggedWord tw : taggedWords) {
if (tw.tag().contains("NNP")) {
System.err.println(tw.word());
}
}

Related

How to get the node name with Graphviz and libcgraph?

I'm trying to update some old code which used to work with Graphviz 2.26 and iterated over all the nodes of a graph and did something with their names:
for (Agnode_t *n = agfstnode(graph); n; n = agnxtnode(graph, n)) {
... use n->name ...
}
However in recent (2.30+?) versions of Graphviz, cgraph library is used for node representation and it doesn't have name field in its Agnode_t struct.
I know about agnode() function which allows to lookup the node by name, but there doesn't seem to be any function to go in the other direction. Am I missing something or is there really no way to access the name of the existing node with cgraph?
You can use the function agnameof, which is listed in the "Generic Objects" section of the cgraph manpage:
char *agnameof(void*);

How to visualize a graph made with igraph in C?

I started to learn igraph in C and I was wondering how can I visualize a graph made with this library. I've seen that with igraph R one just use the plot function and the graph is plotted but if I use C, should I print the graph in a file and then use another program to visualize it or which is the usual way?
Thanks!
edit: This kind of graph.
Follow the Unix philosophy, and have your program output the description of the graph (in text format, or in an easily processed form if no pure text format is easily available).
(Note that this also applies to image formats; the NetPBM (.pnm, or .pbm, .pgm, and .ppm) formats are easy to generate in pure C (to e.g. standard output), and if necessary, the NetPBM tools can be used to convert to any other image format you might wish.)
For example, if your program outputs
graph {
rankdir=LR;
"6" -- "4";
"4" -- "5";
"3" -- "4";
"3" -- "2";
"5" -- "2";
"5" -- "1";
"2" -- "1";
}
then redirecting the output to e.g. output.dot and running dot -Tx11 output.dot will output a graph similar to the one shown in the Wikipedia Graph article,
You mention that you are using igraph, and luckily this library already supports writing graphs in the DOT format. See the igraph_write_graph_dot() function.
The DOT language is specified here, but it really is quite simple. -- denotes an undirected edge, and -> a directed edge. The rankdir=LR; line is a graph attribute, and tells DOT that it should try to order the nodes seen from left to right. The default is from top to bottom. You can add node attributes too, for example "6" [ label="Six" ]; would change the label of node "6" to Six. Edge attributes work exactly the same way; so using "2" -- "1" [ taillabel="Z" ]; adds "Z" near node "2" end of the edge between nodes "2" and "1". It is best to quote node names, even though the quotes are not necessary if the node name starts with a letter and does not match a graph attribute name.
Here is a useful hint, when printing trees or linked lists:
Use %p (a pointer to the node) as the node name, and label="value" to set the visible label of the node to value. For example, if you have
struct node {
struct node *left;
struct node *right;
int value;
};
then a simple function pair,
void print_tree_recursive(FILE *out, struct node *curr)
{
fprintf(out, " \"%p\" [ label=\"%d\" ];\n", (void *)curr, curr->value);
if (curr->left) {
print_tree_recursive(out, curr->left);
fprintf(out, " \"%p\" -> \"%p\" [ taillabel="L" ];\n", curr, curr->left);
}
if (curr->right) {
print_tree_recursive(out, curr->right);
fprintf(out, " \"%p\" -> \"%p\" [ taillabel="R" ];\n", curr, curr->right);
}
}
void print_tree(FILE *out, struct node *tree)
{
fprintf(out, "digraph {\n");
if (tree)
print_tree_recursive(out, tree);
fprintf(out, "}\n");
fflush(out);
}
will print a nice directed graph of any tree. It is easy to modify to print linked lists (both singly and doubly linked). Note how the helper function describes the node first (the fprintf with label=), and the edges separately (the fprintfs with taillabel=).
If you print the graph to standard output, you can either redirect the output to a file and display or convert it using dot -Tformat filename, or you can pipe the output directly to | dot -Tx11 to see the generated graph.
I frequently use the Graphviz DOT format for checking whether my mental picture of data structure linkage matches the reality. I find it an extremely useful tool, and keep recommending it for anyone working with complex data structures.
To plot directed graphs try GraphViz (https://www.graphviz.org).
Alternatively you could use a tool like Gephi (https://gephi.org) if you are willing to write the data into a file in a manner compliant with one of their supported formats (https://gephi.org/users/supported-graph-formats/). GML looks pretty straight forward.

How to get values from a config file

I have the following Config.cfg
[DD]
user=**
password=***
database=***
IPServidor=****
port=***
[Controller]
Control1=8
Temp=5
Hum=7
Link=8
Volt=9
[Controller]
Control2=10
Temp=5
Hum=7
Link=8
Volt=9
I would like to read the values of the controllers only and print them to the screen like
Controller_8: 5,7,8,9
I do not want to use libconfig or glib because I have problem with undefined functions. I did the installation, I have the headers but I do not know why it does not work. So I want another solution. My first thought is with the usage of strchr to find the lines which I want (to ignore [DD] table in my case) and with the usage of strtok to get only the values of temp,hum,link,volt
char buffer1[100];
FILE *f = fopen("/home/pi/Desktop/Config.cfg","r");
while(fgets(buffer1, sizeof(buffer1), f))
{
printf("%s",buffer1);
char *pos1 = strchr(buffer1,'Controller');
if (pos1)
{
item = strtok (buffer1,"Control");
printf("Results: %s\n", buffer1);
}
}
The above code is not correct. Is just a thought. Is there any better way?
Don't try parsing ini files, use some existing library.
Ini file parsing is included in a number of "frameworks", for instance in Gtk+ or on Windows. If you can't access those, you can still use some standalone library, for instance: http://ndevilla.free.fr/iniparser/

Trying to make match on a rule that uses "recursive" identifier in flex

I have this line:
0, 6 -> W(1) L(#);
or
\# -> #shift_right R W(1) L
I have to parse this line with flex, and take every element from every part of the arrow and put it in a list. I know how to match simple things, but I don't know how to match multiple things with the same rule. I'm not allowed to increase the limit for rules. I have a hint: parse the pieces, pieces will then combine, and I can use states, but I don't know how to do that, and I can't find examples on the net. Can someone help me?
So, here an example:
{
a -> W(b) #invert_loop;
b -> W(a) #invert_loop;
-> L(#)
}
When this section begins I have to create a structure for each line, where I put what is on the left of -> in a vector, those are some parameters, and the right side in a list, where each term is kinda another structure. For what is on the right side I wrote rules:
writex W([a-zA-Z0-9.#]) for W(anything).
So I need to parse these lines, so I can put the parameters and the structures int the big structure. Something like this(for the first line):
new bigStruc with param = a and list of struct = W(anything), #invert(it is a notation for a reference to another structure)
So what I need is to know how to parse these line so that I can create and create and fill these bigStruct, also using to rules for simple structure(i have all I need for these structures, but I don't how to parse so that I can use these methods).
Sorry for my English and I hope this time I was more clear on what I want.
Last-minute editing: I have matched the whole line with a rule, and then work on it with strtok. There is a way to use previous rules to see what type of structure i have to create? I mean not to stay and put a lots of if, but to use writex W([a-zA-Z0-9.#]) to know that i have to create that kind of structure?
Ok, lets see how this snippet works for you:
// these are exclusive rules, so they do not overlap, for inclusive rules, use %s
%x dataStructure
%x addRules
%%
<dataStructure>-> { BEGIN addRules; }
\{ { BEGIN dataStructure; }
<addRules>; { BEGIN dataStructure; }
<dataStructure>\} { BEGIN INITIAL; }
<dataStructure>[^,]+ { ECHO; } //this will output each comma separated token
<dataStructure>. { } //ignore anything else
<dataStructure>\n { } //ignore anything else
<addRules>[^ ]+ { ECHO; } //this will output each space separated rule
<addRules>. { } //ignore anything else
<addRules>\n { } //ignore anything else
%%
I'm not entirely sure what it it you want. Edit your original post to include the contents of your comments, with examples, and please structure your English better. If you can't explain what you want without contradicting yourself, I can't help you.

apr-utils apr_strmatch regex syntax

I want to port the following regex from python:
HASH_REGEX = re.compile("([a-fA-F0-9]{32})")
if HASH_REGEX.match(target):
print "We have match"
to C with apr-utils apr_strmatch function:
pattern = apr_strmatch_precompile(pool, "([a-fA-F0-9]{32})", 0);
if (NULL != apr_strmatch(pattern, target, strlen(target)) {
printf("We have match!\n");
}
The problem is that I don't understand what syntax of regex (or dialect) apr-utils apr_strmatch function is using. Search for documentation and examples ended with no results.
Thanks for your advices in advance...
apr_strmatch doesn't do regular expression matching at all; it does ordinary substring search using the Boyer–Moore–Horspool algorithm (see source).
For RE matching in C, try PCRE.

Resources