libxml2 and XPath traversing children and siblings in ANSI C - c

I have done a fair bit of XML stuff in Perl and now I need to do it in ANDI C for a project. Here's the code I wrote with a snippet of the XML. I have had success to a degree, but am having problems with getting siblings, I am sure it's super easy but I just can't get it. There is two functions, one that simply gets the node set (copied directly from xmlsoft.org). The second function is mine.
xmlXPathObjectPtr getnodeset (xmlDocPtr doc, xmlChar *xpath){
xmlXPathContextPtr context;
xmlXPathObjectPtr result;
context = xmlXPathNewContext(doc);
if (context == NULL) {
printf("Error in xmlXPathNewContext\n");
return NULL;
}
result = xmlXPathEvalExpression(xpath, context);
xmlXPathFreeContext(context);
if (result == NULL) {
printf("Error in xmlXPathEvalExpression\n");
return NULL;
}
if(xmlXPathNodeSetIsEmpty(result->nodesetval)){
xmlXPathFreeObject(result);
printf("No result\n");
return NULL;
}
return result;
}
void reader(xmlDocPtr xmlDoc, char *xpath)
{
xmlXPathObjectPtr xpathresult;
xmlNodeSetPtr node;
xmlNodeSetPtr node2;
xmlChar *title;
int cnt;
// parse feed in memory to xml object
doc = xmlReadMemory(xmlDoc,strlen(xmlDoc),"noname.xml",NULL,0);
if (!doc) criterr("Error parsing xml document");
// get xpath node set (ttn retrieves the value from the token table)
xpathresult = getnodeset(doc, ( xmlChar * ) xpath);
if (xpathresult) {
node = xpathresult->nodesetval;
printf("Content-type: text/html\n\n");
for (cnt=0;cnt<node->nodeNr; cnt++) {
title = xmlNodeListGetString(doc, node->nodeTab[cnt]->xmlChildrenNode,1);
printf("%d) title= %s<br/>\n",cnt,title);
xmlFree(title);
}
xmlXPathFreeObject(xpathresult);
xmlFreeDoc(doc);
xmlCleanupParser();
} else {
criterr("Xpath failed");
}
xmlFreeDoc(doc);
criterr("Success");
}
and the xml snippet
<item>
<title>this is the title</title>
<link>this is the link</link>
<description>this is the description</description>
</item>
if I use an XPath like //item/title I get all the titles, but what I really want is to get the item and then in the node->nodeNr loop, be able to get the title, link and description easily as I have 100's of 'item' blocks, I'm just not sure how to get the children or siblings of that block easily.

Use xmlNextElementSibling. How does one locate it? Go to Tree API, search for sibling.
And this is your loop now getting also the link.
for (cnt=0;cnt<node->nodeNr; cnt++) {
xmlNodePtr titleNode = node->nodeTab[cnt];
// titleNode->next gives empty text element, so better:
xmlNodePtr linkNode = xmlNextElementSibling(titleNode);
title = xmlNodeListGetString(doc, titleNode->xmlChildrenNode,1);
link = xmlNodeListGetString(doc, linkNode->xmlChildrenNode,1);
printf("%d) title= %s<br/>, link=%s\n",cnt,title,link);
xmlFree(title);
xmlFree(link);
}
titleNode->next may also be made to point the link, see how to get these XML elements with libxml2?.
And getting children? xmlFirstElementChild and loop while node->next.

Related

libxml2 get offset into XML text of node

I need to know at which offset into an xml string a specific arbitrary node somewhere in dom can be found after xmlReadMemory was used to get dom. The problem is I can't figure out where to get the xmlParserCtxtPtr from to pass as first argument to xmlParserFindNodeInfo because my entire process of parsing yields no such context; only a xmlDoc.
The following code worked for me (libxml2 documentation leaves little to desire, had to download source code and dig in the lib until I understood enough to hack this together). The key is:
xmlSetFeature(ctxt, "gather line info", (void *)&v);
Here is some code to illustrate:
const char *xml = ...
xmlParserCtxt *ctxt = NULL;
xmlDoc *doc = NULL;
if (!(ctxt = xmlCreateDocParserCtxt((const unsigned char *)xml)))
return -1;
int v = 1;
xmlSetFeature(ctxt, "gather line info", (void *)&v);
if (xmlParseDocument(ctxt) == -1)
{
xmlFreeParserCtxt(ctxt);
return -1;
}
else
{
if ((ctxt->wellFormed) || ctxt->recovery)
doc = ctxt->myDoc;
else
{
xmlFreeParserCtxt(ctxt);
return -1;
}
}
// use doc to get a node and then xmlParserFindNodeInfo(ctxt, node)
…
xmlFreeParserCtxt(ctxt);

Returning error when traversing through a tree

I am trying to calculate the frequency of each node as I add them to the tree, instead of inserting a new element. For some reason when a comparing a new key to every element in the current tree, the if statement will not return 1 if they are both identical. BUT, the function will still add 1 to the frequency of the existing node. This is very puzzling to me, as I don't know why it would skip over the return 1, and continue searching through the tree. Thank you for help/advice in advance.
struct:
typedef struct node {
char* key;
struct node *left;
struct node *right;
int height;
int frequency;
}node;
This is my parsing function:
while(fgets(str, 100, textFile)) {
token = strtok(str," \n");
while (token != NULL)
{
key = strdup(token);
if((sameFrequency(root, key)==1)&&root!=NULL) {
printf("%s", key);
free(key);
token = strtok (NULL, " \n");
}
else {
root = insert(root, key);
//printf("%s\n", key);
free(key);
token = strtok (NULL, " \n");
}
}
if(ferror(textFile))
{
printf("you done messed up a-a-ron");
return(0);
}
}
Function to check the frequency of each node:
int sameFrequency(node *node, char* key) {
if (node != NULL) {
if(strcmp(key, node->key)==0){ //This statement is true in some cases, but will not return the 1
node->frequency = node->frequency+1;
printf("%d\n",node->frequency);
return 1;
}
sameFrequency(node->left, key);
sameFrequency(node->right, key);
}
else return 0;
}
Input would look something like this:
wrn69 flr830 flr662 flr830 flr830
flr231
The output (after printing in preOrder):
key: wrn69, frequency: 1
key: flr830, frequency: 3
key: flr662, frequency: 1
key: flr231, frequency: 1
key: flr830, frequency: 1
key: flr830, frequency: 1
I want this to print everything shown, but I don't want the same key to be inserted into the tree, just incremement its frequency by 1.
TL;DR: Function skipping over return value, but still running code in the if statement, have no idea whats wrong, even after debugging.
I'm not sure what your code is trying to do, since you do not define your node struct, however your function int sameFrequency(node *node, char* key) has an obvious bug: not all code paths return a value. Reformatting a bit for clarity, you can see that if strcmp(key, key)!=0 then the return is undefined:
int sameFrequency(node *node, char* key) {
if (node != NULL) {
if(strcmp(key, node->key)==0){
node->frequency = node->frequency+1;
printf("%d\n",node->frequency);
return 1;
}
else {
sameFrequency(node->left, key);
sameFrequency(node->right, key);
// Continue on out of the "if" statements without returning anything.
}
}
else {
return 0;
}
// NO RETURN STATEMENT HERE
}
My compiler generates a warning for this:
warning C4715: 'sameFrequency' : not all control paths return a value
Surely yours must be doing so as well, unless you intentionally disabled them. Such warnings are important, and should always be cleared up before finishing your code.
I'm guessing you want to do something like this, perhaps?
int sameFrequency(node *node, char* key) {
if (node != NULL) {
if(strcmp(key, node->key)==0){
node->frequency = node->frequency+1;
printf("%d\n",node->frequency);
return 1;
}
else {
int found;
if ((found = sameFrequency(node->left, key)) != 0)
return found;
if ((found = sameFrequency(node->right, key)) != 0)
return found;
return 0;
}
}
else {
return 0;
}
}
This clears the compiler warning.
Incidentally, the following if statement is probably in the wrong order:
if((sameFrequency(root, key)==1)&&root!=NULL) {
Since && statements in C execute left to right the following makes more sense:
if(root!=NULL && (sameFrequency(root, key)==1)) {

Parse a GML file (from a shp one) in C

My problem is that, using ogr2ogr, I parse a shp file into a gml one.
Then I want to parse this file in my C function.
sprintf(buffer, "PATH=/Library/Frameworks/GDAL.framework/Programs:$PATH:/usr/local/bin ogr2ogr -f \"GML\" files/Extraction/coord.gml %s", lectureFichier);
system(buffer);
sprintf(buff, "sed \"2s/.*/\\<ogr:FeatureCollection\\>/\" files/Extraction/coord.gml | sed '3,6d' > files/Extraction/temp.xml");
system(buff);
FILE *fichier = NULL;
FILE *final = NULL;
fichier = fopen("files/Extraction/temporaire.csv", "w+");
xmlDocPtr doc;
xmlChar *xpath = (xmlChar*) "//keyword";
xmlNodeSetPtr nodeset;
xmlXPathContextPtr context;
xmlXPathObjectPtr result;
int i;
doc = xmlParseFile("files/Extraction/temp.xml");
When I execute the program, I have an error for every line because of the namespace prefix (gml or ogr) that are not defined)
Example of temp.xml
<ogr:FeatureCollection>
<gml:boundedBy>
<gml:Box>
<gml:coord><gml:X>847001.4933830451</gml:X><gml:Y>6298087.567566251</gml:Y></gml:coord>
<gml:coord><gml:X>859036.8755179688</gml:X><gml:Y>6309720.622619263</gml:Y></gml:coord>
</gml:Box>
</gml:boundedBy>
<gml:featureMember>
Do you have an idea of how to make the program know these new namespace?
EDIT:
xmlDocPtr doc;
xmlChar *xpath = (xmlChar*) "//keyword";
xmlNodeSetPtr nodeset;
xmlXPathContextPtr context;
xmlXPathRegisterNs(context, "ogr", "http://ogr.maptools.org/");
xmlXPathRegisterNs(context, "gml", "http://www.opengis.net/gml");
xmlXPathObjectPtr result;
int i;
doc = xmlParseFile("files/Extraction/temp.xml");
if (doc == NULL ) {
fprintf(stderr,"Document not parsed successfully. \n");
return 0;
}
context = xmlXPathNewContext(doc);
if (context == NULL) {
printf("Error in xmlXPathNewContext\n");
return 0;
}
xpath = "//gml:coordinates/text()";
result = xmlXPathEvalExpression(xpath, context);
xmlXPathFreeContext(context);
if (result == NULL) {
printf("Error in xmlXPathEvalExpression\n");
return 0;
}
if(xmlXPathNodeSetIsEmpty(result->nodesetval)){
xmlXPathFreeObject(result);
printf("No result\n");
return 0;
}
`
When adding what you've given me, I'm having a Seg Fault and I really don't know where it's from, but it seems i'm getting closer to the answer.
Do you have an idea where I'm wrong?
I would think you just need to add the namespace declarations to the FeatureCollection element, so it looks like this:
<ogr:FeatureCollection
xmlns:ogr="http://ogr.maptools.org/"
xmlns:gml="http://www.opengis.net/gml">
You can assumedly do that in your sed script.
When trying to query namespaced elements with xpath you need to register your namespaces first. So you might need to do something like this:
xmlXPathRegisterNs(context, "ogr", "http://ogr.maptools.org/")
xmlXPathRegisterNs(context, "gml", "http://www.opengis.net/gml")
Then when you're trying to query a gml or ogr element, you would do so like this:
xpath = "//gml:coordinates/text()";
xmlXPathEvalExpression(xpath, context);

How to add a xml node constructed from string in libxml2

I am using Libxml2 for encoding the data in a xml file. My data contain tags like "<" and ">". when it is converted into xml these tags are also converted into "&lt" and "&gt". Is there any way to solve this problem. I want to use those tags as xml nodes while decoding that xml file, so CDATA is not a solution for this problem. Please give any solution for this. Thanks.
Example Code:
xmlNewChild(node, NULL, (xmlChar *)"ADDRESS", (xmlChar *)"<street>Park Street</street><city>kolkata</city>");
and output of above code is:
<person>
<ADDRESS><street>Park Street</street><city>Kolkata</city></ADDRESS>
If you want a string to be treated as xml, then you should parse it and obtain xmlDoc from it, using xmlReadMemory. It could be usable for larger strings, but usually the document is builded using single step instructions, like in Joachim's answer. Here I present xmlAddChildFromString function to do the stuff in a string way.
#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>
#include <libxml/tree.h>
/// Returns 0 on failure, 1 otherwise
int xmlAddChildFromString(xmlNodePtr parent, xmlChar *newNodeStr)
{
int rv = 0;
xmlChar *newNodeStrWrapped = calloc(strlen(newNodeStr) + 10, 1);
if (!newNodeStrWrapped) return 0;
strcat(newNodeStrWrapped, "<a>");
strcat(newNodeStrWrapped, newNodeStr);
strcat(newNodeStrWrapped, "</a>");
xmlDocPtr newDoc = xmlReadMemory(
newNodeStrWrapped, strlen(newNodeStrWrapped),
NULL, NULL, 0);
free(newNodeStrWrapped);
if (!newDoc) return 0;
xmlNodePtr newNode = xmlDocCopyNode(
xmlDocGetRootElement(newDoc),
parent->doc,
1);
xmlFreeDoc(newDoc);
if (!newNode) return 0;
xmlNodePtr addedNode = xmlAddChildList(parent, newNode->children);
if (!addedNode) {
xmlFreeNode(newNode);
return 0;
}
newNode->children = NULL; // Thanks to milaniez
newNode->last = NULL; // for fixing
xmlFreeNode(newNode); // the memory leak.
return 1;
}
int
main(int argc, char **argv)
{
xmlDocPtr doc = xmlNewDoc(BAD_CAST "1.0");
xmlNodePtr root = xmlNewNode(NULL, BAD_CAST "root");
xmlDocSetRootElement(doc, root);
xmlAddChildFromString(root,
"<street>Park Street</street><city>kolkata</city>");
xmlDocDump(stdout, doc);
xmlFreeDoc(doc);
return(0);
}
You have to call xmlNewChild in a chain, one call for the parent node and a call each for each sub-node:
xmlNodePtr *addressNode = xmlNewChild(node, NULL, (xmlChar *) "address", NULL);
xmlNewChild(addressNode, NULL, (xmlChar *) "street", "Park Street");
xmlNewChild(addressNode, NULL, (xmlChar *) "city", "Koltaka");
You can try to use function xmlParseInNodeContext. It allows you to parse raw XML in the context of parent node, and constructs a node that can be attached to the parent.
For example:
const char * xml = "<a><b><c>blah</c></b></a>";
xmlNodePtr new_node = NULL;
// we assume that 'parent' node is already defined
xmlParseInNodeContext(parent, xml, strlen(xml), 0, &new_node);
if (new_node) xmlAddChild(parent, new_node);
I'm now using the following code to inject XML text (possibly containing multiple elements) into an existing node (thanks to Nazar and nwellnhof for the one answer and referring me from my question (Injecting a string into an XML node without content escaping) to this one):
std::string xml = "<a>" + str + "</a>";
xmlNodePtr pNewNode = nullptr;
xmlParseInNodeContext(pParentNode, xml.c_str(), (int)xml.length(), 0, &pNewNode);
if (pNewNode != nullptr)
{
// add new xml node children to parent
xmlNode *pChild = pNewNode->children;
while (pChild != nullptr)
{
xmlAddChild(pParentNode, xmlCopyNode(pChild, 1));
pChild = pChild->next;
}
xmlFreeNode(pNewNode);
}
It takes the string (str) adds a surrounding element (< a >...< a/ >), parses the string using xmlParseInNodeContext and then adds the children of the new node to the parent. It is important to add the children of the new node and not the new node to avoid having < a >...< a/ > in the final XML.

libxml2 can´t get content from node

I am using libxml in C and this is how I create xml:
xmlDocPtr createXmlSegment(char *headerContent, char *dataContent)
{
xmlDocPtr doc;
doc = xmlNewDoc(BAD_CAST "1.0");
xmlNodePtr rdt, header, data;
rdt = xmlNewNode(NULL, BAD_CAST "rdt-segment");
xmlSetProp(rdt, "id", "1");
header = xmlNewNode(NULL,BAD_CAST "header");
data = xmlNewNode(NULL, BAD_CAST "data");
xmlNodeSetContent(header, BAD_CAST headerContent);
xmlNodeSetContent(data, BAD_CAST dataContent);
xmlAddChild(rdt, header);
xmlAddChild(rdt, data);
xmlDocSetRootElement(doc, rdt);
return doc;
}
and this is how I want get data from that xml:
int getDataFromXmlSegment(char *data, char *header, char *content)
{
xmlDocPtr doc = xmlReadMemory(data, strlen(data), NULL, NULL, XML_PARSE_NOBLANKS);
xmlNode *rdt = doc->children;
xmlNode *headerNode = rdt->children;
header = (char *)headerNode->content;
content = (char *)headerNode->next->content;
printf("header: %s, content: %s", header, content);
return EXIT_SUCCESS;
}
When I test headerNode->name or ->next->name then the names are correct (it´s names of that elements) but content returns null. Anyone knows where is problem?
Short answer: use xmlNodeGetContent.
Element nodes themselves don't contain content. Instead, they have children text nodes, and those contain content. The contents of an element may be a mix of text and tags, and this allows it to maintain the ordering, represent entities, etc.
You could iterate over the child nodes and look at THEIR content members, but xmlNodeGetContent does that for you, and will handle child tags and entities properly.

Resources