How to add a xml node constructed from string in libxml2 - c

I am using Libxml2 for encoding the data in a xml file. My data contain tags like "<" and ">". when it is converted into xml these tags are also converted into "&lt" and "&gt". Is there any way to solve this problem. I want to use those tags as xml nodes while decoding that xml file, so CDATA is not a solution for this problem. Please give any solution for this. Thanks.
Example Code:
xmlNewChild(node, NULL, (xmlChar *)"ADDRESS", (xmlChar *)"<street>Park Street</street><city>kolkata</city>");
and output of above code is:
<person>
<ADDRESS><street>Park Street</street><city>Kolkata</city></ADDRESS>

If you want a string to be treated as xml, then you should parse it and obtain xmlDoc from it, using xmlReadMemory. It could be usable for larger strings, but usually the document is builded using single step instructions, like in Joachim's answer. Here I present xmlAddChildFromString function to do the stuff in a string way.
#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>
#include <libxml/tree.h>
/// Returns 0 on failure, 1 otherwise
int xmlAddChildFromString(xmlNodePtr parent, xmlChar *newNodeStr)
{
int rv = 0;
xmlChar *newNodeStrWrapped = calloc(strlen(newNodeStr) + 10, 1);
if (!newNodeStrWrapped) return 0;
strcat(newNodeStrWrapped, "<a>");
strcat(newNodeStrWrapped, newNodeStr);
strcat(newNodeStrWrapped, "</a>");
xmlDocPtr newDoc = xmlReadMemory(
newNodeStrWrapped, strlen(newNodeStrWrapped),
NULL, NULL, 0);
free(newNodeStrWrapped);
if (!newDoc) return 0;
xmlNodePtr newNode = xmlDocCopyNode(
xmlDocGetRootElement(newDoc),
parent->doc,
1);
xmlFreeDoc(newDoc);
if (!newNode) return 0;
xmlNodePtr addedNode = xmlAddChildList(parent, newNode->children);
if (!addedNode) {
xmlFreeNode(newNode);
return 0;
}
newNode->children = NULL; // Thanks to milaniez
newNode->last = NULL; // for fixing
xmlFreeNode(newNode); // the memory leak.
return 1;
}
int
main(int argc, char **argv)
{
xmlDocPtr doc = xmlNewDoc(BAD_CAST "1.0");
xmlNodePtr root = xmlNewNode(NULL, BAD_CAST "root");
xmlDocSetRootElement(doc, root);
xmlAddChildFromString(root,
"<street>Park Street</street><city>kolkata</city>");
xmlDocDump(stdout, doc);
xmlFreeDoc(doc);
return(0);
}

You have to call xmlNewChild in a chain, one call for the parent node and a call each for each sub-node:
xmlNodePtr *addressNode = xmlNewChild(node, NULL, (xmlChar *) "address", NULL);
xmlNewChild(addressNode, NULL, (xmlChar *) "street", "Park Street");
xmlNewChild(addressNode, NULL, (xmlChar *) "city", "Koltaka");

You can try to use function xmlParseInNodeContext. It allows you to parse raw XML in the context of parent node, and constructs a node that can be attached to the parent.
For example:
const char * xml = "<a><b><c>blah</c></b></a>";
xmlNodePtr new_node = NULL;
// we assume that 'parent' node is already defined
xmlParseInNodeContext(parent, xml, strlen(xml), 0, &new_node);
if (new_node) xmlAddChild(parent, new_node);

I'm now using the following code to inject XML text (possibly containing multiple elements) into an existing node (thanks to Nazar and nwellnhof for the one answer and referring me from my question (Injecting a string into an XML node without content escaping) to this one):
std::string xml = "<a>" + str + "</a>";
xmlNodePtr pNewNode = nullptr;
xmlParseInNodeContext(pParentNode, xml.c_str(), (int)xml.length(), 0, &pNewNode);
if (pNewNode != nullptr)
{
// add new xml node children to parent
xmlNode *pChild = pNewNode->children;
while (pChild != nullptr)
{
xmlAddChild(pParentNode, xmlCopyNode(pChild, 1));
pChild = pChild->next;
}
xmlFreeNode(pNewNode);
}
It takes the string (str) adds a surrounding element (< a >...< a/ >), parses the string using xmlParseInNodeContext and then adds the children of the new node to the parent. It is important to add the children of the new node and not the new node to avoid having < a >...< a/ > in the final XML.

Related

libxml2 get offset into XML text of node

I need to know at which offset into an xml string a specific arbitrary node somewhere in dom can be found after xmlReadMemory was used to get dom. The problem is I can't figure out where to get the xmlParserCtxtPtr from to pass as first argument to xmlParserFindNodeInfo because my entire process of parsing yields no such context; only a xmlDoc.
The following code worked for me (libxml2 documentation leaves little to desire, had to download source code and dig in the lib until I understood enough to hack this together). The key is:
xmlSetFeature(ctxt, "gather line info", (void *)&v);
Here is some code to illustrate:
const char *xml = ...
xmlParserCtxt *ctxt = NULL;
xmlDoc *doc = NULL;
if (!(ctxt = xmlCreateDocParserCtxt((const unsigned char *)xml)))
return -1;
int v = 1;
xmlSetFeature(ctxt, "gather line info", (void *)&v);
if (xmlParseDocument(ctxt) == -1)
{
xmlFreeParserCtxt(ctxt);
return -1;
}
else
{
if ((ctxt->wellFormed) || ctxt->recovery)
doc = ctxt->myDoc;
else
{
xmlFreeParserCtxt(ctxt);
return -1;
}
}
// use doc to get a node and then xmlParserFindNodeInfo(ctxt, node)
…
xmlFreeParserCtxt(ctxt);

Parse a GML file (from a shp one) in C

My problem is that, using ogr2ogr, I parse a shp file into a gml one.
Then I want to parse this file in my C function.
sprintf(buffer, "PATH=/Library/Frameworks/GDAL.framework/Programs:$PATH:/usr/local/bin ogr2ogr -f \"GML\" files/Extraction/coord.gml %s", lectureFichier);
system(buffer);
sprintf(buff, "sed \"2s/.*/\\<ogr:FeatureCollection\\>/\" files/Extraction/coord.gml | sed '3,6d' > files/Extraction/temp.xml");
system(buff);
FILE *fichier = NULL;
FILE *final = NULL;
fichier = fopen("files/Extraction/temporaire.csv", "w+");
xmlDocPtr doc;
xmlChar *xpath = (xmlChar*) "//keyword";
xmlNodeSetPtr nodeset;
xmlXPathContextPtr context;
xmlXPathObjectPtr result;
int i;
doc = xmlParseFile("files/Extraction/temp.xml");
When I execute the program, I have an error for every line because of the namespace prefix (gml or ogr) that are not defined)
Example of temp.xml
<ogr:FeatureCollection>
<gml:boundedBy>
<gml:Box>
<gml:coord><gml:X>847001.4933830451</gml:X><gml:Y>6298087.567566251</gml:Y></gml:coord>
<gml:coord><gml:X>859036.8755179688</gml:X><gml:Y>6309720.622619263</gml:Y></gml:coord>
</gml:Box>
</gml:boundedBy>
<gml:featureMember>
Do you have an idea of how to make the program know these new namespace?
EDIT:
xmlDocPtr doc;
xmlChar *xpath = (xmlChar*) "//keyword";
xmlNodeSetPtr nodeset;
xmlXPathContextPtr context;
xmlXPathRegisterNs(context, "ogr", "http://ogr.maptools.org/");
xmlXPathRegisterNs(context, "gml", "http://www.opengis.net/gml");
xmlXPathObjectPtr result;
int i;
doc = xmlParseFile("files/Extraction/temp.xml");
if (doc == NULL ) {
fprintf(stderr,"Document not parsed successfully. \n");
return 0;
}
context = xmlXPathNewContext(doc);
if (context == NULL) {
printf("Error in xmlXPathNewContext\n");
return 0;
}
xpath = "//gml:coordinates/text()";
result = xmlXPathEvalExpression(xpath, context);
xmlXPathFreeContext(context);
if (result == NULL) {
printf("Error in xmlXPathEvalExpression\n");
return 0;
}
if(xmlXPathNodeSetIsEmpty(result->nodesetval)){
xmlXPathFreeObject(result);
printf("No result\n");
return 0;
}
`
When adding what you've given me, I'm having a Seg Fault and I really don't know where it's from, but it seems i'm getting closer to the answer.
Do you have an idea where I'm wrong?
I would think you just need to add the namespace declarations to the FeatureCollection element, so it looks like this:
<ogr:FeatureCollection
xmlns:ogr="http://ogr.maptools.org/"
xmlns:gml="http://www.opengis.net/gml">
You can assumedly do that in your sed script.
When trying to query namespaced elements with xpath you need to register your namespaces first. So you might need to do something like this:
xmlXPathRegisterNs(context, "ogr", "http://ogr.maptools.org/")
xmlXPathRegisterNs(context, "gml", "http://www.opengis.net/gml")
Then when you're trying to query a gml or ogr element, you would do so like this:
xpath = "//gml:coordinates/text()";
xmlXPathEvalExpression(xpath, context);

Unexpected Results using fts_children() in C

I have been beating my head on a wall over this fts_children() question. In the man page, http://www.kernel.org/doc/man-pages/online/pages/man3/fts.3.html, it clearly states As a special case, if fts_read() has not yet been called for a hierarchy,
fts_children() will return a pointer to the files in the logical directory
specified to fts_open(), that is, the arguments specified to fts_open().
Which I take to mean that a linked list of all the files in the current directory are returned. Well, I am finding that not to be the case and I would really appreciate some help in the matter. I expected a linked list to be returned and then I would iterate through it to find the file with the matching file name (the end goal). However, right now, I am just trying to iterate through the linked list (baby steps). Right now, it will return one file and then exit the loop. This does not make sense to me. Any help would very much appreciated!!!
Opening of file system:
char* const path[PATH_MAX] = {directory_name(argv[argc-index]), NULL};
char* name = file_name(argv[argc-index]);
if ((file_system = fts_open(path, FTS_COMFOLLOW, NULL)) == NULL){
fprintf(stderr,"%s:%s\n", strerror(errno), getprogname());
exit(EXIT_FAILURE);
}/*Ends the files system check if statement*/
/*Displays the information about the specified file.*/
file_ls(file_system,name, flags);
For clarification, the directory_name parses the inputted path from the user and returns something like /home/tpar44. That directory is then opened.
Searching within the file system:
void
file_ls(FTS* file_system, char* file_name, int* flags){
FTSENT* parent = NULL;
//dint stop = 0;
parent = fts_children(file_system, 0);
while( parent != NULL ){
printf("parent = %s\n", parent->fts_name);
parent = parent->fts_link;
}
}
Thanks!
I think this is entirely by design.
...that is, the arguments specified to fts_open()...
What it says is that it will list the root elements in the path_argv parameters for your convenenience. It treats the path_argv array as a logical directory itself.
In other words this:
int main(int argc, char* const argv[])
{
char* const path[] = { ".", "/home", "more/root/paths", NULL };
FTS* file_system = fts_open(path, FTS_COMFOLLOW | FTS_NOCHDIR, &compare);
if (file_system)
{
file_ls(file_system, "", 0);
fts_close(file_system);
}
return 0;
}
Will output
parent = .
parent = /home
parent = more/root/paths
Which, in fact, it does (see http://liveworkspace.org/code/c2d794117eae2d8af1166ccd620d29eb).
Here is a more complete sample that shows complete directory traversal:
#include<stdlib.h>
#include<stdio.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fts.h>
#include<string.h>
#include<errno.h>
int compare (const FTSENT**, const FTSENT**);
void file_ls(FTS* file_system, const char* file_name, int* flags)
{
FTSENT* node = fts_children(file_system, 0);
if (errno != 0)
perror("fts_children");
while (node != NULL)
{
// TODO use file_name and flags
printf("found: %s%s\n", node->fts_path, node->fts_name);
node = node->fts_link;
}
}
int main(int argc, char* const argv[])
{
FTS* file_system = NULL;
FTSENT* node = NULL;
if (argc<2)
{
printf("Usage: %s <path-spec>\n", argv[0]);
exit(255);
}
char* const path[] = { argv[1], NULL };
const char* name = "some_name";
file_system = fts_open(path, FTS_COMFOLLOW | FTS_NOCHDIR, &compare);
if (file_system)
{
file_ls(file_system, name, 0); // shows roots
while( (node = fts_read(file_system)) != NULL)
file_ls(file_system, name, 0); // shows child elements
fts_close(file_system);
}
return 0;
}
int compare(const FTSENT** one, const FTSENT** two)
{
return (strcmp((*one)->fts_name, (*two)->fts_name));
}

libxml2 and XPath traversing children and siblings in ANSI C

I have done a fair bit of XML stuff in Perl and now I need to do it in ANDI C for a project. Here's the code I wrote with a snippet of the XML. I have had success to a degree, but am having problems with getting siblings, I am sure it's super easy but I just can't get it. There is two functions, one that simply gets the node set (copied directly from xmlsoft.org). The second function is mine.
xmlXPathObjectPtr getnodeset (xmlDocPtr doc, xmlChar *xpath){
xmlXPathContextPtr context;
xmlXPathObjectPtr result;
context = xmlXPathNewContext(doc);
if (context == NULL) {
printf("Error in xmlXPathNewContext\n");
return NULL;
}
result = xmlXPathEvalExpression(xpath, context);
xmlXPathFreeContext(context);
if (result == NULL) {
printf("Error in xmlXPathEvalExpression\n");
return NULL;
}
if(xmlXPathNodeSetIsEmpty(result->nodesetval)){
xmlXPathFreeObject(result);
printf("No result\n");
return NULL;
}
return result;
}
void reader(xmlDocPtr xmlDoc, char *xpath)
{
xmlXPathObjectPtr xpathresult;
xmlNodeSetPtr node;
xmlNodeSetPtr node2;
xmlChar *title;
int cnt;
// parse feed in memory to xml object
doc = xmlReadMemory(xmlDoc,strlen(xmlDoc),"noname.xml",NULL,0);
if (!doc) criterr("Error parsing xml document");
// get xpath node set (ttn retrieves the value from the token table)
xpathresult = getnodeset(doc, ( xmlChar * ) xpath);
if (xpathresult) {
node = xpathresult->nodesetval;
printf("Content-type: text/html\n\n");
for (cnt=0;cnt<node->nodeNr; cnt++) {
title = xmlNodeListGetString(doc, node->nodeTab[cnt]->xmlChildrenNode,1);
printf("%d) title= %s<br/>\n",cnt,title);
xmlFree(title);
}
xmlXPathFreeObject(xpathresult);
xmlFreeDoc(doc);
xmlCleanupParser();
} else {
criterr("Xpath failed");
}
xmlFreeDoc(doc);
criterr("Success");
}
and the xml snippet
<item>
<title>this is the title</title>
<link>this is the link</link>
<description>this is the description</description>
</item>
if I use an XPath like //item/title I get all the titles, but what I really want is to get the item and then in the node->nodeNr loop, be able to get the title, link and description easily as I have 100's of 'item' blocks, I'm just not sure how to get the children or siblings of that block easily.
Use xmlNextElementSibling. How does one locate it? Go to Tree API, search for sibling.
And this is your loop now getting also the link.
for (cnt=0;cnt<node->nodeNr; cnt++) {
xmlNodePtr titleNode = node->nodeTab[cnt];
// titleNode->next gives empty text element, so better:
xmlNodePtr linkNode = xmlNextElementSibling(titleNode);
title = xmlNodeListGetString(doc, titleNode->xmlChildrenNode,1);
link = xmlNodeListGetString(doc, linkNode->xmlChildrenNode,1);
printf("%d) title= %s<br/>, link=%s\n",cnt,title,link);
xmlFree(title);
xmlFree(link);
}
titleNode->next may also be made to point the link, see how to get these XML elements with libxml2?.
And getting children? xmlFirstElementChild and loop while node->next.

libxml2 can´t get content from node

I am using libxml in C and this is how I create xml:
xmlDocPtr createXmlSegment(char *headerContent, char *dataContent)
{
xmlDocPtr doc;
doc = xmlNewDoc(BAD_CAST "1.0");
xmlNodePtr rdt, header, data;
rdt = xmlNewNode(NULL, BAD_CAST "rdt-segment");
xmlSetProp(rdt, "id", "1");
header = xmlNewNode(NULL,BAD_CAST "header");
data = xmlNewNode(NULL, BAD_CAST "data");
xmlNodeSetContent(header, BAD_CAST headerContent);
xmlNodeSetContent(data, BAD_CAST dataContent);
xmlAddChild(rdt, header);
xmlAddChild(rdt, data);
xmlDocSetRootElement(doc, rdt);
return doc;
}
and this is how I want get data from that xml:
int getDataFromXmlSegment(char *data, char *header, char *content)
{
xmlDocPtr doc = xmlReadMemory(data, strlen(data), NULL, NULL, XML_PARSE_NOBLANKS);
xmlNode *rdt = doc->children;
xmlNode *headerNode = rdt->children;
header = (char *)headerNode->content;
content = (char *)headerNode->next->content;
printf("header: %s, content: %s", header, content);
return EXIT_SUCCESS;
}
When I test headerNode->name or ->next->name then the names are correct (it´s names of that elements) but content returns null. Anyone knows where is problem?
Short answer: use xmlNodeGetContent.
Element nodes themselves don't contain content. Instead, they have children text nodes, and those contain content. The contents of an element may be a mix of text and tags, and this allows it to maintain the ordering, represent entities, etc.
You could iterate over the child nodes and look at THEIR content members, but xmlNodeGetContent does that for you, and will handle child tags and entities properly.

Resources