parsing for xml values - c

I have a simple xml string defined in the following way in a c code:
char xmlstr[] = "<root><str1>Welcome</str1><str2>to</str2><str3>wonderland</str3></root>";
I want to parse the xmlstr to fetch all the values assigned to str1,str2,str3 tags.
I am using libxml2 library. As I am less experienced in xml handling, I unable get the values of the required tags. I tried some sources from net, but I am ending wrong outputs.

Using the libxml2 library parsing your string would look something like this:
char xmlstr[] = ...;
char *str1, *str2, *str3;
xmlDocPtr doc = xmlReadDoc(BAD_CAST xmlstr, "http://someurl", NULL, 0);
xmlNodePtr root, child;
if(!doc)
{ /* error */ }
root = xmlDocGetRootElement(doc);
now that we have parsed a DOM structure out of your xml string, we can extract the values by iterating over all child values of your root tag:
for(child = root->children; child != NULL; child = child->next)
{
if(xmlStrcmp(child->name, BAD_CAST "str1") == 0)
{
str1 = (char *)xmlNodeGetContent(child);
}
/* repeat for str2 and str3 */
...
}

I usual do xml parsing using minixml library
u hope this will help you
http://www.minixml.org/documentation.php/basics.html

Related

How to remove '&'-words encoding from libxml2?

I have an XML file which should be parsed and processed. For that reason I'm using libxml2.
The xml file I have looks something like this:
test.xml
<root>
<tag attr1="VALUE_1 "" attr2="VALUE_2
VALUE_3" />
</root>
And I want to get the attribute contents. BUT the libxml2 seems to encode the '&'-words (don't know how to call them).
The code I use is the following one:
LIBXML_TEST_VERSION
xmlDoc *doc;
doc = xmlReadFile("test.xml", NULL, XML_PARSE_IGNORE_ENC);
xmlNode *root;
root = xmlDocGetRootElement(doc);
xmlNode *node;
node = root->children;
while (node != NULL) {
if (node->type == XML_ELEMENT_NODE) {
xmlAttr *attr;
attr = node->properties;
while (attr != NULL) {
xmlNode *child;
child = attr->children;
while (child != NULL) {
if (child->type == XML_TEXT_NODE ||
child->type == XML_CDATA_SECTION_NODE)
printf("%s\n", child->content);
child = child->next;
}
attr = attr->next;
}
}
node = node->next;
}
So basically I want to print the attribute values, BUT they are being parsed with a formatting (I guess). When I run this code than I see following output:
VALUE_1 "
VALUE_2
VALUE_3
As you can see it translated the '&'-words. How can I hint the libxml2 to not do that and give me the literal text values.
You simply can't. libxml2 will always decode numeric character references like
and predefined entities like ". But A and A, for example, are semantically equivalent. If you really need to tell them apart, you're probably doing something wrong elsewhere in your XML pipeline. If you want a literal
in an attribute value, you have to encode it as &#xA;.
Note that the expansion can be controlled for other, user-defined entities via the XML_PARSE_NOENT parser flag, but this won't affect numeric character references.

Appending data in existing XML using libxml2

Well, I am trying to append data using C programming and libxml2 modulel but am facing a lot of problems as I am fairly new to this.
My code is designed to first fetch me an Element Node from the XML file based on the user input and then grab the parent of that child node and append another child in it.
XML FILE:
<policyList>
<policySecurity>
<policyName>AutoAdd</policyName>
<deviceName>PA-722</deviceName>
<status>ACTIVE</status>
<srcZone>any</srcZone>
<dstZone>any</dstZone>
<srcAddr>5.5.5.5</srcAddr>
<dstAddr>5.5.5.4</dstAddr>
<srcUser>any</srcUser>
<application>any</application>
<service>htds</service>
<urlCategory>any</urlCategory>
<action>deny</action>
</policySecurity>
<policySecurity>
<policyName>Test-1</policyName>
<deviceName>PA-710</deviceName>
<status>ACTIVE</status>
<srcZone>any</srcZone>
<dstZone>any</dstZone>
<srcAddr>192.168.1.23</srcAddr>
<dstAddr>8.8.8.8</dstAddr>
<srcUser>vivek</srcUser>
<application>any</application>
<service>http</service>
<urlCategory>any</urlCategory>
<action>deny</action>
</policySecurity>
</policyList>
C CODE:
int main(){
xmlDocPtr pDoc = xmlReadFile("/var/www/db/db_policy.xml", NULL, XML_PARSE_NOBLANKS | XML_PARSE_NOERROR | XML_PARSE_NOWARNING | XML_PARSE_NONET);
if (pDoc == NULL)
{
fprintf(stderr, "Document not parsed successfully.\n");
return 0;
}
root_element = xmlDocGetRootElement(pDoc);
if (root_element == NULL)
{
fprintf(stderr, "empty document\n");
xmlFreeDoc(pDoc);
return 0;
}
printf("Root Node is %s\n", root_element->name);
xmlChar* srcaddr = "5.5.5.5";
xmlChar *xpath = (xmlChar*) "//srcAddr";
xmlNodeSetPtr nodeset;
xmlXPathObjectPtr result;
int i;
xmlChar *keyword;
xmlXPathContextPtr context;
xmlNodePtr resdev;
xmlChar* resd;
context = xmlXPathNewContext(pDoc);
if (context == NULL) {
printf("Error in xmlXPathNewContext\n");
}
result = xmlXPathEvalExpression(xpath, context);
xmlXPathFreeContext(context);
if (result == NULL) {
printf("Error in xmlXPathEvalExpression\n");
}
if(xmlXPathNodeSetIsEmpty(result->nodesetval)){
xmlXPathFreeObject(result);
printf("No result\n");
};
if (result) {
nodeset = result->nodesetval;
for (i=0; i < nodeset->nodeNr; i++) {
keyword = xmlNodeListGetString(pDoc, nodeset->nodeTab[i]->xmlChildrenNode, 1);
printf("keyword: %s\n", keyword);
if(strcmp(keyword, srcaddr) == 0){
xmlNodePtr pNode = xmlNewNode(0, (xmlChar*)"service");
xmlNodeSetContent(pNode, (xmlChar*)"nonser");
xmlAddSibling(result, pNode);
printf("added");
}
xmlFree(keyword);
}
xmlXPathFreeObject (result);
}
xmlFreeDoc(pDoc);
xmlCleanupParser();
return (1);
}
On running this code, it gets compiled and executed(with a few warnings, but nothing that hinders execution), but it does not add anything to my XML File.
I think this topic is old but I just had a similar problem. So, I am just sharing for those who still have similar problems.
On running this code, it gets compiled and executed(with a few warnings, but nothing that hinders execution), but it does not add anything to my XML File.
First of all: In my opinion warnings in C are so much worse than errors because it lets you run the wrong code. So, my very first advice is not to ignore the warnings (although I am not in a position to advise anyone but anyway).
Second: When I was running this code, I saw a warning which makes sense:
> warning: passing argument 1 of ‘xmlAddSibling’ from incompatible
> pointer type [-Wincompatible-pointer-types]
>
> note: expected ‘xmlNodePtr {aka struct _xmlNode *}’ but argument is of
> type ‘xmlXPathObjectPtr {aka struct _xmlXPathObject *}’
As you check the xmlAddSibling from http://www.xmlsoft.org/html/libxml-tree.html you can see:
xmlNodePtr xmlAddSibling (xmlNodePtr cur, xmlNodePtr elem)
Which means the type of both of the arguments should be of xmlNodePtr. However, "result" has the type of xmlXPathObjectPtr which means the pointer types are completely different. What you really want to do is to add a child to a parent that you have found based on the string that you compared: (if(strcmp(keyword, srcaddr) == 0)).
So your way to find the parent is completely correct. But two problems are: first you never updated the "result" (if we assume you imagined the "result" is the parent which is not correct) because "nodeset->nodeTab[i]" is in a for loop that never puts anything in "result". The second problem is even if you updated the "result" based on "nodeset->nodeTab[i]", still they have different types of the pointers (as we discussed previously). So, you have to use xmlAddSibling for the correct parent and with the correct pointer type. As you can see hereunder, the "nodeTab" has the type of "xmlNodePtr" which we were looking for, and "nodeset->nodeTab[i]" is the parent.
Structure xmlNodeSet
struct _xmlNodeSet {
int nodeNr : number of nodes in the set
int nodeMax : size of the array as allocated
> `xmlNodePtr * nodeTab : array of nodes in no particular order`
}
So you should change the:
xmlAddSibling(result, pNode);
to:
xmlAddSibling(nodeset->nodeTab[i], pNode);
Finally: you didn't save the changes. So, save it by adding
xmlSaveFileEnc("note.xml", pDoc, "UTF-8");
before
xmlFreeDoc(pDoc);
With these changes, I was able to run your code with your XML file and with no warnings.
Your commands modify the DOM representation of the XML in memory, but you missed writing it back to the file. So adding the following line should solve your problem:
...
}
// write back to file:
xmlSaveFileEnc("/var/www/db/db_policy.xml", pDoc, "UTF-8");
xmlFreeDoc(pDoc);
xmlCleanupParser();
return (1);

libxml2 fails to parse from buffer but parses successfully from file

I have a function that writes an XML document to a buffer using the libxml2 writer, but when I try to parse the document from memory using xmlParseMemory, it only returns parser errors. I have also tried writing the document to a file and parsing it using xmlParseFile and it parses successfully.
This is how I initialize the writer and buffer for the xml document.
int rc, i = 0;
xmlTextWriterPtr writer;
xmlBufferPtr buf;
// Create a new XML buffer, to which the XML document will be written
buf = xmlBufferCreate();
if (buf == NULL)
{
printf("testXmlwriterMemory: Error creating the xml buffer\n");
return;
}
// Create a new XmlWriter for memory, with no compression.
// Remark: there is no compression for this kind of xmlTextWriter
writer = xmlNewTextWriterMemory(buf, 0);
if (writer == NULL)
{
printf("testXmlwriterMemory: Error creating the xml writer\n");
return;
}
// Start the document with the xml default for the version,
// encoding UTF-8 and the default for the standalone
// declaration.
rc = xmlTextWriterStartDocument(writer, NULL, ENCODING, NULL);
if (rc < 0)
{
printf
("testXmlwriterMemory: Error at xmlTextWriterStartDocument\n");
return;
}
I pass the xml document to another function to be validated using
int ret = validateXML(buf->content);
Here is the first part of validateXML
int validateXML(char *buffer)
{
xmlDocPtr doc;
xmlSchemaPtr schema = NULL;
xmlSchemaParserCtxtPtr ctxt;
char *XSDFileName = XSDFILE;
char *XMLFile = buffer;
int ret = 1;
doc = xmlReadMemory(XMLFile, sizeof(XMLFile), "noname.xml", NULL, 0);
doc is always NULL after calling this function, which means that it failed to parse the document.
Here are the errors that running the program returns
Entity: line 1: parser error : ParsePI: PI xm space expected
<?xm
^
Entity: line 1: parser error : ParsePI: PI xm never end ...
<?xm
^
Entity: line 1: parser error : Start tag expected, '<' not found
<?xm
^
I have been unable to figure this out for quite a while now and I am out of ideas. If anyone has any, I would be grateful if you would share it.
You are using sizeof to determine the size of the xml data. For a char pointer that is always going to return 4. What you probably need is strlen.
doc = xmlReadMemory(XMLFile, strlen(XMLFile), "noname.xml", NULL, 0);

How can libxml2 be used to parse data from XML?

I have looked around at the libxml2 code samples and I am confused on how to piece them all together.
What are the steps needed when using libxml2 to just parse or extract data from an XML file?
I would like to get hold of, and possibly store information for, certain attributes. How is this done?
I believe you first need to create a Parse tree. Maybe this article can help, look through the section which says How to Parse a Tree with Libxml2.
libxml2 provides various examples showing basic usage.
http://xmlsoft.org/examples/index.html
For your stated goals, tree1.c would probably be most relevant.
tree1.c: Navigates a tree to print
element names
Parse a file to a tree, use
xmlDocGetRootElement() to get the root
element, then walk the document and
print all the element name in document
order.
http://xmlsoft.org/examples/tree1.c
Once you have an xmlNode struct for an element, the "properties" member is a linked list of attributes. Each xmlAttr object has a "name" and "children" object (which are the name/value for that attribute, respectively), and a "next" member which points to the next attribute (or null for the last one).
http://xmlsoft.org/html/libxml-tree.html#xmlNode
http://xmlsoft.org/html/libxml-tree.html#xmlAttr
I found these two resources helpful when I was learning to use libxml2 to build a rss feed parser.
Tutorial with SAX interface
Tutorial using the DOM Tree (code example for getting an attribute value included)
Here, I mentioned complete process to extract XML/HTML data from file on windows platform.
First download pre-compiled .dll form http://xmlsoft.org/sources/win32/
Also download its dependency iconv.dll and zlib1.dll from the same page
Extract all .zip files into the same directory. For Ex: D:\demo\
Copy iconv.dll, zlib1.dll and libxml2.dll into c:\windows\system32 deirectory
Make libxml_test.cpp file and copy following code into that file.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/HTMLparser.h>
void traverse_dom_trees(xmlNode * a_node)
{
xmlNode *cur_node = NULL;
if(NULL == a_node)
{
//printf("Invalid argument a_node %p\n", a_node);
return;
}
for (cur_node = a_node; cur_node; cur_node = cur_node->next)
{
if (cur_node->type == XML_ELEMENT_NODE)
{
/* Check for if current node should be exclude or not */
printf("Node type: Text, name: %s\n", cur_node->name);
}
else if(cur_node->type == XML_TEXT_NODE)
{
/* Process here text node, It is available in cpStr :TODO: */
printf("node type: Text, node content: %s, content length %d\n", (char *)cur_node->content, strlen((char *)cur_node->content));
}
traverse_dom_trees(cur_node->children);
}
}
int main(int argc, char **argv)
{
htmlDocPtr doc;
xmlNode *roo_element = NULL;
if (argc != 2)
{
printf("\nInvalid argument\n");
return(1);
}
/* Macro to check API for match with the DLL we are using */
LIBXML_TEST_VERSION
doc = htmlReadFile(argv[1], NULL, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
if (doc == NULL)
{
fprintf(stderr, "Document not parsed successfully.\n");
return 0;
}
roo_element = xmlDocGetRootElement(doc);
if (roo_element == NULL)
{
fprintf(stderr, "empty document\n");
xmlFreeDoc(doc);
return 0;
}
printf("Root Node is %s\n", roo_element->name);
traverse_dom_trees(roo_element);
xmlFreeDoc(doc); // free document
xmlCleanupParser(); // Free globals
return 0;
}
Open Visual Studio Command Promt
Go To D:\demo directory
execute cl libxml_test.cpp /I".\libxml2-2.7.8.win32\include" /I".\iconv-1.9.2.win32\include" /link libxml2-2.7.8.win32\lib\libxml2.lib command
Run binary using libxml_test.exe test.html command(Here test.html may be any valid HTML file)
You can refere this answer.
here they store data into structure format and use further by passing structure address to a function.
You can find detail code in c for use.
code ->> this

LibXML internal and output encodings

I'm trying to write XML files with libxml2 in ISO-8859-1.
But from the documentation it seems that for each text node that I create I'll have to convert to UTF-8 which is libxml's internal encoding. Then when calling xmlSaveFormatFileEnc() libxml converts to the target encoding and adds the encoding attribute to the document.
Is this assumption correct?
For now my code goes roughly like this:
xmlNode *root_element = NULL, *node4 = NULL;
xmlDoc *doc = NULL;
doc = xmlNewDoc(BAD_CAST XML_DEFAULT_VERSION);
root_element = xmlNewDocNode(doc, NULL, BAD_CAST("root"),
NULL);
char * input_str = getLatin1Data();
isolat1ToUTF8(utf8_str, &file_size, input_str, &inlen);
node4 = xmlNewCDataBlock(doc, BAD_CAST list_content, xmlStrlen(BAD_CAST utf8_str));
xmlAddChild(root_element, node4);
xmlSaveFormatFileEnc("test_file.xml", doc, "UTF-8", 1);
xmlFreeDoc(doc);
Your assumption is right. When xmlChar is expected, like in xmlNewCDataBlock, xmlNewText, it is always UTF-8:
From include/libxml/xmlstring.h (libxml 2.8.0):
/**
* xmlChar:
*
* This is a basic byte in an UTF-8 encoded string.
* It's unsigned allowing to pinpoint case where char * are assigned
* to xmlChar * (possibly making serialization back impossible).
*/

Resources