Compare two XMLNodes in C (libxml library) - c

I'm parsing some xml files in C using libxml library. I want to compare two xmlnodes to see whether they contain the same data or not. Is there any function available to do so?

The libxml API docs seem reasonable and suggest that xmlBufGetNodeContent and xmlBufContent might do what you want.
xmlNode node1, node2;
......
xmlBuf buf;
xmlChar* content1 = NULL;
xmlChar* content2 = NULL;
if (xmlBufGetNodeContent(&buf, &node1) == 0) {
content1 = xmlBufContent(&buf);
}
if (xmlBufGetNodeContent(&buf, &node2) == 0) {
content2 = xmlBufContent(&buf);
}
if (strcmp(content1, content2) == 0) {
/* nodes match */
}

I don't think the api calls xmlBufGetNodeContent and xmlBufContent are any more valid.
As the datatype involved in those calls - xmlBufPtr are no more available , atleast not on
libxml2 2.7.6
I used a different api call xmlNodeDump or xmlNodeGetContent. hope it helps others with similar question.

Related

How to remove '&'-words encoding from libxml2?

I have an XML file which should be parsed and processed. For that reason I'm using libxml2.
The xml file I have looks something like this:
test.xml
<root>
<tag attr1="VALUE_1 "" attr2="VALUE_2
VALUE_3" />
</root>
And I want to get the attribute contents. BUT the libxml2 seems to encode the '&'-words (don't know how to call them).
The code I use is the following one:
LIBXML_TEST_VERSION
xmlDoc *doc;
doc = xmlReadFile("test.xml", NULL, XML_PARSE_IGNORE_ENC);
xmlNode *root;
root = xmlDocGetRootElement(doc);
xmlNode *node;
node = root->children;
while (node != NULL) {
if (node->type == XML_ELEMENT_NODE) {
xmlAttr *attr;
attr = node->properties;
while (attr != NULL) {
xmlNode *child;
child = attr->children;
while (child != NULL) {
if (child->type == XML_TEXT_NODE ||
child->type == XML_CDATA_SECTION_NODE)
printf("%s\n", child->content);
child = child->next;
}
attr = attr->next;
}
}
node = node->next;
}
So basically I want to print the attribute values, BUT they are being parsed with a formatting (I guess). When I run this code than I see following output:
VALUE_1 "
VALUE_2
VALUE_3
As you can see it translated the '&'-words. How can I hint the libxml2 to not do that and give me the literal text values.
You simply can't. libxml2 will always decode numeric character references like
and predefined entities like ". But A and A, for example, are semantically equivalent. If you really need to tell them apart, you're probably doing something wrong elsewhere in your XML pipeline. If you want a literal
in an attribute value, you have to encode it as &#xA;.
Note that the expansion can be controlled for other, user-defined entities via the XML_PARSE_NOENT parser flag, but this won't affect numeric character references.

What is "IsA()" function in C?

In pure C code in different projects that involve Postgresql server programming which I'm working with now, I keep encountering the function "IsA()" which returns a boolean and checks whether or not 2 instances of a struct belong to the same struct. I suppose.
One of them:
https://github.com/guotao0628/pipelinedb/blob/master/src/backend/executor/nodeBitmapAnd.c#L123
for (i = 0; i < nplans; i++)
{
PlanState *subnode = bitmapplans[i];
TIDBitmap *subresult;
subresult = (TIDBitmap *) MultiExecProcNode(subnode);
if (!subresult || !IsA(subresult, TIDBitmap)) /*what's IsA(...) ? */
elog(ERROR, "unrecognized result from subplan");
if (result == NULL)
result = subresult; /* first subplan */
I need to port some of that C code to other strictly typed language. Hence, I need to know how "isA()" is implemented under the hood. But I haven't found it anywhere. Supposedly it's defined in some library.
Where can I find its definition?
IsA is a macro which is defined in this header file in Postgresql source code.

String list in GLib/GTK2

How to work with list of strings in GLib/GTK2? Before I worked with QStringList in Qt library and now lookup how to do the same things in GLib/GTK2. I know that there are GList and GString datatypes. But I don't understand how to properly works with it. Unsure trying to google with keywords 'glib gstring glist' doesn't help me. I can't find good tutorial.
In really I need now some basics functional. Create list, fill with strings in loop, check that list contains a some string and clear list. That's all.
In Qt I can do
QStringList list;
list << "first" << "second" << "third";
for (int i = 0; i < list.length(); ++i) {
QString str = list.at(i);
if (str == "second") {
doSomeActions();
}
}
list.clear();
Which analogue in GLib? In real application strings will be allocated dynamically. So in clear() all pointers must be freed.
The Gnome developer documentation gives answers to all your questions. For GList, there are already examples given in the descriptions.
https://developer.gnome.org/glib/2.56/glib-Doubly-Linked-Lists.html
https://developer.gnome.org/glib/2.56/glib-Strings.html
Since you're not really asking a specific question, I can't give you a specific answer. Feel free to ask again if something is unclear after reading and trying out the given resources.
The QT snippet with GList would look something like this (with dynamic allocation):
GList *list = NULL;
g_list_append(list, g_strdup("first"));
g_list_append(list, g_strdup("second"));
g_list_append(list, g_strdup("third"));
for (GList *l = list; l != NULL; l = l->next) {
if (g_strcmp(l->data, "second") == 0) {
doSomeActions();
}
}
g_list_free_full(list, g_free);

directory traverse c

I'm trying to traverse a directory and check for duplicate files.
void findDuplicates(){
char *dot[] = {".", 0};
FTS *ftsp, *temp_ftsp;
FTSENT *entry, *temp_entry;
int fts_options = FTS_LOGICAL;
ftsp = fts_open(dot, fts_options, NULL);
while((entry = fts_read(ftsp)) != NULL){
temp_ftsp = ftsp;
while((temp_entry = fts_read(temp_ftsp)) != NULL){
compareEntries(temp_ftsp, ftsp);
}
}
}
But it doesn't traverse the directory the way I wanted to. After the 2nd while loop, the
entry = fts_read(ftsp)
returns NULL. Is there an easy fix for this or I should use something else?
You need to re-structure your approach. The inner while is exhausting the list of files, so of course the outer will fail after succeeding, once.
A better approach is probably to store the files so you can just compare each new incoming file against the stored ones, or use a recursive approach. Both will require memory.

How can libxml2 be used to parse data from XML?

I have looked around at the libxml2 code samples and I am confused on how to piece them all together.
What are the steps needed when using libxml2 to just parse or extract data from an XML file?
I would like to get hold of, and possibly store information for, certain attributes. How is this done?
I believe you first need to create a Parse tree. Maybe this article can help, look through the section which says How to Parse a Tree with Libxml2.
libxml2 provides various examples showing basic usage.
http://xmlsoft.org/examples/index.html
For your stated goals, tree1.c would probably be most relevant.
tree1.c: Navigates a tree to print
element names
Parse a file to a tree, use
xmlDocGetRootElement() to get the root
element, then walk the document and
print all the element name in document
order.
http://xmlsoft.org/examples/tree1.c
Once you have an xmlNode struct for an element, the "properties" member is a linked list of attributes. Each xmlAttr object has a "name" and "children" object (which are the name/value for that attribute, respectively), and a "next" member which points to the next attribute (or null for the last one).
http://xmlsoft.org/html/libxml-tree.html#xmlNode
http://xmlsoft.org/html/libxml-tree.html#xmlAttr
I found these two resources helpful when I was learning to use libxml2 to build a rss feed parser.
Tutorial with SAX interface
Tutorial using the DOM Tree (code example for getting an attribute value included)
Here, I mentioned complete process to extract XML/HTML data from file on windows platform.
First download pre-compiled .dll form http://xmlsoft.org/sources/win32/
Also download its dependency iconv.dll and zlib1.dll from the same page
Extract all .zip files into the same directory. For Ex: D:\demo\
Copy iconv.dll, zlib1.dll and libxml2.dll into c:\windows\system32 deirectory
Make libxml_test.cpp file and copy following code into that file.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/HTMLparser.h>
void traverse_dom_trees(xmlNode * a_node)
{
xmlNode *cur_node = NULL;
if(NULL == a_node)
{
//printf("Invalid argument a_node %p\n", a_node);
return;
}
for (cur_node = a_node; cur_node; cur_node = cur_node->next)
{
if (cur_node->type == XML_ELEMENT_NODE)
{
/* Check for if current node should be exclude or not */
printf("Node type: Text, name: %s\n", cur_node->name);
}
else if(cur_node->type == XML_TEXT_NODE)
{
/* Process here text node, It is available in cpStr :TODO: */
printf("node type: Text, node content: %s, content length %d\n", (char *)cur_node->content, strlen((char *)cur_node->content));
}
traverse_dom_trees(cur_node->children);
}
}
int main(int argc, char **argv)
{
htmlDocPtr doc;
xmlNode *roo_element = NULL;
if (argc != 2)
{
printf("\nInvalid argument\n");
return(1);
}
/* Macro to check API for match with the DLL we are using */
LIBXML_TEST_VERSION
doc = htmlReadFile(argv[1], NULL, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
if (doc == NULL)
{
fprintf(stderr, "Document not parsed successfully.\n");
return 0;
}
roo_element = xmlDocGetRootElement(doc);
if (roo_element == NULL)
{
fprintf(stderr, "empty document\n");
xmlFreeDoc(doc);
return 0;
}
printf("Root Node is %s\n", roo_element->name);
traverse_dom_trees(roo_element);
xmlFreeDoc(doc); // free document
xmlCleanupParser(); // Free globals
return 0;
}
Open Visual Studio Command Promt
Go To D:\demo directory
execute cl libxml_test.cpp /I".\libxml2-2.7.8.win32\include" /I".\iconv-1.9.2.win32\include" /link libxml2-2.7.8.win32\lib\libxml2.lib command
Run binary using libxml_test.exe test.html command(Here test.html may be any valid HTML file)
You can refere this answer.
here they store data into structure format and use further by passing structure address to a function.
You can find detail code in c for use.
code ->> this

Resources