How to disable XXE in libxml2in C? - c

Requirement: When i pass the following request to my application,
1) How to do XML validation on such input xml which is risk
2) How to disable XXE in libxml2 i.e. should not parse the ENTITY field
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY foo SYSTEM "file:///etc/issue">
]><TRANSACTION>
<FUNCTION_TYPE>LINE_ITEM</FUNCTION_TYPE>
<COMMAND>ADD</COMMAND>
<COUNTER>3</COUNTER>
<MAC>qof2EtycqT9YMcmOfKowpyXVbRpgM/7rncS3liK4JOs=</MAC>
<MAC_LABEL>P_206</MAC_LABEL>
<RUNNING_TAX_AMOUNT>0.00</RUNNING_TAX_AMOUNT>
<RUNNING_TRANS_AMOUNT>1.00</RUNNING_TRANS_AMOUNT>
<LINE_ITEMS>
<MERCHANDISE>
<LINE_ITEM_ID>1</LINE_ITEM_ID>
<DESCRIPTION>&foo;</DESCRIPTION>
<QUANTITY>1</QUANTITY>
<UNIT_PRICE>5.00</UNIT_PRICE>
<EXTENDED_PRICE>5.00</EXTENDED_PRICE>
</MERCHANDISE>
</LINE_ITEMS>
</TRANSACTION>
I understand starting with libxml2 version 2.9, XXE has been disabled by default. But we are using 2.7.7 version currently.
According to this link XML_ENTITY_PROCESSING
The Enum xmlParserOption should not have the following options defined in libxml2:
XML_PARSE_NOENT: Expands entities and substitutes them with replacement text
XML_PARSE_DTDLOAD: Load the external DTD
Till now i was using xmlParseMemory function to parse an XML in-memory block and build a tree. This function does not take any parameter to set the xmlParserOption.
Then i Changed to xmlReadMemory function which also does same thing as xmlParseMemory function but takes different parameters.
docPtr = xmlReadMemory(szXMLMsg, iLen, "noname.xml", NULL, XML_PARSE_RECOVER);
Still I observe that ENTITY field is getting parsed.
Could anyone help me? Please let me know if you need any more additional information.
Thank you for your time.
Regards
Praveen

If you don't specify XML_PARSE_NOENT, the ENTITY declaration is still parsed but the entity won't be replaced. Also, the file /etc/issue won't be opened which you can verify with strace. So to protect from XXE, you simply don't pass the XML_PARSE_NOENT parser option.
The name of the option is a bit misleading, XML_PARSE_NOENT means that no entity nodes should be created in the parsed document. Consequently every entity is expanded. A better name would be something like XML_PARSE_EXPAND_ENTITIES.
If you really want to make sure or you want to expand entities with fine-grained control over which URLs to load, you can install your own external entity loader using xmlSetExternalEntityLoader. If your handler always returns NULL, you're on the safe side. But note that the external entity loader is used to load all kinds of external resources, so completely disabling it might break other stuff (XIncludes or XSLT stylesheets, for example).
EDIT: I have no idea why the entity is replaced in your case. Here's a test program:
#include <stdio.h>
#include <stdlib.h>
#include <libxml/parser.h>
#include <libxml/tree.h>
static xmlNodePtr
find_node(xmlNodePtr parent, const char *name) {
for (xmlNodePtr cur = parent->children; cur != NULL; cur = cur->next) {
if (cur->type == XML_ELEMENT_NODE
&& xmlStrcmp(cur->name, (const xmlChar*)name) == 0
) {
return cur;
}
}
fprintf(stderr, "Element '%s' not found\n", name);
abort();
return NULL;
}
int
main(int argc, char **argv) {
static const char buf[] =
"<?xml version=\"1.0\"?>\n"
"<!DOCTYPE foo [\n"
"<!ENTITY foo SYSTEM \"file:///etc/issue\">\n"
"]><TRANSACTION>\n"
"<FUNCTION_TYPE>LINE_ITEM</FUNCTION_TYPE>\n"
"<COMMAND>ADD</COMMAND>\n"
"<COUNTER>3</COUNTER>\n"
"<MAC>qof2EtycqT9YMcmOfKowpyXVbRpgM/7rncS3liK4JOs=</MAC>\n"
"<MAC_LABEL>P_206</MAC_LABEL>\n"
"<RUNNING_TAX_AMOUNT>0.00</RUNNING_TAX_AMOUNT>\n"
"<RUNNING_TRANS_AMOUNT>1.00</RUNNING_TRANS_AMOUNT>\n"
"<LINE_ITEMS>\n"
"<MERCHANDISE>\n"
"<LINE_ITEM_ID>1</LINE_ITEM_ID>\n"
"<DESCRIPTION>&foo;</DESCRIPTION>\n"
"<QUANTITY>1</QUANTITY>\n"
"<UNIT_PRICE>5.00</UNIT_PRICE>\n"
"<EXTENDED_PRICE>5.00</EXTENDED_PRICE>\n"
"</MERCHANDISE>\n"
"</LINE_ITEMS>\n"
"</TRANSACTION>\n";
xmlDocPtr doc = xmlReadMemory(buf, sizeof(buf), "noname.xml", NULL,
XML_PARSE_RECOVER);
xmlNodePtr trans = find_node((xmlNodePtr)doc, "TRANSACTION");
xmlNodePtr items = find_node(trans, "LINE_ITEMS");
xmlNodePtr merch = find_node(items, "MERCHANDISE");
xmlNodePtr desc = find_node(merch, "DESCRIPTION");
for (xmlNodePtr cur = desc->children; cur != NULL; cur = cur->next) {
if (cur->type == XML_ENTITY_REF_NODE) {
printf("entity ref node\n");
}
else {
printf("other node of type: %d\n", cur->type);
}
}
xmlFreeDoc(doc);
return 0;
}
If I compile with
gcc -std=c99 -O2 -I/usr/include/libxml2 so.c -lxml2 -o so
and run it, the result is
entity ref node

Related

How to remove '&'-words encoding from libxml2?

I have an XML file which should be parsed and processed. For that reason I'm using libxml2.
The xml file I have looks something like this:
test.xml
<root>
<tag attr1="VALUE_1 "" attr2="VALUE_2
VALUE_3" />
</root>
And I want to get the attribute contents. BUT the libxml2 seems to encode the '&'-words (don't know how to call them).
The code I use is the following one:
LIBXML_TEST_VERSION
xmlDoc *doc;
doc = xmlReadFile("test.xml", NULL, XML_PARSE_IGNORE_ENC);
xmlNode *root;
root = xmlDocGetRootElement(doc);
xmlNode *node;
node = root->children;
while (node != NULL) {
if (node->type == XML_ELEMENT_NODE) {
xmlAttr *attr;
attr = node->properties;
while (attr != NULL) {
xmlNode *child;
child = attr->children;
while (child != NULL) {
if (child->type == XML_TEXT_NODE ||
child->type == XML_CDATA_SECTION_NODE)
printf("%s\n", child->content);
child = child->next;
}
attr = attr->next;
}
}
node = node->next;
}
So basically I want to print the attribute values, BUT they are being parsed with a formatting (I guess). When I run this code than I see following output:
VALUE_1 "
VALUE_2
VALUE_3
As you can see it translated the '&'-words. How can I hint the libxml2 to not do that and give me the literal text values.
You simply can't. libxml2 will always decode numeric character references like
and predefined entities like ". But A and A, for example, are semantically equivalent. If you really need to tell them apart, you're probably doing something wrong elsewhere in your XML pipeline. If you want a literal
in an attribute value, you have to encode it as &#xA;.
Note that the expansion can be controlled for other, user-defined entities via the XML_PARSE_NOENT parser flag, but this won't affect numeric character references.

Appending data in existing XML using libxml2

Well, I am trying to append data using C programming and libxml2 modulel but am facing a lot of problems as I am fairly new to this.
My code is designed to first fetch me an Element Node from the XML file based on the user input and then grab the parent of that child node and append another child in it.
XML FILE:
<policyList>
<policySecurity>
<policyName>AutoAdd</policyName>
<deviceName>PA-722</deviceName>
<status>ACTIVE</status>
<srcZone>any</srcZone>
<dstZone>any</dstZone>
<srcAddr>5.5.5.5</srcAddr>
<dstAddr>5.5.5.4</dstAddr>
<srcUser>any</srcUser>
<application>any</application>
<service>htds</service>
<urlCategory>any</urlCategory>
<action>deny</action>
</policySecurity>
<policySecurity>
<policyName>Test-1</policyName>
<deviceName>PA-710</deviceName>
<status>ACTIVE</status>
<srcZone>any</srcZone>
<dstZone>any</dstZone>
<srcAddr>192.168.1.23</srcAddr>
<dstAddr>8.8.8.8</dstAddr>
<srcUser>vivek</srcUser>
<application>any</application>
<service>http</service>
<urlCategory>any</urlCategory>
<action>deny</action>
</policySecurity>
</policyList>
C CODE:
int main(){
xmlDocPtr pDoc = xmlReadFile("/var/www/db/db_policy.xml", NULL, XML_PARSE_NOBLANKS | XML_PARSE_NOERROR | XML_PARSE_NOWARNING | XML_PARSE_NONET);
if (pDoc == NULL)
{
fprintf(stderr, "Document not parsed successfully.\n");
return 0;
}
root_element = xmlDocGetRootElement(pDoc);
if (root_element == NULL)
{
fprintf(stderr, "empty document\n");
xmlFreeDoc(pDoc);
return 0;
}
printf("Root Node is %s\n", root_element->name);
xmlChar* srcaddr = "5.5.5.5";
xmlChar *xpath = (xmlChar*) "//srcAddr";
xmlNodeSetPtr nodeset;
xmlXPathObjectPtr result;
int i;
xmlChar *keyword;
xmlXPathContextPtr context;
xmlNodePtr resdev;
xmlChar* resd;
context = xmlXPathNewContext(pDoc);
if (context == NULL) {
printf("Error in xmlXPathNewContext\n");
}
result = xmlXPathEvalExpression(xpath, context);
xmlXPathFreeContext(context);
if (result == NULL) {
printf("Error in xmlXPathEvalExpression\n");
}
if(xmlXPathNodeSetIsEmpty(result->nodesetval)){
xmlXPathFreeObject(result);
printf("No result\n");
};
if (result) {
nodeset = result->nodesetval;
for (i=0; i < nodeset->nodeNr; i++) {
keyword = xmlNodeListGetString(pDoc, nodeset->nodeTab[i]->xmlChildrenNode, 1);
printf("keyword: %s\n", keyword);
if(strcmp(keyword, srcaddr) == 0){
xmlNodePtr pNode = xmlNewNode(0, (xmlChar*)"service");
xmlNodeSetContent(pNode, (xmlChar*)"nonser");
xmlAddSibling(result, pNode);
printf("added");
}
xmlFree(keyword);
}
xmlXPathFreeObject (result);
}
xmlFreeDoc(pDoc);
xmlCleanupParser();
return (1);
}
On running this code, it gets compiled and executed(with a few warnings, but nothing that hinders execution), but it does not add anything to my XML File.
I think this topic is old but I just had a similar problem. So, I am just sharing for those who still have similar problems.
On running this code, it gets compiled and executed(with a few warnings, but nothing that hinders execution), but it does not add anything to my XML File.
First of all: In my opinion warnings in C are so much worse than errors because it lets you run the wrong code. So, my very first advice is not to ignore the warnings (although I am not in a position to advise anyone but anyway).
Second: When I was running this code, I saw a warning which makes sense:
> warning: passing argument 1 of ‘xmlAddSibling’ from incompatible
> pointer type [-Wincompatible-pointer-types]
>
> note: expected ‘xmlNodePtr {aka struct _xmlNode *}’ but argument is of
> type ‘xmlXPathObjectPtr {aka struct _xmlXPathObject *}’
As you check the xmlAddSibling from http://www.xmlsoft.org/html/libxml-tree.html you can see:
xmlNodePtr xmlAddSibling (xmlNodePtr cur, xmlNodePtr elem)
Which means the type of both of the arguments should be of xmlNodePtr. However, "result" has the type of xmlXPathObjectPtr which means the pointer types are completely different. What you really want to do is to add a child to a parent that you have found based on the string that you compared: (if(strcmp(keyword, srcaddr) == 0)).
So your way to find the parent is completely correct. But two problems are: first you never updated the "result" (if we assume you imagined the "result" is the parent which is not correct) because "nodeset->nodeTab[i]" is in a for loop that never puts anything in "result". The second problem is even if you updated the "result" based on "nodeset->nodeTab[i]", still they have different types of the pointers (as we discussed previously). So, you have to use xmlAddSibling for the correct parent and with the correct pointer type. As you can see hereunder, the "nodeTab" has the type of "xmlNodePtr" which we were looking for, and "nodeset->nodeTab[i]" is the parent.
Structure xmlNodeSet
struct _xmlNodeSet {
int nodeNr : number of nodes in the set
int nodeMax : size of the array as allocated
> `xmlNodePtr * nodeTab : array of nodes in no particular order`
}
So you should change the:
xmlAddSibling(result, pNode);
to:
xmlAddSibling(nodeset->nodeTab[i], pNode);
Finally: you didn't save the changes. So, save it by adding
xmlSaveFileEnc("note.xml", pDoc, "UTF-8");
before
xmlFreeDoc(pDoc);
With these changes, I was able to run your code with your XML file and with no warnings.
Your commands modify the DOM representation of the XML in memory, but you missed writing it back to the file. So adding the following line should solve your problem:
...
}
// write back to file:
xmlSaveFileEnc("/var/www/db/db_policy.xml", pDoc, "UTF-8");
xmlFreeDoc(pDoc);
xmlCleanupParser();
return (1);

Change namespace of node libxml2

I want to change namespace of a node in xml.
doc = novi_xml_getdoc(doc_name);
if(doc==NULL){
return -1;
}
sprintf(buff, "//%s:capable-switch",ofprefix[ofconfig_version]);
node = xmlXPathEvalExpression(xpath, context)
if(node == NULL){
return -1;
}
xmlNsPtr ns = xmlNewNs(node,"new-namespace", "prefix");
xmlSetNs(node, ns);
xmlSaveFormatFile (doc_name, doc, 1);
xmlFreeDoc(doc);
But this does not change the namespace of the node. The namespace remains same. I saw couple of examples, but all are related to changing namespace of childnode.
Moreover, I guess if we could modify the node by other way such as deleting and creating it again it will work. But not dont know how to link this node with its child nodes.
The node result of xmlXPathEvalExpression isn't directly the xmlNodePtr you're expecting, it's an xmlXPathObjectPtr.
http://www.xmlsoft.org/html/libxml-xpath.html#xmlXPathObject
You'll need to drill in to node->nodesetval->nodeTab[0] to get the first hit of the expression. It's also good to test node->nodesetval for NULL, check node->nodesetval->nodeNr for how many hits there are, etc.

Find pathname from dlopen handle on OSX

I have dlopen()'ed a library, and I want to invert back from the handle it passes to me to the full pathname of shared library. On Linux and friends, I know that I can use dlinfo() to get the linkmap and iterate through those structures, but I can't seem to find an analogue on OSX. The closest thing I can do is to either:
Use dyld_image_count() and dyld_get_image_name(), iterate over all the currently opened libraries and hope I can guess which one corresponds to my handle
Somehow find a symbol that lives inside of the handle I have, and pass that to dladdr().
If I have apriori knowledge as to a symbol name inside of the library I just opened, I can dlsym() that and then use dladdr(). That works fine. But in the general case where I have no idea what is inside this shared library, I would need to be able to enumerate symbols to do that, which I don't know how to do either.
So any tips on how to lookup the pathname of a library from its dlopen handle would be very much appreciated. Thanks!
Here is how you can get the absolute path of a handle returned by dlopen.
In order to get the absolute path, you need to call the dladdr function and retrieve the Dl_info.dli_fname field.
In order to call the dladdr function, you need to give it an address.
In order to get an address given a handle, you have to call the dlsym function with a symbol.
In order to get a symbol out of a loaded library, you have to parse the library to find its symbol table and iterate over the symbols. You need to find an external symbol because dlsym only searches for external symbols.
Put it all together and you get this:
#import <dlfcn.h>
#import <mach-o/dyld.h>
#import <mach-o/nlist.h>
#import <stdio.h>
#import <string.h>
#ifdef __LP64__
typedef struct mach_header_64 mach_header_t;
typedef struct segment_command_64 segment_command_t;
typedef struct nlist_64 nlist_t;
#else
typedef struct mach_header mach_header_t;
typedef struct segment_command segment_command_t;
typedef struct nlist nlist_t;
#endif
static const char * first_external_symbol_for_image(const mach_header_t *header)
{
Dl_info info;
if (dladdr(header, &info) == 0)
return NULL;
segment_command_t *seg_linkedit = NULL;
segment_command_t *seg_text = NULL;
struct symtab_command *symtab = NULL;
struct load_command *cmd = (struct load_command *)((intptr_t)header + sizeof(mach_header_t));
for (uint32_t i = 0; i < header->ncmds; i++, cmd = (struct load_command *)((intptr_t)cmd + cmd->cmdsize))
{
switch(cmd->cmd)
{
case LC_SEGMENT:
case LC_SEGMENT_64:
if (!strcmp(((segment_command_t *)cmd)->segname, SEG_TEXT))
seg_text = (segment_command_t *)cmd;
else if (!strcmp(((segment_command_t *)cmd)->segname, SEG_LINKEDIT))
seg_linkedit = (segment_command_t *)cmd;
break;
case LC_SYMTAB:
symtab = (struct symtab_command *)cmd;
break;
}
}
if ((seg_text == NULL) || (seg_linkedit == NULL) || (symtab == NULL))
return NULL;
intptr_t file_slide = ((intptr_t)seg_linkedit->vmaddr - (intptr_t)seg_text->vmaddr) - seg_linkedit->fileoff;
intptr_t strings = (intptr_t)header + (symtab->stroff + file_slide);
nlist_t *sym = (nlist_t *)((intptr_t)header + (symtab->symoff + file_slide));
for (uint32_t i = 0; i < symtab->nsyms; i++, sym++)
{
if ((sym->n_type & N_EXT) != N_EXT || !sym->n_value)
continue;
return (const char *)strings + sym->n_un.n_strx;
}
return NULL;
}
const char * pathname_for_handle(void *handle)
{
for (int32_t i = _dyld_image_count(); i >= 0 ; i--)
{
const char *first_symbol = first_external_symbol_for_image((const mach_header_t *)_dyld_get_image_header(i));
if (first_symbol && strlen(first_symbol) > 1)
{
handle = (void *)((intptr_t)handle | 1); // in order to trigger findExportedSymbol instead of findExportedSymbolInImageOrDependentImages. See `dlsym` implementation at http://opensource.apple.com/source/dyld/dyld-239.3/src/dyldAPIs.cpp
first_symbol++; // in order to remove the leading underscore
void *address = dlsym(handle, first_symbol);
Dl_info info;
if (dladdr(address, &info))
return info.dli_fname;
}
}
return NULL;
}
int main(int argc, const char * argv[])
{
void *libxml2 = dlopen("libxml2.dylib", RTLD_LAZY);
printf("libxml2 path: %s\n", pathname_for_handle(libxml2));
dlclose(libxml2);
return 0;
}
If you run this code, it will yield the expected result: libxml2 path: /usr/lib/libxml2.2.dylib
After about a year of using the solution provided by 0xced, we discovered an alternative method that is simpler and avoids one (rather rare) failure mode; specifically, because 0xced's code snippet iterates through each dylib currently loaded, finds the first exported symbol, attempts to resolve it in the dylib currently being sought, and returns positive if that symbol is found in that particular dylib, you can have false positives if the first exported symbol from an arbitrary library happens to be present inside of the dylib you're currently searching for.
My solution was to use _dyld_get_image_name(i) to get the absolute path of each image loaded, dlopen() that image, and compare the handle (after masking out any mode bits set by dlopen() due to usage of things like RTLD_FIRST) to ensure that this dylib is actually the same file as the handle passed into my function.
The complete function can be seen here, as a part of the Julia Language, with the relevant portion copied below:
// Iterate through all images currently in memory
for (int32_t i = _dyld_image_count(); i >= 0 ; i--) {
// dlopen() each image, check handle
const char *image_name = _dyld_get_image_name(i);
uv_lib_t *probe_lib = jl_load_dynamic_library(image_name, JL_RTLD_DEFAULT);
void *probe_handle = probe_lib->handle;
uv_dlclose(probe_lib);
// If the handle is the same as what was passed in (modulo mode bits), return this image name
if (((intptr_t)handle & (-4)) == ((intptr_t)probe_handle & (-4)))
return image_name;
}
Note that functions such as jl_load_dynamic_library() are wrappers around dlopen() that return libuv types, but the spirit of the code remains the same.

How can libxml2 be used to parse data from XML?

I have looked around at the libxml2 code samples and I am confused on how to piece them all together.
What are the steps needed when using libxml2 to just parse or extract data from an XML file?
I would like to get hold of, and possibly store information for, certain attributes. How is this done?
I believe you first need to create a Parse tree. Maybe this article can help, look through the section which says How to Parse a Tree with Libxml2.
libxml2 provides various examples showing basic usage.
http://xmlsoft.org/examples/index.html
For your stated goals, tree1.c would probably be most relevant.
tree1.c: Navigates a tree to print
element names
Parse a file to a tree, use
xmlDocGetRootElement() to get the root
element, then walk the document and
print all the element name in document
order.
http://xmlsoft.org/examples/tree1.c
Once you have an xmlNode struct for an element, the "properties" member is a linked list of attributes. Each xmlAttr object has a "name" and "children" object (which are the name/value for that attribute, respectively), and a "next" member which points to the next attribute (or null for the last one).
http://xmlsoft.org/html/libxml-tree.html#xmlNode
http://xmlsoft.org/html/libxml-tree.html#xmlAttr
I found these two resources helpful when I was learning to use libxml2 to build a rss feed parser.
Tutorial with SAX interface
Tutorial using the DOM Tree (code example for getting an attribute value included)
Here, I mentioned complete process to extract XML/HTML data from file on windows platform.
First download pre-compiled .dll form http://xmlsoft.org/sources/win32/
Also download its dependency iconv.dll and zlib1.dll from the same page
Extract all .zip files into the same directory. For Ex: D:\demo\
Copy iconv.dll, zlib1.dll and libxml2.dll into c:\windows\system32 deirectory
Make libxml_test.cpp file and copy following code into that file.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/HTMLparser.h>
void traverse_dom_trees(xmlNode * a_node)
{
xmlNode *cur_node = NULL;
if(NULL == a_node)
{
//printf("Invalid argument a_node %p\n", a_node);
return;
}
for (cur_node = a_node; cur_node; cur_node = cur_node->next)
{
if (cur_node->type == XML_ELEMENT_NODE)
{
/* Check for if current node should be exclude or not */
printf("Node type: Text, name: %s\n", cur_node->name);
}
else if(cur_node->type == XML_TEXT_NODE)
{
/* Process here text node, It is available in cpStr :TODO: */
printf("node type: Text, node content: %s, content length %d\n", (char *)cur_node->content, strlen((char *)cur_node->content));
}
traverse_dom_trees(cur_node->children);
}
}
int main(int argc, char **argv)
{
htmlDocPtr doc;
xmlNode *roo_element = NULL;
if (argc != 2)
{
printf("\nInvalid argument\n");
return(1);
}
/* Macro to check API for match with the DLL we are using */
LIBXML_TEST_VERSION
doc = htmlReadFile(argv[1], NULL, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
if (doc == NULL)
{
fprintf(stderr, "Document not parsed successfully.\n");
return 0;
}
roo_element = xmlDocGetRootElement(doc);
if (roo_element == NULL)
{
fprintf(stderr, "empty document\n");
xmlFreeDoc(doc);
return 0;
}
printf("Root Node is %s\n", roo_element->name);
traverse_dom_trees(roo_element);
xmlFreeDoc(doc); // free document
xmlCleanupParser(); // Free globals
return 0;
}
Open Visual Studio Command Promt
Go To D:\demo directory
execute cl libxml_test.cpp /I".\libxml2-2.7.8.win32\include" /I".\iconv-1.9.2.win32\include" /link libxml2-2.7.8.win32\lib\libxml2.lib command
Run binary using libxml_test.exe test.html command(Here test.html may be any valid HTML file)
You can refere this answer.
here they store data into structure format and use further by passing structure address to a function.
You can find detail code in c for use.
code ->> this

Resources