libxml2 get offset into XML text of node - c

I need to know at which offset into an xml string a specific arbitrary node somewhere in dom can be found after xmlReadMemory was used to get dom. The problem is I can't figure out where to get the xmlParserCtxtPtr from to pass as first argument to xmlParserFindNodeInfo because my entire process of parsing yields no such context; only a xmlDoc.

The following code worked for me (libxml2 documentation leaves little to desire, had to download source code and dig in the lib until I understood enough to hack this together). The key is:
xmlSetFeature(ctxt, "gather line info", (void *)&v);
Here is some code to illustrate:
const char *xml = ...
xmlParserCtxt *ctxt = NULL;
xmlDoc *doc = NULL;
if (!(ctxt = xmlCreateDocParserCtxt((const unsigned char *)xml)))
return -1;
int v = 1;
xmlSetFeature(ctxt, "gather line info", (void *)&v);
if (xmlParseDocument(ctxt) == -1)
{
xmlFreeParserCtxt(ctxt);
return -1;
}
else
{
if ((ctxt->wellFormed) || ctxt->recovery)
doc = ctxt->myDoc;
else
{
xmlFreeParserCtxt(ctxt);
return -1;
}
}
// use doc to get a node and then xmlParserFindNodeInfo(ctxt, node)
…
xmlFreeParserCtxt(ctxt);

Related

libgit2 does not return a valid blob

I'm trying to get a blob of a repository with libgit2:
#include <git2.h>
#include <stdio.h>
int main() {
git_libgit2_init();
git_repository *repo = NULL;
int error = git_repository_open(&repo, "/home/martin/Dokumente/TestRepository");
if (error < 0) {
const git_error *e = git_error_last();
printf("Error %d/%d: %s\n", error, e->klass, e->message);
exit(error);
}
git_diff *diff = NULL;
git_diff_options opts = GIT_DIFF_OPTIONS_INIT;
opts.flags |= GIT_DIFF_IGNORE_WHITESPACE;
opts.flags |= GIT_DIFF_INCLUDE_UNTRACKED;
error = git_diff_index_to_workdir(&diff, repo, NULL, &opts);
if (error < 0) {
const git_error *e = git_error_last();
printf("Error %d/%d: %s\n", error, e->klass, e->message);
exit(error);
}
git_patch* patch = nullptr;
git_patch_from_diff(&patch, diff, 0);
bool oldFile = false;
const git_diff_delta *dd = git_patch_get_delta(patch);
const git_oid &id = (!oldFile) ? dd->new_file.id : dd->old_file.id;
git_object *obj = nullptr;
git_object_lookup(&obj, repo, &id, GIT_OBJECT_ANY);
git_blob* blob = reinterpret_cast<git_blob *>(obj);
const char* pointer = (const char*)git_blob_rawcontent(blob);
// cleanup
git_object_free(obj);
git_patch_free(patch);
git_diff_free(diff);
git_repository_free(repo);
return 0;
}
The repository
create a new repository
commit a file like:
1
2
3
4
remove the 4 again, but do not commit
let the program run
Expected:
The program runs fine.
Observed:
obj is still a nullptr after executing
git_object_lookup()
When setting the variable oldFile to true, the program runs fine and the pointer "pointer" contains the raw blob.
Does anybody know why I don't get a valid object from git_object_lookup() back?
The problem is that you're trying to get an object of id dd->new_file.id. This file is in the working directory, as it hasn't been added or committed yet. This means it isn't in the repository yet. When you run git_object_lookup(), it can't find the object as it hasn't been added to the tree. The OID doesn't correspond to any match so it returns null.
If you want to get the current working directory data, you must first create the object in the tree using git_blob_create_from_workdir, and then when trying to access it, it would be found. So your new code might look like:
bool oldFile = false;
const git_diff_delta *dd = git_patch_get_delta(patch);
git_oid id;
if (!oldFile) {
error = git_blob_create_from_workdir(&id, repo, dd->new_file.path);
if (error < 0) {
const git_error *e = git_error_last();
printf("Error %d/%d: %s\n", error, e->klass, e->message);
exit(error);
}
} else {
id = dd->old_file.id;
}
git_object *obj = nullptr;
git_object_lookup(&obj, repo, &id, GIT_OBJECT_ANY);
git_blob* blob = reinterpret_cast<git_blob *>(obj);
const char* pointer = (const char*)git_blob_rawcontent(blob);
When you diff between the index and the workdir, the new side of the delta represents the file in the working directory. Its id is the hash of the file on disk. Unless you explicitly insert that blob into the repository's object store by some other means, there's no reason for it to be there yet.

Apparently allocating memory and freeing it properly but program still crashes

So I've got a weird problem and can't seem to solve it. I have an ADT called TEAM:
typedef struct Team {
char *name;
int points;
int matches_won;
int goal_difference;
int goals_for;
}TEAM;
I created a function to initialize variables of the TEAM* type with a given name:
TEAM *createTEAM (char *name){
int error_code;
if (name != NULL){
if(strcmp(name, "") != 0){
TEAM *new_team = (TEAM*)malloc(sizeof(TEAM));
new_team->name = (char*)malloc(sizeof(char)*strlen(name));
strcpy(new_team->name, name);
new_team->points = 0;
new_team->matches_won = 0;
new_team->goal_difference = 0;
new_team->goals_for = 0;
return new_team;
}else{
error_code = EMPTY_STRING_CODE;
}
} else {
error_code = NULL_STRING_CODE;
}
printf("Erro ao criar time.\n");
printError(error_code);
return NULL;
}
I also created a function to delete one of these TEAM* variables properly:
void deleteTEAM (TEAM *team_to_remove){
free(team_to_remove->name);
team_to_remove->name = NULL;
free(team_to_remove);
team_to_remove = NULL;
}
But when one or multiple test functions that I created (example below) run, the program sometimes crashes, sometimes doesn't. I've noticed that changing the names I use affects whether it crashes or not, even if they don't affect the test results.
int create_team_01(){
int test_result;
TEAM *Teste = createTEAM("Cruzeiro");
if (strcmp(Teste->name, "Cruzeiro") == 0){
test_result = TRUE;
}else test_result = FALSE;
_assert(test_result); //just a macro function that will check the argument and return 1 if it's false
deleteTEAM(Teste);
return 0;
}
I don't see any problems with memory allocation or freeing. Still, the debugger complains a lot about the first free() (can't find bounds) of the deleteTEAM function. Any ideas? Thanks a lot in advance for any help.
P.S.: I've even tried checking the mallocs' results, but it doesn't seem to be the problem either, so I removed it for the sake of simplicity.

How to add a xml node constructed from string in libxml2

I am using Libxml2 for encoding the data in a xml file. My data contain tags like "<" and ">". when it is converted into xml these tags are also converted into "&lt" and "&gt". Is there any way to solve this problem. I want to use those tags as xml nodes while decoding that xml file, so CDATA is not a solution for this problem. Please give any solution for this. Thanks.
Example Code:
xmlNewChild(node, NULL, (xmlChar *)"ADDRESS", (xmlChar *)"<street>Park Street</street><city>kolkata</city>");
and output of above code is:
<person>
<ADDRESS><street>Park Street</street><city>Kolkata</city></ADDRESS>
If you want a string to be treated as xml, then you should parse it and obtain xmlDoc from it, using xmlReadMemory. It could be usable for larger strings, but usually the document is builded using single step instructions, like in Joachim's answer. Here I present xmlAddChildFromString function to do the stuff in a string way.
#include <stdio.h>
#include <string.h>
#include <libxml/parser.h>
#include <libxml/tree.h>
/// Returns 0 on failure, 1 otherwise
int xmlAddChildFromString(xmlNodePtr parent, xmlChar *newNodeStr)
{
int rv = 0;
xmlChar *newNodeStrWrapped = calloc(strlen(newNodeStr) + 10, 1);
if (!newNodeStrWrapped) return 0;
strcat(newNodeStrWrapped, "<a>");
strcat(newNodeStrWrapped, newNodeStr);
strcat(newNodeStrWrapped, "</a>");
xmlDocPtr newDoc = xmlReadMemory(
newNodeStrWrapped, strlen(newNodeStrWrapped),
NULL, NULL, 0);
free(newNodeStrWrapped);
if (!newDoc) return 0;
xmlNodePtr newNode = xmlDocCopyNode(
xmlDocGetRootElement(newDoc),
parent->doc,
1);
xmlFreeDoc(newDoc);
if (!newNode) return 0;
xmlNodePtr addedNode = xmlAddChildList(parent, newNode->children);
if (!addedNode) {
xmlFreeNode(newNode);
return 0;
}
newNode->children = NULL; // Thanks to milaniez
newNode->last = NULL; // for fixing
xmlFreeNode(newNode); // the memory leak.
return 1;
}
int
main(int argc, char **argv)
{
xmlDocPtr doc = xmlNewDoc(BAD_CAST "1.0");
xmlNodePtr root = xmlNewNode(NULL, BAD_CAST "root");
xmlDocSetRootElement(doc, root);
xmlAddChildFromString(root,
"<street>Park Street</street><city>kolkata</city>");
xmlDocDump(stdout, doc);
xmlFreeDoc(doc);
return(0);
}
You have to call xmlNewChild in a chain, one call for the parent node and a call each for each sub-node:
xmlNodePtr *addressNode = xmlNewChild(node, NULL, (xmlChar *) "address", NULL);
xmlNewChild(addressNode, NULL, (xmlChar *) "street", "Park Street");
xmlNewChild(addressNode, NULL, (xmlChar *) "city", "Koltaka");
You can try to use function xmlParseInNodeContext. It allows you to parse raw XML in the context of parent node, and constructs a node that can be attached to the parent.
For example:
const char * xml = "<a><b><c>blah</c></b></a>";
xmlNodePtr new_node = NULL;
// we assume that 'parent' node is already defined
xmlParseInNodeContext(parent, xml, strlen(xml), 0, &new_node);
if (new_node) xmlAddChild(parent, new_node);
I'm now using the following code to inject XML text (possibly containing multiple elements) into an existing node (thanks to Nazar and nwellnhof for the one answer and referring me from my question (Injecting a string into an XML node without content escaping) to this one):
std::string xml = "<a>" + str + "</a>";
xmlNodePtr pNewNode = nullptr;
xmlParseInNodeContext(pParentNode, xml.c_str(), (int)xml.length(), 0, &pNewNode);
if (pNewNode != nullptr)
{
// add new xml node children to parent
xmlNode *pChild = pNewNode->children;
while (pChild != nullptr)
{
xmlAddChild(pParentNode, xmlCopyNode(pChild, 1));
pChild = pChild->next;
}
xmlFreeNode(pNewNode);
}
It takes the string (str) adds a surrounding element (< a >...< a/ >), parses the string using xmlParseInNodeContext and then adds the children of the new node to the parent. It is important to add the children of the new node and not the new node to avoid having < a >...< a/ > in the final XML.

libxml2 and XPath traversing children and siblings in ANSI C

I have done a fair bit of XML stuff in Perl and now I need to do it in ANDI C for a project. Here's the code I wrote with a snippet of the XML. I have had success to a degree, but am having problems with getting siblings, I am sure it's super easy but I just can't get it. There is two functions, one that simply gets the node set (copied directly from xmlsoft.org). The second function is mine.
xmlXPathObjectPtr getnodeset (xmlDocPtr doc, xmlChar *xpath){
xmlXPathContextPtr context;
xmlXPathObjectPtr result;
context = xmlXPathNewContext(doc);
if (context == NULL) {
printf("Error in xmlXPathNewContext\n");
return NULL;
}
result = xmlXPathEvalExpression(xpath, context);
xmlXPathFreeContext(context);
if (result == NULL) {
printf("Error in xmlXPathEvalExpression\n");
return NULL;
}
if(xmlXPathNodeSetIsEmpty(result->nodesetval)){
xmlXPathFreeObject(result);
printf("No result\n");
return NULL;
}
return result;
}
void reader(xmlDocPtr xmlDoc, char *xpath)
{
xmlXPathObjectPtr xpathresult;
xmlNodeSetPtr node;
xmlNodeSetPtr node2;
xmlChar *title;
int cnt;
// parse feed in memory to xml object
doc = xmlReadMemory(xmlDoc,strlen(xmlDoc),"noname.xml",NULL,0);
if (!doc) criterr("Error parsing xml document");
// get xpath node set (ttn retrieves the value from the token table)
xpathresult = getnodeset(doc, ( xmlChar * ) xpath);
if (xpathresult) {
node = xpathresult->nodesetval;
printf("Content-type: text/html\n\n");
for (cnt=0;cnt<node->nodeNr; cnt++) {
title = xmlNodeListGetString(doc, node->nodeTab[cnt]->xmlChildrenNode,1);
printf("%d) title= %s<br/>\n",cnt,title);
xmlFree(title);
}
xmlXPathFreeObject(xpathresult);
xmlFreeDoc(doc);
xmlCleanupParser();
} else {
criterr("Xpath failed");
}
xmlFreeDoc(doc);
criterr("Success");
}
and the xml snippet
<item>
<title>this is the title</title>
<link>this is the link</link>
<description>this is the description</description>
</item>
if I use an XPath like //item/title I get all the titles, but what I really want is to get the item and then in the node->nodeNr loop, be able to get the title, link and description easily as I have 100's of 'item' blocks, I'm just not sure how to get the children or siblings of that block easily.
Use xmlNextElementSibling. How does one locate it? Go to Tree API, search for sibling.
And this is your loop now getting also the link.
for (cnt=0;cnt<node->nodeNr; cnt++) {
xmlNodePtr titleNode = node->nodeTab[cnt];
// titleNode->next gives empty text element, so better:
xmlNodePtr linkNode = xmlNextElementSibling(titleNode);
title = xmlNodeListGetString(doc, titleNode->xmlChildrenNode,1);
link = xmlNodeListGetString(doc, linkNode->xmlChildrenNode,1);
printf("%d) title= %s<br/>, link=%s\n",cnt,title,link);
xmlFree(title);
xmlFree(link);
}
titleNode->next may also be made to point the link, see how to get these XML elements with libxml2?.
And getting children? xmlFirstElementChild and loop while node->next.

Problem in retrieving the ini file through web page

I am using an .ini file to store some values and retrieve values from it using the iniparser.
When I give (hardcode) the query and retrive the value through the command line, I am able to retrive the ini file and do some operation.
But when I pass the query through http, then I am getting an error (file not found), i.e., the ini file couldn't be loaded.
Command line :
int main(void)
{
printf("Content-type: text/html; charset=utf-8\n\n");
char* data = "/cgi-bin/set.cgi?pname=x&value=700&url=http://IP/home.html";
//perform some operation
}
Through http:
.html
function SetValue(id)
{
var val;
var URL = window.location.href;
if(id =="set")
{
document.location = "/cgi-bin/set.cgi?pname="+rwparams+"&value="+val+"&url="+URL;
}
}
.c
int * Value(char* pname)
{
dictionary * ini ;
char *key1 = NULL;
char *key2 =NULL;
int i =0;
int val;
ini = iniparser_load("file.ini");
if(ini != NULL)
{
//key for fetching the value
key1 = (char*)malloc(sizeof(char)*50);
if(key1 != NULL)
{
strcpy(key1,"ValueList:");
key2 = (char*)malloc(sizeof(char)*50);
if(key2 != NULL)
{
strcpy(key2,pname);
strcat(key1,key2);
val = iniparser_getint(ini, key1, -1);
if(-1 == val || 0 > val)
{
return 0;
}
}
else
{
//error
free(key1);
return;
}
}
else
{
printf("ERROR : Memory Allocation Failure ");
return;
}
}
else
{
printf("ERROR : .ini File Missing");
return;
}
iniparser_freedict(ini);
free(key1);
free(key2);
return (int *)val;
}
void get_Value(char* pname,char* value)
{
int result =0;
result = Value(pname);
printf("Result : %d",result);
}
int main(void)
{
printf("Content-type: text/html; charset=utf-8\n\n");
char* data = getenv("QUERY_STRING");
//char* data = "/cgi-bin/set.cgi?pname=x&value=700&url=http://10.50.25.40/home.html";
//Parse to get the values seperately as parameter name, parameter value, url
//Calling get_Value method to set the value
get_Value(final_para,final_val);
}
*
file.ini
*
[ValueList]
x = 100;
y = 70;
When the request is sent through html page, I am always getting .ini file missing. If directly the request is sent from C file them it works fine.
How to resolve this?
Perhaps you have a problem with encoding of the URL parameters? You can't just pass any arbitrary string through a URL - there are some characters that must be encoded. Read this page about URL encoding.
Showing the value of the data string in your C program could be of great help with solving your problem.
Update:
There could be a difference as to where your program executes when called by the web server or directly by you. Are you sure it's being executed with the same "current directory". Chances are it's different, and thus when you attempt to open the ini file you fail. Try to print out the current directory (i.e. using the getcwd function) and compare both cases.

Resources