Read contents from xml file and store in an array - c

I'm working with xml for the first time and I have some problems in storing the contents of the xml file in an array. I'm using libxml2 for parsing the xml file and I'm able to get the data and able to print it. The code is given below:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>
#include <wchar.h>
wchar_t buffer[7][50]={"\0"};
static void parseDoc(const char *docname)
{
xmlDocPtr doc;
xmlNodePtr cur;
xmlChar *key;
int i=0;
doc = xmlParseFile(docname);
if (doc == NULL ) {
fprintf(stderr,"Document not parsed successfully. \n");
return;
}
cur = xmlDocGetRootElement(doc);
if (cur == NULL)
{
fprintf(stderr,"empty document\n");
xmlFreeDoc(doc);
return;
}
cur = cur->xmlChildrenNode;
while (cur != NULL)
{
key = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
wmemcpy(buffer[i],(wchar_t*)(key),size(key)); /*segmentation fault at this stage*/
printf("Content : %s\n", key);
xmlFree(key);
i++;
cur = cur->next;
}
xmlFreeDoc(doc);
return;
}
int main(void)
{
const char *docname="/home/workspace/TestProject/Text.xml;
parseDoc (docname);
return (1);
}
The sample xml file is provided below
<?xml version="1.0"?>
<story>
<author>John Fleck</author>
<datewritten>June 2, 2002</datewritten>
<keyword>example keyword</keyword>
<headline>This is the headline</headline>
<para>This is the body text.</para>
</story>
The output of the file contents when printed on the screen were as below
Content : null
Content : John Fleck
Content : null
Content : June 2, 2002
Content : null
Content : example keyword
Content : null
Content : This is the headline
Content : null
Content : This is the body text.
I feel that the content of the file being null in few places is causing the problem in copy and hence generating the segmentation fault. Please let me know how to fix the problem and is there an better way to get the thing done. I had done a similar xml file read using MSXML parser and this is my first time with Linux API's.
EDIT The copying part is performed as below but the contents of the wchart array are garbled. Further help would be appreciated.
while (cur != NULL) {
key = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
if(key!=NULL)
{
wmemcpy(DiscRead[i],(const wchar_t *)key,sizeof(key));
i++;
}
printf("keyword: %s\n", key);
xmlFree(key);
cur = cur->next;
}

Your code has multiple problems:
You use wchar_t for your string array. This isn't appropriate for the UTF-8 encoded strings you'll get from libxml2. You should stick with xmlChar or use char.
You use xmlNodeListGetString to get the text content of nodes passing cur->xmlChildrenNode as node list. The latter will be NULL for text nodes, so xmlNodeListGetString will return NULL as an error condition. You should simply call xmlNodeGetContent on the current node but only if it is an element node.
Using xmlChildrenNode as field name is deprecated. You should use children.
The call to wmemcpy is dangerous. I'd suggest something safer like strlcpy.
Try something like this:
char buffer[7][50];
static void parseDoc(const char *docname)
{
xmlDocPtr doc;
xmlNodePtr cur;
xmlChar *key;
int i = 0;
doc = xmlParseFile(docname);
if (doc == NULL) {
fprintf(stderr, "Document not parsed successfully. \n");
return;
}
cur = xmlDocGetRootElement(doc);
if (cur == NULL) {
fprintf(stderr, "empty document\n");
xmlFreeDoc(doc);
return;
}
for (cur = cur->children; cur != NULL; cur = cur->next) {
if (cur->type != XML_ELEMENT_NODE)
continue;
key = xmlNodeGetContent(cur);
strlcpy(buffer[i], key, 50);
printf("Content : %s\n", key);
xmlFree(key);
i++;
}
xmlFreeDoc(doc);
}
You should also check that i doesn't overrun the number of strings in your array.

buffer array is not large enough. Increase buffer size to buffer[7+3][50]
wchar_t buffer[7][50]={"\0"};
...
while (cur != NULL) {
wmemcpy(buffer[i],(wchar_t*)(key),size(key)); /*segmentation fault */
printf("Content : %s\n", key);
...
i++;
}
The output is 10 lines of "Content : ...". Thus i incremented form 0 to 9. But buffer may only be indexed 0 to 6. Indexing 7 and later is undefined behavior and this eventually manifested itself as a segment fault.

Related

How to find the character index of an XPath match with libxml2?

Given an XML file (stored in, say, sample.xml), and an XPath expression (say, //storyinfo), I want to get the character index in the XML file of the start of each node that results from evaluating the XPath expression.
A line number and a column number would also be fine.
What I've tried:
Given a sample XML like this one, stored in sample.xml:
<?xml version="1.0" encoding="utf-8"?>
<story>
<storyinfo>
<author>John Fleck</author>
<datewritten>June 2, 2002</datewritten>
<keyword>example keyword</keyword>
</storyinfo>
<body>
<headline>This is the headline</headline>
<para>This is the body text.</para>
</body>
</story>
I can get the "storyinfo" node using an XPath expression with libxml2, like so:
#include <stdio.h>
#include <stdlib.h>
#include <libxml/parser.h>
#include <libxml/tree.h>
#include <libxml/xpath.h>
int
main(void)
{
int i, size;
char *filename;
xmlChar *xpath_expr;
xmlDocPtr doc;
xmlNodePtr node;
xmlNodeSetPtr nodes;
xmlXPathContextPtr xpath_ctx;
xmlXPathObjectPtr xpath_obj;
/* Read file. */
filename = "sample.xml";
doc = xmlParseFile(filename);
if (doc == NULL) {
fprintf(stderr, "Error: unable to parse file \"%s\"\n",
filename);
exit(1);
}
/* Evaluate XPath expression. */
xpath_ctx = xmlXPathNewContext(doc);
if (xpath_ctx == NULL) {
fprintf(stderr, "Error: unable to create new XPath context\n");
xmlFreeDoc(doc);
exit(1);
}
xpath_expr = (xmlChar *)"//storyinfo";
xpath_obj = xmlXPathEvalExpression(xpath_expr, xpath_ctx);
if (xpath_obj == NULL) {
fprintf(stderr,
"Error: unable to evaluate XPath expression \"%s\"\n",
xpath_expr);
xmlXPathFreeContext(xpath_ctx);
xmlFreeDoc(doc);
exit(1);
}
/* Print XPath matches. */
nodes = xpath_obj->nodesetval;
size = (nodes) ? nodes->nodeNr : 0;
for (i = 0; i < size; ++i) {
node = nodes->nodeTab[i];
printf("Match %d - Name: %s\n", i, node->name);
printf("Match %d - Content: %s\n", i, node->content);
printf("Match %d - Line: %d\n", i, node->line);
printf("Match %d - Extra: %d\n", i, node->extra);
}
/* Cleanup. */
xmlXPathFreeObject(xpath_obj);
xmlXPathFreeContext(xpath_ctx);
xmlFreeDoc(doc);
return 0;
}
Here I saw that in libxml2, the node has a field line, with a value of 3, which is the line number in which the node is located in the XML file. However, that is not helpful if the XML is all in a single line (which isn't uncommon).
What I want instead is the character index: in the sample.xml above, that would be 50. Either that, or a line and column number: in the example above, that would be line 3, column 3.

C program failing to insert to file contents the previously read contents

I have this problem in my c program when I start to reinsert the contents of my file and save a new one. It fails in the while loop and i don't understand why it does that if i have some contents to reinsert from it.
here's my code:
'''
void init(){
char pn[30],pd[30],pp[30];
if ((flptr = fopen("MASTER.dat","r+")) == NULL) {
printf("Couldnt Get Cred");
return;
}
fscanf(flptr,"%s %s %s",pn,pd,pp);
while(!feof(flptr)){
r = (struct Records *) malloc(sizeof(struct Records));
int fr = fscanf(flptr,"%s %s %f",r->PartNum,r->PartDesc, &r->PartPrice);
if(fr == EOF){
printf("HERE");
break;
}
if(head == NULL){
head = r;
}
else{
tail->next = r;
}
tail = r;
}
fclose(flptr);
}
void put(){
if ((flptr = fopen("MASTER.dat","r")) == NULL) {
printf("Couldnt Get Cred");
return;
}
r = head;
fprintf(flptr,"PartNumber PartDescription PartPrice\n");
while (r != NULL){
fprintf(flptr,"%s %s %f\n", r->PartNum, r->PartDesc, r->PartPrice);
r = r->next;
}
fprintf(flptr,"Changes SAVED.");
fclose(flptr);
}
In the function put you open the file to read it, not to write, so your fprintf will have no effect and the file not be even created
if ((flptr = fopen("MASTER.dat","r")) == NULL) {
must be
if ((flptr = fopen("MASTER.dat","w")) == NULL) {
If later you try to read that non existing file with init you will not success
Out of that put and init use the global variable r and modify it, I encourage you to use a local variable to avoid possible problems
Why do you open file file with "r+" in init while you only read it ?
When you read string through (f)scanf I encourage you to limit the length to not write out of the receiver with an undefined behavior, and to always check the result, so for instance replace
fscanf(flptr,"%s %s %s",pn,pd,pp);
by
if (fscanf(flptr,"%29s %29s %29s",pn,pd,pp) != 3) {
printf("invalid file contain\n");
return;
}

How to read a .txt file from user input in C

I know how to hardcore a program to receive a file but when I try a similar tactic with scanf nothing happens. I mean that I have an error check that looks to see if it exist and if it has the right format but everytime I enter the filename it doens't print the printf statement below the scanf. Why is that? I also found out that I am opening the file but the while statement is infinite. Which doesn't make sense. I have tried another solution shown below but same results.
void parseFile(struct student_record_node** head)
{
FILE*input;
const int argCount = 4;
char filename[100]="";
const char rowformat[] = "%20s %20s %d %d";
struct student_record record;
struct student_record_node* node = NULL;
printf("\n Please Enter the FULL Path of the .txt file you like to load. \n");
scanf("%s",filename);
input = fopen(filename, "r");
printf("I am here");
if(input == NULL)
{
printf("Error: Unable to open file.\n");
exit(EXIT_FAILURE);
}
while(!feof(input))
{
/* creating blank node to fill repeatedly until end of document*/
memset(&record, 0, sizeof(struct student_record));
if(fscanf(input, rowformat, record.first_name_,record.last_name_,&record.student_id_,&record.student_age_) != argCount)
{
continue;
}
/* set node into the doubly linked list */
node = student_record_allocate();
/* copies values from the blank node reading from document into node in my linked list */
strcpy(node->record_->first_name_, record.first_name_);
strcpy(node->record_->last_name_, record.last_name_);
node->record_->student_id_ = record.student_id_;
node->record_->student_age_ = record.student_age_;
/* check if node right after absolute head is empty if so fills it */
if(*head == NULL)
{
*head = node;
}
else
{
printf(" stuck in loop\n");
/* if current isn't null start linking the node in a list */
appendNode(head,node);
}
}
fclose(input);
printf(" end of parsefile");
}
When I got to the parsefile() function and enter NEW.txt which is in the correct format and inside the same folder as the program itself. I know that my check is working when I enter a .txt file that doesn't exist or that is empty it gets caught like it should.
The expected behavior is that the program should load this list from new.txt and load it into a doubly linked list. Then return to a menu that gives user options. The doubly linked listed can then be manipulated such as add students manipulate data, deleting, saving and printing current roster. I have trouble using gdb with this program since I receive new.txt from parsefile.
Sample of New.txt contents. (Its just First Name, Last Name, Id, Age)
Belinda Homes 345 50
Scott Crown 456 18
Failed Solution: Using fgetc instead of feof
int c = fgetc(input);
while(c != EOF)
{
printf("\n in loop \n");
/* creating blank node to fill repeatedly until end of document*/
memset(&record, 0, sizeof(struct student_record));
if(fscanf(input, rowformat, record.first_name_,record.last_name_,&record.student_id_,&record.student_age_) != argCount)
{
continue;
}
/* set node into the doubly linked list */
node = student_record_allocate();
/* copies values from the blank node reading from document into node in my linked list */
strcpy(node->record_->first_name_, record.first_name_);
strcpy(node->record_->last_name_, record.last_name_);
node->record_->student_id_ = record.student_id_;
node->record_->student_age_ = record.student_age_;
/* check if node right after absolute head is empty if so fills it */
if(*head == NULL)
{
*head = node;
}
else
{
printf(" stuck in loop\n");
/* if current isn't null start linking the node in a list */
appendNode(head,node);
}
c = fgetc(input);
}
This is the easiest way to read from file and print out. I assume you want to print the file or do something with it.
int c;
FILE *file;
file = fopen("test.txt", "r");
if (file) {
while ((c = getc(file)) != EOF){
putchar(c);
}
fclose(file);
}

Read a XML file and print the tags using C

There is a XML file and I have to identify, store and print the Unique tags present in it.
Example XML File:
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
I need to store note,to,from,heading,body etc.. tags in an array and print them afterwards.
Below is the code I tried, but facing issue while checking and removing / from the closing tag to identify duplicate tags.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/*Max number of characters to be read/write from file*/
#define MAX_CHAR_FOR_FILE_OPERATION 1000000
int read_and_show_the_file()
{
FILE *fp;
char text[MAX_CHAR_FOR_FILE_OPERATION];
int i;
fp = fopen("/tmp/test.txt", "r");
if(fp == NULL)
{
printf("File Pointer is invalid\n");
return -1;
}
//Ensure array write starts from beginning
i = 0;
//Read over file contents until either EOF is reached or maximum characters is read and store in character array
while( (fgets(&text[i++],sizeof(char)+1,fp) != NULL) && (i<MAX_CHAR_FOR_FILE_OPERATION) ) ;
const char *p1, *p2, *temp;
temp = text;
while(p2 != strrchr(text, ">"))
{
p1 = strstr(temp, "<");
p2 = strstr(p1, ">");
size_t len = p2-p1;
char *res = (char*)malloc(sizeof(char)*(len));
strncpy(res, p1+1, len-1);
res[len] = '\0';
printf("'%s'\n", res);
temp = p2 + 1;
}
fclose(fp);
return 0;
}
main()
{
if( (read_and_show_the_file()) == 0)
{
printf("File Read and Print is successful\n");
}
return 0;
}
I also tried the strcmp to check the value of if(strcmp(res[0],"/")==0) to check the closing tag, but not working, showing segmentation fault. No example is present on C. Please review and suggest.
Below is the output:
'note'
'to'
'/to' //(Want to remove these closing tags from output)
'from'
'/from' //(Want to remove these closing tags from output)
and so on..
Segmentation fault also occurring.
This addresses just one of the segmentation faults in your question:
You have to give string to strcmp, you can not just give character (res[0]). But since you don't need to compare string, why don't you just compare the first character (res[0]=='/')?

error in my search linked list implementation

My program doesn't seem to be opening the text files properly.
I have a path.txt which is a string representation of all the paths of folders and text files which I have created. When running the program however, it will not output the LINES of the text file the user asked for.
OUTPUT
enter text file
warning: this program uses gets(), which is unsafe.
a1.txt
IT should have output
This is a1
text of a1.txt:
This is a1
text file: path.txt/ this is how my folder is set up with the text files.
a/a1.txt
a/a2.txt
a/b/b3.txt
a/b/b4.txt
a/c/c4.txt
a/c/c5.txt
a/c/d/d6.txt
a/c/d/g
a/c/d/h
a/c/e/i/i7.txt
a/c/f/j/k/k8.txt
code:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
typedef struct sMyPath{
char *element;
struct sMyPath *next;
} tMyPath;
int main(void)
{
FILE *pFile;
pFile = fopen("path.txt", "r");
char inputstr[1024];
tMyPath *curr, *first = NULL, *last = NULL;
//get the text file, and put it into a string inputstr
if (pFile != NULL)
{
while(!feof(pFile))
{
fgets(inputstr, sizeof(inputstr), pFile);
}
fclose(pFile);
}
else
{
printf("Could not open the file.\n");
}
//using tokens to get each piece of the string
//seperate directories and text files, put it into a link list
char *token = strtok(inputstr, "/");
while (token != NULL)
{
if(last == NULL){
//creating node for directory
first = last = malloc (sizeof (*first));
first -> element = strdup (token);
first -> next = NULL;
} else {
last -> next = malloc (sizeof (*last));
last = last -> next;
last -> element = strdup (token);
last -> next = NULL;
}
token = strtok(NULL, "/");
}
//ask user for txt file
char pathU[20];
printf("enter text file\n");
gets(pathU);
//check if text file exist, if yes output entires in text file, else say no
while(first != NULL)
{
if(first -> element == pathU)
{
FILE *nFile;
char texxt[300];
nFile = fopen(pathU, "r");
while (!feof(nFile))
{
fgets(texxt, 300, nFile);
printf("%s", texxt);
}
}
else if(first == NULL)
{
printf("invalid file name\n");
}
else
{
first = first -> next;
}
}
return 0;
}
I understand two possible requirement/implementation.
1) By your implementation, every link-node will contain just filename and directory name and NOT THE PATH-NAME. If you need to store entire pathname, use '\n' as delimiter.
char *token = strtok(inputstr, "\n");
and
token = strtok(NULL, "\n");
This assumes, when your input is a/a1.txt, your current directory contains the directory a and which in-turn contains the file a1.txt.
2) Otherwise, your existing code expects a1.txt to be in current directory, though it contradicts the input file content.
Either way, this below code is culprit,
if(first -> element == pathU)
which compares the pointer and not the string. Replace it as,
if( strcmp( first -> element, pathU ) == 0 )
I could help better solution if your requirement is more clear..
The problem seems to be in the string comparison: first -> element == pathU. Here you are comparing pointers, not the characters of the strings. Use strcmp instead: if (strcmp(first -> element, pathU) == 0) ...

Resources