What is an XML parser? Using Expat - c

This might seem like a simple question.
But I have been looking for an XML parser to use in one of my applications that is running on Linux.
I am using Expat and have parsed my XML file by reading one in. However, the output is the same as the input.
This is my file I am reading in:
<?xml version="1.0" encoding="utf-8"?>
<books>
<book>
<id>1</id>
<name>Hello, world!</name>
</book>
</books>
However, after I have passed this, I get exactly the same as the output. It makes me wonder what the parser is for?
Just one more thing. I am using Expat. Which seems quite difficult to use. My code is below: This reads in a file. But my application will have to parse a buffer that will be received by a socket, and not from a file. Is there any samples of this that anyone has?
int parse_xml(char *buff)
{
FILE *fp;
fp = fopen("mybook.xml", "r");
if(fp == NULL)
{
printf("Failed to open file\n");
return 1;
}
/* Obtain the file size. */
fseek (fp, 0, SEEK_END);
size_t file_size = ftell(fp);
rewind(fp);
XML_Parser parser = XML_ParserCreate(NULL);
int done;
memset(buff, 0, sizeof(buff));
do
{
size_t len = fread(buff, 1, file_size, fp);
done = len < sizeof(buff);
if(XML_Parse(parser, buff, len, done) == XML_STATUS_ERROR)
{
printf("%s at line %d\n", XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
}
while(!done);
fclose(fp);
XML_ParserFree(parser);
return 0;
}

Expat is an even-driven parser. You have to write code to deal with tags, attributes etc. and then register the code with the parser. There is an article here which describes how to do this.
Regarding reading from a socket, depending on your platform you may be able to treat the socket like like a file handle. Otherwise, you need to do your own reading from the socket and then pass the data to expat explicitly. There is an API to do this. However, I'd try to get it working with ordinary files first.

It took a while to wrap my head around XML parsing (though I do it in Perl, not C). Basically, you register callback functions. The parser will ping your callback for each node and pass in a data structure containing all kinds of juicy bits (like plaintext, any attributes, children nodes, etc). You have to maintain some kind of state information--like a hash tree you plug stuff into, or a string that contains all the guts, but none of the XML.
Just remember that XML is not linear and it doesn't make much sense to parse it like a long hunk of text. Instead, you parse it like a tree. Good luck.

Instead of expat, you might want to have a look at libxml2, which is probably already included in your distribution. It's a lot more powerful than expat, and gives you all sorts of goodies: DOM (tree mode), SAX (streaming mode), XPath (indispensable to do anything complex with XML IMHO) and more. It's not as lightweight as expat, but it's a lot easier to use.

Well, you chose the most complicated XML parser (event-driven parsers are more difficult to handle). Why Expat and not libxml?

Related

Proper methods to Copy files/folders programmatically in C using POSIX functions

These terms may not be 100% accurate, but I'm using the GCC compiler and POSIX library. I have C code compiled with the SQLite amalgamation file to a single executable.
In the user interface that exchanges JSON messages with the C program, I'd like to make it possible for users to copy the SQLite database files they create through the C program, and copy a full directory/folder.
Thus far, I've been able to rename and move files and folders programmatically.
I've read many questions and answers here, at Microsoft's C runtime library, and other places but I must be missing the fundamental points. I'm using regular old C, not C++ or C#.
My question is are there POSIX functions similar to rename(), _mkdir(), rmdir(), remove(), _stat(), that allow for programmatic copying of files and folders in Windows and Linux?
If not, can one just make a new folder and/or file and fread/fwrite the bytes from the original file to the new file?
I am primarily concerned with copying SQLite database files, although I wouldn't mind knowing the answer in general also.
Is this answer an adequate method?
Is the system() function a poor method? It seems to work quite well. However, it took awhile to figure out how to stop the messages, such as "copied 2 files" from being sent to stdout and shutting down the requesting application since it's not well-formed JSON. This answer explains how and has a link to Microsoft "Using command redirection operators". A /q in xcopy may or may not be necessary also, but certainly didn't do the job alone.
Thank you very much for any direction you may be able to provide.
The question that someone suggested as an answer and placed the little submission box on this question is one that I had already linked to in my question. I don't mean to be rude but, if it had answered my question, I would not have written this one. Thank you whoever you are for taking the time to respond, I appreciate it.
I don't see how that would be a better option than using system() because with the right parameters all the sub-directories and files of a single parent folder can be copied in one statement without having to iterate through all of them manually. Is there any reason why it would not be better to use system() apart from the fact that code will need to be different for each OS?
Handling errors are a bit different because system() doesn't return an errno but an exit code; however, the errors can be redirected from stderr to a file and pulled from there, when necessary
rename(): posix
_mkdir(): not posix. You want mkdir which is. mkdir takes two arguments, the second of which should usually be 077.
rmdir(): posix
remove(): posix
_stat(): not posix, you want stat() which is.
_stat and _mkdir are called as such on the Windows C library because they're not quite compatible with the modern Unix calls. _mkdir is missing an argument, and _stat looks like a very old version of the Unix call. You'll have trouble on Windows with files larger than 2GB.
You could do:
#ifdef _WIN32
int mkdir(const char *path, int mode) { return _mkdir(path); } /* In the original C we could have #defined this but that doesn't work anymore */
#define stat _stat64
#endif
but if you do so, test it like crazy.
In the end, you're going to be copying stuff with stdio; this loop works. (beware the linked answer; it has bugs that'll bite ya.)
int copyfile(const char *src, const char *dst)
{
const int bufsz = 65536;
char *buf = malloc(bufsz);
if (!buf) return -1; /* like mkdir, rmdir, return 0 for success, -1 for failure */
FILE *hin = fopen(src, "rb");
if (!hin) { free(buf); return -1; }
FILE *hout = fopen(dst, "wb");
if (!hout) { free(buf); fclose(hin); return -1; }
size_t buflen;
while ((buflen = fread(buf, 1, bufsz)) > 0) {
if (buflen != fwrite(buf, 1, buflen)) {
fclose(hout);
fclose(hin);
free(buf);
return -1; /* IO error writing data */
}
}
free(buf);
int r = ferror(hin) ? -1 : 0; /* check if fread had indicated IO error on input */
fclose(hin);
return r | (fclose(hout) ? -1 : 0); /* final case: check if IO error flushing buffer -- don't omit this it really can happen; calling `fflush()` won't help. */
}

POSIX ways for generation of a temporary file with a nice file name

I want to generate a temporary file with a "nice" name like
my-app-Mar27-120357-Qf3K0a.html
while following the best practices for security.
POSIX offers me mkstemp(3) which takes a filename template (typically something like /tmp/my-app-XXXXXX) but it has two problems:
I need to choose the output directory myself. When I see glibc tempnam(3) (which is deprecated for a security reason) considers many factors, I wish to let the library function choose it.
There's no extension in the file name
The second item can be addressed by mkstemps(3) which takes a number of characters to keep as a user-defined extension. In my case, I can pass my-app-Mar27-120357-XXXXXX.html and 5
but it has its own problems:
I still need to choose the output directory
It isn't perfectly portable. NetBSD seems to lack it.
So I'm considering to use the deprecated tempnam(3) to generate a filename with the output directory path, overwrite the filename part with X and feed it to mkstemp(3), and then rename the file to my preferred format. So the problem lies in the last step, renaming without overwrite; is it possible in POSIX?
Or could there be any better alternatives?
Let mkstemp make the file it wants to make, in the POSIX-compliant way that it wants to. Use symlink to make a symbolic link from a source file and path of your choice to a destination that matches whatever comes from using mkstemp. Remove the symbolic link when you're done.
Another approach is to simply munge the template and add your path. We describe such a function in the BEDOPS toolkit here, used by the sort-bed application to allow the end user to specify where temporary intermediate files are stored: https://github.com/bedops/bedops/blob/6da835468565dfc30a3fcb65807e91fcf133ea2b/applications/bed/sort-bed/src/SortDetails.cpp#L115
FILE *
createTmpFile(char const* path, char** fileName)
{
FILE* fp;
int fd;
char* tmpl;
if (path == NULL)
{
fileName = NULL;
return tmpfile();
}
tmpl = static_cast<char*>( malloc(1 + strlen(path) + L_tmpnam) );
strcpy(tmpl, path);
strcpy(tmpl+strlen(path), "/sb.XXXXXX");
fd = mkstemp(tmpl);
if(fd == -1)
{
fprintf(stderr, "unable to create temp file!\n");
return NULL;
}
fp = fdopen(fd, "wb+");
*fileName = static_cast<char*>( malloc(strlen(tmpl) + 1) );
strcpy(*fileName, tmpl);
free(tmpl);
return fp;
}
This uses the L_tmpnam macro, part of the stdio library, to set the number of characters that the variable tmpl (the filename, ultimately) can store.
This compiles and works under Linux and OS X (BSD) hosts and also uses POSIX routines.
It is more complex than my other solution but it might work better for your use case.

Writing information from files into an archive file?

I'm a relatively decent Java programmer, but completely new to C. Thus some of these functions, pointers, etc are giving me some trouble...
I'm trying to create an archive file (I'm basically re-writing the ar sys call). I can fstat the files I want, store the necessary information into a struct I've defined. But now comes my trouble. I want to write the struct to the archive file. So I was thinking I could use sprintf() to put my struct into a buffer and then just write the buffer.
sprintf(stat_buffer, "%s", file_struct);
write(fd, stat_buffer, 60);
This doesn't appear to work. I can tell the size of the archive file is increasing by the desired 60 bytes, but if I cat the file, it prints nonsense.
Also, trying to write the actual contents of the file isn't working either...
while (iosize = read(fd2, text_buffer, 512) > 0) {
write(fd, text_buffer, iosize);
if (iosize == -1) {
perror("read");
exit(1);
}
}
I'm sure this is a relatively easy fix, just curious as to what it is!
Thanks!
%s is used to print string. So sprint will stop when it will meet a \0 character.
Instead, you could directly write your structure to your file. write(fd, &file_struct, sizeof(filestruct)); but you wont be able to read it with a cat call. You still can, from another unarchiver program, read the file content and store it to a structure read(fd, &filestruct, sizeof(filestruct));
This system is not perfect anyways, because it will store the structure using your computer endianess and wont be portable. If you want to do it right, check out the ar file format specification.

how can I search for a file using C

I've been looking for a way to search for a file based on a pattern (*-stack.txt for example) over the last few days and have been having a very difficult time finding a way to do so, having said that I was wondering if anyone knew of a way to do this? Have searched around on google and such as well, but could not really find anything of use :/ this would just serve to search a linux directory for files that match a certain pattern
(an example of directory plus out)
/dev/shm/123-stack.txt abc-stack.txt overflow-stack.txt
searching for *-overflow.txt would return all of the above files
Your best bet is probably glob(3). It does almost exactly what you want. From what you've said a sketch of the proper code is
char glob_pattern[PATH_MAX];
glob_t glob_result;
snprintf(glob_pattern, PATH_MAX, "%s/%s", directory, file_pattern);
glob(glob_pattern, 0, NULL, &glob_result);
for (size_t i = 0; i < glob_result.gl_pathc; ++i) {
char *path = glob_result.gl_pathv[i];
/* process path */
}
I think you should use the opendir system call, like it's described in this question.
But it's going to be a lot more work on top of that - hence higher-level languages providing better interfaces.

D file I/O functions

I'm just learning D. Looks like a great language, but I can't find any info about the file I/O functions. I may be being dim (I'm good at that!), so could somebody point me in the right direction, please?
Thanks
Basically, you use the File structure from std.stdio.
import std.stdio;
void writeTest() {
auto f = File("1.txt", "w"); // create a file for writing,
scope(exit) f.close(); // and close the file when we're done.
// (optional)
f.writeln("foo"); // write 2 lines of text to it.
f.writeln("bar");
}
void readTest() {
auto f = File("1.txt"); // open file for reading,
scope(exit) f.close(); // and close the file when we're done.
// (optional)
foreach (str; f.byLine) // read every line in the file,
writeln(":: ", str); // and print it out.
}
void main() {
writeTest();
readTest();
}
What about the std.stdio module?
For stuff related specifically to files (file attributes, reading/writing a file in one go), look in std.file. For stuff that generalizes to standard streams (stdin, stdout, stderr) look in std.stdio. You can use std.stdio.File for both physical disk files and standard streams. Don't use std.stream, as this is scheduled for deprecation and doesn't work with ranges (D's equivalent to iterators).
Personally I find C-style file I/O favourable. I find it one of the most clear to use I/O's, especially if you work with binary files. Even in C++ I don't use streams, beside added safety it's just plain clumsy (much as I prefer printf over streams, excellent how D has a type-safe printf!).

Resources