I made a wrapper for some basic things in libxml2, stuff like grabbing element content, stepping into children nodes etc.
My super has just asked me to make sure I'm parsing the XML file serially and not loading the entire DOM into memory.
I'm pretty sure the I'm doing it serially, but I couldn't find any documentation on parsing one way or the other.
Any help is appreciated, thanks!
libxml2 can operate in either mode. It just depends how your code uses it. You can either parse the full file into a DOM, or use Sax callbacks to parse serially. What does your parsing code look like?
There are two different APIs you can use.
xmlTextReader is a streaming reader that you'd want to use, calling xmlTextReaderRead() repleatedly to advance the parser through the file.
http://xmlsoft.org/xmlreader.html
If you're working with xmlDocPtr/xmlNodePtr objects returned by things like xmlParseFile, then that's the tree-based DOM API.
http://xmlsoft.org/examples/index.html#tree1.c
Related
I'd like to use the PlantUML syntax to define component structures, which I want to process in an own tool. However, I'd like to avoid having to write a PlantUML parser. Is there some sort of intermediate representation in PlantUML, which I could use for that? It would be perfect to have e.g. a JSON structure which contains all diagram objects and relations among them in a concise way.
I could not find anything in the docs, maybe someone with more insights in the project can help?
As Jean-Marc Volle pointed out, the project github.com/jupe/puml2code allows to process puml files and generate source code in different languages using handlebar templates. Currently the code generation is limited to classes in a puml file.
I have used puml2code as a starting point for a new project github.com/robbito/puml2json, which simplifies the process a bit, as it doesn't require handlebar. Json ist directly generated from the PlantUML code. puml2json currenlty also only supports a subset of PlantUML.
I want to play a media file from a memory stream using LibVLC like so:
//Ideally it would go like this:
LibVLC.MediaFromStream = new MemoryStream(File.ReadAllBytes(File_Path));
Of course this is a very oversimplified version of what I want but hopefully it conveys what I am looking for.
The reason being that I want there to be a good amount of portability for what I'm doing without having to track file locations and such. I'd rather have a massive clump of data in a single file that can be read from than have to track the locations of one or many more files.
I know this has something to do with the LibVLC IMEM Access module. However, looking at what information I've been able to find on that, I feel like I've been tossed from a plane and have just a few minutes to learn how to fly before I hit the ground.
See my answer to a similar question here:
https://stackoverflow.com/a/31316867/2202445
In summary, the API:
libvlc_media_t* libvlc_media_new_callbacks (libvlc_instance_t * instance,
libvlc_media_open_cb open_cb,
libvlc_media_read_cb read_cb,
libvlc_media_seek_cb seek_cb,
libvlc_media_close_cb close_cb,
void * opaque)
allows just this. The four callbacks must be implemented, although the documentation states the seek callback is not always necessary, see the libVlc documentation. I give an example of a partial implementation in the above answer.
There is no LibVLC API for imem, at least not presently.
You can however still use imem in your LibVLC application, but it's not straightforward...
If you do vlc -H | grep imem you will see something like this (this is just some of the options, there are others too):
--imem-get <string> Get function
--imem-release <string> Release function
--imem-cookie <string> Callback cookie string
--imem-data <string> Callback data
You can pass values for these switches either when you create your libvlc instance via libvlc_new(), or when you prepare media via libvlc_media_add_option().
Getting the needed values for these switches is a bit trickier, since you need to pass the actual in-memory address (pointer) to the callback functions you declare in your own application. You end up passing something like "--imem-get 812911313", for example.
There are downsides to doing it this way, e.g. you may not be able to seek backwards/forwards in the stream.
I've done this successfully in Java, but not C# (never tried).
An alternative to consider if you want to play the media data stored in a file, is to store your media in a zip or rar since vlc has plugins to play media from directly inside such archives.
The bulk of the examples I can find for libxml2 are all about loading/parsing XML files. But I'm only interested in writing them; the code will never have to parse any files. There is an example using different writers, where it shows how to use the file, memory, DOM and tree models.
Looking through the code, I don't see any significant differences between them when it comes to writing. How does one decide which is better to use? (In other words, in what cases is one better than the others?)
The differences between the 4 functions you specify are minimal, it's all about where the contents go. As Alex mentioned, if memory is a concern, using xmlNewTextWriterFilename has the advantage of not needing to hold the result in memory.
The xmlWriter API, to which all the methods you mentioned belong, is one of the APIs offered. The other of note is the tree API. xmlWriter is more like calling write() to print to a file, and the tree is more like building nested structs in memory.
The tree-based versions can be good if your data is constructed in a non-linear fasion, going back and adding/changing things based on later information, etc. This would require some workarounds/caching with the streaming xmlWriter interface, as you can't change things once they've been output. The in-memory tree, however, can be fully tweaked until the instant it's serialized.
The tree API has the downside of the fact it has to keep the entire thing im memory; the rule of thumb is the memory requirements for a parsed tree is rougly 4x the size of serialized xml file.
My decision is usually dependent on whether I expect to create large documents. If not, I use the if the tree api, as the flexibility will be there if I want it. If I know efficiency will be a concern or I'll be working with large stuff, the streaming xmlWriter is the way to go.
tree API examples can be found here: http://xmlsoft.org/examples/index.html#Tree
If you're on a device with limited memory, you probably don't want to use DOM or memory-based approaches. In that case, you probably want to write out the file as you iterate through the data structure you want to write to XML.
This is in C Language
I want to know how i can write a program to lookup all the input fields of a website. Any website. and then can fill them in. I can write the simple webbrowser in vbs but how can i analyse the input fields. even better would be is i could click the lookup field and it puts the name of it in a box..... that would be ideal.
Anyone can help? thanks :)
Are you sure you want to do this in C?
I ask because it is not easy. First of all, you need to be able to run the HTTP GET request against the webpage you wish to view. For this, you probably need libcurl; you definitely don't want to be writing from scratch at any rate.
Next, you need to process the html you get, finding all input fields. You do NOT want to do this using regular expressions, if anything for the sake of bobince's blood pressure. HTML is not a regular language is the bit you need to take away - you need an xml parser. Enter libxml. I'm sure there are other xml libraries out there, and even libraries for parsing html.
Finally, having done that (got the fields etc) you need to be able to populate them and submit the correct request as per the ACTION and METHOD parameters of the FORM.
This is of course assuming you know what the fields should be formatted with. And it also assumes nothing else is going on. If you have a javascript validated web form (I sincerely hope they're validating on the request too, but they might provide feedback via JS) you won't benefit from that (unless you're going to integrate JS, in which case you might as well write a browser).
This is not a trivial task and it is the reason there are accessibility standards for HTML, because otherwise it becomes tricky to interpret the form without human interaction.
Of course, this all assumes said html is well formed, which isn't always the case...
I might suggest another approach. BeautifulSoup is a well known Python web scraping library that works very well. Python as a language allows easier string manipulation too, which will dramatically cut down your development time. I'd suggest giving the need to use C some serious thought given the size and complexity of the task you want to undertake vs your need to get a result quickly. If you have a lot of time, by all means go for C.
I want to be able to read the content of pdf files. I need to do that with C on Linux.
The closer i can get to this was here but I think Haru can only create pdf and is not able to read them (not 100% sure).
PS: I only need the plain text from pdf
Check out libpoppler. I've never used it work extracting text, just querying PDF attributes. It's pretty easy to use.
How well do you need to parse them?
Just extracting strings should be relatively easy, fully accurate rendering is harder.
Take a look at the source for evince or ghostscript?
This is for C++ but might be a good starting point for understanding PDF structure http://www.codeproject.com/KB/cpp/ExtractPDFText.aspx (sorry wrong link before)
Another possible, though I've never used it is VersyPDF. It claims to allow you to edit PDFs ... http://versypdf.sybrex-systems-ltd.qarchive.org/