libxml2 writer differences - c

The bulk of the examples I can find for libxml2 are all about loading/parsing XML files. But I'm only interested in writing them; the code will never have to parse any files. There is an example using different writers, where it shows how to use the file, memory, DOM and tree models.
Looking through the code, I don't see any significant differences between them when it comes to writing. How does one decide which is better to use? (In other words, in what cases is one better than the others?)

The differences between the 4 functions you specify are minimal, it's all about where the contents go. As Alex mentioned, if memory is a concern, using xmlNewTextWriterFilename has the advantage of not needing to hold the result in memory.
The xmlWriter API, to which all the methods you mentioned belong, is one of the APIs offered. The other of note is the tree API. xmlWriter is more like calling write() to print to a file, and the tree is more like building nested structs in memory.
The tree-based versions can be good if your data is constructed in a non-linear fasion, going back and adding/changing things based on later information, etc. This would require some workarounds/caching with the streaming xmlWriter interface, as you can't change things once they've been output. The in-memory tree, however, can be fully tweaked until the instant it's serialized.
The tree API has the downside of the fact it has to keep the entire thing im memory; the rule of thumb is the memory requirements for a parsed tree is rougly 4x the size of serialized xml file.
My decision is usually dependent on whether I expect to create large documents. If not, I use the if the tree api, as the flexibility will be there if I want it. If I know efficiency will be a concern or I'll be working with large stuff, the streaming xmlWriter is the way to go.
tree API examples can be found here: http://xmlsoft.org/examples/index.html#Tree

If you're on a device with limited memory, you probably don't want to use DOM or memory-based approaches. In that case, you probably want to write out the file as you iterate through the data structure you want to write to XML.

Related

Improving save/load performance with large arrays of structures in MATLAB

I have a very large array of structs (more than 100k structs) that have to be saved to a file. Later, these have to be loaded and processed one at a time. The current approach is to save it using just save. This takes ~8s to save and ~100s to load.
I've tried a couple of ways to speed this up:
Using the -v6 flags with save. This sped things up, but not significantly.
Serializing and deserializing using getByteStreamFromArray() and getArrayFromByteStream() respectively. This had no effect. Specifically, serializing and deserializing took just as long as simply saving and loading.
(still working on this) Serializing the array, saving it, loading it, then only deserializing each structure as it is processed (rather than the whole array)
Does anyone have any recommendations to improve performance in this situation? It seems like it would be a common problem.
I believe that getByteStreamFromArray() and getArrayFromByteStream() are used by save() and load() under the hood, so your results are not very surprising to me. You might get better performance using hand-crafted serialization functions that crawl down your structs and only save what's really needed. Additional saving can possibly be achieved by compressing the saved data. You can read some implementation ideas here: http://undocumentedmatlab.com/blog/improving-save-performance
Note #1 - YMMV based on Matlab release, data, and platform
Note #2 - for readers who are not aware of this, getByteStreamFromArray() and getArrayFromByteStream() are both undocumented Matlab functions. The only [unofficial] explanation of their behavior, AFAIK, is provided here: http://undocumentedmatlab.com/blog/serializing-deserializing-matlab-data

Persisting Objects while Still Preserving Loose Coupling

I working on a project in a microcontroller and I need to persist some settings. Pretend this is an iPod. I need to save various settings like CurrentSongPlaying, CurrentVolume, etc. so that when I turn on again I can restore those settings. The trouble I'm running into is that makes sense to store all my Non-Volatile Settings in a single struct that I can serialize/de-serialize from memory but I can't find a way to make that happen without the class doing the serialization/de-serialization from non-volatile memory including every class that contains a setting that will need to be saved for size/type information. Is there some sort of design pattern that will allow me to persist all these settings to memory without having to know about what I'm saving?
Looks like you just need an associative array. An associative array (or map) is a container that allows you to map different values to unique keys. It can have a fixed or dynamic size depending on the implementation. Coupled with a proper serialization mechanism, it allows you to save and restore its state without having to know its content in advance.
However, C does not provide this data structure out-of-the-box. Look at this question for a few implementations. The most common implementation is the hash table, also called a hash map.
OOP and classes are not easy to implement in C.
If using C is a must, I would write the struct to file.
Then I would read them and parse them during initialization upon reboot.
You can think of this as serializing your structs yourself.

How to implement a lossless URL shortening

First, a bit of context:
I'm trying to implement a URL shortening on my own server (in C, if that matters). The aim is to avoid long URLs while being able to restore a context from a shortened URL.
Currently I have a implementation that creates a session on the server, identified by a certain ID. This works, but consumes memory on the server (and is not desired since it's an embedded server with limited resources and the main purpose of the device isn't providing web pages but doing other cool stuff).
Another option would be to use cookies or HTML5 webstorage to store the session information in the client.
But what I'm searching for is the possibility to store the shortened URL parameters in one parameter that I attach to the URL and be able to re-construct the original parameters from that one.
First thought was to use a Base64-encoding to put all the parameters into one, but this produces an even larger URL.
Currently, I'm thinking of compressing the URL parameters (using some compression algorithm like zip, bz2, ...), do the Base64-encoding on that compressed binary blob and use that information as context. When I get the parameter, I could do a Base64-decoding, de-compress the result and have hands on the original URL.
The question is: is there any other possibility that I'm overlooking that I could use to lossless compress a large list of URL parameters into a single smaller one?
Update:
After the comments from home, I realized that I overlooked that compressing itself adds some overhead to the compressed data making the compressed data even larger than the original data because of the overhead that for example zipping adds to the content.
So (as home states in his comments), I'm starting to think that compressing the whole list of URL parameters is only really useful if the parameters are beyond a certain length because otherwise, I could end up having an even larger URL than before.
You can always roll your own compression. If you simply apply some huffman coding, the result will always be smaller (but then base64 encoding it, it'll grow a bit, so the net effect may perhaps not be optimal).
I'm using a custom compression strategy on an embedded project I work with where I first use a lzjb (a lempel ziv derivate, follow link for source code, really tight implementation (from open solaris)) followed by huffman coding the compressed result.
The lzjb algorithm doesn't perform too well on very short inputs, though (~16 bytes, in which case I leave it uncompressed).

Using Flyweight Pattern in database-driven application

Can anyone please give me any example of situation in a database-driven application where I should use Flyweight pattern?
How can I know that, I should use flyweight pattern at a point in my application?
I have learned flyweight pattern. But not able to understand an appropriate place in my database-driven business applications to use it.
Except for a very specialized database application, the Flyweight might be used by your application, but probably not for any class that represents an entity which is persisted in your database. Flyweight is used when there otherwise might be a need for so many instantiations of a class that if you instantiated one every discrete time you needed it performance would suffer. So instead, you instantiate a much smaller number of them and reuse them for each required instance by just changing data values for each use. This would be useful in a situation where, for example, you might have to instantiate thousands of such classes each second, which is generally not the case for entities persisted in a database.
You should apply any pattern when it naturally suggests itself as a solution to a concrete problem - not go looking for places in your application where you can apply a given pattern.
Flyweight's purpose is to address memory issues, so it only makes sense to apply it after you have profiled an application and determined that you have a ton of identical instances.
Colors and Brushes from the Base Class Library come to mind as examples.
Since a very important part of Flyweight is that the shared implementation is immutable, good candidates in a data-driven application would be what Domain-Driven Design refers to as Value Objects - but it only becomes relevant if you have a lot of identical values.
[Not a DB guy so this is my best guess]
The real bonus to the flyweight pattern is that you can reuse data if you need to; Another example is word processing where ideally you would have an object per "character" in your document, but that wuld eat up way too much memory so the flyweight memory lets you only store one of each unique value that you need.
A second (and perhaps simplest) way to look at it is like object pooling, only you're pooling on a "per-field" level as opposed to a "per-object" level.
In fact, now that i think about it, it's not unlike using a (comparatively small) chunk of memory in c(++) so store some raw data which you do pointer manipulation to get stuff out of.
[See this wikpedia article].

Cheapest Way To Export/Import Array Contents To File - AS3/AIR

I'm working on a basic editor application. It uses an array of varying size that I want to store to disk. This will eventually be in an AIR application, but for now it's just an AS3 project in Flex.
I want to store the array in a file. The application edits the data, so it doesn't need to be human readable. I want it to be in whatever format will be quickest to store and load back into the array when I need that data again.
Any recommendations?
Edit: It strikes me that importing/exporting in such a way that it can be immediately cast as an Array() would probably be the cheapest thing rather than some sort of iterating - if that's possible. Another obvious option is getting the data as a simple comma delineated string and using the String.split() function to get an array. Though again, the question is what would be cheapest - and I'm not quite convinced that's it.
I'll also add that it needs to be in some sort of permanent file, so a shared object - while possibly the fastest, isn't really a long term solution.
I think the fastest and easiest way is to use a shared object. It stores native objects, so there is no serialization / deserialization steps involved. Just assign the value and read it back.
Performance wise, probably the fastest route as well. If you are looking for a large dataset and are sure it's an AIR app, you can use AIR's db, but that will definitely take much more work.
First, take a look at this answer.
As for saving the contents of an Array, consider JSON using the export tools provided by Adobe.

Resources