How to read an XML with libXML2 using no string functions?

How to read an XML with libXML2 using no string functions? - c

Essentially, I have multiple XSD files for a file format, because it's the main method of configuring the program, so I need versions of it with English strings, versions with German strings, etc. And if the file says in its <schema> statement which XSD it's using, it seems like I should be able to go through it without ever making a string comparison.
e.g. I want to avoid doing this:
xmlNodePtr cur = xmlDocGetRootElement(doc);
if(!xmlStrncmp(cur->name, (const xmlChar *) "settings"))
{
// do things
}
Because the string "settings" will change depending on the xsd file used, it could be "paramètres" or "einstellungen," etc. Typically this is done with a separate stings file, but it seems like the xsd has all the information needed to function as that strings file.
However, it's unclear if for example, if the nth attribute in an element is defined by the XSD as optional with a default value, will libxml2 tell me it's the nth attribute, and give it the default value, when I iterate over the properties in the node?
Similarly, it seems like there should be a way to find out that an element is the nth element in the <xs:choice> or the nth item in the <xs:enumeration> but I can't for the life of me figure out how to do it.
e.g. in this enumeration:
<xs:restrict base="xs:string">
<xs:enumeration value="glucose"/>
<xs:enumeration value="fructose"/>
<xs:enumeration value="sucrose"/>
<xs:restrict>
"glucose" would be 0, "fructose" 1, "sucrose" 2, etc in the order that they appear.
Is there a decent way to do this?

Your troubles stem from the design decision that you've made to maintain parallel XSDs with semantically identical components yet lexically different names.
Don't do that.
It's a terrible design decision that'll undermine the benefits of a standard XML vocabulary. Choose a naming convention, including a single natural language, and use it consistently.
Save I18N for the translation of content (see Best Practices for XML Internationalization), not markup.

Related

Nest xmlDoc into existing xmlTextWriter

I think I'm missing something trivial but I'm losing a bunch of time on this, so its solution may be useful to others too:
I'm working with libxml2 2.9.8 (pure C, not C++ bindings) under linux.
I have an external (non-libxml) tree structure representing an XML file and I'm trying to write into a string representation using libxml2. All is trivial and working nice traversing it and writing using xmlTextWriter API (it is a struct with simple attributes, like
typedef struct _simplifiedNode {
char *tag,
char *content,
struct _simplifiedNode *parent,
struct _simplifiedNodeList *children,
} simplifiedNode;
), except at a certain point I encounter a string node that may contain the string representation of an xml document. I can parse it using the xmlReadMemory API, but then I need to nest it (and not its escaped string representation) into the on-going writer, including namespaces and attributes.
Is there a trivial way I am missing to do this recursively having the parsed doc/root element, without introspecting every sub-element?
e.g.
I'm producing the following document using xmlTextWriter API
<Title>
TitleValue
</Title>
<Date>
2018-11-26
</Date>
<Content>
The Content node in the non-libxml tree is a leaf node with tag Content containing a string like
char *content = "<SomeXmlComplexDocument ss:someattr=\"attrval\">Somecontent</SomeXmlComplexDocument>"
What I Want to achieve is, instead of having something like
<Content><SomeXmlComplexDocument> ... </Content>
after having parsed and validated the content with xmlReadMemory to re-inject the document obtaining
<Content>
<SomeXmlComplexDocument ss:someattr="attrval">Somecontent</SomeXmlComplexDocument>
</Content>
namespaces and attributes should be preserved.

To serialize the inner XML fragments unescaped, you can simply use xmlTextWriterWriteRaw. This won't check whether the XML is well-formed, though. If you need validation, you'll have to parse the XML fragments at some point. Depending on the content model, you might have to use xmlParseBalancedChunkMemory instead of xmlReadMemory. It should also be possible to parse the result document in one go after it was written, but you'll lose information like original line numbers.

I have MDLAsset created from an SCNScene. How do I extract MDLMeshs, MDLCamera(s), and MDLLights?

I am struggling trying to traverse an MDLAsset instance created by loading an SCNScene file (.scn).
I want to identify and extract the MDLMeshs as well as camera(s) and lights. I see no direct way to do that.
For example I see this instance method on MDLAsset:
func childObjects(of objectClass: Swift.AnyClass) -> [MDLObject]
Is this what I use?
I have carefully labeled things in the SceneKit modeler. Can I not refer to those which would be ideal. Surely, there is a dictionary of ids/labels that I can get access to. What am I missing here?
UPDATE 0
I had to resort to pouring over the scene graph in the Xcode debugger due to the complete lack of Apple documentation. Sigh ...
A few things. I see the MDLMesh and MDLSubmesh that is what I am after. What is the traversal approach to get it? Similarly for lights, and camera.
I also need to know the layout of the vertex descriptors so I can sync with my shaders. Can I force a specifc vertex layout on the parsed SCNScene?

MDLObject has a name (because of its conformance to the MDLNamed protocol), and also a path, which is the slash-separated concatenation of the names of its ancestors, but unfortunately, these don't contain the names of their SceneKit counterparts.
If you know you need to iterate through the entire hierarchy of an asset, you may be better off explicitly recursing through it yourself (by first iterating over the top-level objects of the asset, then recursively enumerating their children), since using childObjects(of:) repeatedly will wind up internally iterating over the entire hierarchy to collect all the objects of the specified type.
Beware that even though MDLAsset and MDLObjectContainerComponent conform to NSFastEnumeration, enumerating over them in Swift can be a little painful, and you might want to manually extend them to conform to Sequence to make your work a little easier.

To get all cameras,
[asset childObjectsOfClass:[MDLCamera class]]
Similarly, to get all MDLObjects,
[asset childObjectsOfClass:[MDLObjects class]]
Etc.
MDLSubmeshes aren't MDLObjects, so you traverse those on the MDLMesh.
There presently isn't a way to impose a vertex descriptor on MDL objects created from SCN objects, but that would be useful.
One thing you can do is to impose a new vertex descriptor on an existing MDL object by setting a mesh's vertexDescriptor property. See the MDLMesh.h header for some discussion.

What is the comprehension expression in AngularJS?

I have a few questions buzzing in my head about the comprehension expression:
What is the data structure which it defines?
Was it adapted from some other language?
Where is it used in AngularJS? Does this API exist for select elements only?
From the docs:
ngOptions - comprehension_expression - in one of the following forms:
for array data sources:
label for value in array
select as label for value in array
label group by group for value in array
select as label group by group for value in array track by trackexpr
for object data sources:
label for (key , value) in object
select as label for (key , value) in object
label group by group for (key, value) in object
select as label group by group for (key, value) in object

Comprehension expression is just a string formatted in a special way to be recognized by select directive.
There's no magic behind it, just several formats of it because there are quite a few ways to process and represent your collection (data structure of your model, item/item property selection as scope's model, some other options regarding labels, grouping etc.). When you consider all these options it is not that strange for allowing complex expressions.
Let's say you have such code:
<select
ng-model="color"
ng-options="c.name group by c.shade for c in colors"></select>
In order to ditch the comprehension expression and use attributes, you would write something like this:
<select
ng-model="color"
ng-data-type="object"
ng-data="colors"
ng-select="c"
ng-label="c.name"
ng-group-by="c.shade"></select>
The attribute approach might get ugly once you expand your API. Besides, with comprehension expression it's much easier to use filters.

While in one way it's true to say that a "comprehension_expression" is "just a string" as package says, on the other hand, source code is just a string. Programming languages are just strings.
A SQL SELECT statement
– which could very well be part of the inspiration for the syntax and features of the "comprehension_expression" (but it's not obvious that it is, because it's not mentioned in the docs– perhaps if I dug into some developer conversations I might be able to find out) –
is just a string.
Sure they're just strings, but they have structure, which relates to the problem they are trying to solve. And the question is, is the structure adequately described? Is its pattern, how it relates to the problem at hand, made clear? Is its relationship to other structures that other people have designed apparent?
While the "comprehension_expression" is just a string, on the other hand, its complexity almost comes to it being a sort of sub-language in its own right.
But the way it is portrayed in the docs (https://docs.angularjs.org/api/ng/directive/ngOptions) does reflect the attitude that it is "just a string with some formatting". It is tucked away in the documentation for ng-options as the type of the ng-options directive. To some extent, it is not an entity in its own right, it is a second-class citizen.
The way the different formats are listed can give one a strange feeling, like it's sort of ad-hoc, without any pattern relating the different possible formats (although there is a pattern if you look closely). Without a formal grammar with a regular structure, it makes you wonder if they really covered all the possible options. Compared to, say, the MySQL documentation for the SQL SELECT statement: https://dev.mysql.com/doc/refman/5.7/en/select.html
Obviously such formal syntax can be quite intimidating and is maybe not necessary for the case of the "comprehension_expression", on the other hand it can be reassuring to know it is precisely defined.
I suspect the asker of the question was somewhat unsettled by how casually the "comprehension_expression" was mentioned in the docs; it can seem like a sort of floating, ghost-like entity, just mentioned briefly but not given its own page etc.
It might be worth it having its own page, being treated as an entity in its own right, because then that invites discussion as to the design of this "sub-language". How did it come about? What are the reasons for the different features of the "sub-language"? Which features, and thus syntaxes, conflict with each other? Why can this feature be used together with that feature but not another feature? Are there inspirations from e.g. SQL, in the design of this "sub-language"?
Otherwise it seems to be an invention out of the blue, unrelated to other DSLs of its kind.
In a blog post on ng-options,
https://www.undefinednull.com/2014/08/11/a-brief-walk-through-of-the-ng-options-in-angularjs/
Mr. Shidhin links to a little discussion
https://groups.google.com/forum/#!topic/angular/4EDe8xIbjLU
Where just this issue is discussed. "Matt Hughes" also expresses the opinion that "Seems like a lot of additional complexity for one directive."
Perhaps this is not that big a deal. I just wanted to put it out there though.

Read & Process in memory XML data in a streaming manner in C

Original question below, update regarding solution, if someone has a similar problem:
For a fast regex I found http://re2c.org/ ; for xml parsing http://expat.sourceforge.net/
Is there an xml library I can use to parse xml from memory (and not from file) in a streaming manner in c?
Currently I have:
libxml2 ; XMLReader seems to only be possible to use with a filehandle and not in-memory
rapidxml is c++ and does not seem to expose a c interface
Requirements:
I need to process the individual xml nodes without having the whole xml (400GB uncompressed, and "only" 29GB as original .bz2 file) in memory ( bzip'd file gets read in and decompressed piecewise, and I would pass those uncompressed pieces to be consumed by the xml parser )
It does not need to very fast, but I would prefer an efficient solution
I (most probably) don't need the path of an extracted node, so it would be fine to just discard them as soon as they have been processed by my callback (if I would need the path contrary to what I think right now, I could then still track it myself)
This is part of me trying to solve my own problem posted here (and no, it's not the same question): How to efficiently parse large bz2 xml file in C
Ideally I'd like to be able to feed the library a certain amount of bytes at a time and have a function called whenever a node is completed.
Thank you very much
Here's some pseudo c code (way shorter than actual c code) for a better understanding
// extracted data gets put here
strm.next_out = buffer_ptr;
while( bytes_processed_total < filesize ) {
// extracts up to amount of data set in strm.avail_in
BZ2_bzDecompress( strm );
bytes_processed = strm.next_out - buffer_ptr;
bytes_processed_total += bytes_processed;
// here I would like to pass bytes_processed of buffer_ptr to xmlreader
}
About the data I want to parse: http://wiki.openstreetmap.org/wiki/OSM_XML
At the moment I only need certain <node ...> nodes from this, which have subnode <tag k="place" v="country|county|city|town|village"> (the '|' means at least one of those in this context, in the file it's of course only "country" etc without the '|')

xmlReaderForMemory from libxml2 seems a good one to me (but haven't used it so, I may be wrong)
the char * buffer needs to point to a valid XML document (that can be a part of your entire XML file). This can be extracted reading in chuncks your file but obtaining a valid XML fragment.
What's the structure of your XML file ? A root containing subsequent similar nodes or a fully fledged tree ?
If I had an XML like this:
<root>
<node>...</node>
<node>...</node>
<node>...</node>
</root>
I'd read starting from the opening <node> till the closing </node> and then parse it with the xmlReaderForMemory function, do what I need to do, then go on with the next <node> node.
Ofc if your <node> content is too complex/long, you may have to go deep some levels:
<node>
<subnode>....</subnode>
<subnode>....</subnode>
<subnode>....</subnode>
<subnode>....</subnode>
</node>
And read from the file until you have the entire <subnode> node (but keeping track that you're in a <node>.
I know it's ugly, but is a viable way. Or you can try to use a sax parser (dunno if some C implementation exists).
Sax parsing fires events on each node start and node end, so you can do nothing untill you find your nodes and process just them.
Another viable way can be using some external tools to filter the whole XML (XQuery or XPath processors) in order to extract just your interesting nodes from the whole file, obtain a smaller doc and then work on it.
EDIT: Zorba was a good XQuery framework, with command line preprocessor, may be a good place to look at
EDIT2: well since you have this dimensions, one alternative solution can be manage the file as a text file, so read and uncompress in chunks and then matching something like:
<yourNode>.*</yourNode>
with regexp.
If you're on a Linux/Unix you should have POSIX regexp library. Check
this question on S.O. for further insights.

What's the difference between an object and a struct in OOP?

What distinguishes and object from a struct?
When and why do we use an object as opposed to a struct?
How does an array differ from both, and when and why would we use an array as opposed to an object or a struct?
I would like to get an idea of what each is intended for.

Obviously you can blur the distinctions according to your programming style, but generally a struct is a structured piece of data. An object is a sovereign entity that can perform some sort of task. In most systems, objects have some state and as a result have some structured data behind them. However, one of the primary functions of a well-designed class is data hiding — exactly how a class achieves whatever it does is opaque and irrelevant.
Since classes can be used to represent classic data structures such as arrays, hash maps, trees, etc, you often see them as the individual things within a block of structured data.
An array is a block of unstructured data. In many programming languages, every separate thing in an array must be of the same basic type (such as every one being an integer number, every one being a string, or similar) but that isn't true in many other languages.
As guidelines:
use an array as a place to put a large group of things with no other inherent structure or hierarchy, such as "all receipts from January" or "everything I bought in Denmark"
use structured data to compound several discrete bits of data into a single block, such as you might want to combine an x position and a y position to describe a point
use an object where there's a particular actor or thing that thinks or acts for itself
The implicit purpose of an object is therefore directly to associate tasks with the data on which they can operate and to bundle that all together so that no other part of the system can interfere. Obeying proper object-oriented design principles may require discipline at first but will ultimately massively improve your code structure and hence your ability to tackle larger projects and to work with others.

Generally speaking, objects bring the full object oriented functionality (methods, data, virtual functions, inheritance, etc, etc) whereas structs are just organized memory. Structs may or may not have support for methods / functions, but they generally won't support inheritance and other full OOP features.
Note that I said generally speaking ... individual languages are free to overload terminology however they want to.
Arrays have nothing to do with OO. Indeed, pretty much every language around support arrays. Arrays are just blocks of memory, generally containing a series of similar items, usually indexable somehow.

What distinguishes and object from a struct?
There is no notion of "struct" in OOP. The definition of structures depends on the language used. For example in C++ classes and structs are the same, but class members are private by defaults while struct members are public to maintain compatibility with C structs. In C# on the other hand, struct is used to create value types while class is for reference types. C has structs and is not object oriented.
When and why do we use an object as opposed to a struct?
Again this depends on the language used. Normally structures are used to represent PODs (Plain Old Data), meaning that they don't specify behavior that acts on the data and are mainly used to represent records and not objects. This is just a convention and is not enforced in C++.
How does an array differ from both,
and when and why would we use an
array as opposed to an object or a
struct?
An array is very different. An array is normally a homogeneous collection of elements indexed by an integer. A struct is a heterogeneous collection where elements are accessed by name. You'd use an array to represent a collection of objects of the same type (an array of colors for example) while you'd use a struct to represent a record containing data for a certain object (a single color which has red, green, and blue elements)

Short answer: Structs are value types. Classes(Objects) are reference types.

By their nature, an object has methods, a struct doesn't.
(nothing stops you from having an object without methods, jus as nothing stops you from, say, storing an integer in a float-typed variable)

When and why do we use an object as opposed to a struct?
This is a key question. I am using structs and procedural code modules to provide most of the benefits of OOP. Structs provide most of the data storage capability of objects (other than read only properties). Procedural modules provide code completion similar to that provided by objects. I can enter module.function in the IDE instead of object.method. The resulting code looks the same. Most of my functions now return stucts rather than single values. The effect on my code has been dramatic, with code readability going up and the number of lines being greatly reduced. I do not know why procedural programming that makes extensive use of structs is not more common. Why not just use OOP? Some of the languages that I use are only procedural (PureBasic) and the use of structs allows some of the benefits of OOP to be experienced. Others languages allow a choice of procedural or OOP (VBA and Python). I currently find it easier to use procedural programming and in my discipline (ecology) I find it very hard to define objects. When I can't figure out how to group data and functions together into objects in a philosophically coherent collection then I don't have a basis for creating classes/objects. With structs and functions, there is no need for defining a hierarchy of classes. I am free to shuffle functions between modules which helps me to improve the organisation of my code as I go. Perhaps this is a precursor to going OO.
Code written with structs has higher performance than OOP based code. OOP code has encapsulation, inheritance and polymorphism, however I think that struct/function based procedural code often shares these characteristics. A function returns a value only to its caller and only within scope, thereby achieving encapsulation. Likewise a function can be polymorphic. For example, I can write a function that calculates the time difference between two places with two internal algorithms, one that considers the international date line and one that does not. Inheritance usually refers to methods inheriting from a base class. There is inheritance of sorts with functions that call other functions and use structs for data transfer. A simple example is passing up an error message through a stack of nested functions. As the error message is passed up, it can be added to by the calling functions. The result is a stack trace with a very descriptive error message. In this case a message inherited through several levels. I don't know how to describe this bottom up inheritance, (event driven programming?) but it is a feature of using functions that return structs that is absent from procedural programming using simple return values. At this point in time I have not encountered any situations where OOP would be more productive than functions and structs. The surprising thing for me is that very little of the code available on the internet is written this way. It makes me wonder if there is any reason for this?

Arrays are ordered collection of items that (usually) are of the same types. Items can be accessed by index. Classic arrays allow integer indices only, however modern languages often provide so called associative arrays (dictionaries, hashes etc.) that allow use e.g. strings as indices.
Structure is a collection of named values (fields) which may be of 'different types' (e.g. field a stores integer values, field b - string values etc.). They (a) group together logically connected values and (b) simplify code change by hiding details (e.g. changing structure layout don't affect signature of function working with this structure). The latter is called 'encapsulation'.
Theroretically, object is an instance of structure that demonstrates some behavior in response to messages being sent (i.e., in most languages, having some methods). Thus, the very usefullness of object is in this behavior, not its fields.
Different objects can demonstrate different behavior in response to the same messages (the same methods being called), which is called 'polymorphism'.
In many (but not all) languages objects belong to some classes and classes can form hierarchies (which is called 'inheritance').
Since object methods can work with its fields directly, fields can be hidden from access by any code except for this methods (e.g. by marking them as private). Thus encapsulation level for objects can be higher than for structs.
Note that different languages add different semantics to this terms.
E.g.:
in CLR languages (C#, VB.NET etc) structs are allocated on stack/in registers and objects are created in heap.
in C++ structs have all fields public by default, and objects (instances of classes) have all fields private.
in some dynamic languages objects are just associative arrays which store values and methods.

I also think it's worth mentioning that the concept of a struct is very similar to an "object" in Javascript, which is defined very differently than objects in other languages. They are both referenced like "foo.bar" and the data is structured similarly.

As I see it an object at the basic level is a number of variables and a number of methods that manipulate those variables, while a struct on the other hand is only a number of variables.
I use an object when you want to include methods, I use a struct when I just want a collection of variables to pass around.
An array and a struct is kind of similar in principle, they're both a number of variables. Howoever it's more readable to write myStruct.myVar than myArray[4]. You could use an enum to specify the array indexes to get myArray[indexOfMyVar] and basically get the same functionality as a struct.
Of course you can use constants or something else instead of variables, I'm just trying to show the basic principles.

This answer may need the attention of a more experienced programmer but one of the differences between structs and objects is that structs have no capability for reflection whereas objects may. Reflection is the ability of an object to report the properties and methods that it has. This is how 'object explorer' can find and list new methods and properties created in user defined classes. In other words, reflection can be used to work out the interface of an object. With a structure, there is no way that I know of to iterate through the elements of the structure to find out what they are called, what type they are and what their values are.
If one is using structs as a replacement for objects, then one can use functions to provide the equivalent of methods. At least in my code, structs are often used for returning data from user defined functions in modules which contain the business logic. Structs and functions are as easy to use as objects but functions lack support for XML comments. This means that I constantly have to look at the comment block at the top of the function to see just what the function does. Often I have to read the function source code to see how edge cases are handled. When functions call other functions, I often have to chase something several levels deep and it becomes hard to figure things out. This leads to another benefit of OOP vs structs and functions. OOP has XML comments which show up as tool tips in the IDE (in most but not all OOP languages) and in OOP there are also defined interfaces and often an object diagram (if you choose to make them). It is becoming clear to me that the defining advantage of OOP is the capability of documenting the what code does what and how it relates to other code - the interface.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to read an XML with libXML2 using no string functions? - c

Related

Nest xmlDoc into existing xmlTextWriter

I have MDLAsset created from an SCNScene. How do I extract MDLMeshs, MDLCamera(s), and MDLLights?

What is the comprehension expression in AngularJS?

Read & Process in memory XML data in a streaming manner in C

What's the difference between an object and a struct in OOP?

Categories

Resources