In JSON-LD, do node types have any effect on transformations? - json-ld

I've been playing around with json-ld for a bit now and I'm having a hard time understanding the point of node types (#type inside a regular node). It seems like their only purpose is is just give your node a type. This would allow a generic consumer to do something particular with your node depending on its type. They don't seem to affect anything else tho.
Do node types have any effect on transformation operations like expanding and compacting?

I believe the answer is no, #type has no effect on JSON-LD transformations. Transformations are just about different tree representation of the same data graph. The purpose of declaring a #type for a node is primarily semantic -- e.g. it gives you a way to indicate that "the thing which has properties "name" and "email" and "affiliation" is a "Person".
As you say, this also means a consumer can perform a particular task depending on that type. Doing so might still make use of transformations: For instance, I might write some script that chooses to apply a JSON-LD framing transformation to extract "name" and "email" whenever I come across nodes of #type Person. Without #type information, there would be no obvious way to write this to avoid picking up names or emails of other Types, (e.g. of Organizations, etc). Note that this example exploits the fact that because we have a shared notion of what a "#type:" "Person" object is, we can anticipate that it might have properties such as name and email.

Related

SCD-2 in data modelling: how do I detect changes?

I know the concept of SCD-2 and I'm trying to improve my skills about it doing some practices.
I have the next scenario/experiment:
I'm calling daily to a rest API to extract information about companies.
In my initial load to the DB everything is new, so everything is very easy.
Next day I call to the same rest API, which might returns the same companies, but some of them might have (or not) some changes (i.e., they changed the size, the profits, the location, ...)
I know SCD-2 might be really simple if the rest API returns just records with changes, but in this case it might returns as well records without changes.
In this scenario, how people detect if the data of a company has changes or not in order to apply SCD-2?, do they compare all the fields?.
Is there any example out there that I can see?
There is no standard SCD-2 nor even a unique concept of it. It is a general term for large number of possible approaches. The only chance is to practice and see what is suitable for your use case.
In any case you must identify the natural key of the dimension and the set of the attributes you want to keep the history.
You may of course make it more complex by the decision to use your own surrogate key.
You mentioned that there are two main types of the interface for the process:
• You get periodically a full set of the dimension data
• You get the “changes only” (aka delta interface)
Paradoxically the former is much simple to handle than the latter.
First of all, in the full dimensional snapshot the natural key holds, contrary to the delta interface (where you may get more changes for one entity).
Additionally you have to handle the case of late change delivery or even the wrong order of changes delivery.
Next important decision is if you expect deletes to occur. This is again trivial in the full interface, you must define some convention, how this information would be passed in the delta interface.
Connected is the question whether a previously deleted entity can be reused (i.e. reappear in the data).
If you support delete/reuse you'll have to thing about how to show them in your dimension table.
In any case you will need some additional columns in the dimension to cover the historical information.
Some implementation use a change_timestamp, some other use validity interval valid_from and valid_to.
Even other implementation claim that additional sequence number is required – so you avoid the trap of more changes with the identical timestamp.
So you see that before you look for some particular implementation you need carefully decide the options above. For example the full and delta interface leads to a completely different implementations.

Referencing previously defined items in JSON-LD

I'm trying to wrap my head around defining JSON-LD correctly for my website. The bit I'm not sure about is how to reference previously defined JSON-LD items without having to copy and paste them.
I know that each item can be given an #id property, but how should I correctly utilize it (if I even can)?
For example, suppose I create an Organization item with an #id of https://example.com/#Organization.
When I need to reference that item again, is it correct to simply specify that #id again, nothing more?
Also am I correct in assuming that I can do this even if the item isn't defined on the page that I'm referencing it?
In the case of the Organization item type, my understanding is that you should only declare it on the home page, rather than every page, so if the user is currently on the product page, and I want to reference the organization, it isn't already defined on the page I'm on, but has been declared elsewhere.
You're correct that using the same #id in different places allows you to make statements about the same thing. In fact, the JSON-LD Flattening algorithm, which is used as part of Framing, consolidates these all together in a single node object.
JSON-LD is a format for Linked Data, and it is reasonable to say that statements made about the same resource on different locations (pages) can be merged together, and if you form a Knowledge Graph from information across multiple locations, this is effectively what you're doing. A Knowledge Graph will typically reduce the JSON-LD (or other equivalent syntactic representation) to RDF Triples/Quads, where each "page" effectively defines a graph, which can be combined to create a larger Dataset. You can then query the dataset in different ways to retrieve that information, which can result in the separate statements being consolidated.
Most applications, however, will likely look for a complete definition of a resource in a single location. But for something like Organization, you could imaging that different Employee resources might be made, where there is a relation such as :Employee :worksFor :Organization, so that the page for an Organization would not expect to also list every employee in that organization, but a more comprehensive Knowledge Graph made from the merge of all of those separate resources could be used to reconstruct it.

Relational database design for hierarchical data?

I am a trying to design a database to act as a language dictionary where each word is associated not only to its definition by also to its grammatical "taxon". E.g., it should look something like this:
"eat": verb.imperative
"eat": verb.present
"ate": verb.past
"he": pronoun.masculine.singular
"she": pronoun.feminine.singular
"heiress": noun.feminine.singular
"heirs": noun.masculine.plural
"therefore": adverb
"but": conjunction
It seems that a natural data structure to hold such a grammatical "taxonomy" should be some kind of tree or graph. Although I haven't thought it through, I presume that should make it easier to perform queries of the type
plural OF masculine OF "heiress" -> "heirs"
At this point, however, I am just trying to come up with the least ineffective way to store such a dictionary in a regular relational database (namely a LibreOffice Base). What do you suggest the data schema should be like? Is there something more efficient than the brute force method where I'd have as many boolean columns as there are grammatical types and sub-types? E.g., "she" would be true for the columns pronoun, feminine, and singular, but false for all other column (verbs, adverb, conjunction, etc.)?
This is a really wide-open question, and there are many applications and much related research. Let me give some pointers based on software I have used.
One column would be the lexeme, for example "eat." A second column would give the part of speech, which in your data above would be a string or other identifier that shows whether it is a verb, pronoun, noun, adverb or conjunction.
It might make sense to create another table for verb information. For example, tense, aspect and mood might each be separate columns. But these columns would only make sense for verbs. For the nouns table, the columns would include number (singular, plural) and gender, and perhaps whether it is a count or mass noun. Pronouns would also include person (first, second or third person).
Do you plan to include every form of every word? For example, will this database store "eats" and "eating" as well as "jumps" and "jumping?" It is much more efficient to store rules like "-s" for present singular and "-ing" for progressive. Then if there are exceptions, for example "ate," it can be described as having the underlying form of "eat" + "-ed." This rule would go under the "eat" lexeme, and there would be no separate "ate" entry.
Also there are rules such as that plural changes words that end in y to -ies. This would go under the plural noun suffix ("-s"), not individual verbs.
With these things in mind, I offer a more specific answer to your question: No, I do not think this data is best described hierarchically, nor with a tree or graph, but rather analytically and relationally. LibreOffice Base would be a reasonable choice for a fairly simple project of this type, using macros to help with the processing.
So for:
"heiress" -> masculine plural = "heirs"
The first thing to do would be to analyze "heiress" as "heir" + feminine. Then compose the desired wordform by combining "heir" and "-s."
I was going to add a list of related software such as Python NLTK, but for one thing, the list of available software is nearly endless, and for another, software recommendations are off-topic for stackoverflow.

neo4j - how do I model node schema less?

I read some where that noe4j or other nosql database is schemaless. so what is the schemaless? I would like to know more about it with use case.
You don't need to define a schema like you would have to do e.g. in mysq with a table. Instead, you can add properties and their value to each individual node (entry), as you like.
E.g: if you look at the address book in an android phone a person entry can have a multitude of properties - phone numbers, addresses, names. Some people have a lot of attributes, some have none.
Doing something like that with a schema (e.g. table structure) is really hard, and requires advance planning of what your fields are, and how you want to query them in the future.
Without a schema you can more or less play it by ear, and add things as needed.
What needs deciding though is what to add as property to a node, and what as a related node. E.g. is an address a node, or just a property of a person? (Most likely a seperate node, but it depends on your use case)

What pattern applies to encapsulating "contextual" queries?

At the moment, my project at work has a very inefficient loop which is suffering the n + 1 problem to a great degree. (6n + 1, I think.) Currently, a number of web services instantiate an object whose constructor builds a canonical representation of one of our ORM objects -- call them Foo and FooView(). There are a number of places where a collection of Foo is built; each instance of Foo is passed to FooView and has its (pseudo-)foreign key fields queried in another database to build a textual representation, so that, for example, we can return <fooColor>Blue</fooColor> rather than <fooColor>5</fooColor>. The sets of these properties--Colors, Shapes, and other similarly general properties--are relatively small, and obviously should be pulled into memory.
There is also another, more complex query, which is contributing to the 6n + 1 problem. This is a set of metadata fields. Each Foo has a Source. Each Source can have one, none, or many metadata fields defined for their subset of Foos. Empty XML tags are required for metadata fields which apply to a given Foo's Source. Currently, the four(!) ORM queries(!) used to build this XML are located inside the FooView constructor, meaning they get executed for each and every Foo.
My goal is as follows:
Query for general properties, like Color, Shapes, etc. before anything else.
Run the query to generate the collection of Foo. Store the primary keys in a list.
Using the list of primary keys, run the heinous multi-join, raw SQL query to generate Foo.Metadata.
Call FooView, providing the collection of Foo along with a context object containing the items built in steps 1 and 3. FooView will provide the interleaving logic, using the context object rather than database lookups.
Is this a sound practice? It will certainly solve some of the performance problems in generating the FooView, but where should this thing live? Should I call it FooHelper? FooContext? FooService? Is this a design pattern, or is there one I should be using to make this more logical?
Thanks!

Resources