How do I compact and/or frame a json-ld document so that IRI values are expressed succinctly as well as keys? - json-ld

Given an original JSON-LD document like this example, which defines the sources for some thing1:
[
{
"#id": "https://example.com/thing1",
"https://example.com/sources": [
{
"#id": "https://example.com/vocab/countries/EN"
},
{
"#id": "https://example.com/vocab/countries/FR"
}
]
}
]
(I'm simplifying quite a lot - in my real use-case this is larger and generated from RDF data. ex:vocab/countries is a SKOS ConceptScheme including EN and FR as Concepts)
I want to collapse it into something approximating what I'd use to express that in more normal JSON:
{
"#id": "https://example.com/thing1",
"sources": ["EN", "FR"]
}
I find I can use a context to collapse into name/values and shorten the names:
{
"#context": {
"#version": 1.1,
"ex": "https://example.com/",
"sources": {
"#id": "ex:sources",
"#type": "#id"
}
},
"#id": "ex:thing1",
"sources": [
"ex:vocab/countries/EN",
"ex:vocab/countries/FR"
]
}
An important element is "#type": "#id" which collapses the source definitions from (a list of) objects into key/value pairs, and the enclosing context term maps https://example.com/sources to sources.
But I cannot find a way which seems to do the same on the values, so that they become EN and FR instead of ex:vocab/countries/EN and ex:vocab/countries/FR. My experiments with adding #base and #vocab properties to the context don't appear to work like I expected them to.
I also need to do this in a scoped way, so that other properties besides sources can be defined which reference different vocabularies. For instance I might want to include languages, which could include terms from a vocabulary representing English, French, Gaelic, Breton, etc. In other words, I can't just set a global vocabulary or base for the entire document.
Can anyone tell me if this kind of transform is possible, and if so, how to achieve it?
JSON-LD Playground link here

You could set the expected values of sources to #vocab instead of #type and use a scoped context to set the #vocab to use. For example:
{
"#version": 1.1,
"ex": "https://example.com/",
"sources": {
"#id": "ex:sources",
"#type": "#vocab",
"#context": {
"#vocab": "https://example.com/vocab/countries/"
}
}
}
(playground link).
This says to treat the values of playground as vocabulary-relative IRIs, and sets the base of that vocabulary for those values. You should get the following:
{
"#context": {
"#version": 1.1,
"ex": "https://example.com/",
"sources": {
"#id": "ex:sources",
"#type": "#vocab",
"#context": {
"#vocab": "https://example.com/vocab/countries/"
}
}
},
"#id": "ex:thing1",
"sources": [
"EN",
"FR"
]
}

Related

Aliasing "Type" entry of normal JSON to "#type", when value of "Type" is a JSON object

I want to store a normal JSON to a triple store. The normal JSON has its own format for Ids and Types:
{
"Id": "123456",
"Type": {
"Id": "7890",
"Name": "Person",
...
}
...
}
I am able to flatten the document and give the value of "Id" the type "#id", using a custom context. I am stuck trying to alias "Type" to "#type".
Is there a way to use the "Type" entry of the normal JSON as an "#type" keyword using only a custom context?
I am not sure what you mean by aliasing "Type" to "#type", since your example would not be valid JSON-LD if "Id" was replaced with "#id" and "Type" was replaced with "#type".
But could perhaps a context like this be what you are looking for?
"#context": {
"Id": "#id",
"Type": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
}
Example in JSON-LD Playground.
Or, if the different ids come from different namespaces, perhaps with scoped contexts:
"#context": {
"#version": 1.1,
"#base": "http://example.org/objects/",
"Id": "#id",
"Type": {
"#id": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"#context": {
"#base": "http://example.org/types/"
}
}
}
Also in JSON-LD Playground.

What JSON-LD structured data to use for a multi-pararaph, multi-image blogpost?

I have created the following JSON-LD for a blogpost in my blog:
{
"#context": "http://schema.org",
"#type": "BlogPosting",
"mainEntityOfPage": {
"#type": "WebPage",
"#id": "https://www.example.com"
},
"headline": "My Headline",
"articleBody": "blablabla",
"articleSection": "bla",
"description": "Article description",
"inLanguage": "en",
"image": "https://www.example.com/myimage.jpg",
"dateCreated": "2019-01-01T08:00:00+08:00",
"datePublished": "2019-01-01T08:00:00+08:00",
"dateModified": "2019-01-01T08:00:00+08:00",
"author": {
"#type": "Organization",
"name": "My Organization",
"logo": {
"#type": "ImageObject",
"url": "https://www.example.com/logo.jpg"
}
},
"publisher": {
"#type": "Organization",
"name": "Artina Luxury Villa",
"name": "My Organization",
"logo": {
"#type": "ImageObject",
"url": "https://www.example.com/mylogo.jpg"
}
}
}
Now, I have some blog posts that contain multiple paragraphs and each paragraph is accompanied by an image. Any ideas how can I depict such a structure with JSON-LD?
Background
I have created a simple blog which uses a JSON file for 2 purposes: (a) feed the blog with posts instead using a DB (by using XMLHttpRequest and JSON.parse) and (b) to add JSON-LD structured data to the code for SEO purposes.
When I read the JSON file I have to know which image belongs to which paragraph of the text in order to display it correctly.
Note: As you seem to need this only for internal purposes, and as there is typically no need to publically provide data about this kind of structure, I think it would be best not to provide public Schema.org data about it. So you could, for example, use it to build the page, and then remove it again (or whatever works for your case). Then it would also be possible to use a custom vocabulary (under your own domain) for this, if it better fits your needs.
You could use the hasPart property to add a WebPageElement for each paragraph+image block.
Each WebPageElement can have text and image (and, again, hasPart, if you need to nest them).
Note that JSON-LD arrays are unordered by default. You can use #list to make it ordered.
"hasPart": { "#list":
[
{
"#type": "WebPageElement",
"text": "plain text",
"image": "image-1.png"
},
{
"#type": "WebPageElement",
"text": "plain text",
"image": "image-2.png"
}
]
}
For the blog posting’s header/footer, you could use the more specific WPHeader/WPFooter instead of WebPageElement.

JSON schema deeper object uniqueness

I'm trying to get into JSON schema definitions and wanted to find out, how to achieve a deeper object uniqueness in the schema definition. Please look at the following example definition, in this case a simple IO of a module.
{
"$schema": "http://json-schema.org/draft-06/schema#",
"type": "object",
"required": ["modulIOs"],
"properties": {
"modulIOs": {
"type": "array",
"uniqueItems": true,
"items": {
"allOf": [
{
"type": "object",
"required": ["ioPosition","ioType","ioFunction"],
"additionalProperties": false,
"properties": {
"ioPosition": {
"type": "integer"
},
"ioType": {
"type":"string",
"enum": ["in","out"]
},
"ioFunction": {
"type":"string"
}
}
}
]
}
}
}
}
When I validate the following with i.E. draft-06 I get a positive validation.
{"modulIOs":
[
{
"ioPosition":1,
"ioType":"in",
"ioFunction":"240 V AC in"
},
{
"ioPosition":1,
"ioType":"in",
"ioFunction":"24 V DC in"
}
]
}
I'm aware that the validation is successfull because the validator does what he's intended to - it checks the structure of a JSON-object, but is there a possibility to validate object value data in deeper objects or do i need to perform the check elsewhere?
This is not currently possible with JSON Schema (at draft-7).
There is an issue raised on the official spec repo github for this: https://github.com/json-schema-org/json-schema-spec/issues/538
If you (or anyone reading this) really wants this, please thumbsup the first issue comment.
It's currently unlikely to make it into the next draft, and even if it did, time to impleemntations picking it up may be slow.
You'll need to do this validation after your JSON Schema validation process.
You can validate data value of your object fields by using JSON schema validation.
For example, if you need to check if ioPosition is between 0 and 100 you can use:
"ioPosition": {
"type": "integer",
"minimum": 0,
"maximum": 100
}
If you need to validate ioFunction field you can use regualr expression such as:
"ioFunction": {
"type": "string",
"pattern": "^[0-9]+ V [A,D]C"
}
Take a look at json-schema-validation.

Annotating nested structures/values in JSON-LD

Say I have a JSON object with some properties in a nested object.
{
"title": "My Blog Post",
"meta": {
"publishedAt": "2016-08-01T00:00:00Z"
}
}
Is there an easy way I can just add a #context to my top-level object to reach
these properties (i.e. just "pass through" the meta object)? Something along
these lines:
{
"#context": {
"title": "schema:name",
"meta.publishedAt": {
"#type": "xsd:date",
"#id": "schema:datePublished"
}
},
"#id": "/my-article",
"title": "My Blog Post",
"meta": {
"publishedAt": "2016-08-01T00:00:00Z"
}
}
I would like to avoid having to add (duplicate) #id to the nested object, which is how I would otherwise have solved it:
{
"#context": {
"title": "schema:name",
"meta": { "#id": "_:meta", "#container": "#set" },
"publishedAt": {
"#type": "xsd:date",
"#id": "schema:datePublished"
}
},
"#id": "/my-article",
"title": "My Blog Post",
"meta": {
"#id": "/my-article",
"publishedAt": "2016-08-01T00:00:00Z"
}
}
This solution works, but requires duplication, and comes from ethanresnick's
comments on Github about annotating JSON API. He noted in another issue that #context is not "quite expressive enough to annotate the JSON API structure". I was hoping to prove him wrong at least with regards to this issue.
I just discovered that the latest JSON-LD spec includes a new section on nested properties. Defining your context like this should result in the desired output:
{
"#context": {
"title": "schema:name",
"meta": "#nest",
"publishedAt": {
"#type": "xsd:date",
"#id": "schema:datePublished",
"#nest": "meta"
}
},
...
}
If what you're trying to do is eat the meta element, then no, this can't be done in JSON-LD.
There have been discussions about doing an inverse-index that could do something like this, but I don't see an issue. You might create one at https://github.com/json-ld/json-ld.org/issues. At some point the CG, or a newly formed WG will start looking at feature requests for a new version.

Elasticsearch not returning hits for multi-valued field

I am using Elasticsearch with no modifications whatsoever. This means the mappings, norms, and analyzed/not_analyzed is all default config. I have a very small data set of two items for experimentation purposes. The items have several fields but I query only on one, which is a multi-valued/array of strings field. The doc looks like this:
{
"_index": "index_profile",
"_type": "items",
"_id": "ega",
"_version": 1,
"found": true,
"_source": {
"clicked": [
"ega"
],
"profile_topics": [
"Twitter",
"Entertainment",
"ESPN",
"Comedy",
"University of Rhode Island",
"Humor",
"Basketball",
"Sports",
"Movies",
"SnapChat",
"Celebrities",
"Rite Aid",
"Education",
"Television",
"Country Music",
"Seattle",
"Beer",
"Hip Hop",
"Actors",
"David Cameron",
... // other topics
],
"id": "ega"
}
}
A sample query is:
GET /index_profile/items/_search
{
"size": 10,
"query": {
"bool": {
"should": [{
"terms": {
"profile_topics": [
"Basketball"
]
}
}]
}
}
}
Again there are only two items and the one listed should match the query because the profile_topics field matches with the "Basketball" term. The other item does not match. I only get a result if I ask for clicked = ega in the should.
With Solr I would probably specify that the fields are multi-valued string arrays and are to have no norms and no analyzer so profile_topics are not stemmed or tokenized since all values should be treated as tokens (even the spaces). Not sure this would solve the problem but it is how I treat similar data on Solr.
I assume I have run afoul of some norm/analyzer/TF-IDF issue, if so how do I solve this so that even with two items the query will return ega. If possible I'd like to solve this index or type wide rather than field specific.
Basketball (with capital B) in terms will not be analyzed. This means this is the way it will be searched in the Elasticsearch index.
You say you have the defaults. If so, indexing Basketball under profile_topics field means that the actual term in the index will be basketball (with lowercase b) which is the result of the standard analyzer. So, either you set profile_topics as not_analyzed or you search for basketball and not Basketball.
Read this about terms.
Regarding to setting all the fields to not_analyzed you could do that with a dynamic template. Still with a template you can do what Logstash is doing: defining a .raw subfield for each string field and only this subfield is not_analyzed. The original/parent field still holds the analyzed version of the same text, maybe you will use in the future the analyzed field.
Take a look at this dynamic template. It's the one Logstash is using.
More specifically:
{
"template": "your_indices_name-*",
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"omit_norms": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
]
}
}
}

Resources