I'm trying to work out the best way to handle localisation in JSON-LD. The spec has information on String Internationalization that allows you to specify different translations for string values:
{
"#context":
{
...
"occupation": { "#id": "ex:occupation", "#container": "#language" }
},
"name": "Yagyū Muneyoshi",
"occupation":
{
"ja": "忍者",
"en": "Ninja",
"cs": "Nindža"
}
...
}
This covers translation but not internationalization where the content changes depending on locale.
E.g.
{
"#context":
{
"#id": "http://example.org/carousel#mycarousel",
"#language": "ja"
...
},
"slides": ["http://example.org/japan.jpg"]
}
{
"#context":
{
"#id": "http://example.org/carousel#mycarousel",
"#language": "es"
...
},
"slides": ["http://example.org/spain.jpg"]
}
Does anyone know if the above is invalid in the JSON-LD spec, i.e. having different field values depending on the #language while there #ids are the same? If not is there an alternate approach that could work?
Yes, the above is invalid. #language is only used to annotate strings with their language. What you are looking for is higher-level information. As such, you need to use some vocabulary. Schema.org for instance has http://schema.org/inLanguage for this. There exist various others as well. Which one you want to use, depends on the specific use case.
Related
Google's Structured Data Testing Tool doesn't seem to like JSON-LD's #language in value object approach to string internationalization. For example:
{
"#context": "https://schema.org/",
"#type": "Person",
"name": [{"#language": "ar", "#value": "أياس"},
{"#language": "en", "#value": "Eyas"}]
}
or
{
"#context": "https://schema.org/",
"#type": "Person",
"name": {"#language": "ar", "#value": "أياس"}
}
don't seem to work. I also tried adding "#type": "Text" but that doesn't seem to make it happy either.
Is there an accepted way of specifying multiple language representations of the same thing in Schema.org JSON-LD that is respected by search engines?
I know there's "inLanguage" for certain types, but that is not general enough to, e.g., work with a Person.
I'm trying to standartize a property in a json-ld document. A simple example:
json-ld
{
"#context": {
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dcterms": "http://purl.org/dc/terms/"
},
"#graph": [
{
"#id": "1",
"rdfs:label": "A title"
},
{
"#id": "2",
"dcterms:title": "Another title"
}
]
}
frame (failing attempt)
{
"type": "array",
"items": {
"title": ["rdfs:label", "dcterms:title"]
}
}
This produces an empty graph, instead of this:
desired output
[{
"title": "A title"
},
{
"title": "Another title"
}]
The documentation at https://json-ld.org/primer/latest/#framing seems to be work in progress and there is really not a lot of examples or tutorials covering json-ld framing.
Playground example
Framing is used to shape the data in a JSON-LD document, using an example frame document which is used to both match the flattened data and show an example of how the resulting data should be shaped
https://json-ld.org/spec/latest/json-ld-framing/#framing
This beeing said, re-shaping data does not mean you can change the semantics. rdfs:label and dcterms:title are different things in the source data and will be different things in the result, you can not merge them to a "title" property that expands to only one URI (which one?). If that were the case, the result would have different semantics than the source, but framing is only meant to change the structure.
As a simple exercise I wanted to take some test-data from a little app I had which produced a user record in JSON and turn it into JSON-LD, testing on JSON-LD.org's playground gives some help, but I don't know if I'm doing it right.
The original is:
[
{
"Id": 1
"Username": "Dave",
"Colour":"green“
}
]
So I have a person, who has a username, an ID and an associated colour.
What I've got so far is:
{
"#context": {
"name": "http://schema.org/name",
"Colour": {
"#id": "http://dbpedia.org/ontology/Colour",
"#type": "http://schema.org/Text",
"#language": "en"
}
},
"#type": "http://schema.org/Person",
"#Id": "http://example.com/player/1",
"sameAs" : "https://www.facebook.com/DaveAlger",
"Id": 1,
"name": "David Alger",
"Username": "Dave",
"Colour": "green"
}
So I'm declaring it's a #type of person, and given a URI #id.
I'm also using the "sameAs" idea, which I saw on a blog-post once, but am unclear if it is just supported right off.
Then I've tried to create a #context. Here that I've added a name and given that a reference. I've tried to create something for "colour" too. I'm not sure if pointing to a DBpedia reference about "colour" and specifying a #type and #language is good, or not.
I suppose the final thing is "username", but that feels so deeply internal to a site that it doesn't make sense to "Link" it at all.
I'm aware this data is perhaps not even worth linking, this is very much a learning exercise for me.
I don’t think that http://dbpedia.org/ontology/Colour should be used like that. It’s a class, not a property. The property that has http://dbpedia.org/ontology/Colour as range is http://dbpedia.org/ontology/colour. (That said, I’m not sure if your really intend that the person should have a colour, instead of something related to this person.)
If you want to provide the language of the colour strings, you should not specify the datatype, #language is sufficient (if a value is typed, it can’t have a language anymore; by using #language, it’s implied that the value is a string).
You are using #Id for specifying the node’s URI, but it must be #id.
The properties sameAs, Id and Username are not defined in your #context.
If you intend to use Schema.org’s sameAs property, you could define it similar to what you did with name, but you should specify that the value is a URI:
"sameAs": {
"#id": "http://schema.org/sameAs",
"#type": "#id"
},
For Username, you could use FOAF’s nick property, or maybe Schema.org’s alternateName property.
No idea which property you could use for Id (depends on your case if this is useful for others at all, or if this is only relevant for your internal system).
Is it possible to define which part of the text in which of the indexed text fields matches the query?
No, as far as I know and can tell from the Jira, no such feature exists currently. You can, of course, attempt to highlight the parts of the text yourself, but that requires to implement the highlighting and also implement the stemming according to the rules applied by MongoDB.
The whole feature is somewhat complicated - even consuming it - as can be seen from the respective elasticsearch documentation.
Refer to Mongodb Doc Highlighting
db.fruit.aggregate([
{
$searchBeta: {
"search": {
"path": "description",
"query": ["variety", "bunch"]
},
"highlight": {
"path": "description"
}
}
},
{
$project: {
"description": 1,
"_id": 0,
"highlights": { "$meta": "searchHighlights" }
}
}
])
I'm afraid that solution applies only to MongoDB Atlas at the moment #LF00.
I copy here a part of an example from the json-ld standard:
{
"#context": {
"foaf": "http://xmlns.com/foaf/0.1/",
"picture": { "#id": "foaf:depiction", "#type": "#id" }
},
"picture": "http://twitter.com/account/profile_image/markuslanthaler"
}
I don't get it, why we should use the #id in the #context. It should be:
{
"#context": {
"foaf": "http://xmlns.com/foaf/0.1/",
"picture": {
"#type": [ "#id", "foaf:depiction" ]
}
},
"picture": "http://twitter.com/account/profile_image/markuslanthaler"
}
Do you have any explanation?
A few years later
I guess the upper means the following in a more reusable form:
{
"http://xmlns.com/foaf/0.1/depiction": "http://twitter.com/account/profile_image/markuslanthaler"
}
It is a lot easier to understand if we check the flattened form first and try to compact it gradually. So the #id is the IRI of the property and the #type is the type of the value, which is here #id, which can be confusing, but it just means that we are expecting an IRI as value.
#id tells the JSON-LD processor how to expand the term. It can be omitted if you use #vocab, or the term is in the form of a compact IRI.
#type tells the processor how to handle plain-string values of that term used within the body if a JSON-LD document. It can be something like an XSD data type, #id or #vocab. The last two are very similar, except one is evaluated against the document base and the other as a vocabulary term.
If the value of a term definition in the context is a string, rather than an object, it's a short-hand for specifying an object with just #id. Don't think of the context as an RDFS/OWL vocabulary, but as a kind of prefix mechanism.