Aliasing "Type" entry of normal JSON to "#type", when value of "Type" is a JSON object - json-ld

I want to store a normal JSON to a triple store. The normal JSON has its own format for Ids and Types:
{
"Id": "123456",
"Type": {
"Id": "7890",
"Name": "Person",
...
}
...
}
I am able to flatten the document and give the value of "Id" the type "#id", using a custom context. I am stuck trying to alias "Type" to "#type".
Is there a way to use the "Type" entry of the normal JSON as an "#type" keyword using only a custom context?

I am not sure what you mean by aliasing "Type" to "#type", since your example would not be valid JSON-LD if "Id" was replaced with "#id" and "Type" was replaced with "#type".
But could perhaps a context like this be what you are looking for?
"#context": {
"Id": "#id",
"Type": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
}
Example in JSON-LD Playground.
Or, if the different ids come from different namespaces, perhaps with scoped contexts:
"#context": {
"#version": 1.1,
"#base": "http://example.org/objects/",
"Id": "#id",
"Type": {
"#id": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"#context": {
"#base": "http://example.org/types/"
}
}
}
Also in JSON-LD Playground.

Related

JSON Schema attribute definition: Unsupported field schema for field... : Unknown field type undefined

I'm trying to build a form based on a JSON schema and using the react-jsonschema-form component for ReactJS.
I want to use this to display a form that users can fill with various settings (CSS/HTML selectors, etc) that will be used to parse playlist tracks datas on remote webpages.
I have a selector definition in my JSON schema, like this :
"definitions":{
"selector":{
"$id": "#/definitions/selector",
"type": "object",
"title": "CSS or HTML selector used to extract datas.",
"required": [],
"properties":{
"path":{
"type": "string",
"title": "Selector path",
"default": "",
"examples": ["#playlist .track .title"]
},
"multiple":{
"type": "boolean",
"title": "Should we extract multiple values?",
"default": false,
"readOnly": true
}
}
}
}
My JSON schema for parsing tracks look then like this:
"track":{
"$id": "#/properties/selectors/properties/track",
"type": "object",
"title": "Track selectors",
"required": [
"title"
],
"properties":{
"artist":{
"$ref": "#/definitions/selector",
"title": "Track artist"
},
"title":{
"$ref": "#/definitions/selector",
"title": "Track title"
},
"album":{
"$ref": "#/definitions/selector",
"title": "Track album"
},
"image":{
"$ref": "#/definitions/selector",
"title": "Track image"
},
"duration":{
"$ref": "#/definitions/selector",
"title": "Track duration"
},
"location":{
"$ref": "#/definitions/selector",
"title": "Track locations",
"properties":{
"multiple":{
"default": true
}
}
},
"link":{
"$ref": "#/definitions/selector",
"title": "Track links",
"properties":{
"multiple":{
"default": true
}
}
}
}
}
As you see, the selector definition is used for the artist, title, album, image, duration, location and link attributes.
the multiple attribute of the selector definition is used to know if the parser should extract a single or multiple values.
Its readonly attribute is set to true because the user cannot choose this: I know a track has only ONE artist, title, album, image and duration, but might have several location (files) and links attached to.
"location":{
"$ref": "#/definitions/selector",
"title": "Track locations",
"properties":{
"multiple":{
"default": true
}
}
},
"link":{
"$ref": "#/definitions/selector",
"title": "Track links",
"properties":{
"multiple":{
"default": true
}
}
}
This is where I have a problem:
When displaying those two fields, react-jsonschema-form renders those errors:
Track locations
multiple
Unsupported field schema for field root_selectors_track_location_multiple: Unknown field type undefined.
{
"default": true
}
Track links
multiple
Unsupported field schema for field root_selectors_track_link_multiple: Unknown field type undefined.
{
"default": true
}
I don't know if this occurs because my JSON schema is not well defined (can I "override" the properties of a definition like this ?) or if this is a bug of react-jsonschema-form.
Can someone help ?
Thanks for your long read!
From what I can read in the documentation, the react JSON schema forms library supports draft-07 of the JSON schema specification. In this case, any schema keyword beside $ref at this sub-schema level is simply ignored.
From https://json-schema.org/understanding-json-schema/structuring.html#ref
In Draft 4-7, $ref behaves a little differently. When an object
contains a $ref property, the object is considered a reference, not a
schema. Therefore, any other properties you put in that object will
not be treated as JSON Schema keywords and will be ignored by the
validator. $ref can only be used where a schema is expected.
You will need to change your schema to reflect this. An option could be to surround your extension with allOf:
"location": {
"allOf": [
{ "$ref": "#/definitions/selector" },
{
"title": "Track locations",
"properties": {
"multiple":{
"default": true
}
}
}
]
}

How do I compact and/or frame a json-ld document so that IRI values are expressed succinctly as well as keys?

Given an original JSON-LD document like this example, which defines the sources for some thing1:
[
{
"#id": "https://example.com/thing1",
"https://example.com/sources": [
{
"#id": "https://example.com/vocab/countries/EN"
},
{
"#id": "https://example.com/vocab/countries/FR"
}
]
}
]
(I'm simplifying quite a lot - in my real use-case this is larger and generated from RDF data. ex:vocab/countries is a SKOS ConceptScheme including EN and FR as Concepts)
I want to collapse it into something approximating what I'd use to express that in more normal JSON:
{
"#id": "https://example.com/thing1",
"sources": ["EN", "FR"]
}
I find I can use a context to collapse into name/values and shorten the names:
{
"#context": {
"#version": 1.1,
"ex": "https://example.com/",
"sources": {
"#id": "ex:sources",
"#type": "#id"
}
},
"#id": "ex:thing1",
"sources": [
"ex:vocab/countries/EN",
"ex:vocab/countries/FR"
]
}
An important element is "#type": "#id" which collapses the source definitions from (a list of) objects into key/value pairs, and the enclosing context term maps https://example.com/sources to sources.
But I cannot find a way which seems to do the same on the values, so that they become EN and FR instead of ex:vocab/countries/EN and ex:vocab/countries/FR. My experiments with adding #base and #vocab properties to the context don't appear to work like I expected them to.
I also need to do this in a scoped way, so that other properties besides sources can be defined which reference different vocabularies. For instance I might want to include languages, which could include terms from a vocabulary representing English, French, Gaelic, Breton, etc. In other words, I can't just set a global vocabulary or base for the entire document.
Can anyone tell me if this kind of transform is possible, and if so, how to achieve it?
JSON-LD Playground link here
You could set the expected values of sources to #vocab instead of #type and use a scoped context to set the #vocab to use. For example:
{
"#version": 1.1,
"ex": "https://example.com/",
"sources": {
"#id": "ex:sources",
"#type": "#vocab",
"#context": {
"#vocab": "https://example.com/vocab/countries/"
}
}
}
(playground link).
This says to treat the values of playground as vocabulary-relative IRIs, and sets the base of that vocabulary for those values. You should get the following:
{
"#context": {
"#version": 1.1,
"ex": "https://example.com/",
"sources": {
"#id": "ex:sources",
"#type": "#vocab",
"#context": {
"#vocab": "https://example.com/vocab/countries/"
}
}
},
"#id": "ex:thing1",
"sources": [
"EN",
"FR"
]
}

Manipulate field value of copy-field in Apache Solr

I have a simple string "PART_NUMBER" value as a field in solr. I would like to add an additional field which places that value in a URL field. To do this, I created a new field type, field, and copy field
"add-field-type": {
"name": "endpoint_url",
"class": "solr.TextField",
"positionIncrementGap": "100",
"analyzer": {
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.PatternReplaceFilterFactory",
"pattern": "([\\s\\S]*)",
"replacement": "http://myurl/$1.jpg"
}
]
}
},
"add-field": {
"name": "URL",
"type": "endpoint_url",
"stored": true,
"indexed": true
},
"add-copy-field":{ "source":"PART_NUMBER", "dest":"URL" }
As some of you probably guessed, my query output looks like
{
"id": "1",
"PART_NUMBER": "ABCD1234",
"URL": "ABCD1234",
"_version_": 1645658574812086272
}
Because the endpoint_url fieldtype only modifies the index. Indeed, when doing my analysis, I get
http://myurl/ABCD1234.jpg
My question: Is there any way to apply a tokenizer or filter and feed it back in to the field value? I would prefer this output when returning the result:
{
"id": "1",
"PART_NUMBER": "ABCD1234",
"URL": "http://myurl/ABCD1234.jpg",
"_version_": 1645658574812086272
}
Is this possible to do in Solr?
Solution was posted here:
Custom Solr analyzers not being used during indexing
I need to use an Update Processors In order to change the field value before analysis. The process can be found here:
https://lucene.apache.org/solr/guide/8_1/update-request-processors.html

Apache Nifi: Parse data with UpdateRecord Processor

I'm trying to parse some data in Nifi (1.7.1) using UpdateRecord Processor.
Original data are json files, that I would like to convert to Avro, based on a schema.
The Avro conversion is ok, but in that convertion I also need to parse one array element from the json data to a different structure in Avro.
This is a sample data of the input json:
{ "geometry" : {
"coordinates" : [ [ 4.963087975800593, 45.76365595859971 ], [ 4.962874487781098, 45.76320922779652 ], [ 4.962815443439148, 45.763116079159374 ], [ 4.962744732112515, 45.763010484202866 ], [ 4.962096825239138, 45.762112721939246 ] ]} ...}
Being its schema (specified in RecordReader):
{ "type": "record",
"name": "features",
"fields": [
{
"name": "geometry",
"type": {
"type": "record",
"name": "geometry",
"fields": [
{
"name": "coordinatesJson",
"type": {
"type": "array",
"items": {
"type": "array",
"items": "double"
}
}
},
]
}
},
....
]
}
As you can see, coordinates is an array of arrays.
And I need to parse those data to Avro, based on this schema (specified in RecordWriter):
{
"name": "outputdata",
"type": "record",
"fields": [
{"name": "coordinatesAvro",
"type": {
"type": "array",
"items" : {
"type" : "record",
"name" : "coordinatesAvro",
"fields" : [ {
"name" : "X",
"type" : "double"
}, {
"name" : "Y",
"type" : "double"
} ]
}
}
},
.....
]
}
The problem here is that I'm not being able to parse from coordinatesJson to coordinatesAvro, using RecordPath functions
I tried several mappings, like:
Property: Value:
/coordinatesJson[0..-1]/X /geometry/coordinatesAvro[*][0]
/coordinatesJson[0..-1]/Y /geometry/coordinatesAvro[*][1]
It should be a pretty straighforward parsing step, but as I said, I've been going in circles to achive this for a while.
Any help would be really appreciated.
When I collide with something like that I do next:
1) Transofrm Json into Json with strcuture that I need (for example in your case: coordinatesAvro) by ExecuteScript Processor. I have used ECMAScript cause you can simple parse JSON and work with objects (transform them).
2) ConvertJsonToAvro with one common schema (coordinatesAvro in your case) for Reader and Writer.
It works very good and I have used it on BigData cases. This is one of possible resolutions for your problem.

Pact, ensure key names in array

If returned json is a map, all key names specified in body response will be proved for existence. So
...
"response":
{
"status": 200,
"body":
{
"field1": "value1"
}
...
will ensure, that body contains a key "field1", if it is missing, an error occurs.
But what if response body is an array? I see no chance to test, if all or at least one element in this array have a specific key name. But this is important, I want to be warned if key names in backend are changing, because that would create errors in my application.
You can use eachLike to specify that array elements match a particular format. The correct syntax depends on which Pact framework you're using, but with pact-js, you would say:
const { somethingLike: like, term, eachLike } = pact
....
willRespondWith: {
status: 200,
body: eachLike({
"field1": "value1"
})
}
Here is the relevant part of the documentation.
Your example suggests you're writing the Pact file yourself - if this is the case, you can use the [*] notation to describe any array element, as described in the specification:
"response":
{
"status": 200,
"body":
[
{
"field1": "value1"
}
],
...
"matchingRules": {
"$.body": {
"min": 1,
"match": "type"
},
"$.body[*].field1": {
"match": "type"
},
...

Resources