JSON Schema Array must contain a specific string - arrays

There are several question on the subject, but none of them seem to address this particular issue nor does the documentation on JSON Schema, so maybe it cannot be done.
The issue is that I have an array that can have any of 4 strings as values, easy enough to achieve with this schema:
...
"attributes": {
"type": "array",
"items": {
"type": "string",
"enum": [
"controls",
"autoplay",
"muted",
"loop"
]
},
"additionalItems": false
}
...
So the values in the array can only be one of those four. Nevertheless, "controls" must always be part of the array, while the other three are optional. If it was an array of objects we could make this required, but I'm not sure how to check for an array having a specific value.
Thanks for any help!

You can use the contains keyword:
"attributes": {
"type": "array",
"items": {
"type": "string",
"enum": [
"controls",
"autoplay",
"muted",
"loop"
]
},
"contains": {
"const": "controls"
},
"additionalItems": false
}
From the specification:
6.4.6. contains
The value of this keyword MUST be a valid JSON Schema.
An array instance is valid against "contains" if at least one of its
elements is valid against the given schema.

Related

JSONSchema keyword "type" when encased inside "items" fails to validate

I'm trying to write a json validator to check files before runtime but there's a really odd issue happening with using "type".
I know "type" is a reserved word but jsonSchema doesn't have an issue with it if it doesn't have a value pairing as I found in this other question: Key values of 'key' and 'type' in json schema.
The solution to their problem was encasing "type": { "type": "string"} inside "properties" and it does work. However, my implementation requires it to be inside an array. Here's the snippet of my code:
{
"type": "object",
"additionalProperties": false,
"properties":{
"method":{
"type": "array",
"items":{
"type": {
"type": "string"
},
"name":{
"type": "string"
},
"provider": {
"type": "array"
}
}
}
}
}
Oddly enough, VScode doesn't have a problem with it in a file when it's isolated, but when it's included in the main code, it doeesn't like it and yields no solution. Regardless, validating it with python yields:
...
raise exceptions.SchemaError.create_from(error)
jsonschema.exceptions.SchemaError: {'type': 'string'} is not valid under any of the given schemas
Failed validating 'anyOf' in metaschema['allOf'][1]['properties']['properties']['additionalProperties']['$dynamicRef']['allOf'][1]['properties']['items']['$dynamicRef']['allOf'][3]['properties']['type']:
{'anyOf': [{'$ref': '#/$defs/simpleTypes'},
{'items': {'$ref': '#/$defs/simpleTypes'},
'minItems': 1,
'type': 'array',
'uniqueItems': True}]}
On schema['properties']['method']['items']['type']:
{'type': 'string'}
What further confuses me is that https://www.jsonschemavalidator.net/ tells me
Expected array or string for 'type', got StartObject. Path 'properties.method.items.type', line 8, position 17. yet JSON Schema Faker is able to generate a fake file without any problems. The generated fake json also returns the same error when validated with python and JSONSchemaValidator.
I'm a beginner and any help or insight will be greatly appreciated, thanks for your time.
Edit: here's the snippet of the input data as requested.
{
...
"method": [
{
"type": "action",
"name": "name of the chaos experiment to use here",
"provider": [
]
}
}
]
}
Arrays don't have properties; arrays have items. The schema as you have included it is not valid; are you sure you don't mean to have this?
{
"type": "object",
"additionalProperties": false,
"properties":{
"method":{
"type": "array",
"items":{
"type": "object",
"properties": {
"type": {
"type": "string"
},
"name":{
"type": "string"
},
"provider": {
"type": "array"
}
}
}
}
}
}

How to add required to sub array in a json schema?

I'm creating a json schema to define necessary data with data types. There is some data need to be set into required filed. But didn't find how to do it in its document.
For this json schema:
{
"type": "object",
"required": [
"version",
"categories"
],
"properties": {
"version": {
"type": "string",
"minLength": 1,
"maxLength": 1
},
"categories": {
"type": "array",
"items": [
{
"title": {
"type": "string",
"minLength": 1
},
"body": {
"type": "string",
"minLength": 1
}
}
]
}
}
}
json like
{
"version":"1",
"categories":[
{
"title":"First",
"body":"Good"
},
{
"title":"Second",
"body":"Bad"
}
]
}
I want to set title to be required, too. It's in a sub array. How to set it in json schema?
There are a few things wrong with your schema. I'm going to assume you're using JSON Schema draft 2019-09.
First, you want items to be an object, not an array, as you want it to apply to every item in the array.
If "items" is a schema, validation succeeds if all elements in the
array successfully validate against that schema.
If "items" is an array of schemas, validation succeeds if each
element of the instance validates against the schema at the same
position, if any.
https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-02#section-9.3.1.1
Second, if the value of items should be a schema, you need to treat it like a schema in its own right.
If we take the item from your items array as a schema, it doesn't actually do anything, and you need to nest it in a properties keyword...
{
"properties": {
"title": {
"type": "string",
"minLength": 1
},
"body": {
"type": "string",
"minLength": 1
}
}
}
Finally, now your items keyword value is a schema (subschema), you can add any keywords you can normally use, such as required, the same as you have done previously.
{
"required": [
"title"
],
"properties": {
...
}
}

JSON schema deeper object uniqueness

I'm trying to get into JSON schema definitions and wanted to find out, how to achieve a deeper object uniqueness in the schema definition. Please look at the following example definition, in this case a simple IO of a module.
{
"$schema": "http://json-schema.org/draft-06/schema#",
"type": "object",
"required": ["modulIOs"],
"properties": {
"modulIOs": {
"type": "array",
"uniqueItems": true,
"items": {
"allOf": [
{
"type": "object",
"required": ["ioPosition","ioType","ioFunction"],
"additionalProperties": false,
"properties": {
"ioPosition": {
"type": "integer"
},
"ioType": {
"type":"string",
"enum": ["in","out"]
},
"ioFunction": {
"type":"string"
}
}
}
]
}
}
}
}
When I validate the following with i.E. draft-06 I get a positive validation.
{"modulIOs":
[
{
"ioPosition":1,
"ioType":"in",
"ioFunction":"240 V AC in"
},
{
"ioPosition":1,
"ioType":"in",
"ioFunction":"24 V DC in"
}
]
}
I'm aware that the validation is successfull because the validator does what he's intended to - it checks the structure of a JSON-object, but is there a possibility to validate object value data in deeper objects or do i need to perform the check elsewhere?
This is not currently possible with JSON Schema (at draft-7).
There is an issue raised on the official spec repo github for this: https://github.com/json-schema-org/json-schema-spec/issues/538
If you (or anyone reading this) really wants this, please thumbsup the first issue comment.
It's currently unlikely to make it into the next draft, and even if it did, time to impleemntations picking it up may be slow.
You'll need to do this validation after your JSON Schema validation process.
You can validate data value of your object fields by using JSON schema validation.
For example, if you need to check if ioPosition is between 0 and 100 you can use:
"ioPosition": {
"type": "integer",
"minimum": 0,
"maximum": 100
}
If you need to validate ioFunction field you can use regualr expression such as:
"ioFunction": {
"type": "string",
"pattern": "^[0-9]+ V [A,D]C"
}
Take a look at json-schema-validation.

Elasticsearch not returning hits for multi-valued field

I am using Elasticsearch with no modifications whatsoever. This means the mappings, norms, and analyzed/not_analyzed is all default config. I have a very small data set of two items for experimentation purposes. The items have several fields but I query only on one, which is a multi-valued/array of strings field. The doc looks like this:
{
"_index": "index_profile",
"_type": "items",
"_id": "ega",
"_version": 1,
"found": true,
"_source": {
"clicked": [
"ega"
],
"profile_topics": [
"Twitter",
"Entertainment",
"ESPN",
"Comedy",
"University of Rhode Island",
"Humor",
"Basketball",
"Sports",
"Movies",
"SnapChat",
"Celebrities",
"Rite Aid",
"Education",
"Television",
"Country Music",
"Seattle",
"Beer",
"Hip Hop",
"Actors",
"David Cameron",
... // other topics
],
"id": "ega"
}
}
A sample query is:
GET /index_profile/items/_search
{
"size": 10,
"query": {
"bool": {
"should": [{
"terms": {
"profile_topics": [
"Basketball"
]
}
}]
}
}
}
Again there are only two items and the one listed should match the query because the profile_topics field matches with the "Basketball" term. The other item does not match. I only get a result if I ask for clicked = ega in the should.
With Solr I would probably specify that the fields are multi-valued string arrays and are to have no norms and no analyzer so profile_topics are not stemmed or tokenized since all values should be treated as tokens (even the spaces). Not sure this would solve the problem but it is how I treat similar data on Solr.
I assume I have run afoul of some norm/analyzer/TF-IDF issue, if so how do I solve this so that even with two items the query will return ega. If possible I'd like to solve this index or type wide rather than field specific.
Basketball (with capital B) in terms will not be analyzed. This means this is the way it will be searched in the Elasticsearch index.
You say you have the defaults. If so, indexing Basketball under profile_topics field means that the actual term in the index will be basketball (with lowercase b) which is the result of the standard analyzer. So, either you set profile_topics as not_analyzed or you search for basketball and not Basketball.
Read this about terms.
Regarding to setting all the fields to not_analyzed you could do that with a dynamic template. Still with a template you can do what Logstash is doing: defining a .raw subfield for each string field and only this subfield is not_analyzed. The original/parent field still holds the analyzed version of the same text, maybe you will use in the future the analyzed field.
Take a look at this dynamic template. It's the one Logstash is using.
More specifically:
{
"template": "your_indices_name-*",
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"omit_norms": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
]
}
}
}

minItems doesn't seem to validate correctly in JSON schema

I'm writing a simple JSON schema and using minItems to validate the number of items in a given array. My schema is as follows:
{
"title": "My Schema",
"type": "object",
"properties": {
"root": {
"type": "array",
"properties": {
"id": {
"type": "string"
},
"myarray": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 4,
"uniqueItems": true
},
"boolean": {
"type": "boolean"
}
},
"required": ["id","myarray","boolean"]
}
},
"required": [
"root"
],
"additionalProperties": false
}
Now I would expect the following JSON to fail validation given the element myarray has nothing in it. But when using this online validator, it passes. Have I done something wrong or is the schema validator I'm using faulty?
{
"root":[
{
"id":"1234567890",
"myarray":[],
"boolean":true
}
]
}
I am not sure why or what it is called, but the correct schema definition for your requirement should be as shown further down.
From what I understand from the JSON Schema definitions, you should declare the properties of an array inside the items declaration. In your schema you where defining properties outside of the array item declaration.
In your schema you have the two different types of array declaration:
Once with just a single object (a string for the "myarray" object)
Once with a complex object (the object name "myComplexType" in the code below)
Have a look at the definitions of both, how they are structured and how they would be interpreted.
The corrected schema:
{
"title": "My Schema",
"type": "object",
"properties": {
"root": {
"type": "array",
"items": { <-- Difference here - "items" instead of "properties"
"type": "object", <-- here - define the array items as a complex object
"title": "myComplexType", <-- here - named for easier referencing
"properties": { <-- and here - now we can define the actual properties of the object
"id": {
"type": "string"
},
"myarray": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 4,
"uniqueItems": true
},
"boolean": {
"type": "boolean"
}
}
},
"required": [
"id",
"myarray",
"boolean"
]
}
},
"required": [
"root"
],
"additionalProperties": false
}
Remove the comments I added with <-- when copying over to your code, added for pointing where there changes are.
As a note, I do however don't understand why the validator didn't give an error for the 'malformed' schema, but might just be that it saw the definition as you had it as additional properties, not entirely sure.
The only thing wrong with your schema is that the root property should have type object instead of array. Because the properties keyword is not defined for arrays, it is ignored. Therefore, the part of the schema you were trying to test was completely ignored even though it was correct.
Here is the relevant passage from the specification
Some validation keywords only apply to one or more primitive types. When the primitive type of the instance cannot be validated by a given keyword, validation for this keyword and instance SHOULD succeed.
http://json-schema.org/latest/json-schema-validation.html#rfc.section.4.1

Resources