I use JSON Schema to describe my data structure
My data contains specific arrays of the following structure:
{
"string1",
"string2",
...
"stringN",
{
"object": "as",
"last": "item"
}
}
I'm wondering how to describe this in json schema (especially the thing "last item is an object")
If I knew the number of string items, "prefixItems" would do the thing (but there could be any number of them).
If the object was the first (not last) item, "prefixItems" together with "items" would work.
If is use "contains", it only checks the object is somewhere in my array, not checking it is the last item.
Seems that I need something like "reversePrefixItems", if such option existed - but it doesn't exists.
So, what is a proper way to describe the last item of an array? (and optionally all the preceding ones - knowing their type but not their total count)
there has been discussion of a proposed keyword postfixItems but it has not yet made it into the spec. I don't think there is a way to do this currently.
Related
I'm using OpenRefine to pull in information on publisher policies using the Sherpa Romeo API (Sherpa Romeo is a site that aggregates publisher policies). I've got that.
Now I need to parse the returned JSON so that those with certain pieces of information remain. The results I'm interested in need to include the following:
'any_website',
'any_repository',
'institutional_repository',
'non_commercial_institutional_repository',
'non_commercial_repository'
These pieces on information all fall under an array called "permitted_oa". For some reason, I can't even work out how to just pull out that array. I've tried writing grel expressions such as
value.parseJson().items.permitted_oa
but it never reutrns anything.
I wish I could share the JSON but it's too big.
I can see a couple of issues here.
Firstly the Sherpa API response items is an array (i.e. a list of things). When you have an array in the JSON, you either have to select a particular item from the array, or you have to explicitly work through the list of things in the array (aka iterate across the array) in your GREL. If you've previously worked with arrays in GREL you'll be familiar with this, but if you haven't
value.parseJson().items[0] -> first item in the array
value.parseJson().items[1] -> second item in the array
value.parseJson().items[2] -> third item in the array etc. etc.
If you know there is only ever going to be a single item in the array then you can safely use value.parseJson().items[0]
However, if you don't know how many items will be in the array and you are interested in them all, you will have to iterate over the array using a GREL control such as "forEach":
forEach(value.parseJson().items, v, v)
is a way of iterating over the array - each time the GREL finds an item in the array, it will assign it to a variable "v" and then you can do a further operation on that value using "v" as you would usually use "value" (see https://docs.openrefine.org/manual/grel#foreache1-v-e2 for an example of using forEach on an array)
Another possibility is to use join on the array. This will join all the things in an array into a string.
value.parseJson().items.join("|")
It looks like the Sherpa JSON uses Arrays liberally so you may find more arrays you have to deal with to get to the values you want.
Secondly, in the JSON you pasted "oa_permitted" isn't directly in the "item" but in another array called "publisher_policy" - so you'll need to navigate that as well. So:
value.parseJson().items[0].publisher_policy[0].permitted_oa[0]
would get you the first permitted_oa object in the first publisher_policy in the first item in the items array. If you wanted to (for example) get a list of locations from the JSON you have pasted you could use:
value.parseJson().items[0].publisher_policy[0].permitted_oa[0].location.location.join("|")
Which will give you a pipe ("|") separated list of locations based on the assumption there is only a single item, single publisher_policy and singe permitted_oa - which is true in the case of the JSON you've supplied here (but might not always be true)
For ease of notation, I will use JSON in the following, though the anti-pattern can be programmed in many languages
Let's say that I have a sensible JSON such as
{
"SomeProperty": "SomeValue",
"SomeOtherProperty": 42,
"Items": [
"ValueOfItem0", "ValueOfItem1"
]
}
It has simple entries and an array of items. An alternative way of representing the data, which I think is an anti-pattern and for which I search the name is
{
"someProperty": "someValue",
"someOtherProperty": 42,
"Item0": "valueOfItem0",
"Item1": "valueOfItem1",
"NumberOfItems": 2
}
Instead of by the array, the items are kept 'together' by the keys, which a consumer of the anti-pattern JSON would need to predict. For this reason, the NumberOfItems property has been added, though by using a TryGet-like technique, the property can be made obsolete. Why would anyone do this? Limitations of the serializer.
Comments:
My search has revealed nothing. The sort-of opposite direction is the "Arrject", therefore my humble suggestion for the described anti-pattern, if yet unnamed, would be "Orray", a name already used by Star Wars.
I'm generating some JSON with XStream for an object that contains an ArrayList. I have the #XStreamImplicit annotation set for the specific field. When I have 2 elements in the array it converts to JSON properly...
"cap": [
"switch",
"switch2"
]
However if there is only one element in the array I get this...
"cap": "switch"
Therefore it's not showing up as an array (even though there is only one element in it). This makes it a challenge to deserialize it on the other end (which I do not have control of). What I want it to do is the following...
"cap": [
"switch"
]
I'm assuming this is some sort of automatic assumption made for efficiency sake in the library. Either in XStream or Jettison. However is there a way to force the array setting here?
I'm trying to update a sub-document of a sub-array, but I can't count on the element's position in the array to always be the same, so how do I update it.
For example, here's my document, I just want to push another tag object (key/val) onto the tags array of real_pagenum: 1 of the pdf with the _id of ObjectId("50634e26dc7c22c64d00000a")
{
"_id" : ObjectId("503b83dfad79cc8d26000004"),
"pdfs" : [
{
"_id" : ObjectId("50634e26dc7c22c64d00000a"),
"pages" : [
{
"real_pagenum" : 1,
"_id" : ObjectId("50634e74dc7c22c64d00002b"),
"tags" : [
{
"key" : "Item",
"val" : "foo"
},
{
"key" : "Item",
"val" : "bar"
}
]
},
{
"real_pagenum" : 2,
"_id" : ObjectId("50634e74dc7c22c64d00002b")
}
],
"title" : "PDF3",
"version" : 3
}
],
}
To reiterate, the goal is to first target the right pdf by _id, then the right page by real_pagenum and push into that pdf's page's tags array.
I had tried:
db.projects.update({'pdfs.pdf_id':ObjectId("50634e25dc7c22c64d000007")},
{$push:{'pages.$.tags':{'key':'foo','val':'bar'}}},false,false);
but that doesn't get to the level I need. I've read that I can use a combination of the positional operator with actual element position, but again, I can't guarantee or know the element's position.
This is currently not possible. You can only address the highest level array in your document tree with the positional operator. Any deeper arrays can only be accessed if you know the exact element position.
This is currently an open issue : https://jira.mongodb.org/browse/SERVER-831
It is a currently filed bug in mongodb that the positional operator cannot be used for nested arrays:
https://jira.mongodb.org/browse/SERVER-831
The positional operator also cannot be used multiple times in one query, for the time being.
The best way to handle this in your case may be to change your schema. In general, having arrays within a document that will grow to an arbitrary size is not a good idea, since BSON documents have a maximum size of 16mb.
Especially since in this case it seems like your document is basically just an array of pdfs. Have you considered creating a separate collection for pdfs?
The other answers are correct as far as doing an atomic update of the document.
I highly agree that this looks like less than optimal schema - a document per pdf might make more sense then you only have an array of pages within each document which you can update atomically with positional operator.
If you cannot modify the schema, you can still make the update you want, it may not be thread safe though. You can read the entire document into your application, modify appropriate fields in code and then save the item. If you wanted to make sure that multiple threads wouldn't stomp over each other, you would have to keep a version in each document and increment it on each "update" which would be conditional on the version not having been changed in the mean time. It's more complex and it still requires an additional field in the current schema, but at least it allows you to do what you state you need to do.
I have a collection with records that look like this
{
"_id":{
"$oid":"4e3923963b123b59b73bde67"
},
"ident":"terrainHome",
"columns":[
[
"4e3fbe57dccd1a0cc47509ab",
"4e3fbe57dccd1a0cc47509ac"
],
[
]
]
}
each document can have two or three columns,
each column is an array of blocks, which are stored in a different collection,
I want a query that will return the ident for the document that contains a block.
I tried
db.things.find({ columns[0] : "4e3fbe57dccd1a0cc47509ac" });
but this didn't work
I'll keep trying. :)
You are mixing types here (ObjectId != String). You should always keep things as ObjectId not sometimes as strings, as you have in the array. This is probably not the root of your problem, but could be problematic later.
In you example you can do this:
db.things.find({ "columns.0" : "4e3fbe57dccd1a0cc47509ac" });
Generally arrays of arrays can be challenging to query on when they are more structured (like embedded docs).
If you don't care about WHERE the match is, try
db.things.find({'columns' : $in : ["4e3fbe57dccd1a0cc47509ac"] });
This works
db.things.find({"$where":"typeof(this.columns[0]) != \"undefined\" && this.columns[0].indexOf(\"4e48ed8245333bd40d000010\") != -1"});
I'll accept my own answer in a couple of days or so, unless someone can suggest a simpler solution.
Also here's a screen grab of the mongo console of the two suggested solutions that didn't work, (thanks though for taking the time), plus the complicated solution I've found that works. The screen grab demos that there are documents in the collection, the two suggested find commands, and the $where javascript solution showing that this works for first item in the array, second item in the array and returns no records for a non matching id.
I've tried dozens of variations of the suggested solutions, but they all turn up blank results.