ordered fixed-length arrays in mongoDB - arrays

I'm new to monogDB and trying to design the way I store my data so I can do the kinds of queries I want to. Say I have a document that looks like
{
"foo":["foo1","foo2","foo3"],
"bar":"baz"
}
Where the array "foo" is always of length 3, and the order of the items are meaningful. I would like to be able to make a query that searches for all documents where "foo2" == something. Essentially I want to treat "foo" like any old array and be able to index it in a search, so something like "foo"[1] == something.
Does monogDB support this? Would it be more correct to store my data like,
{
"foo":{
"foo1":"val1",
"foo2":"val2",
"foo3":"val3"
},
"bar":"baz"
}
instead? Thanks.

The schema you have asked about is fine.
To insert at a specific index of array:
Use the $position operator. Read here.
To query at a specific index location:
Use the syntax key.index. As in:
db.users.find({"foo.1":"foo2"})

Related

How to improve or index postgresql's jsonb array field?

I usually use jsonb field store array data.
for example, I want to store customer's barcode info, I will create a table like this:
create table customers(fcustomerid bigint, fcodes jsonb);
One customer has one row, all barcode info stored in its fcodes field, just like below:
[
{
"barcode":"000000001",
"codeid":1,
"product":"Coca Cola",
"createdate":"2021-01-19",
"lottorry":true,
"lottdate":"2021-01-20",
"bonus":50
},
{
"barcode":"000000002",
"codeid":2,
"product":"Coca Cola",
"createdate":"2021-01-19",
"lottorry":false,
"lottdate":"",
"bonus":0
}
...
{
"barcode":"000500000",
"codeid":500000,
"product":"Pepsi Cola",
"createdate":"2021-01-19",
"lottorry":false,
"lottdate":"",
"bonus":0
}
]
The jsonb array maybe store millions of barcode's objects with the same structure. Perhaps this is not a good idea, but you konw when I have thousands of customer, I can store all the data in one table, one customer has one row in this table, all its data store in one field, it looks very tersely and easy to manage.
For this kind of application scenarios, how to efficiently to insert or modify or query the data?
I can use jsonb_insert to insert one object, just like:
update customers
set fcodes=jsonb_insert(fcodes,'{-1}','{...}'::jsonb)
where fcustomerid=999;
When I want modify some object, I found it is a little difficulty, I should know the index of object first, if I use the incremental key codeid as the array index, things looks easilly. I can use jsonb_modify,Just like below:
update customers
set fcodes=jsonb_set(fcodes,concat('{',(mycodeid-1)::text,',lottery}'),'true'::jsonb)
where fcustomerid=999;
But if I want to query the objects in the jsonb array with createdate or bonus or lottorry or product, I should use jsonpath operator. just like:
select jsonb_path_query_array(fcodes,'$ ? (product=="Pepsi Cola")'
from customer
where fcustomerid=999;
or like:
select jsonb_path_query_array(fcodes,'$ ? (lottdate.datetime()>="2021-01-01".datetime() && lottdate.datetime()<="2021-01-31".datetime())'
from customer
where fcustomerid=999;
Thie jsonb index looks useful, But it looks useful between different row, and my operation mostly works in one row's one jsonb field.
I am very worrying about the efficiency, for millions of objects stored in one row's one jsonb field, is this a good idea? And how to improve the efficiency in this scenarios? Especially for the query.
You are right to worry. With a huge JSON like that, you will never get good performance.
Your data don't need JSON at all. Create a table that stores a single barcode and has a foreign key reference to customers. Then everything will be simple and efficient.
Using JSON in the database is almost always the wrong choice, judging from the questions in this forum.

How to filter based on the last element in ArrayField in django

I'm using postgresql database which allows having an array datatype, in addition django provides PostgreSQL specific model fields for that.
My question is how can I filter objects based on the last element of the array?
class Example(models.Model):
tags = ArrayField(models.CharField(...))
example = Example.objects.create(tags=['tag1', 'tag2', 'tag3']
example_tag3 = Example.objects.filter(tags__2='tag3')
I want to filter but don't know what is the size of the tags. Is there any dynamic filtering something like:
example_tag3 = Example.objects.filter(tags__last='tag3')
I don't think there is a way to do that without "killing the performance" other than using raw SQL (see this). But you should avoid doing things like this, from the doc:
Tip: Arrays are not sets; searching for specific array elements can be
a sign of database misdesign. Consider using a separate table with a
row for each item that would be an array element. This will be easier
to search, and is likely to scale better for a large number of
elements.
Adding to the above answer and comment, if changing the table structure isn't an option, you may filter your query based on the first element in an array by using field__0:
example_tag3 = Example.objects.filter(tags__0='tag1')
However, I don't see a way to access the last element directly in the documentation.

jsonb vs jsonb[] for multiple addresses for a customer

It's a good idea to save multiple addresses in a jsonb field in PostgreSQL. I'm new in nosql and I'd like to test PostgreSQL to do that. I don't want to have another table with addresses, I prefer to have it in the same table.
But I'm in doubt, I've seen PostreSQL have jsonb and jsonb[].
Which one is better to store multiple addresses?
If I use jsonb, I think I must to add a prefix for every field like this:
"1_adresse_line-1"
"1_adresse_line-2"
"1_postalcode"
"2_adresse_line-1"
"2_adresse_line-2"
"2_postalcode"
"3_adresse_line-1"
"3_adresse_line-2"
"3_postalcode"
etc.
Is it better to use jsonb[], how does it work?
Use a jsonb (not jsonb[]!) column with the structure like this:
select
'[{
"adresse_line-1": "a11",
"adresse_line-2": "a12",
"postalcode": "code1"
},
{
"adresse_line-1": "a21",
"adresse_line-2": "a22",
"postalcode": "code2"
}
]'::jsonb;
Though, a regular table related to the main one is a better option.
Why not jsonb[]? Take a look at JSON definition:
JSON is built on two structures:
A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.
In a jsonb column you can therefore store an array of objects. Attempts to use the array of jsonb are probably due to misunderstanding of this type of data. I have never seen a reasonable need for such a solution.

Mongodb: how to auto-increment a subdocument field?

The following document records a conversation between Milhouse and Bart. I would like to insert a new message with the right num (the next in the example would be 3) in a unique operation. Is that possible ?
{ user_a:"Bart",
user_b:"Milhouse",
conversation:{
last_msg:2,
messages:[
{ from:"Bart",
msg:"Hello"
num:1
},
{ from:"Milhouse",
msg:"Wanna go out ?"
num:2
}
]
}
}
In MongoDB, arrays keep their order, so by adding a num attribute, you're only creating more data for something that you could accomplish without the additional field. Just use the position in the array to accomplish the same thing. Grabbing the X message in an array will provide faster searches than searching for { num: X }.
To keep the order, I don't think there's an easy way to add the num category besides does a find() on conversation.last_msg before you insert the new subdocument and increment last_msg.
Depending on what you need to keep the ordering for, you might consider including a time stamp in your subdocument, which is commonly kept in conversation records anyway and may provide other useful information.
Also, I haven't used it, but there's a Mongoose plugin that may or may not be able to do what you want: https://npmjs.org/package/mongoose-auto-increment
You can't create an auto increment field but you can use functions to generate and administrate sequence :
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/ 
I would recommend using a timestamp rather than a numerical value. By using a timestamp, you can keep the ordering of the subdocument and make other use of it.

Paging arrays in mongodb subdocument

I have a mongo collection with documents that have a schema structured like the following:
{ _id : bla,
fname : foo,
lname : bar,
subdocs [ { subdocname : doc1
field1 : one
field2 : two
potentially_huge_array : [...]
}, ...
]
}
I'm using the ruby mongo driver that currently does not support elemMatch. I use an aggregation when extracting from subdocs via a project, unwind and match pipeline.
What I would now like to do is to page results from the potentially_huge_array array contained in the subdocument. I have not been able to figure out how to grab just a subset of the array without dragging the entire subdoc, huge array and all, out of the db into my app.
Is there some way to do this?
Would a different schema be a better way to handle this?
Depending on how huge is huge, you definitely don't want it embedded into another document.
The main reason is that unless you always want the array returned with the document, you probably don't want to store it as part of the document. How you can store it in another collection would depend on exactly how you want to access it.
Reviewing the types of queries you most often perform on your data will usually suggest the best schema - one that will allow you to be efficient about number of queries, the amount of data returned and ease of indexing the data.
If you field really huge and changes often, just placed it in separate collection.

Resources