jsonb vs jsonb[] for multiple addresses for a customer - arrays

It's a good idea to save multiple addresses in a jsonb field in PostgreSQL. I'm new in nosql and I'd like to test PostgreSQL to do that. I don't want to have another table with addresses, I prefer to have it in the same table.
But I'm in doubt, I've seen PostreSQL have jsonb and jsonb[].
Which one is better to store multiple addresses?
If I use jsonb, I think I must to add a prefix for every field like this:
"1_adresse_line-1"
"1_adresse_line-2"
"1_postalcode"
"2_adresse_line-1"
"2_adresse_line-2"
"2_postalcode"
"3_adresse_line-1"
"3_adresse_line-2"
"3_postalcode"
etc.
Is it better to use jsonb[], how does it work?

Use a jsonb (not jsonb[]!) column with the structure like this:
select
'[{
"adresse_line-1": "a11",
"adresse_line-2": "a12",
"postalcode": "code1"
},
{
"adresse_line-1": "a21",
"adresse_line-2": "a22",
"postalcode": "code2"
}
]'::jsonb;
Though, a regular table related to the main one is a better option.
Why not jsonb[]? Take a look at JSON definition:
JSON is built on two structures:
A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.
In a jsonb column you can therefore store an array of objects. Attempts to use the array of jsonb are probably due to misunderstanding of this type of data. I have never seen a reasonable need for such a solution.

Related

How to improve or index postgresql's jsonb array field?

I usually use jsonb field store array data.
for example, I want to store customer's barcode info, I will create a table like this:
create table customers(fcustomerid bigint, fcodes jsonb);
One customer has one row, all barcode info stored in its fcodes field, just like below:
[
{
"barcode":"000000001",
"codeid":1,
"product":"Coca Cola",
"createdate":"2021-01-19",
"lottorry":true,
"lottdate":"2021-01-20",
"bonus":50
},
{
"barcode":"000000002",
"codeid":2,
"product":"Coca Cola",
"createdate":"2021-01-19",
"lottorry":false,
"lottdate":"",
"bonus":0
}
...
{
"barcode":"000500000",
"codeid":500000,
"product":"Pepsi Cola",
"createdate":"2021-01-19",
"lottorry":false,
"lottdate":"",
"bonus":0
}
]
The jsonb array maybe store millions of barcode's objects with the same structure. Perhaps this is not a good idea, but you konw when I have thousands of customer, I can store all the data in one table, one customer has one row in this table, all its data store in one field, it looks very tersely and easy to manage.
For this kind of application scenarios, how to efficiently to insert or modify or query the data?
I can use jsonb_insert to insert one object, just like:
update customers
set fcodes=jsonb_insert(fcodes,'{-1}','{...}'::jsonb)
where fcustomerid=999;
When I want modify some object, I found it is a little difficulty, I should know the index of object first, if I use the incremental key codeid as the array index, things looks easilly. I can use jsonb_modify,Just like below:
update customers
set fcodes=jsonb_set(fcodes,concat('{',(mycodeid-1)::text,',lottery}'),'true'::jsonb)
where fcustomerid=999;
But if I want to query the objects in the jsonb array with createdate or bonus or lottorry or product, I should use jsonpath operator. just like:
select jsonb_path_query_array(fcodes,'$ ? (product=="Pepsi Cola")'
from customer
where fcustomerid=999;
or like:
select jsonb_path_query_array(fcodes,'$ ? (lottdate.datetime()>="2021-01-01".datetime() && lottdate.datetime()<="2021-01-31".datetime())'
from customer
where fcustomerid=999;
Thie jsonb index looks useful, But it looks useful between different row, and my operation mostly works in one row's one jsonb field.
I am very worrying about the efficiency, for millions of objects stored in one row's one jsonb field, is this a good idea? And how to improve the efficiency in this scenarios? Especially for the query.
You are right to worry. With a huge JSON like that, you will never get good performance.
Your data don't need JSON at all. Create a table that stores a single barcode and has a foreign key reference to customers. Then everything will be simple and efficient.
Using JSON in the database is almost always the wrong choice, judging from the questions in this forum.

How to filter based on the last element in ArrayField in django

I'm using postgresql database which allows having an array datatype, in addition django provides PostgreSQL specific model fields for that.
My question is how can I filter objects based on the last element of the array?
class Example(models.Model):
tags = ArrayField(models.CharField(...))
example = Example.objects.create(tags=['tag1', 'tag2', 'tag3']
example_tag3 = Example.objects.filter(tags__2='tag3')
I want to filter but don't know what is the size of the tags. Is there any dynamic filtering something like:
example_tag3 = Example.objects.filter(tags__last='tag3')
I don't think there is a way to do that without "killing the performance" other than using raw SQL (see this). But you should avoid doing things like this, from the doc:
Tip: Arrays are not sets; searching for specific array elements can be
a sign of database misdesign. Consider using a separate table with a
row for each item that would be an array element. This will be easier
to search, and is likely to scale better for a large number of
elements.
Adding to the above answer and comment, if changing the table structure isn't an option, you may filter your query based on the first element in an array by using field__0:
example_tag3 = Example.objects.filter(tags__0='tag1')
However, I don't see a way to access the last element directly in the documentation.

ordered fixed-length arrays in mongoDB

I'm new to monogDB and trying to design the way I store my data so I can do the kinds of queries I want to. Say I have a document that looks like
{
"foo":["foo1","foo2","foo3"],
"bar":"baz"
}
Where the array "foo" is always of length 3, and the order of the items are meaningful. I would like to be able to make a query that searches for all documents where "foo2" == something. Essentially I want to treat "foo" like any old array and be able to index it in a search, so something like "foo"[1] == something.
Does monogDB support this? Would it be more correct to store my data like,
{
"foo":{
"foo1":"val1",
"foo2":"val2",
"foo3":"val3"
},
"bar":"baz"
}
instead? Thanks.
The schema you have asked about is fine.
To insert at a specific index of array:
Use the $position operator. Read here.
To query at a specific index location:
Use the syntax key.index. As in:
db.users.find({"foo.1":"foo2"})

Cassandra: map collections with multiple datatypes

As stated and discussed by Mr. Ellis in dynamic-columns/wide-rows, dynamic table is possible through Map Collection. However, I can see that this is only applicable for data with the same types.
Example from the link:
CREATE TABLE users (
user_id text PRIMARY KEY,
name text,
birth_year int,
phone_numbers map
);
INSERT INTO users (user_id, name, birth_year, phone_numbers)
VALUES ('jbellis', 'Jonathan Ellis', 1976, {'home': 1112223333, 'work': 2223334444});
Both home and work phone_numbers are type integer. But we need a collection with various datatypes. Say,
Create table storage ( mobile_id int PRIMARY KEY, date timestamp, data map );
Then data contains these:
{'state': String, 'protocol': Integer, 'weight': Double, 'frame': Blob ... }
So my question is that, do we have an alternative for this? Is this possible with CQL?
At this time, I believe that it is not possible. You would be better off using a String with some sort of type information embedded in it
ie {'home': 'int:1112223333', 'work': 'str:222-333-4444'}
or alternatively use a blob and save a language-specific map into Cassandra using the blob type and language-specific serialization to save your variable map.
Maps are typed. Still you can create one map field per type? map_int, map_text, map_etc
This has the added benefit it'll be a bit faster, as collections are loaded as a whole when read, so splitting up by type will load less data one each query. You should be able to find out the type of what you're looking for up front I hope.

Paging arrays in mongodb subdocument

I have a mongo collection with documents that have a schema structured like the following:
{ _id : bla,
fname : foo,
lname : bar,
subdocs [ { subdocname : doc1
field1 : one
field2 : two
potentially_huge_array : [...]
}, ...
]
}
I'm using the ruby mongo driver that currently does not support elemMatch. I use an aggregation when extracting from subdocs via a project, unwind and match pipeline.
What I would now like to do is to page results from the potentially_huge_array array contained in the subdocument. I have not been able to figure out how to grab just a subset of the array without dragging the entire subdoc, huge array and all, out of the db into my app.
Is there some way to do this?
Would a different schema be a better way to handle this?
Depending on how huge is huge, you definitely don't want it embedded into another document.
The main reason is that unless you always want the array returned with the document, you probably don't want to store it as part of the document. How you can store it in another collection would depend on exactly how you want to access it.
Reviewing the types of queries you most often perform on your data will usually suggest the best schema - one that will allow you to be efficient about number of queries, the amount of data returned and ease of indexing the data.
If you field really huge and changes often, just placed it in separate collection.

Resources