I want to store some nested information in a Postgres database and I am wondering what is the most optimal way to do so.
I have a list of cars for rent, structured like this:
[Brand] > [Model] > [Individual cars for rent of that brand and model], ex.:
[
{
"id": 1,
"name": "Audi",
"models": [
{
"id": 1,
"name": "A1",
"cars": [
{
"id": 1,
"license": "RPY9973",
"mileage": "41053"
},
{
"id": 2,
"license": "RPY3001",
"mileage": "102302"
},
{
"id": 3,
"license": "RPY9852",
"mileage": "10236"
}
]
},
{
"id": 2,
"name": "A3",
"cars": [
{
"id": 1,
"license": "RPY1013",
"mileage": "66952"
},
{
"id": 2,
"license": "RPY3284",
"mileage": "215213"
},
{
"id": 3,
"license": "RPY0126",
"mileage": "19632"
}
]
}
...
]
}
...
]
Currently, having limited experience with databases and storing arrays, I am storing it in a 'brands' table with the following columns:
id (integer) - brand ID
name (text) - brand name
models (text) - contains stringified content of models and cars within them, which are parsed upon reading
In practice, this does the job, however I would like to know what the most efficient way would be.
For example, should I split the single table into three tables: 'brands', 'models' and 'cars' and have the tables reference each other (brands.models would be an array of unique model IDs, which I could use to read data from the 'models' table, and models.cars would be an array of unique car IDs, which I could use to read data from the 'cars' table)?
Rather than store it as json, jsonb, or as arrays, the most efficient way to store the data would be to store it as relational data (excluding the data types for brevity):
create table brands(
id,
name,
/* other columns */
PRIMARY KEY (id)
);
create table models(
id,
name,
brand_id REFERENCES brands(id),
/* other columns */
PRIMARY KEY (id)
);
create table cars(
id,
model_id REFERENCES models(id),
mileage,
license,
/* other columns */
PRIMARY KEY (id)
);
You can then fetch and update each entity individually, without having to parse json. Partial updates is also much easier when you only have to focus on a single row, rather than worrying about updating arrays or json. For querying, you would join by the primary keys. For example, to get rental cars available by a brand:
select b.id, b.name, m.id, m.name, c.id, c.mileage, c.license
FROM brands b
LEFT JOIN models m
ON m.brand_id = b.id
LEFT JOIN cars c
ON c.model_id = m.id
where b.id = ?
Based on querying / filtering patterns, you would then also want to create indexes on commonly used columns...
CREATE INDEX idx_car_model ON cars(model_id);
CREATE INDEX idx_model_brand ON models(brand_id);
The best solution to store the nested data in your postgres database is json or jsonb field.
The benefits using json or jsonb are:
significantly faster to process, supports indexing (which can be a significant advantage),
simpler schema designs (replacing entity-attribute-value (EAV) tables with jsonb columns, which can be queried, indexed and joined, allowing for performance improvements up until 1000X)
Related
Im new to postgres jsonb operation.
Im storing some data in Postgres with jsonb column, which has flexible metadata as below.
I wanted to search different unique metadata (key:value pairs)
id, type, metadata
1, player, {"name": "john", "height": 180, "team": "xyz"}
2, game, {"name": "afl", "members": 10, "team": "xyz"}
results should be something like below, distinct, order by asc. I wanted it to be efficient using some indexes.
key | value
______________
height 180
members 10
name alf
name john
team xyz
My solution below hit the index for search but sorting and distinct wont hit any indexes as they are processed values from jsonb.
CREATE INDEX metadata_jsonb_each_text_idx ON table
USING GIN (jsonb_pretty(metadata) gin_trgm_ops);
select distinct t, t.*
from table u, jsonb_each_text(u.metadata) t
where jsonb_pretty(u.metadata) like '%key%'
order by t.key, t.value
Appreciate any thoughts on this issue.
Thanks!
col-1 has dep_id(varchar) -
112
col-2 has array struct
[
{
"emp_id": 8291828,
"name": "bruce",
},
{
"emp_id": 8291823,
"name": "Rolli",
}
]
I have a use case where i need to flatten and display results. For example when queried data for dep_id - 112 I need to display emp_id in a separate row.
For above data when queried my result should look like
id emp_id
112 8291828
112 8291823
What should be my query format to fetch data?
There are several parts to make this work. First the JSON data will appear as a VARCHAR, so you first need to run json_parse on it to convert it to a JSON type in the engine. Then you can cast JSON types to normal SQL structural types, and in your case this is an array of rows (see cast from JSON). Finally, you do a cross join to the array of rows (which is effectively a nested table). This query fill give you the results you want
WITH your_table AS (
SELECT
112 AS dep_id
, '[{"emp_id": 8291828, "name": "bruce"}, {"emp_id": 8291823, "name": "Rolli"}]' AS data
)
SELECT
dep_id
, r.emp_id
, r.name
FROM your_table
CROSS JOIN
UNNEST(cast(json_parse(data) as array(row (emp_id bigint, name varchar)))) nested_data(r)
Currently, I am using SQL Server 2016 with JSON and I want to join collections together. So far I created two collections:
CREATE TABLE collect_person(person...)
CREATE TABLE collect_address(address...)
The JSON document will look like this in the first collection (collection_person):
{
"id" : "P1",
"name" : "Sarah",
"addresses" : {
"addressId" : [
"ADD1",
"ADD2"
]
}
}
The JSON documents will look like these below in the second collection (collect_address):
{
"id" : "ADD1",
"city" : "San Jose",
"state" : "CA"
}
{
"id" : "ADD2",
"city" : "Las Vegas"
"state" : "NV"
}
I want to get the addresses of the person named "Sarah", so the output will be something like:
{
{"city" : "San Jose", "state" : "CA"},
{"city" : "Las Vegas", "state" : "NV"}
}
I do not want to convert JSON to SQL and SQL to JSON. Is this possible to do in SQL Server 2016 with JSON and please show me how? Thank you in advance.
I am a little late to the question, but it can be done via cross apply and I also used common table expressions. Depending on the table size I would suggest creating a persisted computed column on the id fields for each table assuming that the data won't change and that there was a single addressId per record or add some other key value that can be indexed on and used to limit the records that need to be converted to JSON. This is a simple example and it hasn't been tested for performance so "YMMV".
Building Example Tables
DECLARE #collect_person AS TABLE
(Person NVARCHAR(MAX))
DECLARE #collect_address as TABLE
([Address] NVARCHAR(MAX))
INSERT INTO #collect_person (Person)
SELECT N'{
"id" : "P1",
"name" : "Sarah",
"addresses" : {
"addressId" : [
"ADD1",
"ADD2"
]
}
}'
INSERT INTO #collect_address ([Address])
VALUES
(N'{
"id" : "ADD1",
"city" : "San Jose",
"state" : "CA"
}')
,('{
"id" : "ADD2",
"city" : "Las Vegas",
"state" : "NV"
}')
Querying the Tables
;WITH persons AS (
SELECT --JP.*
JP.id
,JP.name
,JPA.addressId -- Or remove the with clause for JPA and just use JPA.value as addressId
FROM #collect_person
CROSS APPLY OPENJSON([person])
WITH (
id varchar(50) '$.id'
,[name] varchar(50) '$.name'
,addresses nvarchar(max) AS JSON
) as JP
CROSS APPLY OPENJSON(JP.addresses, '$.addressId')
WITH (
addressId varchar(250) '$'
) AS JPA
)
,Addresses AS (
SELECT A.*
FROM #collect_address AS CA
CROSS APPLY OPENJSON([Address])
WITH (
id varchar(50) '$.id'
,city varchar(50) '$.city'
,state varchar(2) '$.state'
) as A
)
SELECT * FROM persons
INNER JOIN Addresses
ON persons.addressId = Addresses.id
Again this is not the ideal way to do this, but it will work and as stated before you should probably have a key field on each table that is indexed to limit the scans and JSON Parsing done on the table.
There is native compilation, but it is new to me and I am not familiar with the pros and cons.
Optimize JSON processing with in-memory OLTP
I'm very new to OrientDB, I'm trying to create a structure to insert and retrieve large data with nested fields and I couldn't find proper solution or guideline.
This is the structure of table I want to create:
{
UID,
Name,
RecordID,
RecordData: [
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
}
....Too many records....
]
},
{
UID,
Name,
RecordID,
RecordData: [
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
},
{
RAddress,
ItemNo,
Description
}
....Too many records....
]
}
....Too many records....
Now, I want to retrieve Description field from table by querying ItemNo and RAddress in bulk.
For example, I have 50K(50000) pairs of UID or RecordID and ItemNo or RAddress, based on this data I want to retrieve Description field. I want to do is with the fastest possible way. So can any one please suggest me good query for this task?
I have 500M records in which most of the record contains 10-12 words each.
Can anyone suggest CRUD queries for it?
Thanks in advance.
You might want to create a single record using content as such:
INSERT INTO Test CONTENT {"UID": 0,"Name": "Test","RecordID": 0,"RecordData": {"RAddress": ["RAddress1", "RAddress2", "RAddress3"],"ItemNo": [1, 2, 3],"Description": ["Description1", "Description2", "Description3"]}}
That'll get you started with embedded values and JSON, however, if you want to do a bulk insert you should write a function, there are many ways to do so but if you want to stay on Studio, go for Function tab.
As for the retrieving part:
SELECT RecordData[Description] FROM Test WHERE (RecordData[ItemNo] CONTAINSTEXT "1") AND (RecordData[RAddress] CONTAINSTEXT "RAddress1")
"question_id": 58640
"tags": ["polls", "fun", "quotes"]
"title": "Great programming quotes"
"question_id": 184618
"tags": ["polls", "fun", "comment"]
"title": "What is the best comment in source code you have ever encountered?"
"question_id": 3734102
"tags": ["c++", "linux", "exit-code"]
"title": "Why cant' I return bigger values from main function ?"
"question_id": 2349378
"tags": ["communication", "terminology", "vocabulary"]
"title": "New programming jargon you coined?"
"question_id": 3723817
"tags": ["open-source", "project-management", "failure", "fail"]
"title": "How to make an open source project fail"
"question_id": 3699150
"tags": ["testing", "interview-questions", "job-interview"]
"title": "Interview question please help"
This is just a text extract some questions that I got using the SO API.
To make this query-able, I want to use SQLite to store the data.
How should I store the tags column?
Since the limit here on SO is five tags, I can use five columns tag1, tag2 ..., but I think there would be something more elegant that can be done. Something that scales to any number of tags being there, and can also handle basic queries like
select title from table where tag has "c++" and "boost" but not "c"
This is a many to many relationship : questions have multiple tags, tags can appear in multiple questions. This means you have to create three tables, one for the questions, one for the tags, and one for the links between these tables. The resulting query would look like this:
SELECT title FROM question
INNER JOIN question_tag_link USING (question_id)
INNER JOIN tag USING (tag_id)
WHERE tag_name IN('c++', 'boost')
AND NOT EXISTS(
SELECT * FROM tag t1
WHERE t1.tag_name = 'c'
AND t1.question_id = question.question_id);
Not so simple, but I think it is the price to pay if you don't want to be limited. If there are less than 64 different tags, you could use the SET field type, but you would loose very much flexibility (hard to add a new tag).
select distinct a.QuestionTitle
from
(
select q.QuestionID, QuestionTitle, TagName
from QuestionTags as x
join Question as q on q.QuestionID = x.QuestionID
join Tag as t on t.TagID = x.TagID
where TagName in ('c++', 'boost')
) as a
left join
(
select q.QuestionID, QuestionTitle, TagName
from QuestionTags as x
join Question as q on q.QuestionID = x.QuestionID
join Tag as t on t.TagID = x.TagID
where TagName = 'c'
) as b on b.QuestionID = a.QuestionID
where b.QuestionTitle is null
order by a.QuestionTitle ;