Postgres seach jsonb with indexes - database

Im new to postgres jsonb operation.
Im storing some data in Postgres with jsonb column, which has flexible metadata as below.
I wanted to search different unique metadata (key:value pairs)
id, type, metadata
1, player, {"name": "john", "height": 180, "team": "xyz"}
2, game, {"name": "afl", "members": 10, "team": "xyz"}
results should be something like below, distinct, order by asc. I wanted it to be efficient using some indexes.
key | value
______________
height 180
members 10
name alf
name john
team xyz
My solution below hit the index for search but sorting and distinct wont hit any indexes as they are processed values from jsonb.
CREATE INDEX metadata_jsonb_each_text_idx ON table
USING GIN (jsonb_pretty(metadata) gin_trgm_ops);
select distinct t, t.*
from table u, jsonb_each_text(u.metadata) t
where jsonb_pretty(u.metadata) like '%key%'
order by t.key, t.value
Appreciate any thoughts on this issue.
Thanks!

Related

Create Postgres JSONB Index on Array Sub-Object for ILIKE operator

I have a table that has cast as jsonb column, which looks like this:
cast: [
{ name: 'Clark Gable', role: 'Rhett Butler' },
{ name: 'Vivien Leigh', role: 'Scarlett' },
]
I'm trying to query name in the jsonb array of objects. This is my query:
SELECT DISTINCT actor as name
FROM "Title",
jsonb_array_elements_text(jsonb_path_query_array("Title".cast,'$[*].name')) actor
WHERE actor ILIKE 'cla%';
Is there a way to index a query like this? I've tried using BTREE, GIN, GIN with gin_trgm_ops with no success.
My attempts:
CREATE INDEX "Title_cast_idx_jsonb_path" ON "Title" USING GIN ("cast" jsonb_path_ops);
CREATE INDEX "Title_cast_idx_on_expression" ON "Title" USING GIN(jsonb_array_elements_text(jsonb_path_query_array("Title".cast, '$[*].name')) gin_trgm_ops);
One of the issues is that jsonb_array_elements_text(jsonb_path_query_array())returns a set which can't be indexed. Using array_agg doesn't seem useful, since I need to extract name value, and not just check for existence.

BigQuery ARRAY_TO_STRING based on condition in non-array field

I have a table that I query like this...
select *
from table
where productId = 'abc123'
Which returns 2 rows (even though the productId is unique) because one of the columns (orderName) is an Array...
**productId, productName, created, featureCount, orderName**
abc123, someProductName, 2020-01-01, 12, someOrderName
, , , , someOtherOrderName
I'm not sure whether the missing values in the 2nd row are empty strings or nulls because of the way the orderName array expands my search results but I want to now run a query like this...
select productName, ARRAY_TO_STRING(orderName,'-')
from table
where productId = 'abc123'
and ifnull(featureCount,0) > 0
But this query returns...
someProductName, someOrderName-someOtherOrderName
i.e. both array values came back even though I specified a condition of featureCount>0.
I'm sure I'm missing something very basic about how Arrays function in BigQuery but from Google's ARRAY_TO_STRING documentation I don't see any way to add a condition to the extracting of ARRAY values. Appreciate any thoughts on the best way to go about this.
For what I understand, this is because you are just querying one row of data which have a column as ARRAY<STRING>. As you are using ARRAY_TO_STRINGS it will only accept ARRAY<STRING> values you will see all array values fit into just one cell.
So, when you run your script, your output will fit your criteria and return the columns with arrays with additional rows for visibility.
The visualization on the UI should look like your mention in your question:
Row
productId
productName
created
featureCount
orderName
1
abc123
someProductName
2020-01-01
12
someOrderName
someOtherOrderName
Note: On bigquery this additional row is gray out ( ) and Its part of row 1 but it shows as an additional row for visibility. So this output only have 1 row in the table.
And the visualization on a JSON will be:
[
{
"productId": "abc123",
"productName": "someProductName",
"created": "2020-01-01",
"featureCount": "12",
"orderName": [
"someOrderName",
"someOtherOrderName"
]
}
]
I don't think there is specific documentation info about how you visualize arrays on UI but I can share the docs that talks about how to flattening your rows outputs into a single row line, check:
Working with Arrays
Flattening Arrays
I use the following to replicate your issue:
CREATE OR REPLACE TABLE `project-id.dataset.working_table` (
productId STRING,
productName STRING,
created STRING,
featureCount STRING,
orderName ARRAY<STRING>
);
insert into `project-id.dataset.working_table` (productId,productName,created,featureCount,orderName)
values ('abc123','someProductName','2020-01-01','12',['someOrderName','someOtherOrderName']);
insert into `project-id.dataset.working_table` (productId,productName,created,featureCount,orderName)
values ('abc123X','someProductNameX','2020-01-02','15',['someOrderName','someOtherOrderName','someData']);
output
Row
productId
productName
created
featureCount
orderName
1
abc123
someProductName
2020-01-01
12
someOrderName
someOtherOrderName
2
abc123X
someProductNameX
2020-01-02
15
someOrderName
someOtherOrderName
someData
Note: Table contains 2 rows.

Postgres index creation for jsonb column having JSON Array value

I have an employee table in postgres having a JSON column "mobile" in it. It stores JSON Array value ,
e_id(integer) name(char) mobile(jsonb)
1 John [{\"mobile\": \"1234567891\", \"status\": \"verified\"},{\"mobile\": \"1265439872\",\"status\": \"verified\"}]
2 Ben [{\"mobile\": \"6453637238\", \"status\": \"verified\"},{\"mobile\": \"4437494900\",\"status\": \"verified\"}]
I have a search api which queries this table to search for employee using mobile number.
How can I query mobile numbers directly ?
How should I create index on the jsonb column to make query work faster ?
*updated question
You can query like this:
SELECT e_id, name
FROM employees
WHERE mobile #> '[{"mobile": "1234"}]';
The following index would help:
CREATE INDEX ON employees USING gin (mobile);

Most optimal way to store nested information in a database

I want to store some nested information in a Postgres database and I am wondering what is the most optimal way to do so.
I have a list of cars for rent, structured like this:
[Brand] > [Model] > [Individual cars for rent of that brand and model], ex.:
[
{
"id": 1,
"name": "Audi",
"models": [
{
"id": 1,
"name": "A1",
"cars": [
{
"id": 1,
"license": "RPY9973",
"mileage": "41053"
},
{
"id": 2,
"license": "RPY3001",
"mileage": "102302"
},
{
"id": 3,
"license": "RPY9852",
"mileage": "10236"
}
]
},
{
"id": 2,
"name": "A3",
"cars": [
{
"id": 1,
"license": "RPY1013",
"mileage": "66952"
},
{
"id": 2,
"license": "RPY3284",
"mileage": "215213"
},
{
"id": 3,
"license": "RPY0126",
"mileage": "19632"
}
]
}
...
]
}
...
]
Currently, having limited experience with databases and storing arrays, I am storing it in a 'brands' table with the following columns:
id (integer) - brand ID
name (text) - brand name
models (text) - contains stringified content of models and cars within them, which are parsed upon reading
In practice, this does the job, however I would like to know what the most efficient way would be.
For example, should I split the single table into three tables: 'brands', 'models' and 'cars' and have the tables reference each other (brands.models would be an array of unique model IDs, which I could use to read data from the 'models' table, and models.cars would be an array of unique car IDs, which I could use to read data from the 'cars' table)?
Rather than store it as json, jsonb, or as arrays, the most efficient way to store the data would be to store it as relational data (excluding the data types for brevity):
create table brands(
id,
name,
/* other columns */
PRIMARY KEY (id)
);
create table models(
id,
name,
brand_id REFERENCES brands(id),
/* other columns */
PRIMARY KEY (id)
);
create table cars(
id,
model_id REFERENCES models(id),
mileage,
license,
/* other columns */
PRIMARY KEY (id)
);
You can then fetch and update each entity individually, without having to parse json. Partial updates is also much easier when you only have to focus on a single row, rather than worrying about updating arrays or json. For querying, you would join by the primary keys. For example, to get rental cars available by a brand:
select b.id, b.name, m.id, m.name, c.id, c.mileage, c.license
FROM brands b
LEFT JOIN models m
ON m.brand_id = b.id
LEFT JOIN cars c
ON c.model_id = m.id
where b.id = ?
Based on querying / filtering patterns, you would then also want to create indexes on commonly used columns...
CREATE INDEX idx_car_model ON cars(model_id);
CREATE INDEX idx_model_brand ON models(brand_id);
The best solution to store the nested data in your postgres database is json or jsonb field.
The benefits using json or jsonb are:
significantly faster to process, supports indexing (which can be a significant advantage),
simpler schema designs (replacing entity-attribute-value (EAV) tables with jsonb columns, which can be queried, indexed and joined, allowing for performance improvements up until 1000X)

Array structures querying in presto, hive

col-1 has dep_id(varchar) -
112
col-2 has array struct
[
{
"emp_id": 8291828,
"name": "bruce",
},
{
"emp_id": 8291823,
"name": "Rolli",
}
]
I have a use case where i need to flatten and display results. For example when queried data for dep_id - 112 I need to display emp_id in a separate row.
For above data when queried my result should look like
id emp_id
112 8291828
112 8291823
What should be my query format to fetch data?
There are several parts to make this work. First the JSON data will appear as a VARCHAR, so you first need to run json_parse on it to convert it to a JSON type in the engine. Then you can cast JSON types to normal SQL structural types, and in your case this is an array of rows (see cast from JSON). Finally, you do a cross join to the array of rows (which is effectively a nested table). This query fill give you the results you want
WITH your_table AS (
SELECT
112 AS dep_id
, '[{"emp_id": 8291828, "name": "bruce"}, {"emp_id": 8291823, "name": "Rolli"}]' AS data
)
SELECT
dep_id
, r.emp_id
, r.name
FROM your_table
CROSS JOIN
UNNEST(cast(json_parse(data) as array(row (emp_id bigint, name varchar)))) nested_data(r)

Resources