Join Two Collections in SQL Server 2016 with JSON Data - sql-server

Currently, I am using SQL Server 2016 with JSON and I want to join collections together. So far I created two collections:
CREATE TABLE collect_person(person...)
CREATE TABLE collect_address(address...)
The JSON document will look like this in the first collection (collection_person):
{
"id" : "P1",
"name" : "Sarah",
"addresses" : {
"addressId" : [
"ADD1",
"ADD2"
]
}
}
The JSON documents will look like these below in the second collection (collect_address):
{
"id" : "ADD1",
"city" : "San Jose",
"state" : "CA"
}
{
"id" : "ADD2",
"city" : "Las Vegas"
"state" : "NV"
}
I want to get the addresses of the person named "Sarah", so the output will be something like:
{
{"city" : "San Jose", "state" : "CA"},
{"city" : "Las Vegas", "state" : "NV"}
}
I do not want to convert JSON to SQL and SQL to JSON. Is this possible to do in SQL Server 2016 with JSON and please show me how? Thank you in advance.

I am a little late to the question, but it can be done via cross apply and I also used common table expressions. Depending on the table size I would suggest creating a persisted computed column on the id fields for each table assuming that the data won't change and that there was a single addressId per record or add some other key value that can be indexed on and used to limit the records that need to be converted to JSON. This is a simple example and it hasn't been tested for performance so "YMMV".
Building Example Tables
DECLARE #collect_person AS TABLE
(Person NVARCHAR(MAX))
DECLARE #collect_address as TABLE
([Address] NVARCHAR(MAX))
INSERT INTO #collect_person (Person)
SELECT N'{
"id" : "P1",
"name" : "Sarah",
"addresses" : {
"addressId" : [
"ADD1",
"ADD2"
]
}
}'
INSERT INTO #collect_address ([Address])
VALUES
(N'{
"id" : "ADD1",
"city" : "San Jose",
"state" : "CA"
}')
,('{
"id" : "ADD2",
"city" : "Las Vegas",
"state" : "NV"
}')
Querying the Tables
;WITH persons AS (
SELECT --JP.*
JP.id
,JP.name
,JPA.addressId -- Or remove the with clause for JPA and just use JPA.value as addressId
FROM #collect_person
CROSS APPLY OPENJSON([person])
WITH (
id varchar(50) '$.id'
,[name] varchar(50) '$.name'
,addresses nvarchar(max) AS JSON
) as JP
CROSS APPLY OPENJSON(JP.addresses, '$.addressId')
WITH (
addressId varchar(250) '$'
) AS JPA
)
,Addresses AS (
SELECT A.*
FROM #collect_address AS CA
CROSS APPLY OPENJSON([Address])
WITH (
id varchar(50) '$.id'
,city varchar(50) '$.city'
,state varchar(2) '$.state'
) as A
)
SELECT * FROM persons
INNER JOIN Addresses
ON persons.addressId = Addresses.id
Again this is not the ideal way to do this, but it will work and as stated before you should probably have a key field on each table that is indexed to limit the scans and JSON Parsing done on the table.
There is native compilation, but it is new to me and I am not familiar with the pros and cons.
Optimize JSON processing with in-memory OLTP

Related

Create Postgres JSONB Index on Array Sub-Object for ILIKE operator

I have a table that has cast as jsonb column, which looks like this:
cast: [
{ name: 'Clark Gable', role: 'Rhett Butler' },
{ name: 'Vivien Leigh', role: 'Scarlett' },
]
I'm trying to query name in the jsonb array of objects. This is my query:
SELECT DISTINCT actor as name
FROM "Title",
jsonb_array_elements_text(jsonb_path_query_array("Title".cast,'$[*].name')) actor
WHERE actor ILIKE 'cla%';
Is there a way to index a query like this? I've tried using BTREE, GIN, GIN with gin_trgm_ops with no success.
My attempts:
CREATE INDEX "Title_cast_idx_jsonb_path" ON "Title" USING GIN ("cast" jsonb_path_ops);
CREATE INDEX "Title_cast_idx_on_expression" ON "Title" USING GIN(jsonb_array_elements_text(jsonb_path_query_array("Title".cast, '$[*].name')) gin_trgm_ops);
One of the issues is that jsonb_array_elements_text(jsonb_path_query_array())returns a set which can't be indexed. Using array_agg doesn't seem useful, since I need to extract name value, and not just check for existence.

How can I split rows into one row for each element in a JSON array stored in a column?

Okay, so, this is hard to explain but I'll give it a shot. Google didn't help me so please update this question if it can help someone else.
Background
I have a table, Persons, with some columns like [ID], [Name] and [PhoneNumbers]. The table is being filled with data from a third party system, so I cannot alter the way we insert data.
The column [PhoneNumbers] contains a JSON array of numbers, like this:
{"phonenumbers":[]}
I am now trying to write a view against that table, with the goal of having one row for each number.
Question
Can I achive this using T-SQL and it's JSON support? I am using SQL Server 2016.
declare #j nvarchar(max) = N'{"phonenumbers":["1a", "2b", "3c", "4", "5", "6", "7", "8", "9", "10x"]}';
select value, *
from openjson(#j, '$.phonenumbers');
declare #t table
(
id int identity,
phonenumbers nvarchar(max)
);
insert into #t(phonenumbers)
values(N'{"phonenumbers":["1a", "2b", "3c", "4d"]}'), (N'{"phonenumbers":["22", "23", "24", "25"]}'), (N'{"phonenumbers":[]}'), (NULL);
select id, j.value, j.[key]+1 as phone_no_ordinal, t.*
from #t as t
outer apply openjson(t.phonenumbers, '$.phonenumbers') as j;

Most optimal way to store nested information in a database

I want to store some nested information in a Postgres database and I am wondering what is the most optimal way to do so.
I have a list of cars for rent, structured like this:
[Brand] > [Model] > [Individual cars for rent of that brand and model], ex.:
[
{
"id": 1,
"name": "Audi",
"models": [
{
"id": 1,
"name": "A1",
"cars": [
{
"id": 1,
"license": "RPY9973",
"mileage": "41053"
},
{
"id": 2,
"license": "RPY3001",
"mileage": "102302"
},
{
"id": 3,
"license": "RPY9852",
"mileage": "10236"
}
]
},
{
"id": 2,
"name": "A3",
"cars": [
{
"id": 1,
"license": "RPY1013",
"mileage": "66952"
},
{
"id": 2,
"license": "RPY3284",
"mileage": "215213"
},
{
"id": 3,
"license": "RPY0126",
"mileage": "19632"
}
]
}
...
]
}
...
]
Currently, having limited experience with databases and storing arrays, I am storing it in a 'brands' table with the following columns:
id (integer) - brand ID
name (text) - brand name
models (text) - contains stringified content of models and cars within them, which are parsed upon reading
In practice, this does the job, however I would like to know what the most efficient way would be.
For example, should I split the single table into three tables: 'brands', 'models' and 'cars' and have the tables reference each other (brands.models would be an array of unique model IDs, which I could use to read data from the 'models' table, and models.cars would be an array of unique car IDs, which I could use to read data from the 'cars' table)?
Rather than store it as json, jsonb, or as arrays, the most efficient way to store the data would be to store it as relational data (excluding the data types for brevity):
create table brands(
id,
name,
/* other columns */
PRIMARY KEY (id)
);
create table models(
id,
name,
brand_id REFERENCES brands(id),
/* other columns */
PRIMARY KEY (id)
);
create table cars(
id,
model_id REFERENCES models(id),
mileage,
license,
/* other columns */
PRIMARY KEY (id)
);
You can then fetch and update each entity individually, without having to parse json. Partial updates is also much easier when you only have to focus on a single row, rather than worrying about updating arrays or json. For querying, you would join by the primary keys. For example, to get rental cars available by a brand:
select b.id, b.name, m.id, m.name, c.id, c.mileage, c.license
FROM brands b
LEFT JOIN models m
ON m.brand_id = b.id
LEFT JOIN cars c
ON c.model_id = m.id
where b.id = ?
Based on querying / filtering patterns, you would then also want to create indexes on commonly used columns...
CREATE INDEX idx_car_model ON cars(model_id);
CREATE INDEX idx_model_brand ON models(brand_id);
The best solution to store the nested data in your postgres database is json or jsonb field.
The benefits using json or jsonb are:
significantly faster to process, supports indexing (which can be a significant advantage),
simpler schema designs (replacing entity-attribute-value (EAV) tables with jsonb columns, which can be queried, indexed and joined, allowing for performance improvements up until 1000X)

Array structures querying in presto, hive

col-1 has dep_id(varchar) -
112
col-2 has array struct
[
{
"emp_id": 8291828,
"name": "bruce",
},
{
"emp_id": 8291823,
"name": "Rolli",
}
]
I have a use case where i need to flatten and display results. For example when queried data for dep_id - 112 I need to display emp_id in a separate row.
For above data when queried my result should look like
id emp_id
112 8291828
112 8291823
What should be my query format to fetch data?
There are several parts to make this work. First the JSON data will appear as a VARCHAR, so you first need to run json_parse on it to convert it to a JSON type in the engine. Then you can cast JSON types to normal SQL structural types, and in your case this is an array of rows (see cast from JSON). Finally, you do a cross join to the array of rows (which is effectively a nested table). This query fill give you the results you want
WITH your_table AS (
SELECT
112 AS dep_id
, '[{"emp_id": 8291828, "name": "bruce"}, {"emp_id": 8291823, "name": "Rolli"}]' AS data
)
SELECT
dep_id
, r.emp_id
, r.name
FROM your_table
CROSS JOIN
UNNEST(cast(json_parse(data) as array(row (emp_id bigint, name varchar)))) nested_data(r)

json sql with list items

I have json stored in sql2016. when I run a query parts of the json returns value and other parts don't return value.
here is a json stored in table dbo.Trades
The datatype is as follows
CREATE TABLE [dbo].[Trades](
[_id] [int] IDENTITY(1,1) NOT NULL,
[trade] [nvarchar](max) NULL,
My json is formatted as below and stored in trade column
{
"direction": "Short",
"Powerbars": 2,
"PowerbarsTime": [
"2016-11-10T20:25:32.481424-05:00",
"2016-11-10T20:44:01.8993031-05:00"
]
}
Since Powerbarstime is a list in the json, I have to provide the index to print it. however I don't know how many entries will eb there in the list. Is there a way to merge the list which is part of the json and return a comma delimited column?
I have to do this to print
SELECT _id,
JSON_VALUE(trade, '$.Powerbars') AS powerbars,
JSON_VALUE(trade, '$.PowerbarsTime[0]') AS PowerbarsTime1,
JSON_VALUE(trade, '$.PowerbarsTime[1]') AS PowerbarsTime2
FROM dbo.Trades
order by _id desc
I get the 2 items on the list that I hardcode into the sql.
How can I merge all the list items under JSON_VALUE(trade, '$.PowerbarsTime') so I can see it all as 1 comma delimited column.

Resources