Generalized way to extract JSON from a relational database?

Generalized way to extract JSON from a relational database? - sql-server

Ok, maybe this is too broad for StackOverflow, but is there a good, generalized way to assemble data in relational tables into hierarchical JSON?
For example, let's say we have a "customers" table and an "orders" table. I want the output to look like this:
{
"customers": [
{
"customerId": 123,
"name": "Bob",
"orders": [
{
"orderId": 456,
"product": "chair",
"price": 100
},
{
"orderId": 789,
"product": "desk",
"price": 200
}
]
},
{
"customerId": 999,
"name": "Fred",
"orders": []
}
]
}
I'd rather not have to write a lot of procedural code to loop through the main table and fetch orders a few at a time and attach them. It'll be painfully slow.
The database I'm using is MS SQL Server, but I'll need to do the same thing with MySQL soon. I'm using Java and JDBC for access. If either of these databases had some magic way of assembling these records server-side it would be ideal.
How do people migrate from relational databases to JSON databases like MongoDB?

Here is a useful set of functions for converting relational data to JSON and XML and from JSON back to tables: https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/

SQL Server 2016 is finally catching up and adding support for JSON.
The JSON support still does not match other products such as PostgreSQL, e.g. no JSON-specific data type is included. However, several useful T-SQL language elements were added that make working with JSON a breeze.
E.g. in the following Transact-SQL code a text variable containing a JSON string is defined:
DECLARE #json NVARCHAR(4000)
SET #json =
N'{
"info":{
"type":1,
"address":{
"town":"Bristol",
"county":"Avon",
"country":"England"
},
"tags":["Sport", "Water polo"]
},
"type":"Basic"
}'
and then, you can extract values and objects from JSON text using the JSON_VALUE and JSON_QUERY functions:
SELECT
JSON_VALUE(#json, '$.type') as type,
JSON_VALUE(#json, '$.info.address.town') as town,
JSON_QUERY(#json, '$.info.tags') as tags
Furhtermore, the OPENJSON function allows to return elements from referenced JSON array:
SELECT value
FROM OPENJSON(#json, '$.info.tags')
Last but not least, there is a FOR JSON clause that can format a SQL result set as JSON text:
SELECT object_id, name
FROM sys.tables
FOR JSON PATH
Some references:
https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server
https://learn.microsoft.com/en-us/sql/relational-databases/json/convert-json-data-to-rows-and-columns-with-openjson-sql-server
https://blogs.technet.microsoft.com/dataplatforminsider/2016/01/05/json-in-sql-server-2016-part-1-of-4/
https://www.red-gate.com/simple-talk/sql/learn-sql-server/json-support-in-sql-server-2016/

I think one 'generalized' solution will be as follows:-
Create a 'select' query which will join all the required tables to fetch results in a 2 dimentional array (like CSV / temporary table, etc)
If each row of this join is unique, and the MongoDB schema and the columns have one to one mapping, then its all about importing this CSV/Table using MongoImport command with required parameters.
But a case like above, where a given Customer ID can have an array of 'orders', needs some computation before mongoImport.
You will have to write a program which can 'vertical merge' the orders for a given customer ID.For small set of data, a simple java program will work. But for larger sets, parallel programming using spark can do this job.

SQL Server 2016 now supports reading JSON in much the same way as it has supported XML for many years. Using OPENJSON to query directly and JSON datatype to store.

There is no generalized way because SQL Server doesn’t support JSON as its datatype. You’ll have to create your own “generalized way” for this.
Check out this article. There are good examples there on how to manipulate sql server data to JSON format.
https://www.simple-talk.com/blogs/2013/03/26/sql-server-json-to-table-and-table-to-json/

Related

Postgres JSONB Query and Index on Nested String Array

I have some troubles wrapping my head around how to formulate queries and provide proper indices for the following situation. I have customer entities represented in JSON like this (only relevant properties are retained):
{
"id": "50000",
"address": [
{
"line": [
"2nd Main Street",
"123 Harris Plaza"
],
"city": "Boston",
"state": "Massachusetts",
"country": "US",
},
{
"line": [
"1st Av."
],
"city": "Jamestown",
"state": "Massachusetts",
"country": "US",
}
]
}
The customers are stored in the following customer table:
CREATE TABLE Customer (
id BIGSERIAL PRIMARY KEY,
resource JSONB
);
I manage to do simple queries on the resource column, e.g. a projection query like this works (retrieve all lower-case address lines for cities starting with "bo"):
SELECT LOWER(jsonb_array_elements_text(jsonb_array_elements(c.resource#>'{address}') #> '{line}')) FROM Customer c, jsonb_array_elements(c.resource #> '{address}') a WHERE LOWER(a->>'city') LIKE 'bo%';
I have trouble doing the following: my goal is to query all customers that have at least one address line beginning with "12". Case insensitivity is a requirement for my use case. The example customer would match my query, as the first address object has an address line starting with "12". Please note that "line" is an Array of JSON Strings, not complex objects. So far the closest thing I could come up with is this:
SELECT c.resource FROM Customer c, jsonb_array_elements(c.resource #> '{address}') a WHERE a->'line' ?| array['123 Harris Plaza'];
Obviously this is not a case-insensitive LIKE query. Any help/pointers on how to formulate both query and accompanying GIN index are greatly appreciated. My first query already selects all address lines as text, so maybe this could be used in a GIN index?
I'm using Postres 9.5, but am happy to upgrade if this can only be achieved in more recent Postgres versions.

While GIN indexes have machinery to support prefix matching, this machinery is only hooked up for tsvectors. array_ops does not have it hooked up, nor does json_ops or json_path_ops. So unless you want to create new operator class/families (or normalize your data into separate tables) you will have to shoe-horn your data into a tsvector.
Here is a crude way to do that, which doesn't account for the possibility that a address line might contain literal single quotes or perhaps other meaningful characters:
create function addressline_tsvector(jsonb) returns tsvector immutable language SQL as $$
select string_agg('''' || lower(value) || '''', ' ')::tsvector
from jsonb_array_elements($1->'address') a(a),
jsonb_array_elements_text(a->'line')
$$;
create index on customer using gin (addressline_tsvector(resource));
select * from customer where addressline_tsvector(resource) ## lower('''2nd Main'':*')::tsquery;
Given that your example table only has one row, the index will probably not actually be used unless you set enable_seqscan = off first.

How to parse a specific data from a JSON string in SnowFlake?

I am very new to SnowFlake and I am trying to work on a dataset. The column I am interested in has multiple feedbacks combined into one in the JSON format and I want to dig only the relevant key. Here's the snapshot of lets say Column_X:
Looking for a way to parse this data in such a way that I have a new column like "riskIndicator" and "riskIndicator" with values 27, 74 as two new rows. I am attempting to parse like the code below but that's not working. Had a look at the javascript/UDF approach but looks complicated for this piece.
,get_path(parse_json("riskIndicatorLNInstantID"),'riskCode') as riskIndicator
I will be thankful for any kind of help/suggestion here.
Thank you.

So if the problem you are having is breaking up the json, you will want to use FLATTEN
with data as (
select parse_json('[{"description":"unable to paste json", "riskCode":"27","seq":1},{"description":"typing in json is painful", "riskCode":"74","seq":2}]') as json
)
select d.json
,f.value:riskCode as riskIndicator
from data d
,lateral flatten(input=>d.json) f;
gives:
JSON RISKINDICATOR
[{ "description": "unable to paste j... "27"
[{ "description": "unable to paste j... "74"

Lateral flatten can help extract the fields of a JSON object and is a very good alternative to extracting them one by one using the respective names. However, sometimes the JSON object can be nested and normally extracting those nested objects requires knowing their names
Docs Reference: https://community.snowflake.com/s/article/Dynamically-extract-multi-level-JSON-object-using-lateral-flatten

Load json file with arrays /structs and flexible schema into Hive table

Need some help loading a json file into a table . Here is an example of some of the json objects within the file:
{"asin": "0002000202", "title": "Black Berry, Sweet Juice: On Being Black and White in Canada", "price": 13.88, "imUrl": "http://ecx.images-amazon.com/images/I/51PQAYJ9EDL.jpg", "related": {"also_bought": ["0393333094"], "buy_after_viewing": ["0393333094", "1554685087"]}, "salesRank": {"Books": 3013713}, "categories": [["Books"]]}
{"asin": "0000041696", "title": "Arithmetic 2 A Beka Abeka 1994 Student Book (Traditional Arithmentic Series)", "price": 6.53, "imUrl": "http://ecx.images-amazon.com/images/I/41cGaan-BrL._SL500_.jpg", "related": {"also_viewed": ["B000KOYDUY", "B004GE1B7W", "B008SXBO88", "B001EH7Y02", "B000W7PN62", "B004H3G1X6", "B004WOEIXA", "B000AXWEEM", "0789478722", "B000MN2C56", "1402709269", "B001HHOKG0", "B000Y9TO1S", "1402711441", "0756609356", "0142400106", "1556616465", "0545021383", "B004LDD18A", "B000HZH18C", "1557996563", "B00CZTVUKI", "B001CXK8Y2", "B000QX6KY6"], "buy_after_viewing": ["B000KOYDUY", "B004GE1B7W", "B000LBXGRC", "0439827655"]}, "salesRank": {"Books": 2554321}, "categories": [["Books"]]}
As you can see the schema varies among objects. Some not all attributes are present in all objects. There are also structs and arrays.
Here is my create table statement
create table amazon.products_test
(asin string,
title string,
description string,
brand string,
price float,
salesRank struct<category:string, rank:int> ,
imUrl string,
categories array<string>,
related struct<also_bought:string, also_viewed:string, buy_after_viewing:string, bought_together:string>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
My load statement:
load data inpath '/user/amazon/products_test.json'
overwrite into table amazon.products_test;
Here I try and query
hive> select * FROM products_test;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Field name expected
Do I have the right datatypes ?
Is there a better serde ?
Do I need add TBLPROPERTIES or SERDEPROPERTIES ?

I found the answer . As suspected, I needed to use a different SERDE:
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
I saw some forums suggesting that I may need to use this SERDE but I didn't know how to implement and add the jars from :
https://github.com/rcongiu/Hive-JSON-Serde
also , I needed to use a map map type not a struct for the salesRank

Azure Stream Analytics–Querying JSON Arrays of arrays

I have a problem writing a query to extract a table out of the arrays from a json file:
The problem is how to get the information of the array “data packets” and its contents of arrays and then make them all in a normal sql table.
One hard issue there is the "CrashNotification" and "CrashMaxModuleAccelerations", I dont know how to define and use them.
The file looks like this:
{ "imei": { "imei": "351631044527130F", "imeiNotEncoded":
"351631044527130"
},
"dataPackets": [ [ "CrashNotification", { "version": 1, "id": 28 } ], [
"CrashMaxModuleAccelerations", { "version": 1, "module": [ -1243, -626,
14048 ] } ] ]}
I tried to use Get array elements method and other ways but I am never able to access 2nd level arrays like elements of "CrashNotification" of the "dataPackets" or elements of "module" of the array "CrashMaxModuleAccelerations" of the "dataPackets".
I looked also here (Select the first element in a JSON array in Microsoft stream analytics query) and it doesnt work.
I would appreciate any help :)

Based on your schema, here's an example of query that will extract a table with the following columns: emei, crashNotification_version, crashNotification_id
WITH Datapackets AS
(
SELECT imei.imei as imei,
GetArrayElement(Datapackets, 0) as CrashNotification
FROM input
)
SELECT
imei,
GetRecordPropertyValue (GetArrayElement(CrashNotification, 1), 'version') as crashNotification_version,
GetRecordPropertyValue (GetArrayElement(CrashNotification, 1), 'id') as crashNotification_id
FROM Datapackets
Let me know if you have any further question.
Thanks,
JS (Azure Stream Analytics)

We built a HTTP API called Stride for converting streaming JSON data into realtime, incrementally updated tables using only SQL.
All you'd need to do is write raw JSON data to the Stride API's /collect endpoint, define continuous SQL queries via the /process endpoint, and then push or pull data via the /analyze endpoint.
This approach eliminates the need to deal with any underlying data infrastructure and gives you a SQL-based approach to this type of streaming analytics problem.

Parse Oracle query to JSON format for using ANGULARJS + APEX(ORACLE)

I was wondering if it is possible to parse a ORACLE pl/sql query into a JSON format?
The thing is that I want to use ANGULARJS directives with an oracle APEX app.
So, is it possible, or any suggestion? please.

You can use the xmltype to convert the result of an SQL into XML and JSON. See the following article for the solution which will work for Oracle since version 9. You can also download the package itstar_xml_util:
Oracle XML and JSON Goodies
A simple example with the emp table:
declare
l_sql_string varchar2(2000);
l_xml xmltype;
l_json xmltype;
begin
l_sql_string := 'select a.empno, a.ename, a.job from emp a';
-- Create the XML aus SQL
l_xml := itstar_xml_util.sql2xml(l_sql_string);
-- Display the XML
dbms_output.put_line(l_xml.getclobval());
l_json := itstar_xml_util.xml2json(l_xml);
-- Display the JSON
dbms_output.put_line(l_json.getclobval());
end;
The result looks like this:
{"ROWSET": [
{
"EMPNO": 7839,
"ENAME": "KING",
"JOB": "PRESIDENT"
},
{
"EMPNO": 7698,
"ENAME": "BLAKE",
"JOB": "MANAGER"
},
[...]
{
"EMPNO": 7934,
"ENAME": "MILLER",
"JOB": "CLERK"
}
]}

With ORDS 3.0 (and earlier) you can easily use the "RESTful services" part of apex to quickly create JSON based rest services.
I am currently in a project where I am the single back-end developer and 6 angularjs programmers working with JSON I create using PLSQL/SQL/ORDS 3.0/APEX 5.0/Glassfish 4.1 web edition I also use the new API APEX_JSON for crunching json data in PLSQL.
They can not keep up with me. It's that simple creating fully functional back-end that is REST enabled.
Database is 11.2
Example of some simple lookup values REST definition. Accessed as JSON.