Database structure for metaquery - database

I have a database that has a the following structure:
+------+------+--------+-------+--------+-------+
| item | type | color | speed | length | width |
+------+------+--------+-------+--------+-------+
| 1 | 1 | 3 | 1 | 2 | 2 |
| 2 | 1 | 5 | 3 | 1 | 1 |
| 3 | 1 | 6 | 3 | 1 | 1 |
| 4 | 2 | 2 | 1 | 3 | 1 |
| 5 | 2 | 2 | 2 | 2 | 1 |
| 6 | 2 | 4 | 2 | 3 | 1 |
| 7 | 2 | 5 | 1 | 1 | 2 |
| 8 | 3 | 1 | 2 | 2 | 2 |
| 9 | 4 | 4 | 3 | 1 | 2 |
| 10 | 4 | 6 | 3 | 3 | 2 |
+------+------+--------+-------+--------+-------+
I would like to efficiently query what combination of fields are valid. So, for example, I'd like to query the database for the following:
What values of color are valid if type is 1?
ans: [3, 5, 6]
What values of speed are valid if type is 2 and color is 2?
ans: [1, 2]
What values of type are valid if length is 2 and width is 2?
ans: [1, 2]
The SQL equivalents are:
SELECT DISTINCT `color` FROM `cars` WHERE `type` =2
SELECT DISTINCT `speed` FROM `cars` WHERE `type` =2 AND `width` =2
SELECT DISTINCT `type` FROM `cars` WHERE `length` =2 AND `width` =2
I'm planning on using a cloud based database (Cloudant DBAAS - based on CouchDB). How would this best be implemented, keeping in mind that there may be thousands of items with tens of fields?

I haven't put too much thought into this question, so there may be errors in the approach, but one option is to represent each row with a document:
{
"_id": "1db91338150bfcfe5fcadbd98fe77d56",
"_rev": "1-83daafc1596c2dabd4698742c2d8b0cf",
"item": 1,
"type": 1,
"color": 3,
"speed": 1,
"length": 2,
"width": 2
}
Note the _id and _rev fields have been automatically generated by Cloudant for this example.
You could then create a secondary index on the type field:
function(doc) {
if(doc.type)
emit(doc.type);
}
To search using the type field:
https://accountname.cloudant.com/dashboard.html#database/so/_design/ddoc/_view/col_for_type?key=1&include_docs=true
A secondary index on the type and width fields:
function(doc) {
if( doc.type && doc.width)
emit([doc.type, doc.width]);
}
To search using the type and width fields:
https://accountname.cloudant.com/dashboard.html#database/so/_design/ddoc/_view/speed_for_type_and_width?key=[1,2]&include_docs=true
A secondary index on the length and width fields:
function(doc) {
if (doc.length && doc.width)
emit([doc.length, doc.width]);
}
To search using the length and width fields:
https://accountname.cloudant.com/dashboard.html#/database/so/_design/ddoc/_view/type_for_length_and_width?key=[2,2]&include_docs=true
The complete design document is here:
{
"_id": "_design\/ddoc",
"_rev": "3-c87d7c3cd44dcef35a030e23c1c91711",
"views": {
"col_for_type": {
"map": "function(doc) {\n if(doc.type)\n emit(doc.type);\n}"
},
"speed_for_type_and_width": {
"map": "function(doc) {\n if( doc.type && doc.width)\n emit([doc.type, doc.width]);\n}"
},
"type_for_length_and_width": {
"map": "function(doc) {\n if (doc.length && doc.width)\n emit([doc.length, doc.width]);\n}"
}
},
"language": "javascript"
}

Related

Generate a new dataset from two existings datasets with conditions

I have two dataset with the same columns, and I would like to create a new one in another sheet with all rows from the first dataset and add to it specific rows from the second one.
My first dataset is like:
| Item Type | Item Numb | Start Date | End date |
---------------------------------------------------
| 1 | 1 | 17/02/2022 | 21/02/2022 |
| 1 | 2 | 19/02/2022 | 24/02/2022 |
| 2 | 1 | 15/02/2022 | 18/02/2022 |
| 2 | 2 | 17/02/2022 | 20/02/2022 |
| 3 | 1 | 21/02/2022 | 25/02/2022 |
And the second one is like:
| Item Type | Item Numb | Start Date | End date |
---------------------------------------------------
| 1 | 2 | 17/02/2022 | 20/02/2022 |
| 2 | 2 | 17/02/2022 | 20/02/2022 |
| 2 | 3 | 20/02/2022 | 23/02/2022 |
| 3 | 1 | 20/02/2022 | 23/02/2022 |
| 4 | 1 | 21/02/2022 | 24/02/2022 |
| 4 | 2 | 23/02/2022 | 28/02/2022 |
So now, I would like in a new sheet to retrieve the rows from the first dataset and add at the end the rows from the second one who are absent.
If a Combination of "Item Type" and "Item Numb" is already imported I don't want to get them from the second dataset, but if this specific combination isn't in the first one so I would like to add the row.
That's what I need as the result:
| Item Type | Item Numb | Start Date | End date |
---------------------------------------------------
| 1 | 1 | 17/02/2022 | 21/02/2022 |
| 1 | 2 | 19/02/2022 | 24/02/2022 |
| 2 | 1 | 15/02/2022 | 18/02/2022 |
| 2 | 2 | 17/02/2022 | 20/02/2022 |
| 3 | 1 | 21/02/2022 | 25/02/2022 |
| 2 | 3 | 20/02/2022 | 23/02/2022 |
| 4 | 1 | 21/02/2022 | 24/02/2022 |
| 4 | 2 | 23/02/2022 | 28/02/2022 |
Thanks in advance for your time folks!
try:
=INDEX(ARRAY_CONSTRAIN(QUERY(SORTN(
{Sheet1!A2:D, Sheet1!A2:A&Sheet1!B2:B;
Sheet2!A2:D, Sheet2!A2:A&Sheet2!B2:B}, 9^9, 2, 5, 1),
"where Col1 is not null", 0), 9^9, 4)

Database design for One to Many or Many to many for use case?

I have a requirement where having a four consecutive one to many table layers as follows,
School
SchoolID | SName
-----------------
1 | HSS1
2 | HSS2
3 | HSS3
4 | HSS4
Class
ClassID | CName | SchoolID
-------------------------------
1 | Class1 | 1
2 | Class2 | 1
3 | Class3 | 3
4 | Class4 | 3
Student
StudentID | StName | ClassID
-------------------------------
1 | StudC1 | 1
2 | StudC2 | 1
3 | StudC3 | 2
4 | StudC4 | 2
Bench
BenchID | BName | StudentID
-------------------------------
1 | Bench1 | 1
2 | Bench2 | 1
3 | Bench3 | 1
4 | Bench4 | 2
1 | Bench1 | 3
2 | Bench2 | 3
3 | Bench3 | 4
4 | Bench4 | 4
Idea is to get the response as this format as json from Database,
{
"resources": [
{
"SName": "HSS1",
"ClassList": [
{
"CName": [
{
"CName": "Class1",
"StudentList": [
{
"StName": "StudC1",
"BenchList": [
{
"BName": "Bench1"
},
{
"BName": "Bench2"
},
{
"BName": "Bench3"
}
]
},
{
"StName": "StudC2",
"BenchList": [
{
"BName": "Bench4"
}
]
}
]
},
{
"CName": "Class2",
"StudentList": [
{
"StName": "StudC3"
},
{
"StName": "StudC4"
}
]
}
]
}
]
}
]
}
My question is,
Is it a good practice of having or creating a nested one to many relationship tables in DB the expected results are JSON using java as above?
Is there any other simplification method of the approach in table creation yet get the same response as nested?
The above structure of tables as source will get merged into same structure of target tables using MERGE/UPSERT. Will that be easy?
one to many to many to many relationship change does that help?
is it okay to get such json response from database design as is if i design tables such as the same way?
Note: Assume table names are sample names and consider all 4 tables are one to many connected
Need community suggestions for this approach and better solution if any to this design.

Sphinx RT index and search in json by value

In rt index i have fields - fields with rt_attr_json attributes.
In this field i have such structure (collection of same blocks):
{
block_name: "a",
block_type: 1,
elements: {
{
...
}
}
}
How I can get all records from sphinx which has block_type = 1 and not empty elements in this block?
I know how this realize if I know a key of block:
where fields[0].block_type=1 and fields[0].elements is null;
I am not sure if I completely understood the question, but the following works nicely in Sphinx 3.1.1:
mysql> select * from jt;
+------+-------+-----------------------------+
| id | title | j |
+------+-------+-----------------------------+
| 1 | | {"type":1} |
| 2 | | {"type":1,"elements":[]} |
| 3 | | {"type":1,"elements":[123]} |
| 4 | | {"type":2} |
| 5 | | {"type":2,"elements":[123]} |
+------+-------+-----------------------------+
5 rows in set (0.00 sec)
mysql> select * from jt where j.type=1 and j.elements is not null;
+------+-------+-----------------------------+
| id | title | j |
+------+-------+-----------------------------+
| 2 | | {"type":1,"elements":[]} |
| 3 | | {"type":1,"elements":[123]} |
+------+-------+-----------------------------+
2 rows in set (0.00 sec)
Note that any previous versions might behave differently, as NULL handling was fixed to the point of being semi-rewritten in the most recent 3.1.1 release.

Hive Query: working with String Array

I have a HIVE Table with following schema like this:
hive>desc books;
gen_id int
author array<string>
rating double
genres array<string>
hive>select * from books;
| gen_id | rating | author |genres
+----------------+-------------+---------------+----------
| 1 | 10 | ["A","B"] | ["X","Y"]
| 2 | 20 | ["C","A"] | ["Z","X"]
| 3 | 30 | ["D"] | ["X"]
Is there a query where I can perform some SELECT operation and that returns individual rows, like this:
| gen_id | rating | SplitData
+-------------+---------------+-------------
| 1 | 10 | "A"
| 1 | 10 | "B"
| 1 | 10 | "X"
| 1 | 10 | "Y"
| 2 | 20 | "C"
| 2 | 20 | "A"
| 2 | 20 | "Z"
| 2 | 20 | "X"
| 3 | 30 | "D"
| 3 | 30 | "X"
Can someone guide me how can get to this result. Thanks in advance for any kind of help.
You need to do Lateral view and explode,i.e.
SELECT
gen_id,
rating,
SplitData
FROM (
SELECT
gen_id,
rating,
array (ex_author,ed_genres) AS ar_SplitData
FROM
books
LATERAL VIEW explode(books.author) exploded_authors AS ex_author
LATERAL VIEW explode(books.genres) exploded_genres AS ed_genres
) tab
LATERAL VIEW explode(tab.ar_SplitData) exploded_SplitData AS SplitData;
I had no chance to test it but it should show you general path. GL!

What's the idiomatic way to split a Smalltalk array at the spot where a series of values changes?

Given an array of domain objects (with the properties subject, trial and run) like this:
+---------+-------+-----+
| Subject | Trial | Run |
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 1 | 1 | 1 |
| 1 | 2 | 1 |
+---------+-------+-----+
i want to split it into multiple arrays at every point where the value for subject changes.
The above example should result in three arrays:
+---------+-------+-----+
| Subject | Trial | Run |
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 2 |
+---------+-------+-----+
+---------+-------+-----+
| 2 | 1 | 1 |
| 2 | 2 | 1 |
+---------+-------+-----+
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
+---------+-------+-----+
What would be the idiomatic Smalltalk (Pharo) way to split the array like this?
SequenceableCollection >> piecesCutWhere: which takes a binary block is your friend:
{ 1. 1. 2. 2. 2. 3. 1. 2. } piecesCutWhere: [:left :right | left ~= right]
=> an OrderedCollection #(1 1) #(2 2 2) #(3) #(1) #(2)

Resources