Hive query, better option to self join

Hive query, better option to self join - database

So I am working with a hive table that is set up as so:
id (Int), mapper (String), mapperId (Int)
Basically a single Id can have multiple mapperIds, one per mapper such as an example below:
ID (1) mapper(MAP1) mapperId(123)
ID (1) mapper(MAP2) mapperId(1234)
ID (1) mapper(MAP3) mapperId(12345)
ID (2) mapper(MAP2) mapperId(10)
ID (2) mapper(MAP3) mapperId(12)
I want to return the list of mapperIds associated to each unique ID. So for the above example I would want the below returned as a single row.
1, 123, 1234, 12345
2, null, 10, 12
The mapper Strings are known, so I was thinking of doing a self join for every mapper string I am interested in, but I was wondering if there was a more optimal solution?

If the assumption that the mapper column is distinct with respect to a given ID is correct, you could collect the mapper column and the mapperid column to a Map using brickhouse collect. You can clone the repo from that link and build the jar with Maven.
Query:
add jar /complete/path/to/jar/brickhouse-0.7.0-SNAPSHOT.jar;
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select id
,id_map['MAP1'] as mapper1
,id_map['MAP2'] as mapper2
,id_map['MAP3'] as mapper3
from (
select id
,collect(mapper, mapperid) as id_map
from some_table
group by id
) x
Output:
| id | mapper1 | mapper2 | mapper3 |
------------------------------------
1 123 1234 12345
2 10 12

Related

Return Parts of an Array in Postgres

I have a column (text) in my Postgres DB (v.10) with a JSON format.
As far as i now it's has an array format.
Here is an fiddle example: Fiddle
If table1 = persons and change_type = create then i only want to return the name and firstname concatenated as one field and clear the rest of the text.
Output should be like this:
id table1 did execution_date change_type attr context_data
1 Persons 1 2021-01-01 Create Name [["+","name","Leon Bill"]]
1 Persons 2 2021-01-01 Update Firt_name [["+","cur_nr","12345"],["+","art_cd","1"],["+","name","Leon"],["+","versand_art",null],["+","email",null],["+","firstname","Bill"],["+","code_cd",null]]
1 Users 3 2021-01-01 Create Street [["+","cur_nr","12345"],["+","art_cd","1"],["+","name","Leon"],["+","versand_art",null],["+","email",null],["+","firstname","Bill"],["+","code_cd",null]]

Disassemble json array into SETOF using json_array_elements function, then assemble it back into structure you want.
select m.*
, case
when m.table1 = 'Persons' and m.change_type = 'Create'
then (
select '[["+","name",' || to_json(string_agg(a.value->>2,' ' order by a.value->>1 desc))::text || ']]'
from json_array_elements(m.context_data::json) a
where a.value->>1 in ('name','firstname')
)
else m.context_data
end as context_data
from mutations m
modified fiddle
(Note:
utilization of alphabetical ordering of names of required fields is little bit dirty, explicit order by case could improve readability
resulting json is assembled from string literals as much as possible since you didn't specified if "+" should be taken from any of original array elements
the to_json()::text is just for safety against injection
)

Adding multiple records from a string

I have a string of email addresses. For example, "a#a.com; b#a.com; c#a.com"
My database is:
record | flag1 | flag2 | emailaddresss
--------------------------------------------------------
1 | 0 | 0 | a#a.com
2 | 0 | 0 | b#a.com
3 | 0 | 0 | c#a.com
What I need to do is parse the string, and if the address is not in the database, add it.
Then, return a string of just the record numbers that correspond to the email addresses.
So, if the call is made with "A#a.com; c#a.com; d#a.com", the rountine would add "d#a.com", then return "1, 3,4" corresponding to the records that match the email addresses.
What I am doing now is calling the database once per email address to look it up and confirm it exists (adding if it doesn't exist), then looping thru them again to get the addresses 1 by 1 from my powershell app to collect the record numbers.
There has to be a way to just pass all of the addresses to SQL at the same time, right?
I have it working in powershell.. but slowly..
I'd love a response from SQL as shown above of just the record number for each email address in a single response. That is, "1,2,4" etc.
My powershell code is:
$EmailList2 = $EmailList.split(";")
# lets get the ID # for each eamil address.
foreach($x in $EmailList2)
{
$data = exec-query "select Record from emailaddresses where emailAddress = #email" -parameter #{email=$x.trim()} -conn $connection
if ($($data.Tables.record) -gt 0)
{
$ResponseNumbers = $ResponseNumbers + "$($data.Tables.record), "
}
}
$ResponseNumbers = $($ResponseNumbers+"XX").replace(", XX","")
return $ResponseNumbers

You'd have to do this in 2 steps. Firstly INSERT the new values and then use a SELECT to get the values back. This answer uses delimitedsplit8k (not delimitedsplit8k_LEAD) as you're still using SQL Server 2008. On the note of 2008 I strongly suggest looking at upgrade paths soon as you have about 6 weeks of support left.
You can use the function to split the values and then INSERT/SELECT appropriately:
DECLARE #Emails varchar(8000) = 'a#a.com;b#a.com;c#a.com';
WITH Emails AS(
SELECT DS.Item AS Email
FROM dbo.DelimitedSplit8K(#Emails,';') DS)
INSERT INTO YT (emailaddress) --I don't know what the other columns value should be, so have excluded
SELECT E.Email
FROM dbo.YourTable YT
LEFT JOIN Emails E ON YT.emailaddress = E.Email
WHERE E.Email IS NULL;
SELECT YT.record
FROM dbo.YourTable YT
JOIN dbo.DelimitedSplit8K(#Emails,';') DS ON DS.Item = YT.emailaddress;

Check a value in an array inside a object json in PostgreSQL 9.5

I have an json object containing an array and others properties.
I need to check the first value of the array for each line of my table.
Here is an example of the json
{"objectID2":342,"objectID1":46,"objectType":["Demand","Entity"]}
So I need for example to get all lines with ObjectType[0] = 'Demand' and objectId1 = 46.
This the the table colums
id | relationName | content
Content column contains the json.

just query them? like:
t=# with table_name(id, rn, content) as (values(1,null,'{"objectID2":342,"objectID1":46,"objectType":["Demand","Entity"]}'::json))
select * From table_name
where content->'objectType'->>0 = 'Demand' and content->>'objectID1' = '46';
id | rn | content
----+----+-------------------------------------------------------------------
1 | | {"objectID2":342,"objectID1":46,"objectType":["Demand","Entity"]}
(1 row)

Filesystem (file, folder) with display order

I want to design like os file system,
with specific display order (sequence) can be update.
I want file and folder can be same layer,
file doesn't have to inside a folder.
But in below design, if the file not in any folder I don't know how to save the sequence, save in where??
Any suggestion will be apperciate
data example
folder(id:1) top layer: sequence: 0
file(id:1) sequence_in_folder: 0
file(id:2) sequence_in_folder: 1
folder(id:2) top layer: sequence: 1
file(id:3) sequence_in_folder: 0
file(id:4) top layer: sequence: 2 << **sequence save in which table ??**
file(id:5) top layer: sequence: 3 << **sequence save in which table ??**
folder
id sequence parent_folder_id
1 0
2 1
file
id sequence_in_folder folder_id
1 0 1
2 1 1
3 0 2
4 ?????
5 ????
schema
CREATE TABLE IF NOT EXISTS "folder"(
"id" SERIAL NOT NULL,
"sequence" integer NOT NULL,
"parent_folder_id" integer Default NULL,
PRIMARY KEY ("id")
);
CREATE TABLE IF NOT EXISTS "file"(
"id" SERIAL NOT NULL,
"sequence_in_folder" integer Default NULL,
"parent_folder_id" integer NOT NULL,
PRIMARY KEY ("id")
);
UPDATE
base on #Laurenz Albe answer, no need change table design,
just create a root folder.
but how to sorting data order by a field cross/exist in two table?
the sequence exist in folder table and file table, how to sort them together
query
SELECT * FROM folder fo
LEFT JOIN file fi ON fi.parent_folder_id = fo.id
WHERE fo.parent_folder_id = $1 AND fi.parent_folder_id = $1
ORDER BY fo.sequence fi.sequence ?? ;
[1]
data example
folder
id | sequence | parent_folder_id | name
1 | 0 | | root
2 | 0 | 1 |
3 | 2 | 1 |
file
id | sequence | parent_folder_id |
1 | 1 | 1 |
output
folder(id:1, sequence:0 name:root)
folder(id:2, sequence:0)
file(id:1, sequence:1)
folder(id:3 sequence:2)

Two suggestions:
Introduce an “anonymous” top folder that contains all the top level elements.
Rename the sequence column of bookmerk_folder to max_sequence or so to avoid confusion with bookmark.sequence.

Supplemental to Laurenz's answer:
unify your bookmark and folder columns, maybe bookmark_node and require that everything have a parent which is not a bookmark. Something like
CREATE TABLE IF NOT EXISTS fsnode(
"id" SERIAL NOT NULL,
"name" text,
"is_folder" bool,
"parent_is_folder" bool not null,
"sequence" integer NOT NULL,
"parent_folder_id" integer Default NULL,
CHECK (parent_is_folder),
PRIMARY KEY ("id"),
UNIQUE(id, is_folder), # needed for fkey below
FOREIGN KEY (parent_folder_id, parent_is_folder) REFERENCES fsnode (id, is_folder)
);

Modeling a non-primary Key relationship

I am trying to model the following relationship with the intent of designing classes for EF code first.
Program table:
ProgramID - PK
ProgramName
ClusterCode
Sample data
ProgramID ProgramName ClusterCode
--------------------------------------
1 Spring A
2 Fall A
3 Winter B
4 Summer B
Cluster table:
ID
ClusterCode
ClusterDetails
Sample data:
ID ClusterCode ClusterDetails
---------------------------------
1 A 10
2 A 20
3 A 30
4 B 20
5 B 40
I need to join the Program table to the Cluster table so I can get the list of cluster details for each program.
The SQL would be
Select
from Programs P
Join Cluster C On P.ClusterCode = C.ClusterCode
Where P.ProgramID = 'xxx'
Note that for the Program table, ClusteCode is not unique.
For Cluster table, neither ClusterCode nor ClusterDetail is unique.
How would I model this so I can take advantage of navigation properties and code-first?

assuming you have mapped above two tables and make an association between them and you are using C#, you can use a simple join :
List<Sting> clustedDets=new ArrayList<String>();
var q =
from p in ClusterTable
join c in Program on p equals c.ClusterTable
select new { p.ClusterDetails };
foreach (var v in q)
{
clustedDets.Add(v.ClusterDetails);
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Hive query, better option to self join - database

Related

Return Parts of an Array in Postgres

Adding multiple records from a string

Check a value in an array inside a object json in PostgreSQL 9.5

Filesystem (file, folder) with display order

Modeling a non-primary Key relationship

Categories

Resources