In BigQuery, How can I turn many columns into a RECORD or Array of Key Value pairs
e.g.
Source Table
id
Name
DOB
Sex
1
Fred
01.01.2001
M
Destination Table
Id
Name
Key
Value
1
Fred
DOB
01.01.2001
Sex
M
I've tried a few things but cant get there, is there a nice way of doing it?
I've tried a few things but cant get there, is there a nice way of doing it?
Not sure what exactly was /could be an issue here as it is as simple/straightforward as below
select id, Name,
[struct<key string, value string>('DOB', DOB),('Sex', Sex)] info
from `project.dataset.table`
with output
Meantime, usually the issue comes when you don't know those columns names in advance and want to have generic approach - in this case you can use below approach where column names DOB and Sex are not being used
select id, Name,
array(
select as struct
split(replace(kv, '"', ''),':')[offset(0)] key,
split(replace(kv, '"', ''),':')[offset(1)] value,
from unnest(split(trim(to_json_string((select as struct * except (id, name) from unnest([t]))), '{}'))) kv
) info
from `project.dataset.table` t
with exact same result/output
Here you go with a solution:-
WITH `proj.dataset.tbl` AS
(SELECT '1' AS id, 'Fred' AS Name, '2020-12-07' AS DOB, 'M' as Sex
)
SELECT id, Name,
[struct('DOB' as key, cast (DOB as string) as value ),
struct('Sex' as key, Sex as value)
] as key_values
FROM `proj.dataset.tbl`
Output will be as :-
Related
I am trying to generate unique Login username for each users in my system.
What I have done so far is:
Select p.FirstName+P.LastName + p.PersonId from person As P
Here 1255 is Primary Key. The Output is like JamesHarley1255 ,
However I don't want to use the Primarykey. What are other
alternatives.There will be numerous Duplicate Records too.
Lets say I need a function that generates a unique number everytime between 1 to n numbers. And somehow I need to produce output like JamesHarley125
Try this,
Select p.FirstName+P.LastName +CAST(ROW_NUMBER() OVER (ORDER BY p.FirstName) AS VARCHAR(20))
from person As P
If you want to get one unique column against each record without using primary key column, I think you can use Row Number function. Query below should work in this case
Select p.FirstName+P.LastName + CAST((row_number() OVER ( PARTITION BY p.FirstName ORDER BY p.FirstName)) AS varchar(100)) from person As P
You can create an column with incremental integers using identity property and you can concatenate the same with the first name and last name. So that each row will be unique.
Select p.FirstName+P.LastName + P.Id from person As P
Here column P.Id can declared as below in the table definition. This will start with value '1' and will be incremented in the scale of '1'. Identity(seed, increment)
id int IDENTITY(1,1)
Building off the previous responses, this generates usernames without the numbers where possible, and then only adds numbers to make the usernames unique where actually required. I know this isn't the exact way you are generating your usernames, but this is how I decided to generate mine. I just changed the table names to match yours.
with Usernames(PersonId, Username) as
(
select PersonId,
left(Firstname, 1) + Middlename + Lastname as Username
from Person
),
NumberedUserNames(PersonId, Username, Number) as
(
select PersonId,
Username,
ROW_NUMBER() OVER (PARTITION BY Username ORDER BY Username) as Number
from Usernames
),
UniqueUsernames(PersonId, UniqueUsername) AS
(
select PersonId,
case when Number = 1 then UserName
else UserName + CAST(Number-1 as VarChar(20))
End as UniqueUsername
from NumberedUserNames
)
select PersonId,
UniqueUsername
from UniqueUsernames;
I have two tables tablea and tableb. Their sturcture is:
tablea
id serial
name varchar
alg integer[]
tableb
id serial
description varchar
in tablea.alg field I have multiple numbers of substances like {1,17,55,97} and in the query I want to get names instead of numbers like:
name | substances
organism1 | substance 1, substance 17, substance 55, substance 97
Can any1 suggest the right query?
Answer to simmilar question is on StackOwerflow question, but how to use tables instead of fixed array values?
Thank you...
Use unnest() to get algs in separate rows,
join tableb on the values and aggregate description with string_agg():
select name, string_agg(description, ', ') substances
from (
select name, unnest(alg) alg
from tablea
) a
join tableb on alg = id
group by name;
I have a hive table "Records" with the following structure:
recordid int
addresses array<map<string,string>>
knownnames array<map<string,string>>
The addresses array contains the standard parts for an address (house number, street name, city, state) and may contain multiple of these elements (if a record has more than 1 address). The knownnames arrary contains first name, middle name, and last name and may contain multiple of each (if a record has akas).
How can I query my "records" table for all records that have any address in CA and a lastname of "Smith"?
I've tried exploding both arrays but it looks like hive doesn't like having elements from 2 different arrays in the where clause....
Since you completely changed the question I'm not sure; I'll have to test this.
select recordid, cities, last_names
from (
select recordid, cities
, knownname.last_name as last_names
from (
select recordid, knownnames
, address.city as cities
from db.table
lateral view explode(addresses) exptbl1 as address ) x
lateral view explode(knownnames) exptbl2 as knownname
where cities='CA' ) y
where last_names='Smith'
I want to select all records, but have the query only return a single record per Product Name. My table looks similar to:
SellId ProductName Comment
1 Cake dasd
2 Cake dasdasd
3 Bread dasdasdd
where the Product Name is not unique. I want the query to return a single record per ProductName with results like:
SellId ProductName Comment
1 Cake dasd
3 Bread dasdasdd
I have tried this query,
Select distict ProductName,Comment ,SellId from TBL#Sells
but it is returning multiple records with the same ProductName. My table is not realy as simple as this, this is just a sample. What is the solution? Is it clear?
Select ProductName,
min(Comment) , min(SellId) from TBL#Sells
group by ProductName
If y ou only want one record per productname, you ofcourse have to choose what value you want for the other fields.
If you aggregate (using group by) you can choose an aggregate function,
htat's a function that takes a list of values and return only one : here I have chosen MIN : that is the smallest walue for each field.
NOTE : comment and sellid can come from different records, since MIN is taken...
Othter aggregates you might find useful :
FIRST : first record encountered
LAST : last record encoutered
AVG : average
COUNT : number of records
first/last have the advantage that all fields are from the same record.
SELECT S.ProductName, S.Comment, S.SellId
FROM
Sells S
JOIN (SELECT MAX(SellId)
FROM Sells
GROUP BY ProductName) AS TopSell ON TopSell.SellId = S.SellId
This will get the latest comment as your selected comment assuming that SellId is an auto-incremented identity that goes up.
I know, you've got an answer already, I'd like to offer a way that was fastest in terms of performance for me, in a similar situation. I'm assuming that SellId is Primary Key and identity. You'd want an index on ProductName for best performance.
select
Sells.*
from
(
select
distinct ProductName
from
Sells
) x
join
Sells
on
Sells.ProductName = x.ProductName
and Sells.SellId =
(
select
top 1 s2.SellId
from
Sells s2
where
x.ProductName = s2.ProductName
Order By SellId
)
A slower method, (but still better than Group By and MIN on a long char column) is this:
select
*
from
(
select
*,ROW_NUMBER() over (PARTITION BY ProductName order by SellId) OccurenceId
from sells
) x
where
OccurenceId = 1
An advantage of this one is that it's much easier to read.
create table Sale
(
SaleId int not null
constraint PK_Sale primary key,
ProductName varchar(100) not null,
Comment varchar(100) not null
)
insert Sale
values
(1, 'Cake', 'dasd'),
(2, 'Cake', 'dasdasd'),
(3, 'Bread', 'dasdasdd')
-- Option #1 with over()
select *
from Sale
where SaleId in
(
select SaleId
from
(
select SaleId, row_number() over(partition by ProductName order by SaleId) RowNumber
from Sale
) tt
where RowNumber = 1
)
order by SaleId
-- Option #2
select *
from Sale
where SaleId in
(
select min(SaleId)
from Sale
group by ProductName
)
order by SaleId
drop table Sale
In SSRS I have a List with, say, a table with two columns: name and number e.g.:
NAME NUMBER
John 123
John 456
John NULL
Name is never null, but number may be. In this case I want the report to include the NULL in each group, like this:
GROUP 1:
John NULL
John 123
GROUP 2:
John NULL
John 456
The SSRS, however, puts the null in a group on its own. How do I accomplish this?
You have told SSRS to group on the NUMBER column, so it will generate a separate group for each value in the NUMBER column and then display those rows. To get what you want, you have to make the data set have the rows you want.
Select Name, Number, cast(Number as varchar(50)) as displayvalue
From mytable
UNION ALL
Select m.Name, m.Number, 'NULL' as displayvalue
From mytable m
Where exists(Select 1 from mytable where Name=m.Name and Number is NULL)
Group by Name, Number
Then group on the Number column but report on the DisplayValue column.