How to find and count unique values across columns? - snowflake-cloud-data-platform

I'm using Snowflake and I'm hoping someone can help me understand how to count the number of unique names across columns for each row and overlook null values. Here is a sample of the data. So you can see below I'm looking to count the number of distinct values across the variables Name 1, Name 2, name 3, Name 4.
ID | Type | Name 1 | Name 2 | Name 3 | Name 4 | Expected result
1 | animal | cat | Dog | null | Dog | 2
2 | animal | fish | cat | cat | cat | 2
3 | animal | fish | cat | dog | rat | 4

You could use a unpivot approach:
SELECT ID, Type, COUNT(DISTINCT Name) AS cnt
FROM
(
SELECT ID, Type, Name1 AS Name FROM yourTable UNION ALL
SELECT ID, Type, Name2 FROM yourTable UNION ALL
SELECT ID, Type, Name3 FROM yourTable UNION ALL
SELECT ID, Type, Name4 FROM yourTable
) t
GROUP BY ID, Type;
Demo
This approach works by unpivoting the name data to a format where one record has just one ID and one name. Then, we aggregate and take the distinct count. The COUNT() function works well here, because by default it ignores NULL values, which is the behavior you want.

Related

Create 1 array with 2 fields from 2 csv fields in BigQuery

I am currently trying to work through this and I'm unsure as to how to proceed. I have the below data
ID
name
value
One
a,b,c
10,20,30
I would like to turn it into
| ID | properties.name | properties.value |
|:---- |:------: | -----: |
| One | a | 10 |
| | b | 20 |
| | c | 30 |
The below query looked like it was working but instead of having an array it created a nested record with 2 array fields.
SELECT ID
name
, value
, array (
select as struct
split(name, ',') as name
, split(value, ',') as value
) as properties
FROM `orders`
Consider below approach
select id, array(
select as struct name, value
from unnest(split(name)) name with offset
join unnest(split(value)) value with offset
using(offset)
) as properties
from `orders`
if applied to sample data in your question - output is

Unpivot PostgreSQL large number of columns?

I am trying to unpivot a large dataset, with 250 columns. There is a very good documented solution here unpivot and PostgreSQL.
However, it inputs the column names manually. I'm looking to do something like..
extract all column names into an array
pass the array through unnest
OR,
extract all column names into an array
loop the array by indexing through
using column name values as an input in the unnest
Apologies for being noob, New to SQL!
This dataset is good enough for purposes:
CREATE TEMP TABLE foo (id int, a text, b text, c text);
INSERT INTO foo VALUES (1, 'ant', 'cat', 'chimp'), (2, 'grape', 'mint', 'basil');
SELECT id,
unnest(array['a', 'b', 'c']) AS colname,
unnest(array[a, b, c]) AS thing
-- I would like something like.. unnest(array[column_names]) AS thing
-- where column_names = [a,b,c.. so on]
FROM foo
ORDER BY id;
Expected outcome:
id | colname | thing
1 | a | ant
1 | b | cat
1 | c | chimp
2 | a | grape
2 | b | mint
2 | c | basil
Use JSONB functions, example:
select id, key as colname, value as thing
from foo t
cross join jsonb_each_text(to_jsonb(t)- 'id')
id | colname | thing
----+---------+-------
1 | a | ant
1 | b | cat
1 | c | chimp
2 | a | grape
2 | b | mint
2 | c | basil
(6 rows)

sum column with duplicates in another table

Wrong Result
So i have two tables
Order
Staging
Order Table having column structure
+-------+---------+-------------+---------------+----------+
| PO | cashAmt | ClaimNumber | TransactionID | Supplier |
+-------+---------+-------------+---------------+----------+
| 12345 | 100 | 99876 | abc123 | 0101 |
| 12346 | 50 | 99875 | abc123 | 0102 |
| 12345 | 100 | 99876 | abc123 | 0101 |
+-------+---------+-------------+---------------+----------+
Staging Table having column structure
+----------+------------+-------------+---------------+
| PONumber | paymentAmt | ClaimNumber | TransactionID |
+----------+------------+-------------+---------------+
| 12345 | 100 | 99876 | abc123 |
| 12346 | 50 | 99875 | abc123 |
+----------+------------+-------------+---------------+
The query i am executing is
select sum(cashAmt) CheckAmount, count(ClaimNumber) TotalLines
FROM [order] with (nolock)
WHERE TransactionID='abc123'
union
select sum(paymentAmt) CheckAmount, count(ClaimNumber) TotalLines
from Staging with (nolock)
where TransactionID='abc123'
but the sum is getting messed up because there is duplicate in one of the tables.
How can i edit that i get only uniques from the order table and the sums are correct
First ask yourself why are there duplicates in the Orders table? There must be a reason why they are there. I would deal with that first.
That issue aside, if the duplicates in the Orders table have a purpose and yet are not to be considered for this particular query, then you should be able to leave out the duplicates by simply changing the query to use DISTINCT on whatever field in the Orders table can reliably identify a duplicate.
select Distinct fieldname sum(cashAmt)... etc.
Assuming duplicates in your table are OK.
Not sure why you are using no lock, it seems like it shouldn't be included.
You could use a table variable to store the distinct values. You'll need to adjust the data types in the table variable to match your table structure.
I haven't tested the code below but it should look something like this.
DECLARE #OrderTmp TABLE (
cashAmt MyNumericColumn numeric(10,2)
, ClaimNumber int
, TransactionID Int
)
INSERT INTO #OrderTmp
select Distinct
cashAmt
,ClaimNumber
,TransactionID
FROM
[order]
WHERE TransactionID='abc123'
SELECT DISTINCT
select sum(cashAmt) CheckAmount, count(ClaimNumber) TotalLines
FROM #OrderTmp
where TransactionID='abc123'
union
select sum(paymentAmt) CheckAmount, count(ClaimNumber) TotalLines
from Staging
where TransactionID='abc123'

Unpivot SQL each row in SQL table to key-value pairs with group IDs

Running SQL Server 2012, I have a table in the following format:
ENC_ID | Name | ED_YN | Seq
-------------------------------------
1234 | John | Y | 1
1234 | Sally | N | 2
2345 | Chris | N | 1
2345 | Sally | N | 2
I would like to unpivot this into a entity-attribute-value list (if that's the right terminology - I am thinking of them as key-value pairs grouped by IDs), with the following format:
ENC_ID | Seq | Key | Value
--------------------------------------
1234 | 1 | Name | John
1234 | 1 | ED_YN | Y
1234 | 2 | Name | Sally
1234 | 2 | ED_YN | N
2345 | 1 | Name | Chris
2345 | 1 | ED_YN | N
2345 | 2 | Name | Sally
2345 | 2 | ED_YN | N
I have seen various answers to this using UNION or UNPIVOT, but these solutions tend to be long and must be closely customized to the table. I'm looking for a solution that can be reused without a great deal of rewriting as this pattern solves a problem I expect to run into frequently (ingesting data from star-schema into Tableau via extracts, where I don't necessarily know the number of value columns).
The closest thing I've found to solve this is this question/answer but I haven't had success altering the solution to add ENC_ID and Seq to each row in the result table.
Thanks.
I will use Cross Apply with table valued Constructor to do this unpivoting
select ENC_ID, Seq, Key, Value
from yourtable
cross apply
(values ('Name',Name),
('ED_YN',ED_YN)) CS (Key,Value)
You can try below query. This uses classic UNPIVOT syntax.
Since I am unsure about your column types I have casted both of them as varchar(100). You can increase length from 100.
Working sql fiddle demo link :
select enc_id,seq, k [key],v [value] from
(select enc_id,seq,cast(name as varchar(100)) as name, cast(ed_yn as varchar(100)) as ed_yn from r)s
UNPIVOT
( v for k in ([name],[ed_yn])
)up

SQLite Complex Query Help

I have a very complex query going on in SQLite, and I need a bit of help understanding how to do it.
The following example is of my Database:
Category:
CatID | CatTitle
----------------
1 | XYZ
2 | Sample
Content:
ItemID | ItemCatID | ItemText | ItemText2 | ItemText3 | ItemText4
-----------------------------------------------------------------
1 | 1 | Test | Bla | Sample | MoreContent
2 | 1 | Test2 | BlaBla | Sample2 | Other Content
3 | 2 | Test3 | BlaBla2 | Sample3 | Other Content2
Now, I basically want to do a query, on a search string.. (i.e. "%XYZ%" with the wildcards) and I want to search not only the CatTitle, but also ItemText, ItemText2 and ItemText3
The parameters I want to return though are: CatID, CatTitle, ItemID, and if possible, "ItemFilteredText" (which can be anything from ItemText to ItemText4, depending on which is the match)
Heres the thing, If the Query Matches CatTitle, then if it returns ItemID, it should be the FIRST ItemID and NOT the LAST ItemID
I have the following SQL somewhat working...
select DISTINCT Category.CatID,
Category.CatTitle,
Content.ItemID,
Content.ItemText,
Content.ItemText2,
Content.ItemText3
from Content,Category
where Category.CatID = Content.ItemCatID
AND ( Category.CatTitle like'%XYZ%'
OR Content.ItemText like '%XYZ%'
OR Content.ItemText2 like '%XYZ%'
OR Content.ItemText3 like '%XYZ%')
GROUP BY Category.CatTitle ORDER BY Category.CatID ASC
It returns the data... but Grouping By Category.CatTitle means that Content.ItemID would return 2, instead of 1
Any Ideas?
Assuming that you are using integers for content.ItemID and the first result is characterized by a lower ItemID modify the query to include min(Content.ItemID)
select DISTINCT Category.CatID, Category.CatTitle, min(Content.ItemID), Content.ItemText, Content.ItemText2, Content.ItemText3 from Content,Category where Category.CatID = Content.ItemCatID AND (Category.CatTitle like'%XYZ%' OR Content.ItemText like '%XYZ%' OR Content.ItemText2 like '%XYZ%' OR Content.ItemText3 like '%XYZ%') GROUP BY Category.CatTitle ORDER BY Category.CatID ASC
should do the work.

Resources