Can I index into an array literal in a SELECT statement? - arrays

I am reading from a PostgreSQL database (that I do not control) that includes an integer column that acts like an enum, but the enum values are not in the database.
This is not my actual data, but consider an example students table:
id | name | class
==================
1 | Adam | 1
2 | Bruce | 1
3 | Chris | 3
4 | Dave | 4
When SELECTing from this table, it is very common to convert the class column to something more humane:
SELECT
id,
name,
CASE class
WHEN 1 THEN 'Freshman'
WHEN 2 THEN 'Sophomore'
WHEN 3 THEN 'Junior'
WHEN 4 THEN 'Senior'
ELSE 'Unknown'
END
FROM students
Is there a better way to write this? I tried constructing a literal array and using the column to index into it, but if that is possible, I have not figured out the syntax yet.
These don't work:
SELECT {"fr", "so", "ju", "se"}[class] FROM students
SELECT '{"fr", "so", "ju", "se"}'[class] FROM students

You can do this using the ARRAY keyword to construct the array:
SELECT (ARRAY['fr', 'so', 'ju', 'se'])[class] FROM students
In PostgreSQL, array subscripts begin with 1, not 0, so if the enum-like column uses 0 as one of its values, you may need to shift it to get what you want. For example, if the values for class were 0, 1, 2, and 3, you can add 1 to class:
SELECT (ARRAY['fr', 'so', 'ju', 'se'])[class + 1] FROM students

Make it a CTE and do the usual join:
with class (class, class_name) as ( values
(1, 'Freshman'),(2,'Sophomore'),(3,'Junior'),(4,'Senior')
), student (id, name, class) as (values
(1, 'Adam',1),(2,'Bruce',1),(3,'Chris',3),(4,'Dave',4)
)
select id, name, class_name
from
student
inner join
class using (class)
;
id | name | class_name
----+-------+------------
1 | Adam | Freshman
2 | Bruce | Freshman
3 | Chris | Junior
4 | Dave | Senior
https://www.postgresql.org/docs/current/static/sql-select.html#SQL-WITH
The WITH clause allows you to specify one or more subqueries that can be referenced by name in the primary query. The subqueries effectively act as temporary tables or views for the duration of the primary query.

Related

How can I do a custom order in Snowflake?

In Snowflake, how do I define a custom sorting order.
ID Language Text
0 ENU a
0 JPN b
0 DAN c
1 ENU d
1 JPN e
1 DAN f
2 etc...
here I want to return all rows sorted by Language in this order: Language = ENU comes first, then JPN and lastly DAN.
Is this even possible?
I would like to order by language, in this order: ENU, JPN, DNA, and so on: ENU, JPN, DNA,ENU,JPN, DAN,ENU, JPN, DAN
NOT: ENU,ENU,ENU,JPN,JPN,JPN,DAN,DAN,DAN
I liked array_position solution of Phil Coulson. It's also possible to use DECODE:
create or replace table mydata ( ID number, Language varchar, Text varchar )
as select * from values
(0, 'JPN' , 'b'),
(0, 'DAN' , 'c' ),
(0, 'ENU' , 'a'),
(1 , 'JPN' , 'e'),
(1 , 'ENU' , 'd'),
(1 , 'DAN' , 'f');
select * from
mydata order by ID, DECODE(Language,'ENU',0,'JPN',1,'DAN',2 );
+----+----------+------+
| ID | LANGUAGE | TEXT |
+----+----------+------+
| 0 | ENU | a |
| 0 | JPN | b |
| 0 | DAN | c |
| 1 | ENU | d |
| 1 | JPN | e |
| 1 | DAN | f |
+----+----------+------+
You basically need 2 levels of sort. I am using arrays to arrange the languages in the order I want and then array_position to assign every language an index based on which they will be sorted. You can achieve the same using either a case expression or decode. To make sure the languages don't repeat within the same id, we use row_number. You can comment out the the row_number() line if that's not a requirement
with cte (id, lang) as
(select 0,'JPN' union all
select 0,'ENU' union all
select 0,'DAN' union all
select 0,'ENU' union all
select 0,'JPN' union all
select 0,'DAN' union all
select 1,'JPN' union all
select 1,'ENU' union all
select 1,'DAN' union all
select 1,'ENU' union all
select 1,'JPN' union all
select 1,'DAN')
select *
from cte
order by id,
row_number() over (partition by id, array_position(lang::variant,['ENU','JPN','DAN']) order by lang), --in case you want languages to not repeat within each id
array_position(lang::variant,['ENU','JPN','DAN'])
This is not to say other answers are wrong.
But here's yet another not using ANSI SQL 'CASE':
SELECT * FROM "Example"
ORDER BY
CASE "Language" WHEN 'ENU' THEN 1
WHEN 'JPN' THEN 2
WHEN 'DAN' THEN 3
ELSE 4 END
,"Language";
Notice the "Language" code is used as a disambiguation for 'other' languages not specified.
It's good defensive programming when dealing with CASE to deal with ELSE.
The ultimate most flexible answer is to have a table with a collation order for languages in it.
Collation order columns are common in many applications.
I've seen them for things like multiple parties to a contract who should appear in a specified order to (of course) the positional order of columns in a table of metadata.

Is there a way in a SQL language (pref T-SQL) to determine the position of a record in a set of consecutively ordered data for multiple records?

So say for example, I have 2 tables with large data sets, one containing a list of users with their UserID. Another table with their orders, which has an OrderNumber field-- a serial primary key incrementing as new orders are created, and a field with the UserID that placed the order.
These might look something like this:
USERS
-----
userid
1
2
3
4
5
ORDERS
-----
ordernumber | userid
1 | 1
2 | 2
3 | 1
4 | 1
5 | 3
6 | 1
7 | 2
Now, I have a list of Order Numbers that I need to find out the Order's position in chronological order (different than count) grouped by each order, and user. Some users may have multiple orders in the list.
So input would be something like
ordernumber in (4, 5, 6, 7)
and based on all of the data above, it would return something like this where "accountsorder" is the position of the order in the accounts order history:
ordernumber | accountsorder
4 | 3
5 | 1
6 | 4
7 | 2
Let me know if I need to be more specific.
Thanks!
If I understand correctly, you want row_number() and subsequent filtering:
select o.*
from (select o.*,
row_number() over (partition by userid order by ordernumber) as seqaccountsorder
from orders o
) o
where ordernumber in (4, 5, 6, 7)
order by ordernumber
If you say "chronological," then I think you definitely need to be referring to a date/time field within the order table. You must never assume any particular "order" for rows in an SQL table: you must ORDER BY something.

Query from multiple XML columns in SQL Server

Environment: SQL Server 2016; abstracted example of "real" data.
From a first query to a table containing XML data I have a SQL result set that has the following columns:
ID (int)
Names (XML)
Times (XML)
Values (XML)
Columns 2-4 contain multiple values in XML format, e.g.
Names:
Row 1: <name>TestR1</name><name>TestR2</name>...
Row 2: <name>TestS1</name><name>TestS2</name>...
Times:
Row 1: <time>0.1</time><time>0.2</time>...
Row 2: <time>-0.1</time><time>-0.2</time>...
Values:
Row 1: <value>1.1</value><value>1.2</value>...
Row 2: <value>-1.1</value><value>-1.2</value>...
The XML of all XML columns contain the exact same number of elements.
What I want now is to create a select that has the following output:
| ID | Name | Time | Value |
+----+--------+------+-------+
| 1 | TestR1 | 0.1 | 1.1 |
| 1 | TestR1 | 0.2 | 1.2 |
| .. | ...... | .... | ..... |
| 2 | TestS1 | -0.1 | -1.1 |
| 2 | TestS2 | -0.2 | -1.2 |
| .. | ...... | .... | ..... |
For a single column CROSS APPLY works fine:
SELECT ID, N.value('.', 'nvarchar(50)') AS ExtractedName
FROM <source>
CROSS APPLY <source>.nodes('/name') AS T(N)
Applying multiple CROSS APPLY statements makes no sense here to me.
I would guess it would work if I would create selects for each column that then produce individual result sets and perform a select over all of the result sets
but that's very likely not the best solution as I am duplicating selects for each additional column.
Any suggestions on how to design a query like that would be highly appreciated!
I'd suggest this approach:
First I create a declared table variable and fill it with your sample data to simulate your issue. This is called "MCVE", please try to provide this yourself in your next question.
DECLARE #tbl TABLE(ID INT, Names XML,Times XML,[Values] XML);
INSERT INTO #tbl VALUES
(1,'<name>TestR1</name><name>TestR2</name>','<time>0.1</time><time>0.2</time>','<value>1.1</value><value>1.2</value>')
,(2,'<name>TestS1</name><name>TestS2</name>','<time>0.3</time><time>0.4</time>','<value>2.1</value><value>2.2</value>');
--The query
SELECT t.ID
,t.Names.value('(/name[sql:column("tally.Nmbr")])[1]','nvarchar(max)') AS [Name]
,t.Times.value('(/time[sql:column("tally.Nmbr")])[1]','decimal(10,4)') AS [Time]
,t.[Values].value('(/value[sql:column("tally.Nmbr")])[1]','decimal(10,4)') AS [Value]
FROM #tbl t
CROSS APPLY
(
SELECT TOP(t.Names.value('count(*)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) Nmbr FROM master..spt_values
) tally;
The idea in short:
We create a tally on the fly by using APPLY to create a list of numbers.
The TOP-clause will limit this list to the count of <name> elements in the given row.
In this case I take master..spt_values as a source for many rows. We do not need the content, just an appropriate list to create a tally. This said, if there is a physical numbers table in your database, this was even better.
Finally we can pick the content by the element's position using sql:column() to introduce the tally's value into the XQuery predicate.

Using STRING_SPLIT for 2 columns in a single table

I've started from a table like this
ID | City | Sales
1 | London,New York,Paris,Berlin,Madrid| 20,30,,50
2 | Istanbul,Tokyo,Brussels | 4,5,6
There can be an unlimited amount of cities and/or sales.
I need to get each city and their salesamount their own record. So my result should look something like this:
ID | City | Sales
1 | London | 20
1 | New York | 30
1 | Paris |
1 | Berlin | 50
1 | Madrid |
2 | Istanbul | 4
2 | Tokyo | 5
2 | Brussels | 6
What I got so far is
SELECT ID, splitC.Value, splitS.Value
FROM Table
CROSS APLLY STRING_SPLIT(Table.City,',') splitC
CROSS APLLY STRING_SPLIT(Table.Sales,',') splitS
With one cross apply, this works perfectly. But when executing the query with a second one, it starts to multiply the number of records a lot (which makes sense I think, because it's trying to split the sales for each city again).
What would be an option to solve this issue? STRING_SPLIT is not neccesary, it's just how I started on it.
STRING_SPLIT() is not an option, because (as is mentioned in the documantation) the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
But you may try with a JSON-based approach, using OPENJSON() and string transformation (comma-separated values are transformed into a valid JSON array - London,New York,Paris,Berlin,Madrid into ["London","New York","Paris","Berlin","Madrid"]). The result from the OPENJSON() with default schema is a table with columns key, value and type and the key column is the 0-based index of each item in this array:
Table:
CREATE TABLE Data (
ID int,
City varchar(1000),
Sales varchar(1000)
)
INSERT INTO Data
(ID, City, Sales)
VALUES
(1, 'London,New York,Paris,Berlin,Madrid', '20,30,,50'),
(2, 'Istanbul,Tokyo,Brussels', '4,5,6')
Statement:
SELECT d.ID, a.City, a.Sales
FROM Data d
CROSS APPLY (
SELECT c.[value] AS City, s.[value] AS Sales
FROM OPENJSON(CONCAT('["', REPLACE(d.City, ',', '","'), '"]')) c
LEFT OUTER JOIN OPENJSON(CONCAT('["', REPLACE(d.Sales, ',', '","'), '"]')) s
ON c.[key] = s.[key]
) a
Result:
ID City Sales
1 London 20
1 New York 30
1 Paris
1 Berlin 50
1 Madrid NULL
2 Istanbul 4
2 Tokyo 5
2 Brussels 6
STRING_SPLIT has no context of what oridinal positions are. In fact, the documentation specifically states that it doesn't care about it:
The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string.
As a result, you need to use something that is aware of such basic things, such as DelimitedSplit8k_LEAD.
Then you can do something like this:
WITH Cities AS(
SELECT ID,
DSc.Item,
DSc.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.City,',') DSc)
Sales AS(
SELECT ID,
DSs.Item,
DSs.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.Sales,',') DSs)
SELECT ISNULL(C.ID,S.ID) AS ID,
C.Item AS City,
S.Item AS Sale
FROM Cities C
FULL OUTER JOIN Sales S ON C.ItemNumber = S.ItemNumber;
Of course, however, the real solution is fix your design. This type of design is going to only cause you 100's of problems in the future. Fix it now, not later; you'll reap so many rewards sooner the earlier you do it.

Find "regional" relationships in SQL data using a query, or SSIS

Edit for clarification: I am compiling data weekly, based on Zip_Code, but some Zip_Codes are redundant. I know I should be able to compile a small amount of data, and derive the redundant zip_codes if I can establish relationships.
I want to define a zip code's region by the unique set of items and values that appear in that zip code, in order to create a "Region Table"
I am looking to find relationships by zip code with certain data. Ultimately, I have tables which include similar values for many zip codes.
I have data similar to:
ItemCode |Value | Zip_Code
-----------|-------|-------
1 |10 | 1
2 |15 | 1
3 |5 | 1
1 |10 | 2
2 |15 | 2
3 |5 | 2
1 |10 | 3
2 |10 | 3
3 |15 | 3
Or to simplify the idea, I could even concantenate ItemCode + Value into unique values:
ItemCode+
Value | Zip_Code
A | 1
B | 1
C | 1
A | 2
B | 2
C | 2
A | 3
D | 3
E | 3
As you can see, Zip_Code 1 and 2 have the same distinct ItemCode and Value. Zip_Code 3 however, has different values for certain ItemCodes.
I need to create a table that establishes a relationship between Zip_Codes that contain the same data.
The final table will look something like:
Zip_Code | Region
1 | 1
2 | 1
3 | 2
4 | 2
5 | 1
6 | 3
...etc
This will allow me to collect data only once for each unique Region, and derive the zip_code appropriately.
Things I'm doing now:
I am currently using a query similar to a join, and compares against Zip_Code using something along the lines of:
SELECT a.ItemCode
,a.value
,a.zip_code
,b.ItemCode
,b.value
,b.zip_code
FROM mytable as a, mytable as b -- select from table twice, similar to a join
WHERE a.zip_code = 1 -- left table will have all ItemCode and Value from zip 1
AND b.zip_code = 2 -- right table will have all ItemCode and Value from zip 2
AND a.ItemCode = b.ItemCode -- matches rows on ItemCode
AND a.Value != b.Value
ORDER BY ItemCode
This returns nothing if the two zip codes have exactly the same ItemNum, and Value, and returns a slew of differences between the two zip codes if there are differences.
This needs to move from a manual process to an automated process however, as I am now working with more than 100 zip_codes.
I do not have much programming experience in specific languages, so tools in SSIS are somewhat limited to me. I have some experience using the Fuzzy tools, and feel like there might be something in Fuzzy Grouping that might shine a light on apparent regions, but can't figure out how to set it up.
Does anyone have any suggestions? I have access to SQLServ and its related tools, and Visual Studio. I am trying to avoid writing a program to automate this, as my c# skills are relatively nooby, but will figure it out if necessary.
Sorry for being so verbose: This is my first Question, and the page I agreed to in order to ask a question suggested to explain in detail, and talk about what I've tried...
Thanks in advance for any help I might receive.
Give this a shot (I used the simplified example, but this can easily be expanded). I think the real interesting part of this code is the recursive CTE...
;with matches as (
--Find all pairs of zip_codes that have matching values.
select d1.ZipCode zc1, d2.ZipCode zc2
from data d1
join data d2 on d1.Val=d2.Val
group by d1.ZipCode, d2.ZipCode
having count(*) = (select count(distinct Val) from data where zipcode = d1.Zipcode)
), cte as (
--Trace each zip_code to it's "smallest" matching zip_code id.
select zc1 tempRegionID, zc2 ZipCode
from matches
where zc1<=zc2
UNION ALL
select c.tempRegionID, m.zc2
from cte c
join matches m on c.ZipCode=m.zc1
and c.ZipCode!=m.zc2
where m.zc1<=m.zc2
)
--For each zip_code, use it's smallest matching zip_code as it's region.
select zipCode, min(tempRegionID) as regionID
from cte
group by ZipCode
Demonstrating that there's a use for everything, though normally it makes me cringe: concatenate the values for each zip code into a single field. Store ZipCode and ConcatenatedValues in a lookup table (PK on the one, UQ on the other). Now you can assess which zip codes are in the same region by grouping on ConcatenatedValues.
Here's a simple function to concatenate text data:
CREATE TYPE dbo.List AS TABLE
(
Item VARCHAR(1000)
)
GO
CREATE FUNCTION dbo.Implode (#List dbo.List READONLY, #Separator VARCHAR(10) = ',') RETURNS VARCHAR(MAX)
AS BEGIN
DECLARE #Concat VARCHAR(MAX)
SELECT #Concat = CASE WHEN Item IS NULL THEN #Concat ELSE COALESCE(#Concat + #Separator, '') + Item END FROM #List
RETURN #Concat
END
GO
DECLARE #List AS dbo.List
INSERT INTO #List (Item) VALUES ('A'), ('B'), ('C'), ('D')
SELECT dbo.Implode(#List, ',')

Resources