SQL server - concat few column values in to one column on join - sql-server

After joining multiple tables I have some results that differ only in one column. Is there a "easy" way to compact those differences to one row ?
For example, lets assume that after join I have something like this:
id | project | date | Oranges | Apples | Name
1 xaxa 1.1.2000 yes yes Tom
1 xaxa 1.1.2000 yes yes Bob
1 xaxa 1.1.2000 yes yes Jan
And I would like to have something like this:
id | project | date | Oranges | Apples | Name
1 xaxa 1.1.2000 yes yes Tom, Bob, Jan
Still begginer here, please be gentle :)

select id, project, date, Oranges, Apples
, Names = stuff((
select distinct ', '+i.Name
from t as i
where i.id = t.id
and i.project = t.project
and i.date = t.date
and i.Oranges = t.Oranges
and i.Apples = t.Apples
order by i.Name
for xml path (''), type).value('.','varchar(max)')
,1,2,'')
from t
group by id, project, date, Oranges, Apples

Related

How can I do a custom order in Snowflake?

In Snowflake, how do I define a custom sorting order.
ID Language Text
0 ENU a
0 JPN b
0 DAN c
1 ENU d
1 JPN e
1 DAN f
2 etc...
here I want to return all rows sorted by Language in this order: Language = ENU comes first, then JPN and lastly DAN.
Is this even possible?
I would like to order by language, in this order: ENU, JPN, DNA, and so on: ENU, JPN, DNA,ENU,JPN, DAN,ENU, JPN, DAN
NOT: ENU,ENU,ENU,JPN,JPN,JPN,DAN,DAN,DAN
I liked array_position solution of Phil Coulson. It's also possible to use DECODE:
create or replace table mydata ( ID number, Language varchar, Text varchar )
as select * from values
(0, 'JPN' , 'b'),
(0, 'DAN' , 'c' ),
(0, 'ENU' , 'a'),
(1 , 'JPN' , 'e'),
(1 , 'ENU' , 'd'),
(1 , 'DAN' , 'f');
select * from
mydata order by ID, DECODE(Language,'ENU',0,'JPN',1,'DAN',2 );
+----+----------+------+
| ID | LANGUAGE | TEXT |
+----+----------+------+
| 0 | ENU | a |
| 0 | JPN | b |
| 0 | DAN | c |
| 1 | ENU | d |
| 1 | JPN | e |
| 1 | DAN | f |
+----+----------+------+
You basically need 2 levels of sort. I am using arrays to arrange the languages in the order I want and then array_position to assign every language an index based on which they will be sorted. You can achieve the same using either a case expression or decode. To make sure the languages don't repeat within the same id, we use row_number. You can comment out the the row_number() line if that's not a requirement
with cte (id, lang) as
(select 0,'JPN' union all
select 0,'ENU' union all
select 0,'DAN' union all
select 0,'ENU' union all
select 0,'JPN' union all
select 0,'DAN' union all
select 1,'JPN' union all
select 1,'ENU' union all
select 1,'DAN' union all
select 1,'ENU' union all
select 1,'JPN' union all
select 1,'DAN')
select *
from cte
order by id,
row_number() over (partition by id, array_position(lang::variant,['ENU','JPN','DAN']) order by lang), --in case you want languages to not repeat within each id
array_position(lang::variant,['ENU','JPN','DAN'])
This is not to say other answers are wrong.
But here's yet another not using ANSI SQL 'CASE':
SELECT * FROM "Example"
ORDER BY
CASE "Language" WHEN 'ENU' THEN 1
WHEN 'JPN' THEN 2
WHEN 'DAN' THEN 3
ELSE 4 END
,"Language";
Notice the "Language" code is used as a disambiguation for 'other' languages not specified.
It's good defensive programming when dealing with CASE to deal with ELSE.
The ultimate most flexible answer is to have a table with a collation order for languages in it.
Collation order columns are common in many applications.
I've seen them for things like multiple parties to a contract who should appear in a specified order to (of course) the positional order of columns in a table of metadata.

Using STRING_SPLIT for 2 columns in a single table

I've started from a table like this
ID | City | Sales
1 | London,New York,Paris,Berlin,Madrid| 20,30,,50
2 | Istanbul,Tokyo,Brussels | 4,5,6
There can be an unlimited amount of cities and/or sales.
I need to get each city and their salesamount their own record. So my result should look something like this:
ID | City | Sales
1 | London | 20
1 | New York | 30
1 | Paris |
1 | Berlin | 50
1 | Madrid |
2 | Istanbul | 4
2 | Tokyo | 5
2 | Brussels | 6
What I got so far is
SELECT ID, splitC.Value, splitS.Value
FROM Table
CROSS APLLY STRING_SPLIT(Table.City,',') splitC
CROSS APLLY STRING_SPLIT(Table.Sales,',') splitS
With one cross apply, this works perfectly. But when executing the query with a second one, it starts to multiply the number of records a lot (which makes sense I think, because it's trying to split the sales for each city again).
What would be an option to solve this issue? STRING_SPLIT is not neccesary, it's just how I started on it.
STRING_SPLIT() is not an option, because (as is mentioned in the documantation) the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
But you may try with a JSON-based approach, using OPENJSON() and string transformation (comma-separated values are transformed into a valid JSON array - London,New York,Paris,Berlin,Madrid into ["London","New York","Paris","Berlin","Madrid"]). The result from the OPENJSON() with default schema is a table with columns key, value and type and the key column is the 0-based index of each item in this array:
Table:
CREATE TABLE Data (
ID int,
City varchar(1000),
Sales varchar(1000)
)
INSERT INTO Data
(ID, City, Sales)
VALUES
(1, 'London,New York,Paris,Berlin,Madrid', '20,30,,50'),
(2, 'Istanbul,Tokyo,Brussels', '4,5,6')
Statement:
SELECT d.ID, a.City, a.Sales
FROM Data d
CROSS APPLY (
SELECT c.[value] AS City, s.[value] AS Sales
FROM OPENJSON(CONCAT('["', REPLACE(d.City, ',', '","'), '"]')) c
LEFT OUTER JOIN OPENJSON(CONCAT('["', REPLACE(d.Sales, ',', '","'), '"]')) s
ON c.[key] = s.[key]
) a
Result:
ID City Sales
1 London 20
1 New York 30
1 Paris
1 Berlin 50
1 Madrid NULL
2 Istanbul 4
2 Tokyo 5
2 Brussels 6
STRING_SPLIT has no context of what oridinal positions are. In fact, the documentation specifically states that it doesn't care about it:
The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string.
As a result, you need to use something that is aware of such basic things, such as DelimitedSplit8k_LEAD.
Then you can do something like this:
WITH Cities AS(
SELECT ID,
DSc.Item,
DSc.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.City,',') DSc)
Sales AS(
SELECT ID,
DSs.Item,
DSs.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.Sales,',') DSs)
SELECT ISNULL(C.ID,S.ID) AS ID,
C.Item AS City,
S.Item AS Sale
FROM Cities C
FULL OUTER JOIN Sales S ON C.ItemNumber = S.ItemNumber;
Of course, however, the real solution is fix your design. This type of design is going to only cause you 100's of problems in the future. Fix it now, not later; you'll reap so many rewards sooner the earlier you do it.

TSQL Conditional Where or Group By?

I have a table like the following:
id | type | duedate
-------------------------
1 | original | 01/01/2017
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | original | 10/01/2017
3 | revised | 09/01/2017
Where there may be either one or two rows for each id. If there are two rows with same id, there would be one with type='original' and one with type='revised'. If there is one row for the id, type will always be 'original'.
What I want as a result are all the rows where type='revised', but if there is only one row for a particular id (thus type='original') then I want to include that row too. So desired output for the above would be:
id | type | duedate
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | revised | 09/01/2017
I do not know how to construct a WHERE clause that conditionally checks whether there are 1 or 2 rows for a given id, nor am I sure how to use GROUP BY because the revised date could be greater than or less than than the original date so use of aggregate functions MAX or MIN don't work. I thought about using CASE somehow, but also do not know how to construct a conditional that chooses between two different rows of data (if there are two rows) and display one of them rather than the other.
Any suggested approaches would be appreciated.
Thanks!
you can use row number for this.
WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Type DESC) AS RN
FROM YourTable
)
SELECT *
FROM T
WHERE RN = 1
Is something like this sufficient?
SELECT *
FROM mytable m1
WHERE type='revised'
or 1=(SELECT COUNT(*) FROM mytable m2 WHERE m2.id=m1.id)
You could use a subquery to take the MAX([type]). In this case it works for [type] since alphabetically we want revised first, then original and "r" comes after "o" in the alphabet. We can then INNER JOIN back on the same table with the matching conditions.
SELECT T2.*
FROM (
SELECT id, MAX([type]) AS [MAXtype]
FROM myTABLE
GROUP BY id
) AS dT INNER JOIN myTable T2 ON dT.id = T2.id AND dT.[MAXtype] = T2.[type]
ORDER BY T2.[id]
Gives output:
id type duedate
1 revised 2017-02-01
2 original 2017-03-01
3 revised 2017-09-01
Here is the sqlfiddle: http://sqlfiddle.com/#!6/14121f/6/0

SSIS - transform 2 records from table A to 1 record in table B

I have following data in employee Table A:
ID | emp | City_Type | City
1 | 101 | Z | Tokyo
2 | 101 | Y | New York
City_Type can either be Y or Z. Y being the city this person was born in, Z is the city he/she is living now.
I need to put these together in a table 'B' which look like the folowing:
ID | emp | Current_City | Birth_City
So in the end, Table B must be filled like this:
ID | emp | Current_City | Birth_City
1 | 101 | Tokyo | New York
(in some cases, one of the 2 can be empty/null)
Any suggestions on how to do this? I haven't been able to found much information on this myself.
I did this exercise (using sql-server) with PIVOT TABLE:
select emp, Z 'Current_City' , Y 'Birth_City' from
(
select emp,City_Type, City from TABLE__A
) x
pivot
(
max(City) FOR City_Type in (Z,Y)
) AS PivotTable
below the result achieved, with an example of a NULL value for the field Current_City
emp Current_City Birth_City
101 Tokyo New York
102 NULL London
I omitted ID, it is not clear from the request if and what needs to be added ( minimum, maximum on emp , or a new calculated or due to the INSERT in TABLE__B)
This previous query can be used to insert into TABLE__B
INSERT INTO [TABLE__B]
([emp]
,[Current_City]
,[Birth_City])
...
First create your TableB and populate [Current_City] and [Birth_City] with nulls, but make sure [emp] is there and it has all the employees you intend to modify.
Then run this SQL modified to fit your database / schema / table names / etc:
update TableB
set Current_City = (select City
from TableA
where TableA.City_Type ='Z'
and TableA.emp = TableB.emp),
Birth_City = (select City
from TableA
where TableA.City_Type ='Y'
and TableA.emp = TableB.emp)
One way would be to use the PIVOT transformation.

SQL Update first instance of record - SQL Server

I have an import process that needs to update the "End Date" of the older record in a table. The import comes from someone else and some of the details I need to match specific user info are missing. What I need to do is update the older record, which, in the example below, would be the UserID = 1, and not update UserID = 4.
I have the following sql to update, but as you can tell, it updates both records:
UPDATE t1
SET t1.[EndDate] = t2.[EndDate]
FROM ExistingUser AS t1, ImportUser AS t2
WHERE (t2.[uName] = t1.[uName]) AND (t1.[EndDate] IS NULL);
Disclaimer: I did not create the database and cannot redesign the tables, so please take pity on me. Thanks!!!
ExistingUser - Table
UserID uName BeginDate EndDate
1 John 01/01/2013
2 Mary 05/01/2014 04/30/2015
3 Bob 12/01/2014
4 John 06/01/2015
ImportUser - Table
uName EndDate
John 05/31/2015
This can be done by building a subquery that numbers the entries by username that do not have the EndDate populated. This subquery would look like this:
SELECT [UserID], [uName],
ROW_NUMBER() OVER (PARTITION BY [uName] ORDER BY [BeginDate]) AS IX
FROM ExistingUser
WHERE [EndDate] IS NULL
Demo: http://www.sqlfiddle.com/#!6/72c49/2
| UserID | uName | IX |
|--------|-------|----|
| 3 | Bob | 1 |
| 1 | John | 1 |
| 4 | John | 2 |
In this subquery, the two records with the same uName of John are numbered in order by BeginDate. Now if we join the original UPDATE query to this subquery searching for records with IX=1, we will only update the earliest rows.
WITH EarliestExistingUser AS (
SELECT [UserID], [uName],
ROW_NUMBER() OVER (PARTITION BY [uName] ORDER BY [BeginDate]) AS IX
FROM ExistingUser
WHERE [EndDate] IS NULL
)
UPDATE t1
SET t1.[EndDate] = t2.[EndDate]
FROM ExistingUser AS t1
JOIN EarliestExistingUser AS t1f ON t1.UserID = t1f.UserID
,ImportUser AS t2
WHERE (t2.[uName] = t1.[uName]) AND (t1.[EndDate] IS NULL)
AND (t1f.IX = 1)
Demo: http://www.sqlfiddle.com/#!6/72c49/3

Resources