SQL - Selecting Multi Keyword Values From Single Field - sql-server

So I'm building a SQL view for more readable data I'll use to report on. I have one table that houses field data (keyword ids) for questions within a section on a website. Column3 is a multi-keyword field and stored in the DB delimited by chr(185).
Table 1
Column1 | Column2 | Column3
4456 | 2323 | ¹8661¹8662¹
I have a second table that houses keyword ids and their values.
Table 2
Column1 | Column2
4456 | val1
2323 | val2
8661 | val3
8662 | val4
The view joins the tables to display the keyword values, but I'm not sure how to handle the multi-keyword field (looking to format the result like below).
View Table
Column1 | Column2 | Column3
val1 | val2 | val3; val4
Would I need some sort of function to accomplish this or is there another way?

Here's one way to do it using some of SQL Server's XML features ..sorry the column and table names make this kind of difficult to read, here's a fiddle:
with mapped (c1, c2, val) as (
select t1.column1, t1.column2, t2.column2
from table1 t1
cross apply (select convert(xml, '<_' + replace(substring(column3,2,len(column3)-2), '¹', '/><_') + '/>')) s(c)
cross apply c.nodes('*') x(n)
join table2 t2 on t2.column1=n.value('fn:substring(local-name(.),2)', 'int'))
select
column1 = c1.column2,
column2 = c2.column2,
column3 = (select val + '; ' from mapped where c1=t1.column1 and c2=t1.column2 for xml path(''))
from table1 t1
join table2 c1 on c1.column1=t1.column1
join table2 c2 on c2.column1=t1.column2

Related

Simplify multiple joins

I have a Claims table with 70 columns, 16 of which contain diagnosis codes. The codes mean nothing, so I need to pull the descriptions for each code located in a separate table.
There has to be a simpler way of pulling these code descriptions:
-- This is the claims table
FROM
[database].[schema].[claimtable] AS claim
-- [StagingDB].[schema].[Diagnosis] table where the codes located
-- [ICD10_CODE] column contains the code
LEFT JOIN
[StagingDB].[schema].[Diagnosis] AS diag1 ON claim.[ICDDiag1] = diag1.[ICD10_CODE]
LEFT JOIN
[StagingDB].[schema].[Diagnosis] AS diag2 ON claim.[ICDDiag2] = diag2.[ICD10_CODE]
LEFT JOIN
[StagingDB].[schema].[Diagnosis] AS diag3 ON claim.[ICDDiag3] = diag3.[ICD10_CODE]
-- and so on, up to ....
LEFT JOIN
[StagingDB].[schema].[Diagnosis]AS diag16 ON claim.[ICDDiag16] = diag16.[ICD10_CODE]
-- reported column will be [code_desc]
-- ie:
-- diag1.[code_desc] AS Diagnosis1
-- diag2.[code_desc] AS Diagnosis2
-- diag3.[code_desc] AS Diagnosis3
-- diag4.[code_desc] AS Diagnosis4
-- etc.
I think what you are doing is already correct in given scenario.
Another way can be from programming point of view or you can give try and compare ther performace.
i) Pivot Claim table on those 16 description columns.
ii) Join the Pivoted column with [StagingDB].[schema].[Diagnosis]
Another way can be to put [StagingDB].[schema].[Diagnosis] table in some #temp table
instead of touching large Staging 16 times.
But for data analysis has to be done to decide if there is any way.
You can go for UNPIVOT of the claimTable and then join with Diagnosis table.
TEST SETUP
create table #claimTable(ClaimId INT, Diag1 VARCHAR(10), Diag2 VARCHAR(10))
CREATE table #Diagnosis(code VARCHAR(10), code_Desc VARCHAR(255))
INSERT INTO #ClaimTable
VALUES (1, 'Fever','Cold'), (2, 'Headache','toothache');
INSERT INTO #Diagnosis
VALUEs ('Fever','Fever Desc'), ('cold','cold desc'),('headache','headache desc'),('toothache','toothache desc');
Query to Run
;WITH CTE_Claims AS
(SELECT ClaimId,DiagnosisNumeral, code
FROM #claimTable
UNPIVOT
(
code FOR DiagnosisNumeral in ([Diag1],[Diag2])
) as t
)
SELECT c.ClaimId, c.code, d.code_Desc
FROM CTE_Claims AS c
INNER JOIN #Diagnosis as d
on c.code = d.code
ResultSet
+---------+-----------+----------------+
| ClaimId | code | code_Desc |
+---------+-----------+----------------+
| 1 | Fever | Fever Desc |
| 1 | Cold | cold desc |
| 2 | Headache | headache desc |
| 2 | toothache | toothache desc |
+---------+-----------+----------------+

Postgres - join on array values

Say I have a table with schema as follows
id | name | tags |
1 | xyz | [4, 5] |
Where tags is an array of references to ids in another table called tags.
Is it possible to join these tags onto the row? i.e. replacing the id numbers with the values for thise rows in the tags table such as:
id | name | tags |
1 | xyz | [[tag_name, description], [tag_name, description]] |
If not, I wonder if this an issue with the design of the schema?
Example tags table:
create table tags(id int primary key, name text, description text);
insert into tags values
(4, 'tag_name_4', 'tag_description_4'),
(5, 'tag_name_5', 'tag_description_5');
You should unnest the column tags, use its elements to join the table tags and aggregate columns of the last table. You can aggregate arrays to array:
select t.id, t.name, array_agg(array[g.name, g.description])
from my_table as t
cross join unnest(tags) as tag
join tags g on g.id = tag
group by t.id;
id | name | array_agg
----+------+-----------------------------------------------------------------
1 | xyz | {{tag_name_4,tag_description_4},{tag_name_5,tag_description_5}}
(1 row)
or strings to array:
select t.id, t.name, array_agg(concat_ws(', ', g.name, g.description))
...
or maybe strings inside a string:
select t.id, t.name, string_agg(concat_ws(', ', g.name, g.description), '; ')
...
or the last but not least, as jsonb:
select t.id, t.name, jsonb_object_agg(g.name, g.description)
from my_table as t
cross join unnest(tags) as tag
join tags g on g.id = tag
group by t.id;
id | name | jsonb_object_agg
----+------+------------------------------------------------------------------------
1 | xyz | {"tag_name_4": "tag_description_4", "tag_name_5": "tag_description_5"}
(1 row)
Live demo: db<>fiddle.
not sure if this is still helpful for anyone, but unnesting the tags is quite a bit slower than letting postgres do the work directly from the array. you can rewrite the query and this is generally more performant because the g.id = ANY(tags) is a simple pkey index scan without the expansion step:
SELECT t.id, t.name, ARRAY_AGG(ARRAY[g.name, g.description])
FROM my_table AS t
LEFT JOIN tags AS g
ON g.id = ANY(tags)
GROUP BY t.id;

How to check 'AAA' vs 'AAa' in TSQL using SQL_Latin1_General_CP1_CI_AS collation

I've a problem and I just cannot seem to get around it.
My database has the above collation setting, SQL_Latin1_General_CP1_CI_AS, which I cannot change and I have for one column which I imported from another database which has SQL_Latin1_General_CP1_CS_AS, both the values 'AAA' and 'AAa'.
Further more by joining on this field I have to retrieve from another table (same collation as my db, SQL_Latin1_General_CP1_CI_AS), the value for another column.
My problem is that due to this collation, SQL_Latin1_General_CP1_CI_AS, both value ('AAA' and 'AAa'), are "seen" as the same and my join returns value for both 'AAA' and 'AAa' where it should return a matching join only for 'AAa'.
Is there a "trick" which could help me to filter only the 'AAa's ? meaning to emulate somehow the SQL_Latin1_General_CP1_CS_AS collation?
Regards,
LE:
I have two tables, Table1 and Table2. Table1 has the column1 - ID, column2- currency. The Table2 has the columns column1-currency and the column2 - rate. both columns, from table1 and table2 have values which are insensitive (eg. EUR and EUr). I want to retrieve the value from table2, the rate value, only for the rows which match the exact currency. I've tried
Select t1.id
, t1.currency
, t2.rate
from table1 t1
inner join table2 t2 on t1.currency=t2.currency COLLATE SQL_Latin1_General_CP1_CS_AS
But it;s not working as, for the ids which are have EUR i got the rate, although I should have only the ids which are only having EUr as rate.
select *
from t
where col collate SQL_Latin1_General_CP1_CS_AS = 'AAa'
rextester demo: http://rextester.com/CZAWR50665
returns AAa from this test setup:
create table t (col varchar(32))
insert into t values
('AAA'),('AAa'),('aAa'),('AaA')
For a join, you could use collate like so:
select *
from t
inner join t as t2
on t.col collate SQL_Latin1_General_CP1_CS_AS = t2.col;
returns
+-----+-----+
| col | col |
+-----+-----+
| AAA | AAA |
| AAa | AAa |
| aAa | aAa |
| AaA | AaA |
+-----+-----+
Usually, SQL Server is not case sensitive(Just Like most other SQL Languages - MySQL has a feature to enable or disable case sensitive feature. refer-> Is SQL syntax case sensitive?). So if you are using SQL Server then there might be some other issue in the data like some invalid characters. like char(9) or char(10) etc. But if you are sure that the issue is the case, not anything else then Try joining the values by converting the case of both files to either upper or Lower Case. Something like the below
SELECT
*
FROM table1 t1
INNER JOIN table2 t2
ON UPPER(t1.Colname) = UPPER(t2.Colname)

TSQL - View with cross apply and pivot

this is my base table:
docID | rowNumber | Column1 | Column2 | Column3
I use cross apply and pivot to transform the records in Column1 to actual columns and use the values in column2 and column3 as records for the new columns. In my fiddle you can see base and transformed select statement.
I have columns like Plant and Color which are numbered, e.g. Plant1, Plant2, Plant3, Color1, Color2 etc.
For each plant that exists in all plant columns I want to create a new row with a comma separated list of colors in one single column.
What I want to achieve is also in below screenshot:
This should become a view to use in Excel. How do I need to modify the view to get to the desired result?
Additional question: The Length-column is numeric. Is there any way to switch the decimal separator from within Excel as a user and apply it to this or all numeric column(s) so that it will be recognized by Excel as a number?
I used to have an old php web query where I would pass the separator from a dropdown cell in Excel as a parameter.
Thank you.
First off, man the way your data is stored is a mess. I would recommend reading up on good data structures and fixing yours if you can. Here's a TSQL query that gets you the data in the correct format.
WITH CTE_no_nums
AS
(
SELECT docID,
CASE
WHEN PATINDEX('%[0-9]%',column1) > 0
THEN SUBSTRING(column1,0,PATINDEX('%[0-9]%',column1))
ELSE column1
END AS cols,
COALESCE(column2,column3) AS vals
FROM miscValues
WHERE column2 IS NOT NULL
OR column3 IS NOT NULL
),
CTE_Pivot
AS
(
SELECT docID,partNumber,prio,[length],material
FROM CTE_no_nums
PIVOT
(
MAX(vals) FOR cols IN (partNumber,prio,[length],material)
) pvt
)
SELECT A.docId + ' # ' + B.vals AS [DocID # Plant],
A.docID,
A.partNumber,
A.prio,
B.vals AS Plant,
A.partNumber + '#' + A.material + '#' + A.[length] AS Identification,
A.[length],
SUBSTRING(CA.colors,0,LEN(CA.colors)) colors --substring removes last comma
FROM CTE_Pivot A
INNER JOIN CTE_no_nums B
ON A.docID = B.docID
AND B.cols = 'Plant'
CROSS APPLY ( SELECT vals + ','
FROM CTE_no_nums C
WHERE cols = 'Color'
AND C.docID = A.docID
FOR XML PATH('')
) CA(colors)
Results:
DocID # Plant docID partNumber prio Plant Identification length colors
---------------- ------ ---------- ---- ---------- ------------------ ------- -------------------------
D0001 # PlantB D0001 X001 1 PlantB X001#MA123#10.87 10.87 white,black,blue
D0001 # PlantC D0001 X001 1 PlantC X001#MA123#10.87 10.87 white,black,blue
D0002 # PlantA D0002 X002 2 PlantA X002#MA456#16.43 16.43 black,yellow
D0002 # PlantC D0002 X002 2 PlantC X002#MA456#16.43 16.43 black,yellow
D0002 # PlantD D0002 X002 2 PlantD X002#MA456#16.43 16.43 black,yellow

Bulk Replace in SQL-Server

Is it possible to do bulk replace without while loop or what is the best way
Table-1
+-------+--------+
| name | value |
+-------+--------+
| #1# | one |
| #2# | two |
| #3# | three |
+-------+--------+
Table-2 (updated: there is more than one different tokens in table2)
+-----------------------+
| col1 |
+-----------------------+
| string #1# string #2# |
| string #2# string #1# |
| string #3# string #2# |
+-----------------------+
I like to replace all token from Table-2 with Table-1's value column respectively.
Expected Result
+-------------------------+
| col1 |
+-------------------------+
| string one string two |
| string two string one |
| string three string two |
+-------------------------+
Current solution with While loop
declare #table1 table(name nvarchar(50),value nvarchar(50))
insert into #table1 values('#1#','one'),('#2#','two'),('#1#','three')
declare #table2 table(col1 nvarchar(50))
insert into #table2 values('string #1# string #2#'),('string #2# string #1#'),('string #3# string #2#')
WHILE EXISTS (SELECT 1 FROM #table2 t2 INNER JOIN #table1 t1 ON CHARINDEX(t1.name,[col1])>0)
BEGIN
UPDATE #table2
SET col1=REPLACE(col1,name,value)
FROM #table1
WHERE CHARINDEX(name,[col1])>0
END
select * from #table2
Thanks
I suppose you use Sql Server (you've tagged with tsql):
I've run this query on Sql fiddle with 2012 version, but on my PC I've tried with 2008r2 version.
You can procede in this way:
UPDATE table2
SET col1 = REPLACE(col1,
(SELECT name FROM table1 WHERE col1 LIKE '%' + table1.NAME + '%'),
(SELECT value FROM table1 WHERE col1 LIKE '%' + table1.NAME + '%'))
Sql Fiddle
If you want show only the value without UPDATE you can proceed in this way:
SELECT REPLACE(T2.col1, T1.name, T1.value)
FROM table1 T1
JOIN table2 T2
ON T2.col LIKE '%' T1.name + '%'
EDIT
After editing of question / comment my answer is not complete because on the same row can exist more one token. I'm thinking... :)
I thought: :D
IMHO: You must create a loop on your table because the presence of several tokens don't resolve with a single UPDATE statement with subquery, because as you written, the subquery return more than one value.
In Sql Server the REPLACE function change only token, so if you want change in one step two token you must nest your REPLACE function, but we have a number undefined of token in a row so we can't prevent to apply the exact number of REPLACE. An help you can have using a cursor and a dynamic SQL query build on runtime, so you can do a single UPDATE per row. If you want a guide line to use CURSOR and DYNAMIC SQL, please write me. Good night ;)
You can do bulk replacement with this simple piece of code:
update #table2
set col1= left(a.col1,6)+' ' + b.value from #table2 a
join #table1 b on b.name=substring(a.col1,8,3)
select * from #table2
Basically, it updates the column with a new field value.

Resources