Im writing a report to show what features our clients want when building there home.
I want to left join 2 tables however the way the data is stored its making it difficult for me to do the join.
Table 1 tbl_Main_Holding has a field called Requirements and the data is stored as a varchar and can have multiple values like 1,4,7
1 = "Eco-Build"
4 = "Conservatory"
7 = "Basement"
Table 2 [tbl_Features] has the fields ID (INT) and Description (Varchar)
SELECT * FROM dbo.tbl_Main_Holding AS rm
LEFT JOIN [dbo].[tbl_Features] AS f
ON rm.Requirements = f.id
The join below wont work as i would need to convert the varchar to INT
However that's not my problem my problem is how do i show the results of clients that have selected multiple feature, how dopes this left join work?
Im using SQL Server 2008 and the data for both tables are store as so.
Step 1 is to go and find the person that designed this table structure (even if it is you) then whack them round the head with a stick.
Step 2 is to redesign the tables, a junction table is what is required here, not stuffing multiple integers into a single varchar column. For good measure at the end of step two you should hit the original designer with a stick again.
CREATE TABLE tbl_Main_Holding_Requirements
(
MainHoldingID INT NOT NULL, --FK TO `tbl_main_Holding`
FeatureID INT NOT NULL -- FK TO Require `tbl_Features`
);
Now, each requirement represents a row in this table, rather than a new item on your list, so your join is now simple:
SELECT *
FROM dbo.tbl_Main_Holding AS rm
LEFT JOIN dbo.tbl_Main_Holding_Requirements AS r
ON r.MainHoldingID = rm.ID
LEFT JOIN [dbo].[tbl_Features] AS f
ON f.ID = r.FeatureID;
If you need to bring this back up to a comma delimited list, then you can do it in the presentation layer, or with SQL-Server's XML Extensions:
SELECT *,
Features = STUFF(f.Features.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM dbo.tbl_Main_Holding AS rm
OUTER APPLY
( SELECT CONCAT(',', f.Description)
FROM dbo.tbl_Main_Holding_Requirements AS r
INNER JOIN [dbo].[tbl_Features] AS f
ON f.ID = r.FeatureID
WHERE r.MainHoldingID = rm.ID
FOR XML PATH(''), TYPE
) f (Features);
If step two is not possible, then you can get around this using LIKE:
SELECT *
FROM dbo.tbl_Main_Holding AS rm
LEFT JOIN [dbo].[tbl_Features] AS f
ON ',' + rm.Requirements + ',' LIKE '%,' + CONVERT(VARCHAR(10), f.ID) + ',%';
Once again, if the features need to be reduced back to a single row, then you can use XML extensions again:
SELECT *,
Features = STUFF(f.Features.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM dbo.tbl_Main_Holding AS rm
OUTER APPLY
( SELECT CONCAT(',', f.Description)
FROM [dbo].[tbl_Features] AS f
WHERE ',' + rm.Requirements + ',' LIKE '%,' + CONVERT(VARCHAR(10), f.ID) + ',%'
FOR XML PATH(''), TYPE
) f (Features);
Another option is to split the comma separated values into a list using some kind of Split function, but as the testing in this article shows, if you don't need to access the individual values from the list, it is more efficient to just use LIKE.
As I wrote in my comment, please read Is storing a delimited list in a database column really that bad?
You really should normalize your database to avoid these things.
Now, assuming you can't change the database schema, there is a simple trick with like that you can use:
SELECT * FROM dbo.tbl_Main_Holding AS rm
LEFT JOIN [dbo].[tbl_Features] AS f
ON ',' + rm.Requirements +',' LIKE '%,' + CAST(f.id as varchar(10)) + ',%'
Note that I've added a comma before and after the rm.Requirements column and also before and after the f.id column.
Related
I am trying to make a table that shows all the patients checked in to the hospital. I can join client, patient, check-in, appointment data just fine, but the alerts table has multiple rows which I am trying to aggregate/concatenate/rollup. I tried to create an XML statement but it doesn't seem to be working. I would like for all the alerts for the patient to be a single comma-separated string in one row. here is what I have:
select DISTINCT
a.ResourceAbbreviation1, a.AppointmentType, a.StatusNum, c.sLastName,
pt.Name, pt.WeightString, pt.AgeShort, pt.Breed, pt.Species, pt.Gender, pt.NewPatient,
(select SUBSTRING((
select ',' + al.stext AS 'data()'
FOR XML PATH('')
), 2, 9999) as cautions),
pt.Classification, p.kPatientId
from dbo.entpatients pt
join alerts al
on al.kpatientid = pt.IDPatient
join dbo.PatientCheckIns P
on pt.IDPatient=p.kPatientId
join dbo.EntAppointments a
on a.IDPatient = p.kPatientId
join dbo.clients c
on c.kID=a.IDClient
where cast (a.StartTime as date) = cast(getdate() as date)
and a.StatusNum=4;
You need to move your alerts table reference inside the subselect as a FROM.
I also suggest using AS text() instead of AS data() (or omitting it entirely) to avoid unwanted spaces, and using STUFF() instead of SUBSTRING() to strip the leading comma. The extra nested SELECT is also unneeded.
select DISTINCT
a.ResourceAbbreviation1, a.AppointmentType, a.StatusNum, c.sLastName,
pt.Name, pt.WeightString, pt.AgeShort, pt.Breed, pt.Species, pt.Gender, pt.NewPatient,
STUFF((
select ',' + al.stext AS 'text()'
from alerts al
where al.kpatientid = pt.IDPatient
FOR XML PATH('')
), 1, 1, '') as cautions,
pt.Classification,
p.kPatientId
from dbo.entpatients pt
join dbo.PatientCheckIns P
on pt.IDPatient=p.kPatientId
join dbo.EntAppointments a
on a.IDPatient = p.kPatientId
join dbo.clients c
on c.kID=a.IDClient
where cast (a.StartTime as date) = cast(getdate() as date)
and a.StatusNum=4;
If there is any chance that your alert text may contain special XML characters (such as <, >, or &) that might get encoded, I recommend a slightly modified form that uses the .value() function to extract the concatenated text.
STUFF((
select ',' + al.stext AS 'text()'
from alerts al
where al.kpatientid = pt.IDPatient
FOR XML PATH(''), TYPE
).value('text()[1]','nvarchar(max)'), 1, 1, '') as cautions,
This avoids seeing encodings like <, >, and & in the results. See this for more.
If you are using SQL server 2017 or later, you could also switch to the relatively new STRING_AGG() function. See here.
(
select STRING_AGG(',', al.stext)
from alerts al
where al.kpatientid = pt.IDPatient
) as cautions,
I would also review your need for the DISTINCT. In some cases, it is appropriate when you knowingly expect your query to return duplicate rows that you wish to eliminate. For example, if you know you may have multiple visits by the same patient with identical selected data, DISTICT may be appropriate. However, if you have dropped it in to eliminate duplicates without knowing why, it may be a sign of an under-constrained join or other logic problems that warrant a further look.
I have a 3 tables from which contain this data:
Table 1:
Table 2:
Table 3:
Output:
I have tried using Pivot but it has to have an aggregate function in it.
SELECT
project_code, project_name, fk_prj_project_id,
[A], [B], [C], [D]
FROM
(SELECT
project_code, project_name, employee_name,
fk_prj_project_id, fk_prj_project_id AS nm,
activity_details
FROM
PRJ_MST_PROJECT AS a
LEFT JOIN
PRJ_TNS_DAILY_SUMMARY AS b ON a.pk_prj_project_id = b.fk_prj_project_id
LEFT JOIN
HRM_EMP_MST_EMPLOYEE AS c ON b.fk_hrm_emp_employee_id = c.pk_hrm_emp_employee_id
WHERE
a.project_status = 0
AND b.transaction_status = 1
AND CONVERT(date, b.transaction_date, 103) = CONVERT(date, '15/04/2021', 103)) x
PIVOT
(MAX(nm)
FOR nm IN ([A], [B], [C], [D])
) p
The problem is you set your PIVOT to look for values of nm in A, B, C, and D, but nm is an alias for fk_prj_project_id, which has possible values of 1, 2, 3, 4, and 5. So there are no A, B, C, or D values to be had. I don't even see a name for the column that holds A, B, C, and D, but whatever column that is needs to be what you put in the "FOR ___ IN" section of your pivot.
Test your query by commenting out the reference to the pivot columns in the SELECT and comment out the word PIVOT and everything after it and re-run your query. You should see some column with values A, B, C, D. If you don't, fix your query so you do. Once you do, that column is what you PIVOT on (put it between FOR and IN in the pivot block).
Oh, and if you provide data in a usable format people might run your query and give you directly usable results, it's a lot to ask to have people enter your data to get to help you so meet them half way. A link to sqlfiddle is ideal, but even just a bunch of DECLARE #T1 and INSERT INTO T1 VALUES statements is usually enough to get significantly better help.
EDIT:
Nice job with the Fiddle!
OK, so using your data, we can test out actual queries. For PIVOT to work, we need a column to look up (employee name), a column to aggregate (activity_details), and some columns that will be constant across the rows produced (the project's name and ID). You're working with text not numbers, so your aggregation can't be mathematical, leaving you with pretty much just MAX or MIN. To make sure you get the right (newest) one, I first built a table of comments and numbered them by how new they were, then I picked just the newest comment for each (project, user) pair. cteCommentNewest is the result of that.
Now with a clean (and verified) table to pivot, the actual pivot syntax is simple. Well, as simple as Pivot can be, it's inherently pretty confusing IMHO, but structuring it this way keeps the actual PIVOT as clean as possible.
Note that the query is in twice, I tested it as a static query before converting it to dynamic because it's much easier to troubleshoot a static query, then I left it in in case you want to experiment with it. You don't need it for the final solution to work.
Here's the final code, fully tested and producing the specified output:
DECLARE #cols3 AS NVARCHAR(MAX)
DECLARE #query3 AS NVARCHAR(MAX)=''
DECLARE #dt varchar(100)='14/04/2021'
select #cols3 = STUFF((SELECT ',' + QUOTENAME(employee_name)
from dbo.HRM_EMP_MST_EMPLOYEE
order by employee_name
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
--SELECT #cols3 --Test column list for dynamic query
--Test the core functions of pivot before making dynamic
;with cteCommentsAll as (
SELECT P.project_code , P.project_name, C.activity_details , E.employee_name
, ROW_NUMBER () over (PARTITION BY P.project_code , E.employee_name ORDER BY C.transaction_date DESC) as Newness
FROM dbo.PRJ_MST_PROJECT as P --Projects
LEFT OUTER JOIN dbo.PRJ_TNS_DAILY_SUMMARY as C --Comments on projects
ON P.pk_prj_project_id = C.fk_prj_project_id --Get all projects, then all comments for each project
LEFT OUTER JOIN dbo.HRM_EMP_MST_EMPLOYEE as E --Employees who commented
on E.pk_hrm_emp_employee_id = C.fk_hrm_emp_employee_id
), cteCommentsNewest as (
SELECT project_code , project_name, activity_details , employee_name
FROM cteCommentsAll WHERE Newness = 1 --Only one comment per user per project of CROSS problems
)
SELECT *
FROM cteCommentsNewest as N --TEST up to this point to see the raw table
PIVOT (MAX(activity_details) FOR employee_name IN (A, B, C) ) as P
--Put the working query, modified for dynamic columns, into a variable
set #query3 = N'
;with cteCommentsAll as (
SELECT P.project_code , P.project_name, C.activity_details , E.employee_name
, ROW_NUMBER () over (PARTITION BY P.project_code , E.employee_name ORDER BY C.transaction_date DESC) as Newness
FROM dbo.PRJ_MST_PROJECT as P --Projects
LEFT OUTER JOIN dbo.PRJ_TNS_DAILY_SUMMARY as C --Comments on projects
ON P.pk_prj_project_id = C.fk_prj_project_id --Get all projects, then all comments for each project
LEFT OUTER JOIN dbo.HRM_EMP_MST_EMPLOYEE as E --Employees who commented
on E.pk_hrm_emp_employee_id = C.fk_hrm_emp_employee_id
), cteCommentsNewest as (
SELECT project_code , project_name, activity_details , employee_name
FROM cteCommentsAll WHERE Newness = 1 --Only one comment per user per project of CROSS problems
)SELECT *
FROM cteCommentsNewest as N
PIVOT (MAX(activity_details) FOR employee_name IN (' + #cols3 + ') ) as P
'
exec sp_executesql #query3
which produces the following output
project_code
project_name
A
B
C
MOA20171
Project A
some remark By Employee A on 14
NULL
some remark By Employee C on 14
MOA20172
Project B
NULL
NULL
some remark By Employee C on 15
MOA20173
Project C
NULL
NULL
NULL
Wondering if you could help me with something. We get agency invoices for workers who worked for different clients on our behalf. Now these Invoices have worker's surname and Forenames in completely random order. Now my SSIS package import these invoices to a database. I am trying to automate few reports based on this data. Now i would like to Join this Invoice data with our data warehouse Employee and Shift table. Only fields i can join this on is Name and Shift date. How can i Join the DW Employee table to Imported Invoices as Names in Invoice tables are all over the place. Any nice SQL function that can help ,e. I have tried Joins using Concat and Like but it didn't work. Suggestions would be highly appreciated.
Thanks and Regards
Khurram
This is a real challenge for you. Others have already suggested that you might not want to do this, but I will give it a shot anyway. I think you need to create two parts, the first part and the second part, in both tables. Then you can compare them. The following code might lead you in the right direction:
SELECT LEFT ('John Jackson', CHARINDEX (' ', 'John Jackson') - 1) AS FirstPart,
SUBSTRING ('John Jackson', CHARINDEX (' ', 'John Jackson'), LEN ('John Jackson') - CHARINDEX (' ', 'John Jackson') + 1) AS SecondPart
SELECT *
FROM TableA AS A
LEFT JOIN TableB AS B ON A.FirstPart = B.FirstPart AND A.SecondPart = B.SecondPart
UNION ALL
SELECT * FROM TableA
LEFT JOIN TableB AS B2 ON A.FirstPart = B2.SecondPart AND A.SecondPart = B2.FirstPart
Good luck in handling this!
As has already been said, this is a challenge with no good answer outside of a redesign. But, like SQL_M, I figured I'd try it. Assuming that dwEmployee is the source of record, I initially went with something similar to SQL_M's approach with this:
SELECT *
FROM dbo.invoiceTable i
JOIN dwEmployee d
ON
(
d.[DW_personname] LIKE CONCAT('%',SUBSTRING(i.[Name],0,CHARINDEX(' ',i.[name])),' %')
OR d.[DW_personname] LIKE CONCAT('% ',SUBSTRING(i.[Name],0,CHARINDEX(' ',i.[name])),'%')
)
AND
(
d.[DW_personname] LIKE CONCAT('%',SUBSTRING(i.[Name],CHARINDEX(' ',i.[name])+1,LEN(I.[Name]) - CHARINDEX(' ',i.[name])),' %')
OR d.[DW_personname] LIKE CONCAT('% ',SUBSTRING(i.[Name],CHARINDEX(' ',i.[name])+1,LEN(I.[Name]) - CHARINDEX(' ',i.[name])),'%')
)
It works great until it hits a name that has three distinct parts... and then I couldn't be buggered to get it working - I'm leaving it here in case it helps you build on something.
I ended up using a CTE and STRING_SPLIT combination to make potential matches across the tables. You had mentioned that you can join on Shiftdate, but didn't say exactly how you wanted the results to look, and it seemed the name join was the big issue, so I just focused on that. Depending on what version of SQL you're on STRING_SPLIT might not be available to you, in which case you'd have to use a different split function to make this method work. (demo here http://sqlfiddle.com/#!18/4bd31/2/1 )
CREATE TABLE invoiceTable
(
[Invoice_ID] INT, [ShiftDate] DATE, [Ref_Num] INT, [Name] VARCHAR(200)
)
CREATE TABLE dwEmployee
(
[Shiftdate] DATE, [DW_personname] VARCHAR(200), [Timesheetserial] VARCHAR(200)
)
INSERT INTO dbo.invoiceTable
VALUES
(807, '2018-09-02',83789315,'ABCD EFGH'), (195, '2018-09-14',83789315,'EFGH ABCD'), (227, '2018-09-15',83789315,'WXYZ EFGH-ABCD'), (246, '2018-09-16',83789315,'JKLM OPQR'),(1398, '2018-09-19',83789315,'STUV IJKKL WXYZ')
INSERT INTO dbo.dwEmployee
VALUES
( '2018-10-22','EFGH ABCD','Z3746543'), ( '2018-10-29','EFIH ABCD','Z3746550'), ( '2018-10-26','EFGH-ABCD WXYZ','Z3746557'),( '2018-10-26','EFGH-ABCD WXYZ','Z3746557')
--my additional insert for testing three-part name
INSERT INTO dbo.dwEmployee
VALUES
( '2018-10-31','WXYZ STUV IJKKL','Z0000000');
--work
WITH nameSplitter AS
(
SELECT
[Invoice_ID], CAST(NULL AS VARCHAR(200)) AS [Timesheetserial], [Value]
FROM invoiceTable
CROSS APPLY STRING_SPLIT([Name], ' ')
UNION ALL
SELECT
NULL, [Timesheetserial], Value
FROM dwEmployee
CROSS APPLY STRING_SPLIT([DW_personname], ' ')
),
potentialMatches AS
(
SELECT
ns1.[Invoice_ID], ns2.[Timesheetserial]
FROM nameSplitter ns1
JOIN nameSplitter ns2
ON ns2.value = ns1.value
WHERE ns1.[Invoice_ID] IS NOT NULL
AND ns2.[Timesheetserial] IS NOT NULL
GROUP BY ns1.[Invoice_ID], ns2.[Timesheetserial]
HAVING COUNT(ns2.[Timesheetserial]) = (SELECT COUNT([Timesheetserial]) FROM nameSplitter WHERE [Timesheetserial] = ns2.[Timesheetserial] )
)
SELECT i.*, d.*
FROM potentialMatches p
join dbo.invoiceTable i
ON P.[Invoice_ID] = I.[Invoice_ID]
JOIN dwEmployee d
ON p.[Timesheetserial] = d.[Timesheetserial]
If you only got one forename and one surname, you could try something like this:
....
FROM tableA AS A
INNER JOIN tableB AS B
ON A.Name = B.DW_personname OR A.Name = RIGHT(B.DW_personname, LEN(DW_personname) - CHARINDEX(' ', B.DW_personname) +1) + ' ' + LEFT(B.DW_personname, CHARINDEX(' ', B.DW_personname))
I'm trying to optimize or completely rewrite this query. It takes about ~1500ms to run currently. I know the distinct's are fairly inefficient as well as the Union. But I'm struggling to figure out exactly where to go from here.
I am thinking that the first select statement might not be needed to return the output of;
[Key | User_ID,(User_ID)]
Note; Program and Program Scenario are both using Clustered Indexes. I can provide a screenshot of the Execution Plan if needed.
ALTER FUNCTION [dbo].[Fn_Get_Del_User_ID] (#_CompKey INT)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE #UseID AS VARCHAR(8000);
SET #UseID = '';
SELECT #UseID = #UseID + ', ' + x.User_ID
FROM
(SELECT DISTINCT (UPPER(p.User_ID)) as User_ID FROM [dbo].[Program] AS p WITH (NOLOCK)
WHERE p.CompKey = #_CompKey
UNION
SELECT DISTINCT (UPPER(ps.User_ID)) as User_ID FROM [dbo].[Program] AS p WITH (NOLOCK)
LEFT OUTER JOIN [dbo].[Program_Scenario] AS ps WITH (NOLOCK) ON p.ProgKey = ps.ProgKey
WHERE p.CompKey = #_CompKey
AND ps.User_ID IS NOT NULL) x
RETURN Substring(#UserIDs, 3, 8000);
END
There are two things happening in this query
1. Locating rows in the [Program] table matching the specified CompKey (#_CompKey)
2. Locating rows in the [Program_Scenario] table that have the same ProgKey as the rows located in (1) above.
Finally, non-null UserIDs from both these sets of rows are concatenated into a scalar.
For step 1 to be efficient, you'd need an index on the CompKey column (clustered or non-clustered)
For step 2 to be efficient, you'd need an index on the join key which is ProgKey on the Program_Scenario table (this likely is a non-clustered index as I can't imagine ProgKey to be PK). Likely, SQL would resort to a loop join strategy - i.e., for each row found in [Program] matching the CompKey criteria, it would need to lookup corresponding rows in [Program_Scenario] with same ProgKey. This is a guess though, as there is not sufficient information on the cardinality and distribution of data.
Ensure the above two indexes are present.
Also, as others have noted the second left outer join is a bit confusing as an inner join is the right way to deal with it.
Per my interpretation the inner part of the query can be rewritten this way. Also, this is the query you'd ideally run and optimize before tacking the string concatenation part. The DISTINCT is dropped as it is automatic with a UNION. Try this version of the query along with the indexes above and if it provides the necessary boost, then include the string concatenation or the xml STUFF approaches to return a scalar.
SELECT UPPER(p.User_ID) as User_ID
FROM
[dbo].[Program] AS p WITH (NOLOCK)
WHERE
p.CompKey = #_CompKey
UNION
SELECT UPPER(ps.User_ID) as User_ID
FROM
[dbo].[Program] AS p WITH (NOLOCK)
INNER JOIN [dbo].[Program_Scenario] AS ps WITH (NOLOCK) ON p.ProgKey = ps.ProgKey
WHERE
p.CompKey = #_CompKey
AND ps.User_ID IS NOT NULL
I am taking a shot in the dark here. I am guessing that the last code you posted is still a scalar function. It also did not have all the logic of your original query. Again, this is a shot in the dark since there is no table definitions or sample data posted.
This might be how this would look as an inline table valued function.
ALTER FUNCTION [dbo].[Fn_Get_Del_User_ID]
(
#_CompKey INT
) RETURNS TABLE AS RETURN
select MyResult = STUFF(
(
SELECT distinct UPPER(p.User_ID) as User_ID
FROM dbo.Program AS p
WHERE p.CompKey = #_CompKey
group by p.User_ID
UNION
SELECT distinct UPPER(ps.User_ID) as User_ID
FROM dbo.Program AS p
LEFT OUTER JOIN dbo.Program_Scenario AS ps ON p.ProgKey = ps.ProgKey
WHERE p.CompKey = #_CompKey
AND ps.User_ID IS NOT NULL
for xml path ('')
), 1, 1, '')
from dbo.Program
How can I join two tables, where one of the tables has multiple comma separated values in one column that reference an id in another column?
1st table
Name | Course Id
====================
Zishan | 1,2,3
Ellen | 2,3,4
2nd table
course id | course name
=======================
1 | java
2 | C++
3 | oracle
4 | dot net
Maybe this uglyness, I have not checked results:
select names.name, courses.course_name
from names inner join courses
on ',' + names.course_ids + ',' like '%,' + cast(courses.course_id as nvarchar(20)) + ',%'
First of all your Database structure is not normalized and should have been. Since it is already set up this way , here's how to solve the issue.
You'll need a function to split your string first:
CREATE FUNCTION SPLIT_STRING(str VARCHAR(255), delim VARCHAR(12), pos INT) RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(str, delim, pos),
LENGTH(SUBSTRING_INDEX(str, delim, pos-1)) + 1), delim, '');
Then you'll need to create a view in order to make up for your structure:
CREATE VIEW database.viewname AS
SELECT SPLIT_STRING(CourseID, ',', n) as firstField,
SPLIT_STRING(CourseID, ',', n) as secondField,
SPLIT_STRING(CourseID, ',',n) as thirdField
FROM 1stTable;
Where n is the nth item in your list.
Now that you have a view which generates your separated fields, you can make a normal join on your view, just use your view like you would use a table.
SELECT *
FROM yourView
JOIN table1.field ON table2.field
However since I don't think you'll always have 3 values in your second field from your first table you'll need to tweak it a little more.
Inspiration of my answer from:
SQL query to split column data into rows
and
Equivalent of explode() to work with strings in MySQL
SELECT f.name,s.course_name FROM table1 AS f
INNER JOIN table2 as s ON f.course_id IN (s.course_id)
Use the Below Query For Solution
Select * from table_2 t2 INNER JOIN table_1 t1 on t1.Course Id = t2.course id