Concatenation with a complex query - SQL Server - sql-server

So I've got a query with multiple joins and several rows that I want to put on one line. A couple of PIVOT statements solved most of this problem, but I have one last field with multiple rows (User Names) that I want to concatenate into one column.
I've read about COALESCE and got a sample to work, but I did not know how to combine the variable returned with the other data fields, as it has no key.
I also saw this recommended approach:
SELECT [ID],
STUFF((
SELECT ', ' + CAST([Name] AS VARCHAR(MAX))
FROM #YourTable WHERE (ID = Results.ID)
FOR XML PATH(''),TYPE
/* Use .value to uncomment XML entities e.g. > < etc*/
).value('.','VARCHAR(MAX)')
,1,2,'') as NameValues
FROM #YourTable Results
GROUP BY ID
But again, I'm not sure how to incorporate this into a complex query.
BTW, the users do not have write access to the DB, so cannot create functions, views, tables or even execute functions. So this limits the options somewhat.

Related

Avoid duplicate values in comma delimited sql query

hello I have here a comma delimited query:
select [Product_Name]
,(select h2.Location_name + ', ' from (select distinct * from [dbo].[Product_list]) h2 where h1.Product_Name = h2.Product_Name
order by h2.Product_Name for xml path ('')) as Location_name
,(select h2.[Store name] + ', ' from [dbo].[Product_list] h2 where h1.Product_Name = h2.Product_Name
order by h2.Product_Name for xml path ('')) as store_name, sum(Quantity) as Total_Quantity from [dbo].[Product_list] h1
group by [Product_Name]
but this query shows duplicated data in comma delimited form, my problem is how will I only show the distinct values of the column in comma delimited form? can anyone please help me?
Well, if you don't SELECT DISTINCT * FROM dbo.Product_list and instead SELECT DISTINCT location_name FROM dbo.Product_list, which is anyway the only column you need, it will return only distinct values.
T-SQL supports the use of the asterisk, or “star” character (*) to
substitute for an explicit column list. This will retrieve all columns
from the source table. While the asterisk is suitable for a quick
test, avoid using it in production work, as changes made to the table
will cause the query to retrieve all current columns in the table’s
current defined order. This could cause bugs or other failures in
reports or applications expecting a known number of columns returned
in a defined order. Furthermore, returning data that is not needed can
slow down your queries and cause performance issues if the source
table contains a large number of rows. By using an explicit column
list in your SELECT clause, you will always achieve the desired
results, providing the columns exist in the table. If a column is
dropped, you will receive an error that will help identify the problem
and fix your query.
Using SELECT DISTINCT will filter out duplicates in the result set.
SELECT DISTINCT specifies that the result set must contain only unique
rows. However, it is important to understand that the DISTINCT option
operates only on the set of columns returned by the SELECT clause. It
does not take into account any other unique columns in the source
table. DISTINCT also operates on all the columns in the SELECT list,
not just the first one.
From Querying Microsoft SQL Server 2012 MCT Manual.

Splitting multiple fields by delimiter

I have to write an SP that can perform Partial Updates on our databases, the changes are stored in a record of the PU table. A values fields contains all values, delimited by a fixed delimiter. A tables field refers to a Schemes table containing the column names for each table in a similar fashion in a Colums fiels.
Now for my SP I need to split the Values field and Columns field in a temp table with Column/Value pairs, this happens for each record in the PU table.
An example:
Our PU table looks something like this:
CREATE TABLE [dbo].[PU](
[Table] [nvarchar](50) NOT NULL,
[Values] [nvarchar](max) NOT NULL
)
Insert SQL for this example:
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','John Doe;26');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Jane Doe;22');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Mike Johnson;20');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Mary Jane;24');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','Mathematics');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','English');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','Geography');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus A;Schools Road 1;Educationville');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus B;Schools Road 31;Educationville');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus C;Schools Road 22;Educationville');
And we have a Schemes table similar to this:
CREATE TABLE [dbo].[Schemes](
[Table] [nvarchar](50) NOT NULL,
[Columns] [nvarchar](max) NOT NULL
)
Insert SQL for this example:
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Person','[Name];[Age]');
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Course','[Name]');
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Campus','[Name];[Address];[City]');
As a result the first record of the PU table should result in a temp table like:
The 5th will have:
Finally, the 8th PU record should result in:
You get the idea.
I tried use the following query to create the temp tables, but alas it fails when there's more that one value in the PU record:
DECLARE #Fields TABLE
(
[Column] INT,
[Value] VARCHAR(MAX)
)
INSERT INTO #Fields
SELECT TOP 1
(SELECT Value FROM STRING_SPLIT([dbo].[Schemes].[Columns], ';')),
(SELECT Value FROM STRING_SPLIT([dbo].[PU].[Values], ';'))
FROM [dbo].[PU] INNER JOIN [dbo].[Schemes] ON [dbo].[PU].[Table] = [dbo].[Schemes].[Table]
TOP 1 correctly gets the first PU record as each PU record is removed once processed.
The error is:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
In the case of a Person record, the splits are indeed returning 2 values/colums at a time, I just want to store the values in 2 records instead of getting an error.
Any help on rewriting the above query?
Also do note that the data is just generic nonsense. Being able to have 2 fields that both have delimited values, always equal in amount (e.g. a 'person' in the PU table will always have 2 delimited values in the field), and break them up in several column/header rows is the point of the question.
UPDATE: Working implementation
Based on the (accepted) answer of Sean Lange, I was able to work out followin implementation to overcome the issue:
As I need to reuse it, the combine column/value functionality is performed by a new function, declared as such:
CREATE FUNCTION [dbo].[JoinDelimitedColumnValue]
(#splitValues VARCHAR(8000), #splitColumns VARCHAR(8000),#pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH MyValues AS
(
SELECT ColumnPosition = x.ItemNumber,
ColumnValue = x.Item
FROM dbo.DelimitedSplit8K(#splitValues, #pDelimiter) x
)
, ColumnData AS
(
SELECT ColumnPosition = x.ItemNumber,
ColumnName = x.Item
FROM dbo.DelimitedSplit8K(#splitColumns, #pDelimiter) x
)
SELECT cd.ColumnName,
v.ColumnValue
FROM MyValues v
JOIN ColumnData cd ON cd.ColumnPosition = v.ColumnPosition
;
In case of the above sample data, I'd call this function with the following SQL:
DECLARE #FieldValues VARCHAR(8000), #FieldColumns VARCHAR(8000)
SELECT TOP 1 #FieldValues=[dbo].[PU].[Values], #FieldColumns=[dbo].[Schemes].[Columns] FROM [dbo].[PU] INNER JOIN [dbo].[Schemes] ON [dbo].[PU].[Table] = [dbo].[Schemes].[Table]
INSERT INTO #Fields
SELECT [Column] = x.[ColumnName],[Value] = x.[ColumnValue] FROM [dbo].[JoinDelimitedColumnValue](#FieldValues, #FieldColumns, #Delimiter) x
This data structure makes this way more complicated than it should be. You can leverage the splitter from Jeff Moden here. http://www.sqlservercentral.com/articles/Tally+Table/72993/ The main difference of that splitter and all the others is that his returns the ordinal position of each element. Why all the other splitters don't do this is beyond me. For things like this it is needed. You have two sets of delimited data and you must ensure that they are both reassembled in the correct order.
The biggest issue I see is that you don't have anything in your main table to function as an anchor for ordering the results correctly. You need something, even an identity to ensure the output rows stay "together". To accomplish I just added an identity to the PU table.
alter table PU add RowOrder int identity not null
Now that we have an anchor this is still a little cumbersome for what should be a simple query but it is achievable.
Something like this will now work.
with MyValues as
(
select p.[Table]
, ColumnPosition = x.ItemNumber
, ColumnValue = x.Item
, RowOrder
from PU p
cross apply dbo.DelimitedSplit8K(p.[Values], ';') x
)
, ColumnData as
(
select ColumnName = replace(replace(x.Item, ']', ''), '[', '')
, ColumnPosition = x.ItemNumber
, s.[Table]
from Schemes s
cross apply dbo.DelimitedSplit8K(s.Columns, ';') x
)
select cd.[Table]
, v.ColumnValue
, cd.ColumnName
from MyValues v
join ColumnData cd on cd.[Table] = v.[Table]
and cd.ColumnPosition = v.ColumnPosition
order by v.RowOrder
, v.ColumnPosition
I recommended not storing values like this in the first place. I recommend having a key value in the tables and preferably not using Table and Columns as a composite key. I recommend to avoid using reserved words. I also don't know what version of SQL you are using. I am going to assume you are using a fairly recent version of Microsoft SQL Server that will support my provided stored procedure.
Here is an overview of the solution:
1) You need to convert both the PU and the Schema table into a table where you will have each "column" value in the list of columns isolated in their own row. If you can store the data in this format rather than the provided format, you will be a little better off.
What I mean is
Table|Columns
Person|Jane Doe;22
needs converted to
Table|Column|OrderInList
Person|Jane Doe|1
Person|22|2
There are multiple ways to do this, but I prefer an xml trick that I picked up. You can find multiple split string examples online so I will not focus on that. Use whatever gives you the best performance. Unfortunately, You might not be able to get away from this table-valued function.
Update:
Thanks to Shnugo's performance enhancement comment, I have updated my xml splitter to give you the row number which reduces some of my code. I do the exact same thing to the Schema list.
2) Since the new Schema table and the new PU table now have the order each column appears, the PU table and the schema table can be joined on the "Table" and the OrderInList
CREATE FUNCTION [dbo].[fnSplitStrings_XML]
(
#List NVARCHAR(MAX),
#Delimiter VARCHAR(255)
)
RETURNS TABLE
AS
RETURN
(
SELECT y.i.value('(./text())[1]', 'nvarchar(4000)') AS Item,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as RowNumber
FROM
(
SELECT CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.') AS x
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
CREATE Procedure uspGetColumnValues
as
Begin
--Split each value in PU
select p.[Table],p.[Values],a.[Item],CHARINDEX(a.Item,p.[Values]) as LocationInStringForSorting,a.RowNumber
into #PuWithOrder
from PU p
cross apply [fnSplitStrings_XML](p.[Values],';') a --use whatever string split function is working best for you (performance wise)
--Split each value in Schema
select s.[Table],s.[Columns],a.[Item],CHARINDEX(a.Item,s.[Columns]) as LocationInStringForSorting,a.RowNumber
into #SchemaWithOrder
from Schemes s
cross apply [fnSplitStrings_XML](s.[Columns],';') a --use whatever string split function is working best for you (performance wise)
DECLARE #Fields TABLE --If this is an ETL process, maybe make this a permanent table with an auto incrementing Id and reference this table in all steps after this.
(
[Table] NVARCHAR(50),
[Columns] NVARCHAR(MAX),
[Column] VARCHAR(MAX),
[Value] VARCHAR(MAX),
OrderInList int
)
INSERT INTO #Fields([Table],[Columns],[Column],[Value],OrderInList)
Select pu.[Table],pu.[Values] as [Columns],s.Item as [Column],pu.Item as [Value],pu.RowNumber
from #PuWithOrder pu
join #SchemaWithOrder s on pu.[Table]=s.[Table] and pu.RowNumber=s.RowNumber
Select [Table],[Columns],[Column],[Value],OrderInList
from #Fields
order by [Table],[Columns],OrderInList
END
GO
EXEC uspGetColumnValues
GO
Update:
Since your working implementation is a table-valued function, I have another recommendation. The problem I see is that your using a table valued function which ultimately handles one record at a time. You are going to have better performance with set based operations and batching as needed. With a tabled valued function, you are likely going to be looping through each row. If this is some sort of ETL process, your team will be better off if you have a stored procedure that processes the rows in bulk. It might make sense to stage the results into a better table that your team can work with down stream rather than have them use a potentially slow table-valued function.

Improve performance of query with conditional filtering

let's say i have a table with 3 million rows, the table does not have a PK nor Indexes.
the query is as follows
SELECT SKU, Store, ColumnA, ColumnB, ColumnC
FROM myTable
WHERE (SKU IN (select * from splitString(#skus)) OR #skus IS NULL)
AND (Store IN (select * from splitString(#stores)) OR #stores IS NULL)
Please consider that #sku and #store are NVARCHAR(MAX) containing a list of ids separated by comma.
SplitString is a function which converts a string in format '1,2,3' to a table of 1 column and 3 rows as shown in the following picture.
This pattern allows me to send arguments from the application and filter by sku or by store or both or none.
What can I do to improve performance of this query? - I know Indexes are a good idea, but I don't really know about that stuff, so a guidance to that will be helpful.
Any other ideas?
This type of generic search query tends to be rough on performance.
In addition to the suggestion to use temp tables to store the results of the string parsing, there are a couple other things you could do:
Add indexes
It's usually recommended that each table have a clustered index (although it seems there is still room for debate): Will adding a clustered index to an existing table improve performance?
In addition to that, you will probably also want to add indexes on the fields that you're searching on.
In this case, that might be something like:
SKU (for searches on SKU alone)
Store, SKU (for searches on Store and the combination of both Store and SKU)
Keep in mind that if the query matches too many records, these indexes might not be used.
Also keep in mind that making the indexes cover the query can improve performance:
Why use the INCLUDE clause when creating an index?
Here is a link to Microsoft's documentation on creating indexes:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-index-transact-sql
Use dynamic SQL to build the query
I need to preface this with a warning. Please be aware of SQL injection, and make sure to code appropriately!
How to cleanse dynamic SQL in SQL Server -- prevent SQL injection
Building a dynamic SQL query allows you to write more streamlined and direct SQL, and thus allows the optimizer to do a better job. This is normally something to be avoided, but I believe it fits this particular situation.
Here is an example (should be adjusted to take SQL injection into account as needed):
DECLARE #sql VARCHAR(MAX) = '
SELECT SKU, Store, ColumnA
FROM myTable
WHERE 1 = 1
';
IF #skus IS NOT NULL BEGIN
SET #sql += ' AND SKU IN (' + #skus + ')';
END
IF #stores IS NOT NULL BEGIN
SET #sql += ' AND Store IN (' + #stores + ')';
END
EXEC sp_executesql #sql;
Another thing to avoid is using functions in your Where clause. That will slow a query down.
Try putting this at the beginning of your script, before the first SELECT:
SELECT skus_group INTO #skus_group
FROM (SELECT item AS skus_group FROM
splitstring(#skus, ','))A;
Then replace your WHERE clause:
WHERE SKU IN(Select skus_group FROM #skus_group)
This normally improves performance because it takes advantage of indexes instead of a table scan, but since you're not using any indexes I'm not sure how much performance gain you'll get.
This will work faster i believe:
SELECT SKU, Store, ColumnA, ColumnB, ColumnC FROM myTable WHERE #skus IS NULL AND #stores IS NULL
UNION ALL
SELECT SKU, Store, ColumnA, ColumnB, ColumnC
FROM myTable
INNER JOIN (select colname AS myskus from splitString(#skus))skuses ON skuses.myskus = myTable.SKU
INNER JOIN (select colname AS mystore from splitString(#stores))stores ON stores.mystore = myTable.Store

T-SQL equivalent of IEnumerable.Zip()

In a T-SQL stored procedure, when supplied with two tables each of which has the same number of rows, how can I pair-wise match the rows based on row order rather than a join criteria?
Basically, an equivalent of .NET's IEnumerable.Zip() method?
I'm using SQL Server 2016.
Background
The purpose of the stored procedure is to act as an integration adapter between two other applications. I do not control the source code for either application.
The "client" application contains extensibility objects which can be configured to invoke a stored procedure in an SQL Server database. The configuration options for the extensibility point allow me to name a stored procedure which will be invoked, and provide a statically configured list of named parameters and their associated values, which will be passed to the stored procedure. Only scalar parameters are supported, not table-valued parameters.
The stored procedure needs to collect data from the "server" application (which is exposed through an OLE-DB provider) and transform it into a suitable result set for consumption by the client application.
For maintenance reasons, I want to avoid storing any configuration in the adapter database. I want to write generic, flexible logic in the stored procedure, and pass all necessary configuration information as parameters to that stored procedure.
The configuration information that's needed for the stored procedure is, essentially, equivalent to the following table variable schema:
DECLARE #TableOfServerQueryParameterValues AS TABLE (
tag NVARCHAR(50),
filterexpr NVARCHAR(500)
)
This table can then be used as the left-hand side of JOIN and CROSS APPLY queries in the stored proc which are run against the "server" application interfaces.
The problem I encountered is that I did not know of any way of passing a table of parameter info from the client application, because its extensibility points only include scalar parameter support.
So, I thought I would pass two scalar parameters. One would be a comma-separated list of tag values. The other would be a comma-separated list of filterexpr values.
Inside the stored proc, it's easy to use STRING_SPLIT to convert each of those parameters into a single-column table. But then I needed to match the two columns together into a two-column table, which I could then use as the basis for INNER JOIN or CROSS APPLY to query the server application.
The best solution I've come up with so far is selecting each table into a table variable and use the ROW_NUMBER() function to assign a row number, and then join the two tables together by matching on the extra ROW_NUMBER column. Is there an easier way to do it than that? It would be nice not to have to declare all the columns in the table variables.
Your suggestion of using row_number seems sound.
Instead of table variables you can use subqueries or CTEs; there should be little difference overall, though avoiding the table variable reduces the number of passes you need to make & avoids the additional code to maintain.
select a.*, b.* --specify whatever columns you want to return
from (
select *
, row_number() over (order by someArbitraryColumnPreferablyYourClusteredIndex) r
from TableA
) a
full outer join --use a full outer if your have different numbers of rows in the tables & want
--results from the larger table with nulls from the smaller for the bonus rows
--otherwise use an inner join to only get matches for both tables
(
select *
, row_number() over (order by someArbitraryColumnPreferablyYourClusteredIndex) r
from TableA
) b
on b.r = a.r
Update
Regarding #PanagiotisKanavos's comment on passing structured data, here's a simple example of how you could convert a value passed as an xml type to table data:
declare #tableA xml = '<TableA>
<row><col1>x</col1><col2>Anne</col2><col3>Droid</col3></row>
<row><col1>y</col1><col2>Si</col2><col3>Borg</col3></row>
<row><col1>z</col1><col2>Roe</col2><col3>Bott</col3></row>
</TableA>'
select row_number() over (order by aRow) r
, x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') Code
, x.aRow.value('(./col2/text())[1]' , 'nvarchar(32)') GivenName
, x.aRow.value('(./col3/text())[1]' , 'nvarchar(32)') Surname
from #tableA.nodes('/*/*') x(aRow)
You may get a performance boost over the above by using the following. This creates a dummy column allowing us to do an order by where we don't care about the order. This should be faster than the above as ordering by 1 will be simpler than sorting based on the xml type.
select row_number() over (order by ignoreMe) r
, x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') Code
, x.aRow.value('(./col2/text())[1]' , 'nvarchar(32)') GivenName
, x.aRow.value('(./col3/text())[1]' , 'nvarchar(32)') Surname
from #tableA.nodes('/*/*') x(aRow)
cross join (select 1) a(ignoreMe)
If you do care about the order, you can order by the data's fields, as such:
select row_number() over (order by x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') ) r
, x.aRow.value('(./col1/text())[1]' , 'nvarchar(32)') Code
, x.aRow.value('(./col2/text())[1]' , 'nvarchar(32)') GivenName
, x.aRow.value('(./col3/text())[1]' , 'nvarchar(32)') Surname
from #tableA.nodes('/*/*') x(aRow)

How to get comma-separated values in SQL Server?

I have two tables called tblEmployee and tblWorkPlace. The second table consists of multiple rows for each employee. I want to get the result as follows
EmployeeName WorkPlace
Gopi India,Pakistan,...
I.e. if tblWorkPlace consists of multiple rows for an employee, I want the result in a single row, with data being separated by commas.
How will I get this result ????
You'll need to have code on the client side for this. It is possible to make this happen in sql server, but it's not trivial, the performance is horrible, and this kind of thing does not belong on the database anyway.
You're not very clear on how the tables tblWorkplace and tblEmployee are connected - I'm assuming by means of a many-to-many link table or something.
If that's the case, you can use something like this:
SELECT
e.EmpName,
(SELECT STUFF(
(SELECT ',' + w.WPName
FROM #Workplace w
INNER JOIN #WorkplaceEmployeesLink we ON w.WorkplaceID = we.WorkplaceID
WHERE we.EmpID = e.EmpID
FOR XML PATH('')
), 1, 1, '')
) AS Workplaces
FROM #Employee e
(I've replaced your tables with my own table variables #Employee and #Workplace etc. for testing)
This gives me an output something like:
EmpName Workplaces
Gopi India,Pakistan
for that one row.
Basically, the internal FOR XML PATH('') selects the list of workplaces for each employee, and prepends each workplace with a ',' so you get something like ,India,Pakistan.
The STUFF method then stuff an empty string into that resulting string, at position one, for a length of 1 - essentially wiping out the first comma, thus you get the list of workplaces as you desired.
You can assign multiple values to a variable in a select statement. I call this "flattening" the records.
declare #WorkPlace varchar(max)
select #WorkPlace = '' -- it is important to be NOT null
select #WorkPlace = #WorkPlace + WorkPlace + ', '
from YourTable
where EmployeeName = 'Gopi'
You may want to add some code to remove the final ',' from the string.
Here's a link to different ways to get comma separated values.

Resources