We are currently upgrading a current data import process we have written in C#.
As part of the upgrade process, we need to check the results of the import process from the rewrite against the results of the old system.
One of the changes we made was breaking comma-delimited lists into rows in another table. This will enable us to filter results using a simple join.
This is the old schema:
FormNumber MainCategories
1 blue,green,red
2 yellow,red,blue
3 white
Which we normalized to:
FormNumber AttributeId Value
1 1 blue
1 1 green
1 1 red
2 1 yellow
2 1 red
2 1 blue
3 1 white
Now, our next step is to confirm that the results from the two processes are the same. One of these checks is to compare the MainCategories field of the old process with the results from the normalized tables.
This leads us, finally, to the question: How do I create a comma-delimited list of the new schema to compare to the value of the old.
We have tried the XMLPath solution proposed by #Ritesh here: Concatenate many rows into a single text string?
Here is the adapted sql statement:
Select distinct ST2.FormNumber,
(Select ST1.Value + ',' AS [text()]
From cache.ArtifactAttribute ST1
Where ST1.FormNumber= ST2.FormNumber
ORDER BY ST1.FormNumber
For XML PATH ('')) [Values]
From cache.ArtifactAttribute ST2
The problem is the results are not correct. Even though FormNumber 1 only has three entries in the table, the Values column (the dynamically built delimited string) shows incorrect results. Obviously we are not implementing the sql code correctly.
What are we doing wrong?
Here is a way for you to try:
SELECT DISTINCT A.FormNumber, MainCategories
FROM YourTable A
CROSS APPLY (SELECT STUFF((SELECT ',' + Value
FROM YourTable
WHERE FormNumber = A.FormNumber FOR XML PATH('')),1,1,'') MainCategories) B
Though there is the problem where you can't really be sure that the order of the items concatenated is the same as the one you have, since there isn't a column that explictly gives that order. Here is a working SQL Fiddle with this example.
This seems to work fine for me:
DECLARE #s TABLE(FormNumber int, AttributeId int, Value varchar(32));
INSERT #s VALUES
(1,1,'blue'),
(1,1,'green'),
(1,1,'red'),
(2,1,'yellow'),
(2,1,'red'),
(2,1,'blue'),
(3,1,'white');
SELECT ST2.FormNumber, [Values] = STUFF(
(SELECT ',' + ST1.Value AS [text()]
FROM #s ST1
WHERE ST1.FormNumber = ST2.FormNumber
ORDER BY ST1.FormNumber
FOR XML PATH (''),
TYPE).value(N'./text()[1]', N'varchar(max)'), 1, 1, '')
FROM #s ST2 GROUP BY ST2.FormNumber;
Results:
FormNumber
Values
1
blue,green,red
2
yellow,red,blue
3
white
Example db<>fiddle
Related
Ok so I have a table with three columns:
Id, Key, Value
I would like to delete all rows where Value is empty (''). Therefore I wrote the query to select before I delete which was:
Select * from [Imaging.ImageTag] where [Value] = ''
all pretty standard so far...
Now heres the strange part. This query returned two rows shown below with commas seperating columns:
CE7C367C-5C4A-4531-9C8C-8F2A26B1B980, ObjectType, 🎃
F5B2F8A8-C4A8-4799-8824-E5FFEEDAB887, Caption, 🍰
Why are these two rows matching on ''?
Extra Info
I am using Sql-Server, The [Value] column is of type NVARCHAR(300) and yes the table name really is [Imaging.ImageTag]
This is collation dependant.
Matches empty string
SELECT 1 where N'' = N'🍰' COLLATE latin1_general_ci_as
Doesn't match empty string
SELECT 1 WHERE N'' = N'🍰' COLLATE latin1_general_100_ci_as
The 100 collations are more up-to-date (though still not bleeding edge, they have been available since 2008) and you should use more modern collations unless you have some specific reason not to. The BOL entry for 100 collations specifically calls out
Weighting has been added to previously non-weighted characters that
would have compared equally.
It's not an answer to your "why", but in terms of your overall goal, perhaps you should alter your strategy for searching for empty values:
Select * from [Imaging.ImageTag] where LEN([Value]) = 0
As per the comments (thanks Martin Smith for providing some copy/pastable emoji):
SELECT CASE WHEN N'' = N'🍰' then 1 else 0 end --returns 1, no good for checking
SELECT LEN(N'🍰') --returns 2, can be used to check for zero length values?
Complementing this answers
When you need use 'like' at sql
WHERE
N'' + COLUMNS like N'%'+ #WordSearch +'%' COLLATE latin1_general_100_ci_as
Google send me here looking for a way filter all rows with an emoji on a varchar column.
In case that your looking for something similar:
SELECT mycolumn
FROM mytable
WHERE REGEXP_EXTRACT(mycolumn,'\x{1f600}') <> ''
--sqlserver WHERE SUBSTRING(MyCol, (PATINDEX( '\x{1f600}', MyCol ))) <> ''
the \x{1f600} is the char code for the searched emoji, you can find the emoji codes here
I have data inside a table's column. I SELECT DISTINCT of that column, i also put LTRIM(RTRIM(col_name)) as well while writing SELECT. But still I am getting duplicate column record.
How can we identify why it is happening and how we can avoid it?
I tried RTRIM, LTRIM, UPPER function. Still no help.
Query:
select distinct LTRIM(RTRIM(serverstatus))
from SQLInventory
Output:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Production
Decommissioned
Non-Production
Unsupported Edition
Looks like there's a unicode character in there, somewhere. I copied and pasted the values out initially as a varchar, and did the following:
SELECT DISTINCT serverstatus
FROM (VALUES('Development'),
('Staging'),
('Test'),
('Pre-Production'),
('UNKNOWN'),
('NULL'),
('Need to be decommissioned'),
('Production'),
(''),
('Pre-Production'),
('Decommissioned'),
('Non-Production'),
('Unsupported Edition'))V(serverstatus);
This, interestingly, returned the values below:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc?tion
Decommissioned
Non-Production
Unsupported Edition
Note that one of the values is Pre-Produc?tion, meaning that there is a unicode character between the c and t.
So, let's find out what it is:
SELECT 'Pre-Production', N'Pre-Production',
UNICODE(SUBSTRING(N'Pre-Production',11,1));
The UNICODE function returns back 8203, which is a zero-width space. I assume you want to remove these, so you can update your data by doing:
UPDATE SQLInventory
SET serverstatus = REPLACE(serverstatus, NCHAR(8203), N'');
Now your first query should work as you expect.
(I also suggest you might therefore want a lookup table for your status' with a foreign key, so that this can't happen again).
DB<>fiddle
I deal with this type of thing all the time. For stuff like this NGrams8K and PatReplace8k and PATINDEX are your best friends.
Putting what you posted in a table variable we can analyze the problem:
DECLARE #table TABLE (txtID INT IDENTITY, txt NVARCHAR(100));
INSERT #table (txt)
VALUES ('Development'),('Staging'),('Test'),('Pre-Production'),('UNKNOWN'),(NULL),
('Need to be decommissioned'),('Production'),(''),('Pre-Production'),('Decommissioned'),
('Non-Production'),('Unsupported Edition');
This query will identify items with characters other than A-Z, spaces and hyphens:
SELECT t.txtID, t.txt
FROM #table AS t
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
This returns:
txtID txt
----------- -------------------------------------------
10 Pre-Production
To identify the bad character we can use NGrams8k like this:
SELECT t.txtID, t.txt, ng.position, ng.token -- ,UNICODE(ng.token)
FROM #table AS t
CROSS APPLY dbo.NGrams8K(t.txt,1) AS ng
WHERE PATINDEX('%[^a-zA-Z -]%',ng.token)>0;
Which returns:
txtID txt position token
------ ----------------- -------------------- ---------
10 Pre-Production 11 ?
PatReplace8K makes cleaning up stuff like this quite easily and quickly. First note this query:
SELECT OldString = t.txt, p.NewString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
Which returns this on my system:
OldString NewString
------------------ ----------------
Pre-Produc?tion Pre-Production
To fix the problem you can use patreplace8K like this:
UPDATE t
SET txt = p.newString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
Sorry, this might be a bit out of scope for the community here, but I wanted to get a second opinion.
I have a table with the following structure
Table_1
TYPE ITEM DATE QTYA QTYB QTYC
X AAA 17/08/2015 100 200 300
X AAA 18/08/2015 100 170 240
Y BBB 17/08/2015 100 240 100
I need to use this table as a source for a merge, but the target table is formatted completely differently
Table_2
ITEM QTYA_1 QTYA_2......QTYA_31 QTYB_1 QTYB_2 QTYB_3......QTYB_31 QTYC_1 QTYC_2....QTYC_31
(the numbers suffixed at basically the day of the month)
I can convert Table 1 to the format of Table 2 using a mix of UNION ALL and PIVOT, but the performance isn't that good - particularly since I have to save the information in a temp table first before merging it in (Each 'type' in Table_1 has a different start date and I cannot overwrite previous values in Table_2 starting from a different date - basically I have to merge the table 3 or 4 times with a different item type and different start date)
Here's what I got so far
SELECT TOP 0 * INTO #TEMP_PIVOT_TABLE
INSERT INTO #TEMP_PIVOT_TABLE SELECT * FROM
(SELECT ITEM, 'QTYA_' + CONVERT(DATETIMEFROMPARTS(day,DATE) AS VARCHAR) as 'Quantity Type', QTY_A FROM TABLE_1
UNION_ALL
SELECT ITEM, 'QTYB_' + CONVERT(DATETIMEFROMPARTS(day,DATE) AS VARCHAR) as 'Quantity Type', QTY_B FROM TABLE_1
UNION_ALL
SELECT ITEM, 'QTYC_' + CONVERT(DATETIMEFROMPARTS(day,DATE) AS VARCHAR) as 'Quantity Type', QTY_C FROM TABLE_1
) A
PIVOT
(SUM(QUANTITY) FOR QUANTITY_TYPE IN ([QTYA_1], [QTYA_2],.....[QTYA_31],[QTYB_1].....[QTYC_31],[QTYC_1].....[QTYC_31])) AS B
----For each different date per item_type, construct a string only selecting those days in the month.
Then merge the results from #TEMP_PIVOT_TABLE into TABLE_2 for each Item TYPE` with dynamic SQL
1) Is there any better way to do the PIVOT command? The performance of the UNION ALL commands isn't encouraging - particularly since the table I'm reading from has large amounts of data. I'm also simplifying here for the sake of brevity - the actual table needs to map 5 columns or so, each with 31 days
2) Is there a better way to do the MERGE? I dislike using a loop + Dynamic SQL to basically read the same dataset repeatedly just so to merge on different columns, but I can't see a different way. That and building the MERGE command dynamically with so many columns will make it troublesome for future maintenance.
Does anyone have an idea how I can go about this more efficiently?
You can replace the union all internal query with the following query. This query needs only one table hit instead of hitting table for each column.
To unpivot the data use cross apply with table valued constructor
select ITEM,[Quantity Type],QTY
from yourtable
cross apply
(
values
('QTYA_' + CONVERT(DATEPART(day,DATE) AS VARCHAR),QTY_A),
('QTYB_' + CONVERT(DATEPART(day,DATE) AS VARCHAR),QTY_B),
('QTYC_' + CONVERT(DATEPART(day,DATE) AS VARCHAR),QTY_C),
)
CS ([Quantity Type],QTY)
Customer wants to display a list of values returned by my query in a specific order. The issue is ordering simply by asc or desc is not giving me what the customer want. DBA doesn't want me to hard code values. Is there a way to custom sort without hard coding values? Because the values will change every year and would have to maintain/update it every year.
Table Structure:
Column: CMN_CodesID (unique), Name (is what I'd like to display in custom order)
something like this.
order by case when Jack then 1
when Apple then 2
when Orange then 3
...
End
You could use dynamic sql in a stored procedure and pass #MyOrderBy into it as a parameter (Added to this example for illustration).
DECLARE #MyOrderBy VARCHAR(300)
SELECT #MyOrderBy = 'case when myfield = ''Jack'' then 1 when myfield = ''Apple'' then 2 when myfield = ''Orange'' then 3 else 4 End'
DECLARE #sSQL VARCHAR(300)
SELECT #sSQL = 'SELECT * FROM mytable ORDER BY ' + #MyOrderBy
EXEC(#sSQL)
I've been reading other similar "multiple records into one" posts, but either cannot seem to get any to work, or they don't really apply to what I am trying to do.
Here are my 3 tables. vehicle, vehicle_repair, comments
vehicle columns: vehicle_name and other vehicle related info,vehicle_make, vehicle_model
vehicle_repair columns: vehicle_name, vehicle_repair_type, vehicle_repair_num, etc, etc
comments columns: vehicle_name,vehicle_repair_num, comments_detail
The way the program is written, if I write more than 1 line of comments, it doesn't concatenate them, it makes 1 entry for each line, ie:
comments table:
vehicle_name vehicle_rpr_num comments_detail
--------------------------------------------------------------------
150 1 replaced hose
750 1 replaced belt
750 2 replaced fuel and also saw that the
750 2 timing belt needs to be replaced
750 2 as well
I was trying to use something like:
select
substring((select ' '+comments_detail AS 'data()'
from comments
for xml path('')), 3, 80) as 'comments_detail'
from
comments
I tried to add the join and other tables inside the substring but then the comments_details become all jacked up, like it then combines 20 comments together instead of 1 at a time.
I'd rather start from scratch and see if I can get it working another way.
My issue comes in when I try to link the 3 tables above.
I do not know how to put in the other fields that I need from the vehicle table, ie vehicle_make, vehicle_model
Any ideas? Am I writing my concatenate completely wrong? I am trying to put this into a stored procedure, would a view be better?
SELECT c1.vehicle_name,
c1.vehicle_rpr_num,
STUFF((SELECT ' ' + comments_detail
FROM comments c2
WHERE c2.vehicle_name = c1.vehicle_name
AND c2.vehicle_rpr_num = c1.vehicle_rpr_num
FOR XML PATH(''), TYPE).value('.', 'varchar(max)'),
1,1,'')
AS comments
FROM comments c1
GROUP BY c1.vehicle_name, c1.vehicle_rpr_num;
SQLFiddle with the sample comments data.
You were on the right track using FOR XML PATH to concatenate the comments. There are many different ways to concatenate, a good article on the pros/cons of each is here. I'd put this into a view definition to allow for easier joining with other tables.
CREATE TABLE tbl (vehicle_name INT,vehicle_rpr_num INT,comments_detail NVARCHAR(1000))
INSERT INTO tbl
VALUES
(150,1,'replaced hose'),
(750,1,'replaced belt'),
(750,2,'replaced fuel and also saw that the'),
(750,2,'timing belt needs to be replaced'),
(750,2,'as well')
SELECT DISTINCT t.vehicle_name, t.vehicle_rpr_num , STUFF(List.Comments, 1 ,2, '') AS Comments
FROM tbl t
CROSS APPLY (
SELECT ' ' + comments_detail [text()]
FROM tbl
WHERE vehicle_name = t.vehicle_name
AND vehicle_rpr_num = t.vehicle_rpr_num
FOR XML PATH('')
)List(Comments)
Result Set
vehicle_name vehicle_rpr_num Comments
150 1 eplaced hose
750 1 eplaced belt
750 2 eplaced fuel and also saw that the timing belt needs to be replaced as well