String casing for specific characters in query causing it to fail - sql-server

A lower case a followed by a upper case A (aA) cause our queries to fail when used in where clauses. Error message is:
Invalid column name 'DataAreaId'.
-- OK
select top 100 *
  from inventtable
where DATAAREAID = 'a04' 
-- OK
select top 100 *
  from inventtable
where datAareaid = 'a04' 
-- Fails
select top 100 *
  from inventtable
where dataAreaid = 'a04'
The collation is Danish_Norwegian_CI_AS for server, database, table and column. This is causing a bit of trouble in EF Core, where it is generating queries like this:
SELECT top (100) *
  FROM [InventTable]
WHERE [DataAreaId] = 'a04' 
Which are failing. This isn't that big of a problem, since we implicitly mapped the columns to uppercase during model building, but what is the actual issue here? Can we fix this in MSSQL itself?

Related

Switching a case statement to a constant greatly slows query

I have run into an issue with SQL Server 2017 where replacing:
a CASE statement that assigns a numerical value
with a constant numerical value
slows down the query be a factor of 6+.
The rather complicated query has the general form of:
WITH CTE1 AS
(
...
),
WITH CTE2 AS
(
SELECT
--conditions based on below
FROM
(SELECT
--various math,
CASE
--statement assigning values to different runID combinations for samples with matching siteIDs and dates (due to the ON statement below)
ELSE NULL
....
END AS whichCombination
FROM
CTE1 AS value1
JOIN
CTE1 AS value2 ON (value1.siteID = value2.siteID,
value1.date = value2.date,
value1.sampleID <> value2.sampleID)
) AS combinations
WHERE combinations.whichCombination IS NOT NULL
)
SELECT various data
FROM dataTable
LEFT JOIN
(stuff from CTE2) AS pairTable ON dataTable.sampleID = pairTable.sampleID
The CASE statement assigns a pair number to different combinations of rows from the self join.
This then is used to select only the combinations that I want.
However, when the CASE statement is replaced with: 1 AS whichCombination (a constant value so no rows are assigned NULL) the query slows dramatically. This also occurs if CASE WHEN 1 = 1 THEN 1 is used.
This makes no sense to me as either way the values are:
numerical
not unique
not an index
The only thing that is unique is that each combination of rows is a assigned a unique value.
Is SQL Server somehow using this as an index that speeds things up?
And how would I replicate this behavior without the CASE statement as this answer says you cannot create indices for CTE's?
EDIT: Also of note is that the slowdown occurs only if main select statement (the last 5 lines) is included (i.e. if CTE2 is run as the main select statement instead of being a CTE)
Best, JD
One workaround would be spliting these CTE's to temp tables, then you could add indexes if needed.

not able to identify difference between same value

I have data inside a table's column. I SELECT DISTINCT of that column, i also put LTRIM(RTRIM(col_name)) as well while writing SELECT. But still I am getting duplicate column record.
How can we identify why it is happening and how we can avoid it?
I tried RTRIM, LTRIM, UPPER function. Still no help.
Query:
select distinct LTRIM(RTRIM(serverstatus))
from SQLInventory
Output:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc​tion
Decommissioned
Non-Production
Unsupported Edition
Looks like there's a unicode character in there, somewhere. I copied and pasted the values out initially as a varchar, and did the following:
SELECT DISTINCT serverstatus
FROM (VALUES('Development'),
('Staging'),
('Test'),
('Pre-Production'),
('UNKNOWN'),
('NULL'),
('Need to be decommissioned'),
('Production'),
(''),
('Pre-Produc​tion'),
('Decommissioned'),
('Non-Production'),
('Unsupported Edition'))V(serverstatus);
This, interestingly, returned the values below:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc?tion
Decommissioned
Non-Production
Unsupported Edition
Note that one of the values is Pre-Produc?tion, meaning that there is a unicode character between the c and t.
So, let's find out what it is:
SELECT 'Pre-Produc​tion', N'Pre-Produc​tion',
UNICODE(SUBSTRING(N'Pre-Produc​tion',11,1));
The UNICODE function returns back 8203, which is a zero-width space. I assume you want to remove these, so you can update your data by doing:
UPDATE SQLInventory
SET serverstatus = REPLACE(serverstatus, NCHAR(8203), N'');
Now your first query should work as you expect.
(I also suggest you might therefore want a lookup table for your status' with a foreign key, so that this can't happen again).
DB<>fiddle
I deal with this type of thing all the time. For stuff like this NGrams8K and PatReplace8k and PATINDEX are your best friends.
Putting what you posted in a table variable we can analyze the problem:
DECLARE #table TABLE (txtID INT IDENTITY, txt NVARCHAR(100));
INSERT #table (txt)
VALUES ('Development'),('Staging'),('Test'),('Pre-Production'),('UNKNOWN'),(NULL),
('Need to be decommissioned'),('Production'),(''),('Pre-Produc​tion'),('Decommissioned'),
('Non-Production'),('Unsupported Edition');
This query will identify items with characters other than A-Z, spaces and hyphens:
SELECT t.txtID, t.txt
FROM #table AS t
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
This returns:
txtID txt
----------- -------------------------------------------
10 Pre-Produc​tion
To identify the bad character we can use NGrams8k like this:
SELECT t.txtID, t.txt, ng.position, ng.token -- ,UNICODE(ng.token)
FROM #table AS t
CROSS APPLY dbo.NGrams8K(t.txt,1) AS ng
WHERE PATINDEX('%[^a-zA-Z -]%',ng.token)>0;
Which returns:
txtID txt position token
------ ----------------- -------------------- ---------
10 Pre-Produc​tion 11 ?
PatReplace8K makes cleaning up stuff like this quite easily and quickly. First note this query:
SELECT OldString = t.txt, p.NewString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
Which returns this on my system:
OldString NewString
------------------ ----------------
Pre-Produc?tion Pre-Production
To fix the problem you can use patreplace8K like this:
UPDATE t
SET txt = p.newString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;

Query Match is Case Sensitive

I have the following query, looking for salespeople of 'BG'.
SELECT *
FROM [dbo].[JobOrders]
where [salesperson] = 'BG'
When I use 'bg' instead, I do not get results. To my understanding 'BG' or 'bg' would bring back the same results.
Is there a setting that would prevent this?
To avoid this situation you can always check using Upper cases
SELECT *
FROM [dbo].[JobOrders]
where upper([salesperson]) = 'BG'
Your salesperson column is almost certainly a foreign key and so would likely be populated by consistently cased values (ie they should be all upper case), so:
SELECT *
FROM [dbo].[JobOrders]
WHERE [salesperson] = UPPER('bg')
This would allow an index to still be used for the salesperson column.
You can make it case insensitive by using collation:
SELECT *
FROM [dbo].[JobOrders]
where [salesperson] = 'BG' COLLATE SQL_Latin1_General_CP1_CI_AS

SQL Server 2005 SELECT TOP 1 from VIEW returns LAST row

I have a view that may contain more than one row, looking like this:
[rate] | [vendorID]
8374 1234
6523 4321
5234 9374
In a SPROC, I need to set a param equal to the value of the first column from the first row of the view. something like this:
DECLARE #rate int;
SET #rate = (select top 1 rate from vendor_view where vendorID = 123)
SELECT #rate
But this ALWAYS returns the LAST row of the view.
In fact, if I simply run the subselect by itself, I only get the last row.
With 3 rows in the view, TOP 2 returns the FIRST and THIRD rows in order. With 4 rows, it's returning the top 3 in order. Yet still top 1 is returning the last.
DERP?!?
This works..
DECLARE #rate int;
CREATE TABLE #temp (vRate int)
INSERT INTO #temp (vRate) (select rate from vendor_view where vendorID = 123)
SET #rate = (select top 1 vRate from #temp)
SELECT #rate
DROP TABLE #temp
.. but can someone tell me why the first behaves so fudgely and how to do what I want? As explained in the comments, there is no meaningful column by which I can do an order by. Can I force the order in which rows are inserted to be the order in which they are returned?
[EDIT] I've also noticed that: select top 1 rate from ([view definition select]) also returns the correct values time and again.[/EDIT]
That is by design.
If you don't specify how the query should be sorted, the database is free to return the records in any order that is convenient. There is no natural order for a table that is used as default sort order.
What the order will actually be depends on how the query is planned, so you can't even rely on the same query giving a consistent result over time, as the database will gather statistics about the data and may change how the query is planned based on that.
To get the record that you expect, you simply have to specify how you want them sorted, for example:
select top 1 rate
from vendor_view
where vendorID = 123
order by rate
I ran into this problem on a query that had worked for years. We upgraded SQL Server and all of a sudden, an unordered select top 1 was not returning the final record in a table. We simply added an order by to the select.
My understanding is that SQL Server normally will generally provide you the results based on the clustered index if no order by is provided OR off of whatever index is picked by the engine. But, this is not a guarantee of a certain order.
If you don't have something to order off of, you need to add it. Either add a date inserted column and default it to GETDATE() or add an identity column. It won't help you historically, but it addresses the issue going forward.
While it doesn't necessarily make sense that the results of the query should be consistent, in this particular instance they are so we decided to leave it 'as is'. Ultimately it would be best to add a column, but this was not an option. The application this belongs to is slated to be discontinued sometime soon and the database server will not be upgraded from SQL 2005. I don't necessarily like this outcome, but it is what it is: until it breaks it shall not be fixed. :-x

"Error converting data type varchar to numeric." - What column?

I have a huge INSERT-statement with 200 columns and suddendly I get the dreaded Error converting data type varchar to numeric. Is there somewhere I can see the actual column that contains the "varchar" value? I know I can remove one of the columns at a time until the error disappears, but it's very tedious.
Unfortunately, this error is a serious pain and there's no easy way to troubleshoot it. When I've encountered it in the past, I've always just had to comment out groups of columns until I find the culprit.
Another approach might be to use the ISNUMERIC() function in in T-SQL to try and find the culprit. Assuming every column in your destination table is numeric (adjust accordingly if it's not), you could try this:
SELECT *
FROM SourceTable
WHERE ISNUMERIC(Column1) = 0
OR ISNUMERIC(Column2) = 0
OR ISNUMERIC(Column3) = 0
OR ISNUMERIC(Column4) = 0
...
This will expose the row that contains your non-numeric value, and should make it pretty clear which column it's in. I know it's tedious, but at least it helps you hunt down the actual value, in addition to the column that's causing trouble.
You don't specify SQL Server Version or number of rows.
For SQL2005+ adding the OUTPUT clause to the INSERT might help identify the rogue row in that it will output the inserted rows until it encounters an error so the next row is the one with the problem
DECLARE #Source TABLE
(
Col1 VARCHAR(10),
Col2 VARCHAR(10)
)
INSERT INTO #Source
SELECT '1','1' UNION ALL
SELECT '2','2' UNION ALL
SELECT '3','3' UNION ALL
SELECT '4A','4' UNION ALL
SELECT '5','5'
DECLARE #Destination TABLE
(
Col1 INT,
Col2 VARCHAR(10)
)
INSERT INTO #Destination
OUTPUT inserted.*
SELECT *
FROM #Source
Returns
(5 row(s) affected)
Col1 Col2
----------- ----------
1 1
2 2
3 3
Msg 245, Level 16, State 1, Line 23
Conversion failed when converting the varchar value '4A' to data type int.
Well, this is just a hunch but what about inserting the data to a temporary table and the using the GUI to migrate the data to the other table? If it still generates an error, you should at least be able to get more feedback on that non-numerical column...
If it doesn't work, consider trying this.
Cheers!

Resources