Force T-SQL query to be case sensitive in MS - sql-server

I have a table that originates in an old legacy system that was case senstive, in particular a status column where 's' = 'Schedule import' and 'S' = 'Schedule management'. This table eventually makes its way into a SQL Server 2000 database which I can query against. My query is relatively simple just going for counts...
Select trans_type, count(1) from mytable group by trans_type
This is grouping the counts for 'S' along with the 's' counts. Is there any way to force a query to be cap sensitive? I have access to both SQL Server 2000 and 2005 environments to run this, however have limited admin capability on the server (so I can't set server attributes)... I guess I could move the data to my local and setup something on my local where I have full access to server options, but would prefer a tsql solution.

select trans_type collate SQL_Latin1_General_CP1_CS_AS, count(*)
from mytable
group by trans_type collate SQL_Latin1_General_CP1_CS_AS
You can do this with =, like, and other operators as well. Note that you must modify the select list because you are no longer grouping by trans_type, you are now grouping by trans_type collate SQL_Latin1_General_CP1_CS_AS. Kind of a gotcha.

Can you introduce a trans_type_ascii column with the ascii value of the trans_type and group on that instead? Or any other column you can use (isUpperCase) to distinguish them.

Related

SQL Server : finding substring using PATINDEX function

I'm writing different queries in SQL Server.
I have 2 tables, Employees and Departments.
Table Employees consists of EMPLOYEE_ID, ENAME, ID_DEP - department id. Table Departments consists of ID_DEP, DNAME.
The task is to show Employee.ENAME and his Department.DNAME where Department.DNAME has word Sales inside. I have to use functions SUBSTRING and PATINDEX.
Here is my code, but I think that it looks quite strange and it's meaningless. Nevertheless I need to use both functions in this task.
SELECT e.ENAME, d.DNAME
FROM EMPLOYEE e
JOIN DEPARTMENTS d ON d.ID_DEP = e.ID_DEP
WHERE UPPER(SUBSTRING(d.DNAME, (PATINDEX('%SALES%', d.DNAME)), 5)) = 'SALES'
Any ideas what should I change while continuing using these two functions?
The answer is just below, and BTW, using row constructor VALUES is an excellent mean to get a simple demo of what you want.
The query below provides several possible answers to your ambiguous question. Why would you need to use these functions? Is it an homework that specify this? If your SQL Server database was installed with a case insensitive collation, or the column 'name' was set to this collation, no matter how UPPER is used, it will makes no difference in match. The most you can get of UPPER is to make the data appears uppercase in the result, or turn data to uppercase if you update the column. PATINDEX/LIKE are going to perform case insensitive match. And you know, this is so useful, that most people configure their server with some case insensitive collation. To circumvent default comparison behavior that match the column/database collation, specify the collate clause, as in the outer apply of Test2.
Here are the queries. Watch the results, they show what I said.
select *
From
(Values ('très sales'), ('TRES SALES'), ('PLUTOT PROPRE')) as d(name)
outer apply (Select Test1='match' Where Substring(name, patindex('%SALES%', name), 5) = 'SALES') as test1
outer apply (Select Test2='match' Where name COLLATE Latin1_General_CS_AS like '%SALES%' ) as test2 -- CS_AS mean case sensitive
outer apply (Select Test3='match' Where name like '%SALES%') as test3
select * -- really want an upper case match ?
From
(Values ('très sales'), ('TRES SALES'), ('PLUTOT PROPRE')) as d(name)
Where name COLLATE Latin1_General_CS_AS like '%SALES%'
select * -- demo of patindex
From
(Values ('très sales'), ('TRES SALES'), ('PLUTOT PROPRE')) as d(name)
outer apply (Select ReallyUpperMatch=name Where patindex('%SALES%', name COLLATE Latin1_General_CS_AS)>0 ) as ReallyUpperMatch -- CI_AS mean case sensitive
outer apply (Select ciMatch=name Where name like '%SALES%' ) as ciMatch
outer apply (Select MakeItLookUpper=UPPER(ciMatch) ) MakeItLookUpper

CREATE VIEW in SQL Server using UNION ALL

Please consider the below example:
CREATE VIEW VW_YearlySales
AS
SELECT 2011 AS YearNo, ProductID, SUM(Amount) FROM InvoiceTable2011
UNION ALL
SELECT 2012 AS YearNo, ProductID, SUM(Amount) FROM InvoiceTable2012
UNION ALL
SELECT 2013 AS YearNo, ProductID, SUM(Amount) FROM InvoiceTable2013
GO
The InvoiceTable2013 doesn't exist actually and I don't want to create it right now, it will be created automatically when recording the first invoice for year 2013.
Can anyone help me on how to specify a condition that will verify the existence of the table before doing the UNION ALL?
Many thanks for your help.
As others have correctly said, you can't achieve this with a view, because the select statement has to reference a concrete set of tables - and if any of them don't exist, the query will fail to execute.
It seems to me like your problem is more fundamental. Clearly there should conceptually be exactly one InvoiceTable, with rows for different dates. Separating this out into different logical tables by year is presumably something that's been done for optimisation (unless the columns are different, which I very much doubt).
In this case, partitioning seems like the way to remedy this problem (partitioning large tables by year/quarter/month is the canonical example). This would let you have a single InvoiceTable logically, yet specify that SQL Server should store the data behind the scenes as if it were different tables split out by year. You get the best of both worlds - an accurate model, and fast performance - and this makes your view definition simple.
No, according to my knowledge its not possible in view, you have to use Stored Procedure. In Stored Procedure you can validate table existance & based on the existance of that table you can change your SQL.
EDIT:
CREATE PROCEDURE GetYearlySales
AS
IF (EXISTS (SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'InvoiceTable2013'))
BEGIN
SELECT 2011 AS YearNo, ProductID, SUM(Amount) FROM InvoiceTable2011
UNION ALL
SELECT 2012 AS YearNo, ProductID, SUM(Amount) FROM InvoiceTable2012
UNION ALL
SELECT 2013 AS YearNo, ProductID, SUM(Amount) FROM InvoiceTable2013
END
ELSE
BEGIN
SELECT 2011 AS YearNo, ProductID, SUM(Amount) FROM InvoiceTable2011
UNION ALL
SELECT 2012 AS YearNo, ProductID, SUM(Amount) FROM InvoiceTable2012
END
Looks like you want to have a table for every year and you want to ensure that you have the query for SP without modifying the SP. This is slightly risky , you will have to maintain the naming conventions all the time. In this case what you will have to do is query the informationschema tables for table_name like 'InvoiceTable%'. Get the records in a table and then loop through the records attaching the fixed SQL. And then execute the dynamic sql like its done here http://www.vishalseth.com/post/2008/07/10/Dynamic-SQL-sp_executesql.aspx

Wrong case in subquery column name causes incorrect results, but no error

Using SQL Server Management Studio, I am getting some undesired results (looks like a bug to me..?)
If I use (FIELD rather than field for the other_table):
SELECT * FROM main_table WHERE field IN (SELECT FIELD FROM other_table)
I get all results from main_table.
Using the correct case:
SELECT * FROM main_table WHERE field IN (SELECT field FROM other_table)
I get the expected results where field appears in other.
Running the subquery on it's own:
SELECT FIELD FROM other_table
I get an invalid column name error.
Surely I should get this error in the first case?
Is this related to collation?
The DB is binary collation.
The server is case insensitive however.
It seems to me like the server component is saying "this code is OK" and not allowing the DB to say the field is the wrong name..?
What are my options for a solution?
Let's illustrate what is happening using something that doesn't depend on case sensitivity:
USE tempdb;
GO
CREATE TABLE dbo.main_table(column1 INT);
CREATE TABLE dbo.other_table(column2 INT);
INSERT dbo.main_table SELECT 1 UNION ALL SELECT 2;
INSERT dbo.other_table SELECT 1 UNION ALL SELECT 3;
SELECT column1 FROM dbo.main_table
WHERE column1 IN (SELECT column1 FROM dbo.other_table);
Results:
column1
-------
1
2
Why doesn't that raise an error? SQL Server is looking at your queries and seeing that the column1 inside can't possibly be in other_table, so it is extrapolating and "using" the column1 that exists in the outer referenced table (just like you could reference a column that only exists in the outer table without a table reference). Think about this variation:
SELECT [column1] FROM dbo.main_table
WHERE EXISTS (SELECT [column1] FROM dbo.other_table WHERE [column2] = [column1]);
Results:
column1
-------
1
Again SQL Server knows that column1 in the where clause also doesn't exist in the locally referenced table, but it tries to find it in the outer scope. So in an imaginary world you might consider the query to actually be saying:
SELECT m.[column1] FROM dbo.main_table AS m
WHERE EXISTS (SELECT m.[column1] FROM dbo.other_table AS o WHERE o.[column2] = m.[column1]);
(Which is not how I typed it, but if I do type it that way, it still works.)
It doesn't make logical sense in some of the cases but this is the way the query engine does it and the rule has to be applied consistently. In your case (no pun intended), you have an extra complication: case sensitivity. SQL Server didn't find FIELD in your subquery, but it did find it in the outer query. So a couple of lessons:
Always prefix your column references with the table name or alias (and always prefix your table references with the schema).
Always create and reference your tables, columns and other entities using consistent case. Especially when using a binary or case-sensitive collation.
Very interesting find. The unspoken mandate is that you always should alias tables in your subqueries and use those aliases to be explicit about which table your column comes from. Subqueries allow you to make reference to a field from your outer query which is the cause of your issue, but in your scenario I would agree that either the default should be the internal query's field list, or to give you a column ambiguity error. Regardless, this method below is always preferable:
select * from main_table a where a.field in
(select x.field from other_table x)

SQL Server Pagination w/o row_number() or nested subqueries?

I have been fighting with this all weekend and am out of ideas. In order to have pages in my search results on my website, I need to return a subset of rows from a SQL Server 2005 Express database (i.e. start at row 20 and give me the next 20 records). In MySQL you would use the "LIMIT" keyword to choose which row to start at and how many rows to return.
In SQL Server I found ROW_NUMBER()/OVER, but when I try to use it it says "Over not supported". I am thinking this is because I am using SQL Server 2005 Express (free version). Can anyone verify if this is true or if there is some other reason an OVER clause would not be supported?
Then I found the old school version similar to:
SELECT TOP X * FROM TABLE WHERE ID NOT IN (SELECT TOP Y ID FROM TABLE ORDER BY ID) ORDER BY ID where X=number per page and Y=which record to start on.
However, my queries are a lot more complex with many outer joins and sometimes ordering by something other than what is in the main table. For example, if someone chooses to order by how many videos a user has posted, the query might need to look like this:
SELECT TOP 50 iUserID, iVideoCount FROM MyTable LEFT OUTER JOIN (SELECT count(iVideoID) AS iVideoCount, iUserID FROM VideoTable GROUP BY iUserID) as TempVidTable ON MyTable.iUserID = TempVidTable.iUserID WHERE iUserID NOT IN (SELECT TOP 100 iUserID, iVideoCount FROM MyTable LEFT OUTER JOIN (SELECT count(iVideoID) AS iVideoCount, iUserID FROM VideoTable GROUP BY iUserID) as TempVidTable ON MyTable.iUserID = TempVidTable.iUserID ORDER BY iVideoCount) ORDER BY iVideoCount
The issue is in the subquery SELECT line: TOP 100 iUserID, iVideoCount
To use the "NOT IN" clause it seems I can only have 1 column in the subquery ("SELECT TOP 100 iUserID FROM ..."). But when I don't include iVideoCount in that subquery SELECT statement then the ORDER BY iVideoCount in the subquery doesn't order correctly so my subquery is ordered differently than my parent query, making this whole thing useless. There are about 5 more tables linked in with outer joins that can play a part in the ordering.
I am at a loss! The two above methods are the only two ways I can find to get SQL Server to return a subset of rows. I am about ready to return the whole result and loop through each record in PHP but only display the ones I want. That is such an inefficient way to things it is really my last resort.
Any ideas on how I can make SQL Server mimic MySQL's LIMIT clause in the above scenario?
Unfortunately, although SQL Server 2005 Row_Number() can be used for paging and with SQL Server 2012 data paging support is enhanced with Order By Offset and Fetch Next, in case you can not use any of these solutions you require to first
create a temp table with identity column.
then insert data into temp table with ORDER BY clause
Use the temp table Identity column value just like the ROW_NUMBER() value
I hope it helps,

How to set collation for a connection in SQL Server?

How can i set the collation SQL Server will use for the duration of that connection?
Not until i connect to SQL Server do i know what collation i want to use.
e.g. a browser with language fr-IT has connected to the web-site. Any queries i run on that connection i want to follow the French language, Italy variant collation.
i envision a hypothetical connection level property, simlar to SET ANSI_NULLS OFF, but for collation1:
SET COLLATION_ORDER 'French_CI_AS'
SELECT TOP 100 FROM Orders
ORDER BY ProjectName
and later
SELECT * FROM Orders
WHERE CustomerID = 3277
AND ProjectName LIKE '%l''ecole%'
and later
UPDATE Quotes
SET IsCompleted = 1
WHERE QuoteName = 'Cour de l''école'
At the same time, when a chinese customer connects:
SET COLLATION_ORDER Chinese_PRC_CI_AI_KS_WS
SELECT TOP 100 FROM Orders
ORDER BY ProjectName
or
SELECT * FROM Orders
WHERE CustomerID = 3277
AND ProjectName LIKE '學校'
or
UPDATE Quotes
SET IsCompleted = 1
WHERE QuoteName = '學校的操場'
Now i could alter every SELECT statement in the system to allow me to pass in a collation:
SELECT TOP 100 FROM Orders
WHERE CustomerID = 3278
ORDER BY ProjectName COLLATE French_CI_AS
But you cannot pass a collation order as a parameter to a stored procedure:
CREATE PROCEDURE dbo.GetCommonOrders
#CustomerID int, #CollationOrder varchar(50)
AS
SELECT TOP 100 FROM Orders
WHERE CustomerID = #CustomerID
ORDER BY ProjectName COLLATE #CollationOrder
And the COLLATE clause can't help me when performing an UPDATE or a SELECT.
Note: All string columns in the database all are already nchar, nvarchar or ntext. i am not talking about the default collation applied to a server, database, table, or column for non-unicode columns (i.e. char, varchar, text). i am talking about the collation used by SQL Server when comparing and sorting strings.
How can i specify per-connection collation?
See also
Similar question, but for ADO.net and connection strings
Similar question, but for ASP.net MVC2 and MySQL
1 hypothetical sql that exhibits locale issues
As marc_s commented, the collation is a property of a database or a column, and not of a connection.
However, you can override the collation on statement level using the COLLATE keyword.
Using your examples:
SELECT * FROM Orders
WHERE CustomerID = 3277
AND ProjectName COLLATE Chinese_PRC_CI_AI_KS_WS LIKE N'學校'
UPDATE Quotes
SET IsCompleted = 1
WHERE QuoteName COLLATE Chinese_PRC_CI_AI_KS_WS = N'學校的操場'
Still, I cannot find a statement on using COLLATE with a dynamic collation name, leaving as only possible solution dynamic SQL and EXEC. See this social.MSDN entry for an example.

Resources