Why is Latin1_General_CS_AS not case sensitive? - sql-server

For LIKE queries, the Latin1_General_CS_AS collation is not case-sensitive. According to a bug report to Microsoft, this was listed as "By Design".
However, the Latin1_General_Bin collation is also case-sensitive and works exactly as expected for LIKE queries.
You can see the difference in this simple query:
SELECT
MyColumn AS Latin1_General_Bin
FROM MyTable
WHERE MyColumn LIKE '%[a-z]%' COLLATE Latin1_General_Bin;
SELECT
MyColumn AS Latin1_General_CS_AS
FROM MyTable
WHERE MyColumn LIKE '%[a-z]%' COLLATE Latin1_General_CS_AS;
SQL Fiddle Demo.
My questions are:
Why would this be considered "By Design" to be case-insensitive in LIKE?
If this really is better, why is it a different behavior between the two case sensitive collations _Bin and _CS_AS?
I was going to standardize on Latin1_General_CS_AS for any case-sensitive databases going forward, but this seems like a subtle query bug waiting to happen.

It is not a regular expression. The range [a-z] just means >='a' AND <='z'.
Under that collation that includes all letters except capital Z.
Under SQL_Latin1_General_CP1_CS_AS all except capital A fall within that sort order.
In case that is still not clear review the sort orders for the following; for the three different collations
SELECT *
FROM (VALUES ('A'),('B'),('Y'),('Z'), ('a'),('b'),('y'),('z')) V(C)
ORDER BY C COLLATE Latin1_General_Bin
You see that the binary collation has all the upper case letters together, the other two don't.
+--------------------+----------------------+-------------------------------+
| Latin1_General_Bin | Latin1_General_CS_AS | SQL_Latin1_General_CP1_CS_ASĀ  |
+--------------------+----------------------+-------------------------------+
| A | a | A |
| B | A | a |
| Y | b | B |
| Z | B | b |
| a | y | Y |
| b | Y | y |
| y | z | Z |
| z | Z | z |
+--------------------+----------------------+-------------------------------+
This is documented in BOL
In range searches, the characters included in the range may vary
depending on the sorting rules of the collation.

Related

SQL Server select from multiple tables stored in a column or list

Because it is too complicated to solve this problem without real data, I will try to add some:
| tables 1 | table 2 | ... | table n
---------------------------------------------------------------------------------------
columns_name: | name | B | C | D | name | B | C | D | ... | name | B | C | D
---------------------------------------------------------------------------------------
column_content:| John | ... | Ben | ... | ... | John| ...
The objective is to extract the rows in the N tables where name = 'John'.
Where we already have a table called [table_names] with the n tables names stored in the column [column_table_name].
Now we want to do something like that:
SELECT [name]
FROM (SELECT [table_name]
FROM INFORMATION_SCHEMA.TABLES)
WHERE [name] = 'Jonh'
Tables names are dynamic and thus unknown until we run the information_schema.tables query.
This final query is giving me an error. Any clue about how to use multiple stored tables names in a subquery?
You need to alias your subquery in order to reference it. Plus name should be table_name
SELECT [table_name]
FROM (SELECT [table_name]
FROM INFORMATION_SCHEMA.TABLES) AS X
WHERE [table_name] = 'Jonh'

Microsoft SQL Server max of character and numeric values in same column

I am taking the max of a column that contains both numeric and varchar values (i.e. '2008', 'n/a'). What is considered the max? The string or numeric value?
I am working in Microsoft SQL Server.
Thanks!
The Numeric value is actually a string.
MAX finds the highest value in the collating sequence
For collating sequence of ASCII chars refer below link.
https://www.ibm.com/support/knowledgecenter/SSQ2R2_9.5.1/com.ibm.ent.cbl.zos.doc/PGandLR/ref/rlebcasc.html
For character columns, MAX finds the highest value in the collating sequence.
- max() docs
It will be the same value as if you order by col desc
Here are some values thrown into a column and sorted descending:
+------------+
| col |
+------------+
| Z |
| na |
| n/a/ |
| 9999999999 |
| 30 |
| 2008 |
| 00000000 |
+------------+
The max() would be the first value from the above. Z.
rextester demo: http://rextester.com/IXXX76837
The exact order will depend on your the collation of your column (most default to Latin1_General_CI_AS).
Here is a demo that shows you the sort order for each character for some different collations (latin general / latin binary)
rextester demo: http://rextester.com/WLJ38844
create table one
(
Col1 varchar(200)
)
insert into one(Col1)
values('2008'),('n/a'),('aaaa'),('bbb'),('zzzz')
select max(Col1) from one
--zzzz
n/a would be max
It is better to eliminate the known 'n/a' values and cast them to some thing meaningful while applying the max or min functions.
select max(cast(column as int))
from tablename
where column != 'n/a'

SQL Server Indexes - Column Order

Going of the diagram here: I'm confused on column 1 and 3.
I am working on an datawarehouse table and there are two columns that are used as a key that gets you the primary key.
The first column is the source system. there are three possible values Lets say IBM, SQL, ORACLE. Then the second part of the composite key is the transaction ID it could ne numerical or varchar. There is no 3rd column. Other than the secret key which would be a key generated by Identity(1,1) as the record gets loaded. So in the graph below I imagine if I pass in a query
Select a.Patient,
b.Source System,
b.TransactionID
from Patient A
right join Transactions B
on A.sourceSystem = B.sourceSystem and
a.transactionID = B.transactionID
where SourceSystem = "SQL"
The graph leads me to think that column 1 in the index should be set to the SourceSystem. Since it would immediately split the drill down into the next level of index by a 3rd. But when showing this graph to a coworker, they interpreted it as column 1 would be the transactionID, and column 2 as the source system.
Cols
1 2 3
-------------
| | 1 | |
| A |---| |
| | 2 | |
|---|---| |
| | | |
| | 1 | 9 |
| B | | |
| |---| |
| | 2 | |
| |---| |
| | 3 | |
|---|---| |
First, you should qualify all column names in a query. Second, left join usually makes more sense than a right join (the semantics are keep all columns in the first table). Finally, if you have proper foreign key relationships, then you probably don't need an outer join at all.
Let's consider this query:
Select p.Patient, t.Source System, t.TransactionID
from Patient p join
Transactions t
on t.sourceSystem = p.sourceSystem and
t.transactionID = p.transactionID
where t.SourceSystem = 'SQL';
The correct index for this query is Transactions(SourceSystem, TransactionId).
Notes:
Outer joins affect the choice of indexes. Basically if one of the tables has to be scanned anyway, then an index might be less useful.
t.SourceSystem = 'SQL' and p.SourceSystem = 'SQL' would probably optimize differently.
Does the patient really have a transaction id? That seems strange.

Data Type conversion error in Pivot Table

I have a database that stores our company's positions and "requirements". I.e. each position needs to have undergone a building induction, etc., etc. There's a program that allows the users to see/manage all this, but of course, someone wants an export for a client and it's not really possible with the current setup, I'm thinking a quick pivot table will get the job done though.
I have the following tables;
---------------------------
| Positions |
---------------------------
| PositionID | int |
| PositionName | nvarchar |
---------------------------
------------------------------
| Requirements |
------------------------------
| RequirementID | int |
| RequirementName | nvarchar |
| RequirementType | bit |
------------------------------
-------------------------
| Position Requirements |
-------------------------
| Position_ID | int |
| Requirement_ID | int |
-------------------------
What I would like to do is pull out the data for a specific Position or Positions, i.e. SELECT * FROM Positions WHERE PositionName LIKE '%Manager%';
These Positions would form the leftmost column of the PivotTable.
For the top row of the PivotTable, I would like to have each RequirementName.
The internal data would be the RequirementType field (i.e. '0' or '1', maybe 'Any' / 'All').
I've read and read and read, but I can never quite seem to get my head around the concept of them, so this is my current attempt;
SELECT *
FROM Requirements
PIVOT (MAX(RequirementType) FOR RequirementName IN ([Requirement], [Names], [Go], [Here])) AS pivtable
WHERE [Requirement], [Names], [Go], [Here] IN (
SELECT RequirementName FROM Requirements WHERE RequirementID IN (
SELECT Requirement_ID FROM PositionRequirements WHERE Position_ID IN (
SELECT PositionID FROM Positions WHERE PositionName LIKE '%Manager%')));
Your PIVOT query is not arranged properly. Try this one:
SELECT * FROM
(
SELECT
PositionName,
RequirementType,
RequirementName
FROM [Position Requirements] A
LEFT JOIN Positions B ON A.Position_ID=B.PositionID
LEFT JOIN Requirements C ON A.Requirement_ID=C.RequirementID
WHERE PositionName LIKE '%Manager%'
) AS TABLE
PIVOT(MAX(RequirementType) FOR RequirementName IN ([Requirement],[Names],[Go],[Here])AS pvt

Find primary key from one table in comma separated list

I've been given the task at work of creating a report based on a very poorly designed table structure.
Consider the following two tables. They contain techniques that each person likes to perform at each gym. Keep in mind that a unique person may show up on multiple rows in the PERSONNEL table:
PERSONNEL
+-----+-----+-------+--------+-----------+
| ID | PID | Name | Gym | Technique |
+-----+-----+-------+--------+-----------+
| 1 | 122 | Bob | GymA | 2,3,4 |
+-----+-----+-------+--------+-----------+
| 2 | 131 | Mary | GymA | 1,2,4 |
+-----+-----+-------+--------+-----------+
| 3 | 122 | Bob | GymB | 1,2,3 |
+-----+-----+-------+--------+-----------+
TECHNIQUES
+-----+------------+
| ID | Technique |
+-----+------------+
| 1 | Running |
+-----+------------+
| 2 | Walking |
+-----+------------+
| 3 | Hopping |
+-----+------------+
| 4 | Skipping |
+-----+------------+
What I am having trouble coming up with is a MSSQL query that will reliably give me a listing of every person in the table that is performing a certain technique.
For instance, let's say that I want a listing of every person that likes skipping. The desired results would be:
PREFERS_SKIPPING
+-----+-------+--------+
| PID | Name | Gym |
+-----+-------+--------+
| 122 | Bob | GymA |
+-----+-------+--------+
| 131 | Mary | GymA |
+-----+-------+--------+
Likewise hopping:
PREFERS_HOPPING
+-----+-------+--------+
| PID | Name | Gym |
+-----+-------+--------+
| 122 | Bob | GymA |
+-----+-------+--------+
| 122 | Bob | GymB |
+-----+-------+--------+
I can break out the strings easily in ColdFusion, but that isn't an option due to the size of the PERSONNEL table. Can anyone help?
I think this query looks cleaner:
SELECT p.*,
t.Technique as ParsedTechnique
FROM Personnel p
JOIN Techniques t
ON CHARINDEX((','+CAST(t.id as varchar(10))+','), (','+p.technique+',')) > 0
WHERE t.id ='1';
You can just change the WHERE t.id = to whatever TechniqueId you need.
Fiddle Here
Using this function
Create FUNCTION F_SplitAsIntTable
(
#txt varchar(max)
)
RETURNS
#tab TABLE
(
ID int
)
AS
BEGIN
declare #i int
declare #s varchar(20)
Set #i = CHARINDEX(',',#txt)
While #i>1
begin
set #s = LEFT(#txt,#i-1)
insert into #tab (id) values (#s)
Set #txt=RIGHT(#txt,Len(#txt)-#i)
Set #i = CHARINDEX(',',#txt)
end
insert into #tab (id) values (#txt)
RETURN
END
You can query like this
declare #a Table (id int,Name varchar(10),Kind Varchar(100))
insert into #a values (1,'test','1,2,3,4'),(2,'test2','1,2,3,5'),(3,'test3','3,5')
Select a.ID,Name
from #a a
cross apply F_SplitAsIntTable(a.Kind) b
where b.ID=2
One of the problems you have to prevent is prevent "1" from matching "10" and "11". For this, you want to be sure that all values are delimited by the separator (in this case a comma).
Here is a method using like that should work effectively (although performance will not be so great):
SELECT p.*, t.Technique as ParsedTechnique
FROM Personnel p join
Techniques t
on ','+p.technique+',' like '%,'+cast(t.id as varchar(255))+',%'
WHERE t.id = 1;
If performance is an issue, then fix your data structure an include a PersonTechniques table so you can do a proper join.
The first comment under the question provided the link to the answer. Here's what I ended up going with:
WHERE
p.Technique LIKE '%,29,%' --middle
OR
p.Technique LIKE '29,%' --start
OR
p.Technique LIKE '%,29' --end
OR
p.Technique = '29' --single (good point by Cheran S in comment)
At initial glance I thought it wouldn't work, but clever use of % made it not match ids like 129, etc.

Resources