tsql LIKE wildcard comparison finds what = does not - sql-server

This:
select casenumber from case where casenumber = '21-CR-1019'
Returns:
Nothing
While this:
select casenumber from case where casenumber like '%21-CR-1019%'
Returns:
21-CR-1019
The table column definition is varchar(20). I don't think that's a factor.
I checked the data source and didn't find a padded string. Nor does there appear to be a padded string in the value, since the length returned matches the actual string length:
This:
select casenumber, len(casenumber) as StringLength from case where casenumber like '%21-CR-1019%'
Returns:
casenumber StringLength
2021-CR-1019 12
This is SQL Server Standard Edition, 10.50.6560.0 (SQL Server 2008R2).
Does anyone know if there's an explanation for this behavior? I've never, ever seen it.
Does the mix of characters and integers somehow confuse the ability of tsql to infer a data type? Is this a "feature"?

Good Morning 504more!
Yeah, sometimes when it comes to years, we like to think 2021 but accidently only type in 21..
So in your first one.. it's looking for exactly 21-CR-1019, so it doesn't find it.
In your second one, you have the wildcard.. and it finds a match because it finds 2021-CR-1019
Once you said that it was measuring the length as 12.. i knew that your first search was for only 10 characters.
I'm glad I could help you out with that :)

Related

Word popularity leaderboard in SQL Server based message-board

In a SQL server database, I have a table Messages with the following columns:
Id INT(1,1)
Detail VARCHAR(5000)
DatetimeEntered DATETIME
PersonEntered VARCHAR(25)
Messages are pretty basic, and only allow alphanumeric characters and a handful of special characters, which are as follows:
`¬!"£$%^&*()-_=+[{]};:'##~\|,<.>/?
Ignoring the bulk of the special characters bar the apostrophe, what I need is a way to list each word along with how many times the word occurs in the Detail column, which I can then filter by PersonEntered and DatetimeEntered.
Example output:
Word Frequency
-----------------
a 11280
the 10102
and 8845
when 2024
don't 2013
.
.
.
It doesn't need to be particularly clever. It is perfectly fine if dont and don't are treated as separate words.
I'm having trouble splitting out the words into a temporary table called #Words.
Once I have a temporary table, I would apply the following query:
SELECT
Word,
SUM(Word) AS WordCount
FROM #Words
GROUP BY Word
ORDER BY SUM(Word) DESC
Please help.
Personally, I would strip out almost all the special characters, and then use a splitter on the space character. Of your permitted characters, only ' is going to appear in a word; anything else is going to be grammatical.
You haven't posted what version of SQL you're using, so I've going to use SQL Server 2017 syntax. If you don't have the latest version, you'll need to replace TRANSLATE with a nested REPLACE (So REPLACE(REPLACE(REPLACE(REPLACE(... REPLACE(M.Detail, '¬',' '),...),'/',' '),'?',' '), and find a string splitter (for example, Jeff Moden's DelimitedSplit8K).
USE Sandbox;
GO
CREATE TABLE [Messages] (Detail varchar(5000));
INSERT INTO [Messages]
VALUES ('Personally, I would strip out almost all the special characters, and then use a splitter on the space character. Of your permitted characters, only `''` is going to appear in a word; anything else is going to be grammatical. You haven''t posted what version of SQL you''re using, so I''ve going to use SQL Server 2017 syntax. If you don''t have the latest version, you''ll need to replace `TRANSLATE` with a nested `REPLACE` (So `REPLACE(REPLACE(REPLACE(REPLACE(... REPLACE(M.Detail, ''¬'','' ''),...),''/'','' ''),''?'','' '')`, and find a string splitter (for example, Jeff Moden''s [DelimitedSplit8K](http://www.sqlservercentral.com/articles/Tally+Table/72993/)).'),
('As a note, this is going to perform **AWFULLY**. SQL Server is not designed for this type of work. I also imagine you''ll get some odd results and it''ll include numbers in there. Things like dates are going to get split out,, numbers like `9,000,000` would be treated as the words `9` and `000`, and hyperlinks will be separated.')
GO
WITH Replacements AS(
SELECT TRANSLATE(Detail, '`¬!"£$%^&*()-_=+[{]};:##~\|,<.>/?',' ') AS StrippedDetail
FROM [Messages] M)
SELECT SS.[value], COUNT(*) AS WordCount
FROM Replacements R
CROSS APPLY string_split(R.StrippedDetail,' ') SS
WHERE LEN(SS.[value]) > 0
GROUP BY SS.[value]
ORDER BY WordCount DESC;
GO
DROP TABLE [Messages];
As a note, this is going to perform AWFULLY. SQL Server is not designed for this type of work. I also imagine you'll get some odd results and it'll include numbers in there. Things like dates are going to get split out,, numbers like 9,000,000 would be treated as the words 9 and 000, and hyperlinks will be separated.

How to take apart information between hyphens in SQL Server

How would I take apart a column that contains string:
92873-987dsfkj80-2002-04-11
20392-208kj48384-2008-01-04
Data would look like this:
Filename Yes/No Key
Abidabo Yes 92873-987dsfkj80-2002-04-11
Bibiboo No 20392-208kj48384-2008-01-04
Want it to look like this:
Filename Yes/No Key
Abidabo Yes 92873-987dsfkj80-20020411
Bibiboo No 20392-208kj48384-20080104
whereby I would like to concat the dates in the end as 20020411 and 20080104. From the right side, the information is the same always. From the left it is not, otherwise I could have concatenated it. It is not an import issue.
As mentioned in the comments already, storing data like this is a bad idea. However, you can obtain the dates from those strings by using a RIGHT function like so:
SELECT RIGHT('20392-208kj48384-2008-01-04', 10)
Output:
2008-01-04
Depending on the SQLSERVER version you are using, you can use STRING_SPLIT which requieres COMPATIBILITY_LEVEL 130. You can also build your own User Defined Function to split the contents of a field and manipulate it as you need, you can find some useful examples of SPLIT functions in this thread:
Split function equivalent in T-SQL?
Assuming I'm correct and the date part is always on the right side of the string, you can simply use RIGHT and CAST to get the date (assuming, again, that the date is represented as yyyy-mm-dd):
SELECT CAST(RIGHT(YourColumn, 10) As Date)
FROM YourTable
However, Panagiotis is correct in his comment - You shouldn't store data like that. Each column in the database should hold only a single point of data, be it string, number or date.
Update following your comment and the updated question:
SELECT LEFT(YourColumn, LEN(YourColumn) - 10) + REPLACE(RIGHT(YourColumn, 10), '-', '')
FROM YourTable
will return the desired results.

SQL would using between statement improve this?

I want to find out using a select statement what columns in a table share similar information.
Example: Classes table with ClassID, ClassName, ClassCode, ClassDescription columns.
This was part of my SQL class that I already turned in. The question asked "What classes are part of the English department?"
I used this Select statement:
SELECT *
FROM Classes
WHERE ClassName LIKE "English%" OR ClassCode LIKE "ENG%"
Granted we have only input one actual English course in this database, the end result was it executed fine and displayed everything for just the English class. Which I thought was a success since we did populate other non English courses in the database.
Anyways, I was told I should have used a BETWEEN statement.
I am just sitting here thinking they would both do what I needed them to do right?
I'm using SQL Server 2014
No, BETWEEN would probably be a bad idea here. BETWEEN doesn't allow wildcards and doesn't do any pattern matching in any RDBMS I've used. So you'd have to say BETWEEN 'ENG' AND 'English'. Except that doesn't return things like 'English I' (which would be after 'English' in a sorted list).
It would also potentially include something like 'Engineering' or 'Engaging Artistry', but that's a weakness of your existing query, too, since LIKE 'ENG%' matches those.
If you happen to be using a case-sensitive collation you add a whole new dimension of complexity. Your BETWEEN statement gets even more confusing. Just know that capital letters generally come before lower case letters, so 'ENGRAVING I' would be included but 'Engraving I' would not. Additionally, 'eng' would not be included. Note that case-insensitive collation is the default.
Also whats the difference when searching for null values in one table
and one column
column_name =''
or
column_name IS NULL
You're not understanding the difference between an empty string and null.
An empty string is explicit. It says "This field has a known value and it is a string of zero length."
A null string is imprecise. It means "unknown". It could mean "This value wasn't asked for," or "This value was not available," or "This value has not yet been determined," or "This values does not make sense for this record."
"What is this person's middle name?"
"He doesn't have one. See, his birth certificate has no middle name listed." --> Empty string
"I don't know. He never told me and I don't have any birth or identity record." --> NULL
Note that Oracle, due to backwards compatibility, treats empty strings as NULLs. This is explicitly against ANSI SQL, but since Oracle is that old and that's how it's always worked that's how it will continue to work.
Another way to look at it is the example I tend to use with numbers. The difference between 0 and NULL is the difference between having a bank account with $0 balance and not having a bank account at all.
Nothing can be said unless we see table and its data.Though don't use between.
Secondly first find which of the column is not null by design.Say for example ClassName cannot be null then there is no use using ClassCode LIKE "ENG%",just ClassName LIKE "English%" is enough,similarly vice versa is also true.
Thirdly you should use same parameter in both column.for example
ClassName LIKE "English%" OR ClassCode LIKE "English%"
see the difference.
Select * FROM Classes
Where ClassName LIKE "%English%"

SQL LIKE Operator doesn't work with Asian Languages (SQL Server 2008)

Dear Friends,
I've faced with a problem never thought of ever. My problem seems too simple but I can't find a solution to it.
I have a sql server database column that is of type NVarchar and is filled with standard persian characters. when I'm trying to run a very simple query on it which incorporates the LIKE operator, the resultset becomes empty although I know the query term is present in the table. Here is the very smiple example query which doesn't act corectly:
SELECT * FROM T_Contacts WHERE C_ContactName LIKE '%ف%'
ف is a persian character and the ContactName coulmn contains multiple entries which contain that character.
Please tell me how should I rewrite the expression or what change should I apply. Note that my database's collation is SQL_Latin1_General_CP1_CI_AS.
Thank you very much
Also, if those values are stored as NVARCHAR (which I hope they are!!), you should always use the N'..' prefix for any string literals to make sure you don't get any unwanted conversions back to non-Unicode VARCHAR.
So you should be searching:
SELECT * FROM T_Contacts
WHERE C_ContactName COLLATE Persian_100_CI_AS LIKE N'%ف%'
Shouldn't it be:
SELECT * FROM T_Contacts WHERE C_ContactName LIKE N'%ف%'
ie, with the N in front of the comparing string, so it treats it like an nvarchar?

Function to find the Exact match in Microsoft SQL Server

What is the way to find the exactly matching substring in the given string in Microsoft SQL server?
For example, in the string '0000020354', I want to find '20354'. Of course it has to be an exact match. I tried to use CHARINDEX(#providerId, external_prv_id) > -1, but the problem with CHARINDEX is that it gives me the index as soon as it finds the first match.
Basically I am looking for function like indexOf("") in Microsoft SQL SERVER.
Assuming #ProviderId is a VARCHAR
You could just use LIKE :
SELECT Id FROM TableName WHERE Column LIKE '%' + #ProviderId + '%'
Which will return rows where Column contains 2034.
And if you don't want to use LIKE, You can use PATINDEX:
SELECT Id FROM TableName WHERE PATINDEX('%' + #ProviderId + '%', Column) > 0
Which returns the starting position of any match that it finds.
What's the data you're storing? It sounds like another storage type (e.g. a separate table) might be more suitable.
Ahh, 2034 was a typo. What I don't understand from your question is that you say you need the exact match. If CHARINDEX returns non-zero for '20354' you know that it's matched '20354'. If you don't know what #providerId is, return that in your query along with the result of CHARINDEX. Similarly, if you want external_prv_id, include that, e.g.:
SELECT external_prv_id, CHARINDEX(#providerId, external_prv_id)
WHERE CHARINDEX(#providerId, external_prv_id) > 0
(Note that CHARINDEX returning 0 means it was not found.)
If you actually mean that '20354' could include wildcards, you need PATINDEX.
The LIKE %VAL% stuff will be overly broad, e.g. the database contains 00000012345 and you search for 1234 you'll pull this row, which is what the OP does not intend (if I'm understanding the "EXACT" part correctly).
What you want is a regular expression that does something like: any number of zeroes followed by the match and end of line.
From this question we know how to trim leading zeroes:
Better techniques for trimming leading zeros in SQL Server?
SUBSTRING(str_col, PATINDEX('%[^0]%', str_col+'.'), LEN(str_col))
So, combine that with your query, and you can do something like the following:
WHERE SUBSTRING(external_prv_id, PATINDEX('%[^0]%', external_prv_id+'.'), LEN(external_prv_id)) = '12345'
Of course, the better (best?) solution would be to store them as INTEGERS so you get full indexability and don't have to muck with all of this crap. If you REALLY need to store the exact string then you have a couple of options:
store the normalized integer results
in another column and use that for
all internal queries
always store an integer but then pad
with zeros upon query (my vote)

Resources