mssql search a varchar field for invalid typo - sql-server

I have a field with names in it. They can be last name, first name middle name/initial
Basically I want to find all names that aren't normal spellings so I can tell someone to fix their names in the system.
I don't want to select and find this guy
O'Leary-Smith, Timothy L.
But I would want to find this guy
<>[]}{##$%^&*()/?=+_!|";:~`1234567890
I can just keep coming up with special characters to search for but then I'm just making this huge query and having to bracket wildcards... it's like 50+ lines long just to say one thing.
Is there something (not some custom function)
that lets me say
where name not like
A-Z
a-z
,
.
'
-
possibly something that is
where name contains anything but these ascii characters

Hopefully this is a one of fix-up; a negated character class:
where patindex('%[^ A-Za-z,.''-]%', name) > 0
Although more letters than A-Z can appear in names ...

If it's just odd characters you're looking for:
WHERE name like '%[^A-Za-z]%'
The ^ acts as a NOT operator.

Related

Replacing Unicode characters using TRANSLATE function

A customer asked to create a custom character mapper function from specific names to ASCII in their SQL database.
Here is a simplified fragment that works (shortened for brevity):
select TRANSLATE(N'àáâãäåāąæậạả',
N'àáâãäåāąæậạả',
N'aaaaaaaaaaaa');
While analyzing the results on customer's dataset, I noticed one more unmapped symbol ă. So I added it to the mapper as follows:
select TRANSLATE(N'àáâãäåāąæậạảă',
N'àáâãäåāąæậạảă',
N'aaaaaaaaaaaaa');
Unexpectedly, it started failing with the message:
The second and third arguments of the TRANSLATE built-in function must contain an equal number of characters.
Obviously, TRANSLATE thinks that ă is special and consists of more than one character. Actually, even Notepad thinks the same (copy ă and try to delete it using Backspace key - something unusual will happen. Delete key works normally, though).
Then I thought - if TRANSLATE considers it a two-char symbol, let's add a two char mapping then:
select TRANSLATE(N'àáâãäåāąæậạảă',
N'àáâãäåāąæậạảă',
N'aaaaaaaaaaaaaa');
No errors this time, yay. But the input string was not processed correctly, ă was not replaced with a.
What is the correct (case-sensitive) way to replace such "double symbols"? Can it be done using TRANSLATE at all? I don't want to add a bunch of REPLACE for every such symbol I find.

SQL Pattern matching not giving the correct output

I am trying to find a certain set of characters in a column from a datatable. I have tried the pattern that seems more logical to me (right below) but it doesn't seem to be doing the job. What I wish to achieve is a pattern where I have something like '["5"]', basically with: square brackets, quotation marks, any integer number, quotation marks, square brackets. The output I am getting is just empty, and I can't seem to undersand why. Besides this, I would like to update the records that do not follow this pattern to follow it. Does anyone have a solution for this?
To give you some context, here is the test table:
I want to achive only the last three records.
Here is what I have tried:
SELECT ToJsonTestValue
FROM Test
WHERE ToJsonTestValue LIKE '["%"]'
and
UPDATE dbo.Test
SET ToJsonTestValue = '["'+ToJsonTestValue+'"]'
WHERE ToJsonTestValue LIKE '#';
You have a couple of problem here. Firstly you have the square brackets, which needs escaping. Then you also use % which is a multi character wildcard, however, it appears that you want a single character. It also appears that that character can only be an integer, so you might want to be more specific. Either of these should give you the result you want:
--Using single character wildcard:
SELECT *
FROM (VALUES('["1"]'),('["["1"]"]'))V(S)
WHERE V.S LIKE '[[]"_"[\]]' ESCAPE '\';
--Specifically requiring int:
SELECT *
FROM (VALUES('["1"]'),('["["1"]"]'))V(S)
WHERE V.S LIKE '[[]"[0-9]"[\]]' ESCAPE '\';

Full text index doesn't work at single word?

I have a full text index on many columns of the customer table, one of which columns is fname.
The following query:
select * from customer where fname like 'In%' and code='1409584557891'
returns me the line needed, this customer has an fname of 'In' .But if I add this to the end:
and contains((customer.fname) , N'"In*"')
an empty result-set is retuned. Why?
Also: there is another column named lname. If I add the equivelant contains command with the column and its value altered, it works!
There is a good chance "In" is a noise word. I also believe that if you do a fulltextsearch for something too short like the letter 'a' it is simply considered a noise word. See if 'a' or 'I' gives you anything.
Here is a link that can provide information on changing the noise words around if that is the case.
https://www.mssqltips.com/sqlservertip/1491/sql-server-full-text-search-noise-words-and-thesaurus-configurations/
You may also be able to simply turn off noise or 'stop' words:
https://dba.stackexchange.com/questions/135062/sql-server-no-search-results-caused-by-noise-words

Sql Server - Encoding issue, replace strange characters

After importing some data into a Sql 2014 database, I realized that there are some fields in which the data replaced German characters such as (ü, ß, ä, ö, etc) with some weird characters. Ex.
München should be München
ChiemgaustraÃe should be Chiemgaustraße
Königstr should be Königstr
I would like to replace these characters with the right German letter. Ex.
ü -> ü
à - > ß
ö -> ö
However when I run queries like the following to try to identify which rows have these characters, the queries returns 0 rows.
select address
from Directory
where street like N'%ChiemgaustraÃe 50%'
select address
from Directory
where street like N'%ü%'
Is there a query I can run to identify and replace these characters?
I must clarify that most of the data was imported correctly, in fact I believe the strange characters were already part of the original data.
Also, I think I can export the data to a text file, replace the characters and re-import, but I was wondering if there is a way to do it directly in sql.
Thanks in advance for the help.
I couldn't get it fix using only sql.
FutbolFa suggestion worked for the most part but there were a couple of symbols, in particular "Ã" that wasn't picked up by any query a tried. I ended up exporting the data to a text file and replacing the symbols there. Then I just re-imported the info.

Guidance for MS SQL Delete query

In my SQL Server database there is scenario like database have one primary key and primary key is in format like '0000100001' and 'C100001'
I want to delete the all records from database which starts with '0' but not the records starts with 'C'.
I tried the inbuilt function SUBSTRING('primary_key',1,1)='0' but it did not helped me..
Thank You..
SUBSTRING('primary_key',1,1)='0'
tests whether the string literal "primary_key" starts with the character 0 (which it doesn't so will return zero rows), Get rid of the single quotes to reference the column. (NB: If your column is not actually called primary_key you will need to reference its actual name of course!)
Or alternatively you can use WHERE primary_key LIKE '0%' which can use the index to locate the rows so is more efficient.
I don't know MS SQL, but in MySQL it would be something like this:
"DELETE * FROM your_table WHERE primary_key LIKE '0%' AND primary_key NOT LIKE 'C%'"
You can use the LIKE operator to essentially search for a occurances of either a string or a regular expression. It can take wildcards such as the % sign both in front, behind, or both in front and behind of the pattern you are looking for.
For example:
LIKE 'C%' would match anything starting with C
LIKE '%C' would match anything ending in C
LIKE '[A-Z]%' would match anything starting with a capital letter
LIKE '%LOL%' would match anything that has the word LOL(in caps) in it.
Further reading at
http://msdn.microsoft.com/en-us/library/ms179859.aspx

Resources