Remove guids from ntext column - sql-server

I'm trying to update a big amount of data in a customer db, but I've encountered a problem.
The column (ntext) I need to update contains a mix of regular text/comments and guid's.
I only need to update the cells that does NOT contain a GUID.
Searching for ways to determine if some text is a uniqueidentifier/guid in SQL gave multiple solutions like som regex, but for some reason that did not remove all entries of guids from the select statement. (I tried some of the solutions from here: How to check if a string is a uniqueidentifier?)
Can someone tell me how to remove all kinds of guid'like entries in the ntext column?
Any help would be much appreciated.
EDIT:
Example of guid removed correctly:
4cfb5539-1656-4447-87f7-ea7c4ea94e96
Example f guid still in the list:
f5f284a0-c1c5-4c71-95b6-1eaa3ed38222
They're the same length, I don't see any hidden characters or spaces (tried to trim with no difference) etc.
EDIT 2:
The SQL statement I used:
SELECT * from TABLE
where VALUE like
REPLACE(REPLACE('00000000-0000-0000-0000-000000000000', '0', '[0-9a-fA-F]'),' ','')
EDIT 3:
Another statement removing any whitespace as first step
SELECT * from TABLE
where REPLACE(Convert(nvarchar(max),VALUE), ' ', '') not like
REPLACE('00000000-0000-0000-0000-000000000000', '0', '[0-9a-fA-F]')

WHERE column like '%-%-%-%-%'
This filter condition will find all the rows that have a guid in the "column" column.
You could also do something more complex using PATINDEX

Related

How to match a substring exactly in a string in SQL server?

I have a column workId in my table which has values like :
W1/2009/12345, G2/2018/2345
Now a user want to get this particular id G2/2018/2345. I am using like operator in my query as below:
select * from u_table as s where s.workId like '%2345%' .
It is giving me both above mentioned workids. I tried following query:
select * from u_table as s where s.workId like '%2345%' and s.workId not like '_2345'
This query also giving me same result.
If anyone please provide me with the correct query. Thanks!
Why not use the existing delimiters to match with your criteria?
select *
from u_table
where concat('/', workId, '/') like concat('%/', '2345', '/%');
Ideally of course your 3 separate values would be 3 separate columns; delimiting multiple values in a single column goes against first-normal form and prevents the optimizer from performing an efficient index seek, forcing a scan of all rows every time, hurting performance and concurrency.

Word popularity leaderboard in SQL Server based message-board

In a SQL server database, I have a table Messages with the following columns:
Id INT(1,1)
Detail VARCHAR(5000)
DatetimeEntered DATETIME
PersonEntered VARCHAR(25)
Messages are pretty basic, and only allow alphanumeric characters and a handful of special characters, which are as follows:
`¬!"£$%^&*()-_=+[{]};:'##~\|,<.>/?
Ignoring the bulk of the special characters bar the apostrophe, what I need is a way to list each word along with how many times the word occurs in the Detail column, which I can then filter by PersonEntered and DatetimeEntered.
Example output:
Word Frequency
-----------------
a 11280
the 10102
and 8845
when 2024
don't 2013
.
.
.
It doesn't need to be particularly clever. It is perfectly fine if dont and don't are treated as separate words.
I'm having trouble splitting out the words into a temporary table called #Words.
Once I have a temporary table, I would apply the following query:
SELECT
Word,
SUM(Word) AS WordCount
FROM #Words
GROUP BY Word
ORDER BY SUM(Word) DESC
Please help.
Personally, I would strip out almost all the special characters, and then use a splitter on the space character. Of your permitted characters, only ' is going to appear in a word; anything else is going to be grammatical.
You haven't posted what version of SQL you're using, so I've going to use SQL Server 2017 syntax. If you don't have the latest version, you'll need to replace TRANSLATE with a nested REPLACE (So REPLACE(REPLACE(REPLACE(REPLACE(... REPLACE(M.Detail, '¬',' '),...),'/',' '),'?',' '), and find a string splitter (for example, Jeff Moden's DelimitedSplit8K).
USE Sandbox;
GO
CREATE TABLE [Messages] (Detail varchar(5000));
INSERT INTO [Messages]
VALUES ('Personally, I would strip out almost all the special characters, and then use a splitter on the space character. Of your permitted characters, only `''` is going to appear in a word; anything else is going to be grammatical. You haven''t posted what version of SQL you''re using, so I''ve going to use SQL Server 2017 syntax. If you don''t have the latest version, you''ll need to replace `TRANSLATE` with a nested `REPLACE` (So `REPLACE(REPLACE(REPLACE(REPLACE(... REPLACE(M.Detail, ''¬'','' ''),...),''/'','' ''),''?'','' '')`, and find a string splitter (for example, Jeff Moden''s [DelimitedSplit8K](http://www.sqlservercentral.com/articles/Tally+Table/72993/)).'),
('As a note, this is going to perform **AWFULLY**. SQL Server is not designed for this type of work. I also imagine you''ll get some odd results and it''ll include numbers in there. Things like dates are going to get split out,, numbers like `9,000,000` would be treated as the words `9` and `000`, and hyperlinks will be separated.')
GO
WITH Replacements AS(
SELECT TRANSLATE(Detail, '`¬!"£$%^&*()-_=+[{]};:##~\|,<.>/?',' ') AS StrippedDetail
FROM [Messages] M)
SELECT SS.[value], COUNT(*) AS WordCount
FROM Replacements R
CROSS APPLY string_split(R.StrippedDetail,' ') SS
WHERE LEN(SS.[value]) > 0
GROUP BY SS.[value]
ORDER BY WordCount DESC;
GO
DROP TABLE [Messages];
As a note, this is going to perform AWFULLY. SQL Server is not designed for this type of work. I also imagine you'll get some odd results and it'll include numbers in there. Things like dates are going to get split out,, numbers like 9,000,000 would be treated as the words 9 and 000, and hyperlinks will be separated.

SQLite query in populated table not returning anything?

I have created a db called AllWords.db in sqlite that contains a list of all english words (count:172820). When I issue a select all query, it returns a list of all 172820 words. Also, when I print the count of the table words like this :
SELECT COUNT(*) FROM words;
the output is 172820, so the database clearly has all the words included in it. However, when I try to check if a word exists (the only thing I'll want to do with this database), it doesn't print anything :
SELECT * FROM words WHERE word="stuff";
returns nothing.
The database is a single table with the only column being 'words', which has all the words as rows. Any help would be greatly appreciated, thanks.
Just to be sure you use a word in your database, look into your table with
select * from words limit 10
house
stuff
tree
...
and then select with one of the words you see
select * from words where word = 'stuff'
Edit: fixed where clause according to #MichaelEakins
Edit2: Unfortunately there's no difference between single and double quotes in this case, see SQL Fiddle
Answering my own question because I figured out what was wrong. To populate the table, I had written a python program to parse a file called words.txt (all words, separated by newlines), into sqlite. My problem was the query turned into :
INSERT INTO WORDS VALUES('englishWord\n')
And that messed up the database. I fixed that and it started to work, thanks to #ScoPi for the hint with using LIKE, it helped me figure out that there was a stray newline character.

Find columns that match in two tables

I need to query two tables of companies in the first table are the full names of companies, and the second table are also the names but are incomplete. The idea is to find the fields that are similar. I put pictures of the reference and SQL code I'm using.
The result I want is like this
The closest way I found to do so:
SELECT DISTINCT
RTRIM(a.NombreEmpresaBD_A) as NombreReal,
b.EmpresaDB_B as NombreIncompleto
FROM EmpresaDB_A a, EmpresaDB_B b
WHERE a.NombreEmpresaBD_A LIKE 'VoIP%' AND b.EmpresaDB_B LIKE 'VoIP%'
The problem with the above code is that it only returns the record specified in the WHERE and if I put this LIKE '%' it returns the Cartesian product of two tables. The RDBMS is Microsoft SQL Server. I would greatly appreciate if you help me with any proposed solution.
Use the short name plus appended '%' as argument in the LIKE expression:
Edit with info that we deal with SQL Server:
SELECT a.NombreEmpresaBD_A as NombreReal
,b.NombreEmpresaBD_B as NombreIncompleto
FROM EmpresaDB_A a, EmpresaDB_B b
WHERE a.NombreEmpresaBD_A LIKE (b.NombreEmpresaBD_B + '%');
According to your screenshot you had the column name wrong!
String concatenation in T-SQL with + operator.
Above query finds a case where
'Computex S.A' LIKE 'Computex%'
but not:
'Voip Service Mexico' LIKE 'VoipService%'
For that you would have to strip blanks first or use more powerful pattern matching functions.
I have created a demo for you on data.SE.
Look up pattern matching or the LIKE operator in the manual.
I would suggest adding a foreign key between the tables linking the data. Then you can just search for the one table and join the second to get the other results.

Apostrophes and SQL Server FT search

I have setup FT search in SQL Server 2005 but I cant seem to find a way to match "Lias" keyword to a record with "Lia's". What I basically want is to allow people to search without the apostrophe.
I have been on and off this problem for quite some time now so any help will really be a blessing.
EDIT 2: just realised this doesn't actually resolve your problem, please ignore and see other answer! The code below will return results for a case when a user has inserted an apostrophe which shouldn't be there, such as "abandoned it's cargo".
I don't have FT installed locally and have not tested this - you can use the syntax of CONTAINS to test for both the original occurrence and one with the apostrophe stripped, i.e.:
SELECT *
FROM table
WHERE CONTAINS ('value' OR Replace('value', '''',''))
EDIT: You can search for phrases using double quotes, e.g.
SELECT *
FROM table
WHERE CONTAINS ("this phrase" OR Replace("this phrase", '''',''))
See MSDN documentation for CONTAINS. This actually indicates the punctuation is ignored anyway, but again I haven't tested; it may be worth just trying CONTAINS('value') on its own.
I haven't used FT, but in doing queries on varchar columns, and looking for surnames such as O'Reilly, I've used:
surname like Replace( #search, '''', '') + '%' or
Replace( surname,'''','') like #search + '%'
This allows for the apostrophe to be in either the database value or the search term. It's also obviously going to perform like a dog with a large table.
The alternative (also not a good one probably) would be to save a 2nd copy of the data, stripped of non-alpha characters, and search (also?) against that copy. So there original would contain Lia's and the 2nd copy Lias. Doubling the amount of storage, etc.
Another attempt:
SELECT surname
FROM table
WHERE surname LIKE '%value%'
OR REPLACE(surname,'''','') LIKE '%value%'
This works for me (without FT enabled), i.e. I get the same results when searching for O'Connor or OConnor.

Resources