Can SQL Server index a text string by delimiter? - sql-server

I need to store content keyed by strings, so a database table of key/value pairs, essentially. The keys, however, will be of a hierarchical format, like this:
foo.bar.baz
They'll have multiple categories, delimited by dots. The above value is in a category called "baz" which is in a parent category called "bar" which is in a parent category called "foo."
How can I index this in such a way that it's rapidly searchable for different permutations of the key/dot combo? For example, I want to be able to very quick find everything that starts
foo
Or
foo.bar
Yes, I could do a LIKE query, but I never need find anything like:
fo
So that seems like a waste to me.
Is there any way that SQL would index all permutation of a string delimited by the dots? So, in the above case we have:
foo
foo.bar
foo.bar.baz
Is there any type of index that would facilitate searching like that?
Edit
I will never need to search backwards or from the middle. My searches will always begin from the front of the string:
foo.bar
Never:
bar.baz

SQL Server can't really index substrings, no. If you only ever want to search on the first string, this will work fine, and will perform an index seek (depending on other query semantics of course):
WHERE col LIKE 'foo.%';
-- or
WHERE col LIKE 'foo.bar.%';
However when you start needing to search for bar or baz following any leading string, you will need to search on the substring:
WHERE col LIKE '%.bar.%';
-- or
WHERE PATINDEX('%.bar.%', col) > 0;
This won't work well with regular B-tree indexes, and I don't think Full-Text Search will be much help either, because of the special characters (periods) - but you should try it out if this is a requirement.
In general, storing data this way smells wrong to me. Seems to me that you should either have separate columns instead of jamming all the data into one column, or using a more relational EAV design.

Its appears to be a work for CTE!
create TableA(
id int identity,
parentid int null,
name varchar(50)
)
for a (fixed) two level its easy
select t2.name, t1.name
from tableA t1
join tableA t2 on t2.id = t1.parentid
where t2.name = 'father'
To find that kind of hierarchical values for a most general case you ill need some kind of recursion in self-join table by using a CTE.
http://msdn.microsoft.com/pt-br/library/ms175972.aspx

Related

Query performance

I have table that has the following schema:
ID,firstName,MiddleName,LastName,FML,[some other columns]
FML column is created by concatenation firstName,space character,MiddleName,space character and last name. I want to search persong when you know FML. Therefore my query is
SELECT * from tbl where FML LIKE #Param
But I want to optimize this query, I'm thinking of separating input string into firstName,MiddleName,LastName strings and make query like that
SELECT * FROM tbl where firstName like #FN and MiddleName like #MN and LastName like #ln.
Also will query
SELECT smth from tbl where Val='test'
Be better in terms of performance then
Select smth from tbl where Val like 'test'
Thank you.
If you mean =, then use =. If you mean like, then use like. But once you add wildcards to like, the performance will decrease.
By separating and filtering on separate fields, you lose flexibility, but increase the ability to be more specific in your search. So it's not optimising, per se, as the functionality is different.
Imagine you have two records, Jack Roberts, and Robert Jack
Your first query allows you to find them both if your query is '%Robert%', whereas the second allows you to find them with separate queries.
Yes '=' operator gives the best performance, whereas LIKE searches all the Val which has test in its value.

Guidance for MS SQL Delete query

In my SQL Server database there is scenario like database have one primary key and primary key is in format like '0000100001' and 'C100001'
I want to delete the all records from database which starts with '0' but not the records starts with 'C'.
I tried the inbuilt function SUBSTRING('primary_key',1,1)='0' but it did not helped me..
Thank You..
SUBSTRING('primary_key',1,1)='0'
tests whether the string literal "primary_key" starts with the character 0 (which it doesn't so will return zero rows), Get rid of the single quotes to reference the column. (NB: If your column is not actually called primary_key you will need to reference its actual name of course!)
Or alternatively you can use WHERE primary_key LIKE '0%' which can use the index to locate the rows so is more efficient.
I don't know MS SQL, but in MySQL it would be something like this:
"DELETE * FROM your_table WHERE primary_key LIKE '0%' AND primary_key NOT LIKE 'C%'"
You can use the LIKE operator to essentially search for a occurances of either a string or a regular expression. It can take wildcards such as the % sign both in front, behind, or both in front and behind of the pattern you are looking for.
For example:
LIKE 'C%' would match anything starting with C
LIKE '%C' would match anything ending in C
LIKE '[A-Z]%' would match anything starting with a capital letter
LIKE '%LOL%' would match anything that has the word LOL(in caps) in it.
Further reading at
http://msdn.microsoft.com/en-us/library/ms179859.aspx

Find columns that match in two tables

I need to query two tables of companies in the first table are the full names of companies, and the second table are also the names but are incomplete. The idea is to find the fields that are similar. I put pictures of the reference and SQL code I'm using.
The result I want is like this
The closest way I found to do so:
SELECT DISTINCT
RTRIM(a.NombreEmpresaBD_A) as NombreReal,
b.EmpresaDB_B as NombreIncompleto
FROM EmpresaDB_A a, EmpresaDB_B b
WHERE a.NombreEmpresaBD_A LIKE 'VoIP%' AND b.EmpresaDB_B LIKE 'VoIP%'
The problem with the above code is that it only returns the record specified in the WHERE and if I put this LIKE '%' it returns the Cartesian product of two tables. The RDBMS is Microsoft SQL Server. I would greatly appreciate if you help me with any proposed solution.
Use the short name plus appended '%' as argument in the LIKE expression:
Edit with info that we deal with SQL Server:
SELECT a.NombreEmpresaBD_A as NombreReal
,b.NombreEmpresaBD_B as NombreIncompleto
FROM EmpresaDB_A a, EmpresaDB_B b
WHERE a.NombreEmpresaBD_A LIKE (b.NombreEmpresaBD_B + '%');
According to your screenshot you had the column name wrong!
String concatenation in T-SQL with + operator.
Above query finds a case where
'Computex S.A' LIKE 'Computex%'
but not:
'Voip Service Mexico' LIKE 'VoipService%'
For that you would have to strip blanks first or use more powerful pattern matching functions.
I have created a demo for you on data.SE.
Look up pattern matching or the LIKE operator in the manual.
I would suggest adding a foreign key between the tables linking the data. Then you can just search for the one table and join the second to get the other results.

Is there any way to put an invisible character at beginning of a string to change its sort order?

Is there any way to put a non printing or non obtrusive character at the beginning of a string of data in sqlserver. so that when an order by is performed, the string is sorted after the letter z alphabetically?
I have used a space at the beginning of the string to get the string at the top of the sorted list, but I am looking to do something similar to put a string at the end of the list.
I would rather not put another field such as "SortOrder" in the table to use to order the sort, and I would rather not have to sort the list in my code.
Added: Yes I know this is a bad idea, thanks to all for mentioning it, but still, I am curious if what I am asking can be done
Since no one is venturing to answer your question properly, here's my answer
Given: You are already adding <space> to some other data to make them appear top
Solution: Add CHAR(160) to make it appear at the bottom. This is in reality also a space, but is designed for computer systems to not treat it as a word break (hence the name).
http://en.wikipedia.org/wiki/Non-breaking_space
Your requirements:
Without adding another field such as "SortOrder" to the table
Without sorting the list in your code
I think this fits!
create table my(id int,data varchar(100))
insert my
select 1,'Banana' union all
select 2,Char(160) + 'mustappearlast' union all
select 3,' ' +N'mustappearfirst' union all
select 4,'apple' union all
select 5,'pear'
select *
from my
order by ASCII(lower(data)), data
(ok I cheated, I had to add ASCII(lower( but this is closest to your requirements than all the other answers so far)
You should use another column in the database to help specify the ordering rather than modifying the string:
SELECT *
FROM yourtable
ORDER BY sortorder, yourstring
Where you data might look like this:
yourstring sortorder
foo 0
bar 0
baz 1
qux 1
quux 2
If you can't modify the table you might be able to put the sortorder column into a different table and join to get it:
SELECT *
FROM yourtable AS T1
JOIN yourtablesorting AS T2
ON T1.id = T2.T1_id
ORDER BY T2.sortorder, T1.yourstring
Alternative solution:
If you really can't modify the database at all, not even adding a new table then you could add any character you like at the start of the string and remove it during the select:
SELECT RIGHT(yourstring, LEN(yourstring) - 1)
FROM yourtable
ORDER BY yourstring
Could you you include something like:
"<SORT1>This is my string"
"<SORT2>I'd like this to go second"
And remove them later? I think using invisible characters is fragile and hacky.
You could put a sort order in the query and use unions (no guarantees on performance).
select 1 as SortOrder, *
from table
where ... --first tier
union
select 2, *
from table
where ... --second tier
order by SortOrder
In my opinion, an invisible character for this purpose is a bad idea because it pollutes the data. I would do exactly what you would rather not do and add a new column.
To modify the idea slightly, you could implement it not as a sort order, but a grouping order, defaults to 0, where a negative integer puts the group at top of the list and a positve integer at the bottom, and then "order by sort_priority, foo"
I'm with everyone else that the ideal way to do this is by adding an additional column for sort order.
But if you don't want to add another column, and you already use a space for those items you want to appear at the top of the list, how do you feel about using a pipe (|) for items at the bottom of the list?
By default, SQL Server uses a Unicode character set for its sorting. In Unicode, the pipe and both curly brackets ({, }) come after z, so any of those three characters should work for you.

Full text search vs LIKE

My question is about using fulltext.As I know like queries which begin with % never use index :
SELECT * from customer where name like %username%
If I use fulltext for this query can ı take better performance? Can SQL Server use fulltext index advantages for queries like %username%?
Short answer
There is no efficient way to perform infix searches in SQL Server, neither using LIKE on an indexed column, or with a fulltext index.
Long answer
In the general case, there is no fulltext equivalent to the LIKE operator. While LIKE works on a string of characters and can perform arbitrary wildcard matches against anything inside the target, by design fulltext operates upon whole words/terms only. (This is a slight simplification but it will do for the purpose of this answer.)
SQL Server fulltext does support a subset of LIKE with the prefix term operator. From the docs (http://msdn.microsoft.com/en-us/library/ms187787.aspx):
SELECT Name
FROM Production.Product
WHERE CONTAINS(Name, ' "Chain*" ');
would return products named chainsaw, chainmail, etc. Functionally, this doesn't gain you anything over the standard LIKE operator (LIKE 'Chain%'), and as long as the column is indexed, using LIKE for a prefixed search should give acceptable performance.
The LIKE operator allows you to put the wildcard anywhere, for instance LIKE '%chain', and as you mentioned this prevents an index from being used. But with fulltext, the asterisk can only appear at the end of a query term, so this is of no help to you.
Using LIKE, it is possible to perform efficient postfix searches by creating a new column, setting its value to the reverse your target column, and indexing it. You can then query as follows:
SELECT Name
FROM Production.Product
WHERE Name_Reversed LIKE 'niahc%'; /* "chain" backwards */
which returns products with their names ending with "chain".
I suppose you could then combine the prefix and reversed postfix hack:
SELECT Name
FROM Production.Product
WHERE Name LIKE 'chain%'
AND Name_Reversed LIKE 'niahc%';
which implements a (potentially) indexed infix search, but it's not particularly pretty (and I've never tested this to see if the query optimizer would even use both indexes in its plan).
You have to understand how index is working. Index is the very same like the dead-wood edition of encyclopedia.
If you use:
SELECT * from customer where name like username%
The index, in fulltext or no fulltext should work. but
SELECT * from customer where name like %username%
will never work with index. and it will be time-consuming query.
Of what I know about fulltext indexes, i'll make the following extrapolations:
Upon indexing, it parses the text, searching for words (some RDBMS, like MySQL, only consider words longer than 3 chars), and placing the words in the index.
When you search in the fulltext index, you search for words, which then link to the row.
If I'm right about the first two (for MSSQL), then it will only work if you search for WORDS, with lengths of 4 or more characters. It won't find 'armchair' if you look for 'chair'.
Assuming all that is correct, I'll go ahead and make the following statement: The fulltext index is in fact an index, which makes search faster. It is large, and has fewer search posibilities than LIKE would have, but it's way faster.
More info:
http://www.developer.com/db/article.php/3446891
http://en.wikipedia.org/wiki/Full_text_search
Like and contains are very different -
Take the following data values
'john smith'
'sam smith'
'john fuller'
like 's%'
'sam smith'
like '%s%'
'john smith'
'sam smith'
contains 's'
contains 'john'
'john smith'
'john fuller'
contains 's*'
'john smith'
'sam smith'
contains s returns the same as contains s* - the initial asterisk is ignored, which is a bit of a pain but then the index is of words - not characters
You can use:
SELECT * from customer where CONTAINS(name, 'username')
OR
SELECT * from customer where FREETEXT(name, 'username')
https://stackoverflow.com/users/289319/mike-chamberlain, you are quite right as you suggest it's not enough to search something 'chain' WHERE Name LIKE 'chain%'
AND Name_Reversed LIKE 'niahc%' is not equivalent to like'%chain%'****

Resources