SQL Containstable Multiple columns - Which Column Contains Result - sql-server

I have received a spec to add a relevance score to search results, based on which column the result is in. As an example I have a product table with, amongst other fields, keywords,productNames and brands.
I currently check to find a product by link to
JOIN CONTAINSTABLE(Products, (keywords, productNames, brands), '"NIKE*"')
Now this will find the record with the search term on but I need to weight the results by column eg. keywords scores 1, productNames scores 2, brands 4, etc. The sum of the scores I can then add together to give my relevancy of result. i.e. if "Nike" is in all three columns it would score 7, just in brands 4, etc.
To facilitate this I need to know which columns containstable matches on, but haven't found any details on that.
I've looked at the ISABOUT option, but that's for weighting multiple search terms in a single column.
At the moment I have a case statement
CASE WHEN CONTAINS (Keywords, '"Nike*"') THEN 1 ELSE 0 END +
CASE WHEN CONTAINS (productNames, '"Nike*"') THEN 2 ELSE 0 END +
CASE WHEN CONTAINS (brands, '"Nike*"') THEN 4 ELSE 0
AS Relevance
Which does work, but seems to be very wasteful since containstable must already be doing the work.
If anyone has any ideas then they'll be gratefully received.

Related

Solr facet offset by term/prefix rather than index

I'm generating facet counts for a multivalued field and sorting them by index in order to see them in alphabetical order. Given a particular facet prefix, I would like to jump to its place in the facet count list and show the facet counts surrounding it. For example, if my prefix is "wha" then I would want the following returned (four before and four after):
weld 1
welsh 5
west 4
wetland 1
whale 99
wheat 123
wheel 1
whey 9
There are millions of values in the field and so I can't just ask for them all. I need to be able to jump to that location or use some kind of filter on the facet counts themselves. I've tried using facet.offset, but I have to basically do a binary search in order to find the appropriate offset which is too slow.
I could probably get close enough if I could put in a range for a facet prefix. For example facet.prefix=[we TO wk] or even multiple prefixes like facet.prefix=we,wf,wg,wh,wi,wj,wk.
I'm currently using other non-Solr solutions to accomplish this, but I would like to use Solr 6.6 in order to take advantage of filter queries.

fulltext search over data with underscore

I have a indexed table where one of the indexed columns can contains data with an underscore.
ID Name
1 01_A3L
2 02_A3L
3 03_A3L
4 05_A3L
5 some name
6 another name
7 a name
When I search this table with the following query however I don't get any results:
SELECT * FROM MyAmazingTable WHERE( CONTAINS(*,'"a3l*"'))
What is the reason for this? And how can I make sure I do get results I expect (all records that end with A3L)?
Kees C Bakker is 100% correct, but if you just wanted to get the results you require without all of the steps.
The quick/dirty way to do so would be change your search to be a like...
Select * from MyAmazingTable where Name like '%A3L'
The % in this case would represent whatever comes before and make sure the last 3 characters are A3L.
Which will give you the results that you are looking for.

Pushing term 1 results into term 3 fields for new students joining mid-year

I have a table that contains test marks from different terms, ca1_percent, sa1_percent, ca2_percent and sa2_percent. These 4 fields reside in the Results table that contains results from the different terms.
I used a self-relationship linking using the matched field overall_percent_match which is calculated using year & " " & subject & " " & _kf_studentID. This relationship allows me to obtain the test results from past terms (of a year). For example, my term 3 results will contain results from term 1 and term 2 (of each subject). All works fine unless there is a new student who joins mid way of the year. If he joins in term 3, his ca2 results (done in term 3) will fall into his ca1_percent column (which is supposed to contain term 1 results) like other records before him.
Image shows what I mean.
I could not figure out the solution. Can anyone help me?
This StackOverflow link contains more details of my work that was done related to this problem.
The underlying problem, per your prior query, is that you're pulling the values through:
GetNthRecord(SA1_Results_Match::mark_percent,2)
This statement assumes the existence of an N=1, N=2 and N=3. To make this work properly you could do any of the following:
Ensure that your Results table always has records from the prior semester, even if the student joins later in the semester. You could keep using GetNthRecord this way, but you will always need to ensure that the records are in order.
Use an ExecuteSQL statement to gather only the correct semester's results for the correct summary field.
Make four separate relationships, with separate Table Occurrences, to define ca1, sa1, ca2 and sa2 each separately. This looks like what you started out trying to do in the prior question.

Solr: Searching a term in multiple, indexed fields and returning top 'N' hits from each search field

I have two indexed fields in my Solr schema
Employee Name
Manager Name
Which are plain strings.
my Question is: Given a search term, I want to display top 5 suggested completions from Manager Names and the next 5 from Employee Names.
I can use copy fields, but sometimes I get all top 10 results from Employee Names.
I have a hunch that boosting can help me.. but could not figure out how?
Boost can't help you control the results and distribute 5 each in the top 10 results.
Probably you can check on Field Collapsing, where you can group per role (Manager and Name) and limit 5 results for the group.
So you would have 2 groups returned back to you with 5 results each.

Need some help (search algorithm)

I need some help with this issue:
As an input I have a string, which looks like Blue cat green eyes 2342342, or it can be Cat blue eyes green 23242 or any other permutation of words.
In my DB table I have some data. One of the columns is called, say, keyWords.
Here is an example of this table:
My task is to find record in my DB table column, KEYWORDS, which matches some words from the input string.
For example: for strings "Blue cat green eyes 2342342" "Cat blue eyes green 23242" and "Cat 23242 eyes blue green" the result must be "blue cat" (first row of my table).
The only way I can imagine how to solve this task looks like this:
Consistently take every word from the string.
Search this every word with %like% in a table column.
If it is not found it means this word is not key and we have no interest in it.
If it is found one time - great! No doubt, this is what we are looking for.
If there are more than one result:
From all the words from the string, which were not tested yet consistently take every word.
Search this word with %like% in the results from step 2.
etc…
Graphical schema of this algorithm is here
But it looks like this algorithm will work very slowly if there are a lot of records in a table and if my input string consists of big number of words.
So, my question is: Is there are any special algorithms which can help solving this task?
You can adopt another table such as
ID KeywordID Word
1 1 blue
2 2 blue
3 1 cat
and transform the string
"Blue cat green eyes 2342342"
in a series of indexes and counts:
SELECT KeywordID, COUNT(*) FROM ancillary WHERE Word IN ('blue','cat','green','eyes'...)
This would perform a series of exact matches and return, say,
KeywordID Count
1 2
2 1
Then you know that keyword group with id 1 has two words, which means that a count of 2 matches all of them. So keywordid 1 is satisfied. Group 2 has also two words (black, cat) but only one was found, and the match is there but not complete.
If you also record the keyword set size together with keyword ID, then all keywords from the same ID will have the same KeywordSize, and you can GROUP BY it too:
KeywordID KeywordSize Count
1 2 2
2 2 1
and can even SELECT COUNT(*)/KeywordSize AS match ... ORDER BY match and have keyword matches sorted by relevancy.
Of course, once you have KeywordID, you can find it in the keywords table.
Implementation
You want to add the keyword list "black angry cat" to your existing table.
So you explode this keyword list into words: and get "black", "angry" and "cat".
You insert the keyword list normally in the table that you already have, and retrieve the ID for that newly created row, let's say it is 1701.
Now you insert the words into a new table that we call "ancillary". This table only contains the keyword row ID of your primary table, the single word, and the size of the word list from which that word comes.
We know we are inserting 3 words in all, for table row 1701, so size=3 and we insert these tuples:
(1701, 3, 'black')
(1701, 3, 'cat')
(1701, 3, 'angry')
(These will receive an unique ID of their own, but this does not concern us).
Now some time later we receive a sentence which is,
'Schroedinger cat is black and angry'
We could first run the query against a list of null-words to be removed, such as "is" and "and". But this is not necessary.
Then we could run as many queries as there are words, and thereby discover that no rows anywhere contained "Schroedinger" and we can drop it. But this, too, is not necessary.
Finally we build the real query against ancillary:
SELECT KeywordID, COUNT(*) AS total, ListSize*100/COUNT(*) AS match
FROM ancillary WHERE Word IN ('Schroedinger','cat','is','black','and','angry')
GROUP BY KeywordID;
The WHERE will return, say, these rows:
(1234, 'black') -- from 'black cat'
(1234, 'cat') -- from 'black cat'
(1423, 'angry') -- from 'angry birds'
(1701, 'cat') -- from 'black angry cat'
(1701, 'angry') -- from 'black angry cat'
(1701, 'black') -- from 'black angry cat'
(1999, 'cat') -- from 'nice white cat'
So the GROUP will return the KeywordID of these rows with its cardinality:
1423 1 50%
1701 3 100%
1234 2 100%
1999 1 33%
Now you can sort by matching ratio descending, and then by list size descending (since matching 100% of 3 words is better than matching 100% of 2, and matching 1 in 2 is better than matching 2 in 3):
1701 3 100% -- our best match
1234 2 100% -- second runner
1423 1 50%
1999 1 33%
You can also retrieve your first table in one query, with added match ratio:
SELECT mytable.*, total, match FROM
mytable JOIN (
SELECT KeywordID, COUNT(*) AS total, ListSize*100/COUNT(*) AS match
FROM ancillary WHERE Word IN ('Schroedinger','cat','is','black','and','angry')
GROUP BY KeywordID
) AS ancil ON (mytable.KeywordID = ancil.KeywordID)
ORDER BY match DESC, total DESC;
The largest cost is for the exact match in "ancillary" which has to be indexed on the Word column.
You might wang to look full-text search engine, like sphinx: http://sphinxsearch.com/
Or, another way - make a stored procedure, splitting search string into keywords, using specified separator and look for charindex of each keyword in your DB column (depends on your db managment system)

Resources