How to build search engine for website using sql server - sql-server

I need some help with creating a simple search engine for website. Basic idea is that user will enter a string in search bar, which will compare in database key_word and get the results.
Let's say I have the following table in the SQL Server database:
|----|----------|----------------------|
| ID | URL | key_word |
|----|----------|----------------------|
| 1 | url1.com | cat short red NYC |
| 2 | url2.com | tall blue LA |
| 3 | url3.com | skinny NYC green |
| 4 | url4.com | cat black get |
|----|----------|----------------------|
Now in search bar, lets say user want to search the below string "get red cat from NYC". I want to search this in database 'key_word'.
String key = "get red cat from NYC"
What I have tried:
So far I have the following below query to search from database. This is good for if user want to search for only one word. but the string 'key' will not work here and it will return 0 result. I need some idea so I can make this better query.
SELECT *
FROM [SearchTable]
WHERE [key_Word] LIKE % key %;
What I want:
I want to change this sql server query so that it return ID=1,3,4.
So in other words. I want to take this string:
String key = "get red cat from NYC"
and first search in database the word "get". it doesn't show up so go to next word. Next word is "red", this shows up in ID=1. next word is "cat", this shows up in ID=1,4. Next word is "from", this doesn't show up in any rows. Next word is "NYC", this shows up in ID=1,3.
put all id's together and you get ID's=1,1,4,1,3.
than I want to sort it so that ID=1 shows up at top and ID=3,4 can be at button since they are tied.
I was hoping to do this by only one SQL query, because if I keep connecting to database than the speed will go down too. So I was think of some SQL Server functions?

You need a string splitter for this. See this article for some functions:
DECLARE #key VARCHAR(MAX) = 'get red cat from NYC'
SELECT t.ID
FROM tbl t
CROSS APPLY dbo.SplitStrings_XML(t.key_word, ' ') tx
INNER JOIN (
SELECT Item
FROM dbo.SplitStrings_XML(#key, ' ')
)k
ON k.Item = tx.Item
GROUP BY T.ID
ORDER BY COUNT(*) DESC
SQL Fiddle
Here is the SplitStrings_XML function:
CREATE FUNCTION dbo.SplitStrings_XML
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
The above function will not work if your string has illegal XML characters like >, <, and &. You can use other splitter but the idea stays the same.

Related

SQL Server: Performance issue: OR statement substitute in WHERE clause

I want to select only the records from table Stock based on the column PostingDate.
The PostingDate should be after the InitDate in another table called InitClient. However, there are currently 2 clients in both tables (client 1 and client 2), that both have a different InitDate.
With the code below I get exactly what I need currently, based on the sample data also included underneath. However, two problems arise, first of all based on millions of records the query is taking way too long (hours). And second of all, it isn't dynamic at all, every time when a new client is included.
A potential option to cover the performance issue would be to write two separate query's, one for Client 1 and one for Client 2 with a UNION in between. Unfortunately, this then isn't dynamic enough since multiple clients are possible.
SELECT
Material
,Stock
,Stock.PostingDate
,Stock.Client
FROM Stock
LEFT JOIN (SELECT InitDate FROM InitClient where Client = 1) C1 ON 1=1
LEFT JOIN (SELECT InitDate FROM InitClient where Client = 2) C2 ON 1=1
WHERE
(
(Stock.Client = 1 AND Stock.PostingDate > C1.InitDate) OR
(Stock.Client = 2 AND Stock.PostingDate > C2.InitDate)
)
Sample dataset:
CREATE TABLE InitClient
(
Client varchar(300),
InitDate date
);
INSERT INTO InitClient (Client,InitDate)
VALUES
('1', '5/1/2021'),
('2', '1/31/2021');
SELECT * FROM InitClient
CREATE TABLE Stock
(
Material varchar(300),
PostingDate varchar(300),
Stock varchar(300),
Client varchar(300)
);
INSERT INTO Stock (Material,PostingDate,Stock,Client)
VALUES
('322', '1/1/2021', '5', '1'),
('101', '2/1/2021', '5', '2'),
('322', '3/2/2021', '10', '1'),
('101', '4/13/2021', '5', '1'),
('400', '5/11/2021', '170', '2'),
('401', '6/20/2021', '200', '1'),
('322', '7/20/2021', '160', '2'),
('400', '8/9/2021', '93', '2');
SELECT * FROM Stock
Desired result, but then with a substitute for the OR statement to ramp up the performance:
| Material | PostingDate | Stock | Client |
|----------|-------------|-------|--------|
| 322 | 1/1/2021 | 5 | 1 |
| 101 | 2/1/2021 | 5 | 2 |
| 322 | 3/2/2021 | 10 | 1 |
| 101 | 4/13/2021 | 5 | 1 |
| 400 | 5/11/2021 | 170 | 2 |
| 401 | 6/20/2021 | 200 | 1 |
| 322 | 7/20/2021 | 160 | 2 |
| 400 | 8/9/2021 | 93 | 2 |
Any suggestions if there is an substitute possible in the above code to keep performance, while making it dynamic?
You can optimize this query quite a bit.
Firstly, those two LEFT JOINs are basically just semi-joins, because you don't actually return any results from them. So we can turn them into a single EXISTS.
You will also get an implicit conversion to int, because Client is varchar and 1,2 is an int. So change that to '1','2', or you could change the column type.
PostingDate is also varchar, that should really be date
SELECT
s.Material
,s.Stock
,s.PostingDate
,s.Client
FROM Stock s
WHERE s.Client IN ('1','2')
AND EXISTS (SELECT 1
FROM InitClient c
WHERE s.PostingDate > c.InitDate
AND c.Client = s.Client
);
Next you want to look at indexing. For this query (not accounting for any other queries being run), you probably want the following indexes (remove the INCLUDE for a clustered index)
InitClient (Client, InitDate)
Stock (Client) INCLUDE (PostingDate, Material, Stock)
It is possible that even with these indexes that you may get a scan on Stock, because IN functions like an OR. This does not always happen, it's worth checking. If so, instead you can rewrite this to use UNION ALL
SELECT
s.Material
,s.Stock
,s.PostingDate
,s.Client
FROM (
SELECT *
FROM Stock s
WHERE s.Client = '1'
UNION ALL
SELECT *
FROM Stock s
WHERE s.Client = '2'
) s
WHERE EXISTS (SELECT 1
FROM InitClient c
WHERE s.PostingDate > c.InitDate
AND c.Client = s.Client
);
db<>fiddle
There is nothing wrong in expecting your query to be dynamic. However, in order to make it more performant, you may need to reach a compromise between to conflicting expectations. I will present here a few ways to optimize your query, some of them involves some drastic changes, but eventually it is you or your client who decides how this needs to be improved. Also, some of the improvements might be ineffective, so do not take anything for granted, test everything. Without further ado, let's see the suggestions
The query
First I would try to change the query a little, maybe something like this could help you
SELECT
Material
,Stock
,Stock.PostingDate
,C1.InitDate
,C2.InitDate
,Stock.Client
FROM Stock
LEFT JOIN InitClient C1 ON Client = 1
LEFT JOIN InitClient C2 ON Client = 2
WHERE
(
(Stock.Client = 1 AND Stock.PostingDate > C1.InitDate) OR
(Stock.Client = 2 AND Stock.PostingDate > C2.InitDate)
)
Sometimes a simple step of getting rid of subselects does the trick
The indexes
You may want to speed up your process by creating indexes, for example on Stock.PostingDate.
Helper table
You can create a helper table where you store the Stock records' relevant data, so you perform the slow query ONCE in a while, maybe once in a week, or each time a new client enters the stage and store the results in the helper table. Once the prerequisite calculation is done, you will be able to query only the helper table with its few records, reaching lightning fast behavior. So, the idea is to execute the slow query rarely, cache/store the results and reuse them instead of calculating it every time.
A new column
You could create a column in your Stock table named InitDate and fill that with data for each record periodically. It will take a long while at the first execution, but then you will be able to query only the Stock table without joins and subselects.

Sql query for multiple names in a single search engine

Can anyone help me to find out the SQL query for following scenario.
I have a search box, which I want to search multiple names separated by spaces.
for example : "David Jones" which gives me the result of David's details and Jones details.
select
emp.cid as empcid,
emp.name,
emp.employeeno,
info.employeeUniqueId,
info.agentId,
info.empBankCode,
info.accountNumber,
info.ibanAccNo
from tblemployee emp,
fk_tblUserEmployeeList f,
empinfo info
where
info.employee = emp.cid
and emp.cid = f.employeeid
and f.userId = 1
and
(
name like '%david%'
or emp.employeeno like '%david%'
or info.employeeUniqueId like '%david%'
or info.agentId like '%david%'
or info.empBankCode like '%david%'
or info.accountNumber like '%david%'
)
I want include Jones inside search box also, then how will the like condition changes>
This seems like a case for full-text search. After setting up full-text indices on your tblemployee, fk_tblUserEmployeeList, and empinfo tables, your query would look something like this:
SELECT
emp.cid AS empcid,
emp.name,
emp.employeeno,
info.employeeUniqueID,
info.agentID,
info.empBankCode,
info.accountNumber,
info.ibanAccNo
FROM dbo.tblemployee emp
INNER JOIN dbo.fk_tblUserEmployeeList f ON
f.employeeid = emp.cid
INNER JOIN dbo.empinfo info ON
info.employee = emp.cid
WHERE
f.userID = 1
AND
( FREETEXT(Emp.*, 'david jones')
OR FREETEXT(info.*, 'david jones')
)
gives you this data:
+--------+-------+------------+------------------+---------+-------------+---------------+-----------+
| empcid | name | employeeno | employeeUniqueID | agentID | empBankCode | accountNumber | ibanAccNo |
+--------+-------+------------+------------------+---------+-------------+---------------+-----------+
| 1 | David | NULL | david | david | david | david | david |
| 2 | Jones | NULL | jones | jones | jones | jones | jones |
+--------+-------+------------+------------------+---------+-------------+---------------+-----------+
Note that I changed your query to use the modern industry-standard join style.
Keep in mind that, to create a full-text index on a table, the table must have a single-column unique index. If one of your tables has a multi-column primary key, you'll have to add a column (see this question for more information).
A couple of notes about your naming conventions:
There's no need to preface table names with tbl (especially since you're not doing so consistently). There are loads of people telling you not to do this: See this answer as an example.
fk_tblUserEmployeeList is a bad table name: The prefixes fk and tbl don't add any information. What kind of information is stored in this table? I would suggest a more descriptive name (with no prefixes).
Now, if you don't want to go the route of using a full-text index, you can parse the input client-side before sending to SQL Server. You can split the search input on a space, and then construct the SQL accordingly.
declare #SearchString varchar(200)='David Jones', #Word varchar(100)
declare #Words table (Word varchar(100))
-- Parse the SearchString to extract all words
while len(#SearchString) > 0 begin
if charindex(' ', #SearchString)>0 begin
select #Word = rtrim(ltrim(substring(#SearchString,0,charindex(' ', #SearchString)))),
#SearchString = rtrim(ltrim(replace(#SearchString, #Word, '')))
end
else begin
select #Word = #SearchString,
#SearchString = ''
end
if #Word != ''
insert into #Words select #Word
end
-- Return Results
select t.*
from MyTable t
join #Words w on
' ' + t.MyColumn + ' ' like '%[^a-z]' + w.Word + '[^a-z]%'

SQLServer JSON column data to a temporary table

I am using SQL Server 2016. The column in question contains JSON. It always stores data in below format;
{"question1":"123","question2":"123","reference-id":"Z6SIPLGKE56"}
So, multiple rows will have same structure with different values.
Is there a way i can retrieve it back as a table? or put it into a temporary table? So final output will be like;
question1 | question2 | reference-id|....
123 | 123 | Z6SIPLGKE56
456 | 456 | Z6SWFLGKE56
The end result I am looking at is export the results to a CSV. I can do this outside of the SQL Server, but was wondering whether it's possible with built-in features of SQL Server(With current searches I did, seems like the available functions such as openjson etc.. doesn't allow you to do this in one pass).
UPDATE 1 - Since more details are requested by commentros
This is a survey application. So, users can design their own surveys. The structure is stored as json. As a start let's assume each survey has same set of questions. (ex:- Survey 1 has 5 questions where as Survey 2 has 10 questions)
Now, let's say two users fill the survey 1. Sample data if visualized in json is as follows:
from user 1:
{"forms-survey-client-reference-id":"RYRT4ZU1ZO","question1":"ans1","question2":"ans2"....}
from user 2
{"forms-survey-client-reference-id":"RYRT4ZU1FE","question1":"asdf","question2":"dfhdsf"....}
So the CSV output for this survey has to be: (ignore the column order)
question1 | question2 | reference-id|....
asdf | dfhdsf | RYRT4ZU1FE
ans1 | ans2 | RYRT4ZU1ZO
Now consider survey 2 has the following structure after submitting data from:
User 1
{"forms-survey-client-reference-id":"RYRT4ZU1ZO","question1":"ans1","question2":"opt1,opt2,opt3"....}
User 2
{"forms-survey-client-reference-id":"RYRT4ABCZO","question1":"ans1","question2":"opt1,opt2"....}
Notice for question 2, users has selected multiple answers (checkboxes) and they are stored as a general string with comma separated(User 1 has selected 3 items and User 2 has selected 2 items)
The CSV output for above should be:
question1 | question2 | reference-id|....
ans1 | opt1,opt2 | RYRT4ZU1ZO
ans1 | opt1,opt2,opt3 | RYRT4ABCZO
Assuming that this is your JSON structure you can use the following
DECLARE #json NVARCHAR(4000) = '{"question1":"123","question2":"123","reference-id":"Z6SIPLGKE56"}'
SELECT *
FROM
(
SELECT [key] JsonKey , value JsonValue
FROM OPENJSON (#json)
) X
PIVOT
(
MAX(JsonValue) FOR JsonKey IN ([question1], [question2], [reference-id])
) P
If the structure is not going to be similar you'll need to create dynamic pivot
you can also do this:
DECLARE #json NVARCHAR(4000) = '{"question1":"123","question2":"123","reference-id":"Z6SIPLGKE56"}'
SELECT *
FROM OPENJSON (#json)
WITH ([question1] INT '$."question1"',
[question2] INT '$."question2"',
[reference-id] varchar(100) '$."reference-id"')
One method is with OPENJSON and CROSS APPLY:
DECLARE #JsonTable TABLE(json nvarchar(MAX));
INSERT INTO #JsonTable VALUES
(N'{"question1":"123","question2":"123","reference-id":"Z6SIPLGKE56"}')
, (N'{"question1":"456","question2":"456","reference-id":"Z6SIPLGKE57"}');
SELECT
question1
, question1
, reference_id
FROM #JsonTable
CROSS APPLY OPENJSON(json)
WITH (
question1 int '$.question1'
, question2 int '$.question2'
, reference_id varchar(20) '$."reference-id"'
);

Split Single Column into multiple and Load it to a Table or a View

I'm using SQL Server 2008. I have a source table with a few columns (A, B) containing string data to split into a multiple columns. I do have function that does the split already written.
The data from the Source table (the source table format cannot be modified) is used in a View being created. But I need to have my View have already split data for Column A and B from the Source table. So, my view will have extra columns that are not in the Source table.
Then the View populated with the Source table is used to Merge with the Other Table.
There two questions here:
Can I split column A and B from the Source table when creating a View, but do not change the Source Table?
How to use my existing User Defined Function in the View "Select" statement to accomplish this task?
Idea in short:
String to split is also shown in the example in the commented out section. Pretty much have Destination table, vStandardizedData View, SP that uses the View data to Merge to tblStandardizedData table. So, in my Source column I have column A and B that I need to split before loading to tblStandardizedData table.
There are five objects that I'm working on:
Source File
Destination Table
vStandardizedData View
tblStandardizedData table
Stored procedure that does merge
(Update and Insert) form the vStandardizedData View.
Note: all the 5 objects a listed in the order they are supposed to be created and loaded.
Separately from this there is an existing UDFunction that can split the string which I was told to use
Example of the string in column A (column B has the same format data) to be split:
6667 Mission Street, 4567 7rd Street, 65 Sully Pond Park
Desired result:
User-defined function returns a table variable:
CREATE FUNCTION [Schema].[udfStringDelimeterfromTable]
(
#sInputList VARCHAR(MAX) -- List of delimited items
, #Delimiter CHAR(1) = ',' -- delimiter that separates items
)
RETURNS #List TABLE (Item VARCHAR(MAX)) WITH SCHEMABINDING
/*
* Returns a table of strings that have been split by a delimiter.
* Similar to the Visual Basic (or VBA) SPLIT function. The
* strings are trimmed before being returned. Null items are not
* returned so if there are multiple separators between items,
* only the non-null items are returned.
* Space is not a valid delimiter.
*
* Example:
SELECT * FROM [Schema].[udfStringDelimeterfromTable]('abcd,123, 456, efh,,hi', ',')
*
* Test:
DECLARE #Count INT, #Delim CHAR(10), #Input VARCHAR(128)
SELECT #Count = Count(*)
FROM [Schema].[udfStringDelimeterfromTable]('abcd,123, 456', ',')
PRINT 'TEST 1 3 lines:' + CASE WHEN #Count=3
THEN 'Worked' ELSE 'ERROR' END
SELECT #DELIM=CHAR(10)
, #INPUT = 'Line 1' + #delim + 'line 2' + #Delim
SELECT #Count = Count(*)
FROM [Schema].[udfStringDelimeterfromTable](#Input, #Delim)
PRINT 'TEST 2 LF :' + CASE WHEN #Count=2
THEN 'Worked' ELSE 'ERROR' END
What I'd ask you, is to read this: How to create a Minimal, Complete, and Verifiable example.
In general: If you use your UDF, you'll get table-wise data. It was best, if your UDF would return the item together with a running number. Otherwise you'll first need to use ROW_NUMBER() OVER(...) to create a part number in order to create your target column names via string concatenation. Then use PIVOT to get the columns side-by-side.
An easier approach could be a string split via XML like in this answer
A quick proof of concept to show the principles:
DECLARE #tbl TABLE(ID INT,YourValues VARCHAR(100));
INSERT INTO #tbl VALUES
(1,'6667 Mission Street, 4567 7rd Street, 65 Sully Pond Park')
,(2,'Other addr1, one more addr, and another one, and even one more');
WITH Casted AS
(
SELECT *
,CAST('<x>' + REPLACE(YourValues,',','</x><x>') + '</x>' AS XML) AS AsXml
FROM #tbl
)
SELECT *
,LTRIM(RTRIM(AsXml.value('/x[1]','nvarchar(max)'))) AS Address1
,LTRIM(RTRIM(AsXml.value('/x[2]','nvarchar(max)'))) AS Address2
,LTRIM(RTRIM(AsXml.value('/x[3]','nvarchar(max)'))) AS Address3
,LTRIM(RTRIM(AsXml.value('/x[4]','nvarchar(max)'))) AS Address4
,LTRIM(RTRIM(AsXml.value('/x[5]','nvarchar(max)'))) AS Address5
FROM Casted
If your values might include forbidden characters (especially <,> and &) you can find an approach to deal with this in the linked answer.
The result
+----+---------------------+-----------------+--------------------+-------------------+----------+
| ID | Address1 | Address2 | Address3 | Address4 | Address5 |
+----+---------------------+-----------------+--------------------+-------------------+----------+
| 1 | 6667 Mission Street | 4567 7rd Street | 65 Sully Pond Park | NULL | NULL |
+----+---------------------+-----------------+--------------------+-------------------+----------+
| 2 | Other addr1 | one more addr | and another one | and even one more | NULL |
+----+---------------------+-----------------+--------------------+-------------------+----------+

Join tables by column names, convert string to column name

I have a table which store 1 row per 1 survey.
Each survey got about 70 questions, each column present 1 question
SurveyID Q1, Q2 Q3 .....
1 Yes Good Bad ......
I want to pivot this so it reads
SurveyID Question Answer
1 Q1 Yes
1 Q2 Good
1 Q3 Bad
... ... .....
I use {cross apply} to acheive this
SELECT t.[SurveyID]
, x.question
, x.Answer
FROM tbl t
CROSS APPLY
(
select 1 as QuestionNumber, 'Q1' as Question , t.Q1 As Answer union all
select 2 as QuestionNumber, 'Q2' as Question , t.Q2 As Answer union all
select 3 as QuestionNumber, 'Q3' as Question , t.Q3 As Answer) x
This works but I dont want to do this 70 times so I have this select statement
select ORDINAL_POSITION
, COLUMN_NAME from INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = mytable
This gives me the list of column and position of column in the table.
So I hope I can somehow join 2nd statement with the 1st statement where by column name. However I am comparing content within a column and a column header here. Is it doable? Is there other way of achieving this?
Hope you can guide me please?
Thank you
Instead of Cross Apply you should use UNPIVOT for this query....
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE Test_Table(SurveyID INT, Q1 VARCHAR(10)
, Q2 VARCHAR(10), Q3 VARCHAR(10), Q4 VARCHAR(10))
INSERT INTO Test_Table VALUES
(1 , 'Yes', 'Good' , 'Bad', 'Bad')
,(2 , 'Bad', 'Bad' , 'Yes' , 'Good')
Query 1:
SELECT SurveyID
,Questions
,Answers
FROM Test_Table t
UNPIVOT ( Answers FOR Questions IN (Q1,Q2,Q3,Q4))up
Results:
| SurveyID | Questions | Answers |
|----------|-----------|---------|
| 1 | Q1 | Yes |
| 1 | Q2 | Good |
| 1 | Q3 | Bad |
| 1 | Q4 | Bad |
| 2 | Q1 | Bad |
| 2 | Q2 | Bad |
| 2 | Q3 | Yes |
| 2 | Q4 | Good |
If you need to perform this kind of operation to lots of similar tables that have differing numbers of columns, an UNPIVOT approach alone can be tiresome because you have to manually change the list of columns (Q1,Q2,Q3,etc) each time.
The CROSS APPLY based query in the question also suffers from similar drawbacks.
The solution to this, as you've guessed, involves using meta-information maintained by the server to tell you the list of columns you need to operate on. However, rather than requiring some kind of join as you suspect, what is needed is Dynamic SQL, that is, a SQL query that creates another SQL query on-the-fly.
This is done essentially by concatenating string (varchar) information in the SELECT part of the query, including values from columns which are available in your FROM (and join) clauses.
With Dynamic SQL (DSQL) approaches, you often use system metatables as your starting point. INFORMATION_SCHEMA exists in some SQL Server versions, but you're better off using the Object Catalog Views for this.
A prototype DSQL solution to generate the code for your CROSS APPLY approach would look something like this:
-- Create a variable to hold the created SQL code
-- First, add the static code at the start:
declare #SQL varchar(max) =
' SELECT t.[SurveyID]
, x.question
, x.Answer
FROM tbl t
CROSS APPLY
(
'
-- This syntax will add to the variable for every row in the query results; it's a little like looping over all the rows.
select #SQL +=
'select ' + cast(C.column_id as varchar)
+ ' as QuestionNumber, ''' + C.name
+ ''' as Question , t.' + C.name
+ ' As Answer union all
'
from sys.columns C
inner join sys.tables T on C.object_id=T.object_id
where T.name = 'MySurveyTable'
-- Remove final "union all", add closing bracket and alias
set #SQL = left(#SQL,len(#SQL)-10) + ') x'
print #SQL
-- To also execute (run) the dynamically-generated SQL
-- and get your desired row-based output all at the same time,
-- use the EXECUTE keyword (EXEC for short)
exec #SQL
A similar approach could be used to dynamically write SQL for the UNPIVOT approach.

Resources