SQL Server Full Text Search - Weighting Certain Columns Over Others

SQL Server Full Text Search - Weighting Certain Columns Over Others - sql-server

If I have the following full text search query:
SELECT *
FROM dbo.Product
INNER JOIN CONTAINSTABLE(Product, (Name, Description, ProductType), 'model') ct
ON ct.[Key] = Product.ProductID
Is it possible to weigh the columns that are being searched?
For example, I care more about the word model appearing in the Name column than I do the
Description or ProductType columns.
Of course if the word is in all 3 columns then I would expect it to rank higher than if it was just in the name column. Is there any way to have a row rank higher if it just appears in Name vs just in Description/ProductType?

You can do something like the following query. Here, WeightedRank is computed by multiplying the rank of the individual matches. NOTE: unfortunately I don't have Northwind installed so I couldn't test this, so look at it more like pseudocode and let me know if it doesn't work.
declare #searchTerm varchar(50) = 'model';
SELECT 100 * coalesce(ct1.RANK, 0) +
10 * coalesce(ct2.RANK, 0) +
1 * coalesce(ct3.RANK, 0) as WeightedRank,
*
FROM dbo.Product
LEFT JOIN
CONTAINSTABLE(Product, Name, #searchTerm) ct1 ON ct1.[Key] = Product.ProductID
LEFT JOIN
CONTAINSTABLE(Product, Description, #searchTerm) ct2 ON ct2.[Key] = Product.ProductID
LEFT JOIN
CONTAINSTABLE(Product, ProductType, #searchTerm) ct3 ON ct3.[Key] = Product.ProductID
order by WeightedRank desc

Listing 3-25. Sample Column Rank-Multiplier Search of Pro Full-Text Search in SQL Server 2008
SELECT *
FROM (
SELECT Commentary_ID
,SUM([Rank]) AS Rank
FROM (
SELECT bc.Commentary_ID
,c.[RANK] * 10 AS [Rank]
FROM FREETEXTTABLE(dbo.Contributor_Birth_Place, *, N'England') c
INNER JOIN dbo.Contributor_Book cb ON c.[KEY] = cb.Contributor_ID
INNER JOIN dbo.Book_Commentary bc ON cb.Book_ID = bc.Book_ID
UNION ALL
SELECT c.[KEY]
,c.[RANK] * 5
FROM FREETEXTTABLE(dbo.Commentary, Commentary, N'England') c
UNION ALL
SELECT ac.[KEY]
,ac.[RANK]
FROM FREETEXTTABLE(dbo.Commentary, Article_Content, N'England') ac
) s
GROUP BY Commentary_ID
) s1
INNER JOIN dbo.Commentary c1 ON c1.Commentary_ID = s1.Commentary_ID
ORDER BY [Rank] DESC;

Similar to Henry's solution but simplified, tested and using the details the question provided.
NB: I ran performance tests on both the union and left join styles and found the below to require far less logical reads on the union style below with my datasets YMMV.
declare #searchTerm varchar(50) = 'model';
declare #nameWeight int = 100;
declare #descriptionWeight int = 10;
declare #productTypeWeight int = 1;
SELECT ranksGroupedByProductID.*, outerProduct.*
FROM (SELECT [key],
Sum([rank]) AS WeightedRank
FROM (
-- Each column that needs to be weighted separately
-- should be added here and unioned with the other queries
SELECT [key],
[rank] * #nameWeight as [rank]
FROM Containstable(dbo.Product, [Name], #searchTerm)
UNION ALL
SELECT [key],
[rank] * #descriptionWeight as [rank]
FROM Containstable(dbo.Product, [Description], #searchTerm)
UNION ALL
SELECT [key],
[rank] * #productTypeWeight as [rank]
FROM Containstable(dbo.Product, [ProductType], #searchTerm)
) innerSearch
-- Grouping by key allows us to sum each ProductID's ranks for all the columns
GROUP BY [key]) ranksGroupedByProductID
-- This join is just to get the full Product table columns
-- and is optional if you only need the ordered ProductIDs
INNER JOIN dbo.Product outerProduct
ON outerProduct.ProductID = ranksGroupedByProductID.[key]
ORDER BY WeightedRank DESC;

Related

Return only rows from the joined table with the latest date

By running the following query I realized that I have duplicates on the column QueryExecutionId.
SELECT DISTINCT qe.QueryExecutionid AS QueryExecutionId,
wfi.workflowdefinitionid AS FlowId,
qe.publishing_date AS [Date],
c.typename AS [Type],
c.name As Name
INTO #Send
FROM
[QueryExecutions] qe
JOIN [Campaign] c ON qe.target_campaign_id = c.campaignid
LEFT JOIN [WorkflowInstanceCampaignActivities] wfica ON wfica.queryexecutionresultid = qe.executionresultid
LEFT JOIN [WorkflowInstances] wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid
WHERE qe.[customer_idhash] IS NOT NULL;
E.g. When I test with one of these QueryExecutionIds, I can two results
select * from ##Send
where QueryExecutionId = 169237
We realized the reason is that these two rows have a different FlowId (second returned value in the first query). After discussing this issue, we decided to take the record with a FlowId that has the latest date. This date is a column called lastexecutiontime that sits in the third joined table [WorkflowInstances] which is also the table where FlowId comes from.
How do I only get unique values of QueryExecutionId with the latest value of WorkflowInstances.lastexecution time and remove the duplicates?

You can use a derived table with first_value partitioned by workflowinstanceid ordered by lastexecutiontime desc:
SELECT DISTINCT qe.QueryExecutionid AS QueryExecutionId,
wfi.FlowId,
qe.publishing_date AS [Date],
c.typename AS [Type],
c.name As Name
INTO #Send
FROM
[QueryExecutions] qe
JOIN [Campaign] c ON qe.target_campaign_id = c.campaignid
LEFT JOIN [WorkflowInstanceCampaignActivities] wfica ON wfica.queryexecutionresultid = qe.executionresultid
LEFT JOIN
(
SELECT DISTINCT workflowinstanceid, FIRST_VALUE(workflowdefinitionid) OVER(PARTITION BY workflowinstanceid ORDER BY lastexecutiontime DESC) As FlowId
FROM [WorkflowInstances]
) wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid
WHERE qe.[customer_idhash] IS NOT NULL;

Please note that your distinct query is pertaining to the selected variables,
eg. Data 1 (QueryExecutionId = 169237 and typename = test 1)
    Data 2 (QueryExecutionId = 169237 and typename = test 2)
The above 2 data are considered as distinct
Try partition by and selection the [seq] = 1 (the below code are partition by their date)
SELECT *
into #Send
FROM
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY [QueryExecutionid] ORDER BY [Date] DESC) [Seq]
FROM
(
SELECT qe.QueryExecutionid AS QueryExecutionId,
wfi.FlowId,
qe.publishing_date AS [Date], --should not have any null values
qe.[customer_idhash]
c.typename AS [Type],
c.name As Name
FROM [QueryExecutions] qe
JOIN [Campaign] c
ON qe.target_campaign_id = c.campaignid
LEFT JOIN [WorkflowInstanceCampaignActivities] wfica
ON wfica.queryexecutionresultid = qe.executionresultid
LEFT JOIN
(
SELECT DISTINCT workflowinstanceid, FIRST_VALUE(workflowdefinitionid) OVER(PARTITION BY workflowinstanceid ORDER BY lastexecutiontime DESC) As FlowId
FROM [WorkflowInstances]
) wfi ON wfica.workflowinstanceid = wfi.workflowinstanceid
) a
WHERE [customer_idhash] IS NOT NULL
) b
WHERE [Seq] = 1
ORDER BY [QueryExecutionid]

SQL SERVER - Left Join does not work. Because? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
This works perfectly in MS Access.
Why not in MS SQL Server?
Can you help me solve it?
Here's how my query is
select *
from tblPROestoque
where idproduto = 8183
order by identrada desc
select top(1) *
from tblPROestoque
where idproduto = 8183
order by identrada desc
select *
from tblPROproduto pr
left join (select top(1) idproduto, valcusto
from tblproestoque
order by identrada desc) tmp on tmp.idproduto = pr.idproduto
where pr.idproduto = 8183

I suspect the problem here is your understanding of how a subquery works. We have your query:
SELECT *
FROM tblPROproduto pr
LEFT JOIN (SELECT TOP (1)
idproduto,
valcusto
FROM tblproestoque
ORDER BY identrada DESC) tmp ON tmp.idproduto = pr.idproduto
WHERE pr.idproduto = 8183;
We can separate this into 2 different parts:
SELECT TOP (1)
idproduto,
valcusto
FROM tblproestoque
ORDER BY identrada DESC;
and then:
SELECT *
FROM tblPROproduto pr
LEFT JOIN tmp ON tmp.idproduto = pr.idproduto
WHERE pr.idproduto = 8183;
This might explain to you why what you have isn't working. I'm guessing you are assuming that the ON clause on tmp is derived BEFORE the SELECT of the subquery. This isn't the case. The subquery will be derived, and then the ON. Thus the value of tmp will be whatever is rerturned in the query above.
I suspect what you want is:
SELECT *
FROM tblPROproduto pr
OUTER APPLY (SELECT TOP (1)
ca.idproduto,
ca.valcusto
FROM tblproestoque ca
WHERE ca.idproduto = pr.idproduto
ORDER BY ca.identrada DESC) tmp
WHERE pr.idproduto = 8183;
Edit: Added some sample data and explanations for the OP, to help their udnerstanding:
USE Sandbox;
GO
CREATE TABLE Product (ID int IDENTITY(1,1),
Sku varchar(10),
ProductName varchar(25));
CREATE TABLE ProductOrder (ID int IDENTITY(1,1),
ProductID int,
OrderDate date,
NumberOrdered int);
INSERT INTO dbo.Product (Sku,
ProductName)
VALUES ('65432462','Lawn Mower'),
('98742347','Helicopter'),
('89465735','BBQ');
INSERT INTO dbo.ProductOrder (ProductID,
OrderDate,
NumberOrdered)
VALUES (1,'20180101',7),
(1,'20180708',19),
(2,'20180501',12),
(3,'20180804',27);
GO
SELECT *
FROM dbo.Product;
SELECT *
FROM dbo.ProductOrder;
GO
--Use the example the OP has in their post:
SELECT *
FROM dbo.Product P
LEFT JOIN (SELECT TOP 1 *
FROM dbo.ProductOrder
ORDER BY OrderDate DESC) PO ON PO.ProductID = P.ID
WHERE P.ID = 2;
--This returns NULLs for all the latter columns.
--Why?
--Inspect the subquery:
SELECT TOP 1 *
FROM dbo.ProductOrder
ORDER BY OrderDate DESC;
--Product ID 3? 3 != 2 so the ON clause fails:
--Demonstrate
SELECT *
FROM dbo.Product P
CROSS JOIN (SELECT TOP 1 * --CROSS JOIN joins all rows (creates a cartesian product)
FROM dbo.ProductOrder
ORDER BY OrderDate DESC) PO
WHERE P.ID = 2;
--The solution, use OUTER APPLY:
SELECT *
FROM dbo.Product P
OUTER APPLY (SELECT TOP 1 *
FROM dbo.ProductOrder oa
WHERE oa.ProductID = P.ID --WHERE clause, this is like your ON
ORDER BY oa.OrderDate DESC) PO
WHERE P.ID = 2;
GO
DROP TABLE dbo.ProductOrder;
DROP TABLE dbo.Product;

#RafaelBueno, you basically need an outer apply. This is different in MSAcess.
See modified below:
Let me know if it works.
select *
from tblPROproduto pr
outer apply (select top(1) idproduto, valcusto
from tblproestoque tmp
where tmp.idproduto = pr.idproduto
order by identrada desc) tmp
where pr.idproduto = 8183

SQL Server 2014 Consolidate Tables avoiding duplicates

I have 36 Sales tables each referred to one store:
st1.dbo.Sales
st2.dbo.Sales
...
st35.dbo.Sales
st36.dbo.Sales
Each record has the following key columns:
UserName, PostalCode, Location, Country, InvoiceAmount, ItemsCount, StoreID
Here is SQLFiddle
I need to copy into Customers table all Username (and their details) that are not already present into Customers
in case of duplicated it is required to use the fields of record where InvoiceAmount is MAX
I tried to build a query but looks too complicated and it is also wrong because in CROSS APPLY should consider the full list of Sales Tables
INSERT INTO Customers (.....)
SELECT distinct
d.UserName,
w.postalCode,
w.location,
W.country,
max(w.invoiceamount) invoiceamount,
max(w.itemscount) itemscount,
w.storeID
FROM
(SELECT * FROM st1.dbo.Sales
UNION
SELECT * FROM st2.dbo.Sales
UNION
...
SELECT * FROM st36.dbo.Sales) d
LEFT JOIN
G.dbo.Customers s ON d.Username = s.UserName
CROSS APPLY
(SELECT TOP (1) *
FROM s.dbo.[Sales]
WHERE d.Username=w.Username
ORDER BY InvoiceAmount DESC) w
WHERE
s.UserName IS NULL
AND d.username IS NOT NULL
GROUP BY
d.UserName, w.postalCode, w.location,
w.country, w.storeID
Can somebody please give some hints?

As a basic SQL query, I'd create a row_number in the inner subquery and then join to customers and then isolated the max invoice number for each customer not in the customer table.
INSERT INTO Customers (.....)
SELECT w.UserName,
w.postalCode,
w.location,
w.country,
w.invoiceamount,
w.itemscount,
w.storeID
FROM (select d.*,
row_number() over(partition by d.Username order by d.invoiceamount desc) rownumber
from (SELECT *
FROM st1.dbo.Sales
UNION
SELECT *
FROM st2.dbo.Sales
UNION
...
SELECT *
FROM st36.dbo.Sales
) d
LEFT JOIN G.dbo.Customers s
ON d.Username = s.UserName
WHERE s.UserName IS NULL
AND d.username IS NOT NULL
) w
where w.rownumber = 1

Using your fiddle this will select distinct usernames rows with max invoiceamount
with d as(
SELECT * FROM Sales
UNION
SELECT * FROM Sales2
)
select *
from ( select *,
rn = row_number() over(partition by Username order by invoiceamount desc)
from d) dd
where rn=1;

step 1 - use cte .
select username , invoiceamount ,itemscount from Sales
UNION all
select user name , invoiceamount ,itemscount from Sales
.....
...
step 2
next cte use group by and get max invoiceamount ,itemscount for user of last result set.
,cte2 as (
select user name , max (invoiceamount) as invoiceamount ,max(itemscount) as itemscount from cte)
step3
use left join with user table and find missing record and itemscount invoiceamount

Getting most recent date from multiple SQL columns

The suggested answer, in this post, works great for two columns.
I have about 50 different date columns, where I need to be able to report on the most recent interaction, regardless of table.
In this case, I am bringing the columns in to a view, since they are coming from different tables in two different databases...
CREATE VIEW vMyView
AS
SELECT
comp_name AS Customer
, Comp_UpdatedDate AS Last_Change
, CmLi_UpdatedDate AS Last_Communication
, Case_UpdatedDate AS Last_Case
, AdLi_UpdatedDate AS Address_Change
FROM Company
LEFT JOIN Comm_Link on Comp_CompanyId = CmLi_Comm_CompanyId
LEFT JOIN Cases ON Comp_CompanyId = Case_PrimaryCompanyId
LEFT JOIN Address_Link on Comp_CompanyId = AdLi_CompanyID
...
My question is, how I would easily account for the many possibilities of one column being greater than the others?
Using only the two first columns, as per the example above, works great. But considering that one row could have column 3 as the highest value, another row could have column 14 etc...
SELECT Customer, MAX(CASE WHEN (Last_Change IS NULL OR Last_Communication> Last_Change)
THEN Last_Communication ELSE Last_Change
END) AS MaxDate
FROM vMyView
GROUP BY Customer
So, how can I easily grab the highest value for each row in any of the 50(ish) columns?
I am using SQL Server 2008 R2, but I also need this to work in versions 2012 and 2014.
Any help would be greatly appreciated.
EDIT:
I just discovered that the second database is storing the dates in NUMERIC fields, rather than DATETIME. (Stupid! I know!)
So I get the error:
The type of column "ARCUS" conflicts with the type of other columns specified in the UNPIVOT list.
I tried to resolve this with a CAST to make it DATETIME, but that only resulted in more errors.
;WITH X AS
(
SELECT Customer
,Value [Date]
,ColumnName [Entity]
,BusinessEmail
,ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Value DESC) rn
FROM (
SELECT comp_name AS Customer
, Pers_EmailAddress AS BusinessEmail
, Comp_UpdatedDate AS Company
, CmLi_UpdatedDate AS Communication
, Case_UpdatedDate AS [Case]
, AdLi_UpdatedDate AS [Address]
, PLink_UpdatedDate AS Phone
, ELink_UpdatedDate AS Email
, Pers_UpdatedDate AS Person
, oppo_updateddate as Opportunity
, samdat.dbo.ARCUS.AUDTDATE AS ARCUS
FROM vCompanyPE
LEFT JOIN Comm_Link on Comp_CompanyId = CmLi_Comm_CompanyId
LEFT JOIN Cases ON Comp_CompanyId = Case_PrimaryCompanyId
LEFT JOIN Address_Link on Comp_CompanyId = AdLi_CompanyID
LEFT JOIN PhoneLink on Comp_CompanyId = PLink_RecordID
LEFT JOIN EmailLink on Comp_CompanyId = ELink_RecordID
LEFT JOIN vPersonPE on Comp_CompanyId = Pers_CompanyId
LEFT JOIN Opportunity on Comp_CompanyId = Oppo_PrimaryCompanyId
LEFT JOIN Orders on Oppo_OpportunityId = Orde_opportunityid
LEFT JOIN SAMDAT.DBO.ARCUS on IDCUST = Comp_IdCust
COLLATE Latin1_General_CI_AS
WHERE Comp_IdCust IS NOT NULL
AND Comp_deleted IS NULL
) t
UNPIVOT (Value FOR ColumnName IN
(
Company
,Communication
,[Case]
,[Address]
,Phone
,Email
,Person
,Opportunity
,ARCUS
)
)up
)
SELECT Customer
, BusinessEmail
,[Date]
,[Entity]
FROM X
WHERE rn = 1 AND [DATE] >= DATEADD(year,-2,GETDATE()) and BusinessEmail is not null

You could use CROSS APPLY to manually pivot your fields, then use MAX()
SELECT
vMyView.*,
greatest.val
FROM
vMyView
CROSS APPLY
(
SELECT
MAX(val) AS val
FROM
(
SELECT vMyView.field01 AS val
UNION ALL SELECT vMyView.field02 AS val
...
UNION ALL SELECT vMyView.field50 AS val
)
AS manual_pivot
)
AS greatest
The inner most query will pivot each field in to a new row, then the MAX() re-aggregate them back in to a single row. (Also skipping NULLs, so you don't need to explicitly cater for them.)

;WITH X AS
(
SELECT Customer
,Value [Date]
,ColumnName [CommunicationType]
,ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Value DESC) rn
FROM (
SELECT comp_name AS Customer
, Comp_UpdatedDate AS Last_Change
, CmLi_UpdatedDate AS Last_Communication
, Case_UpdatedDate AS Last_Case
, AdLi_UpdatedDate AS Address_Change
FROM Company
LEFT JOIN Comm_Link on Comp_CompanyId = CmLi_Comm_CompanyId
LEFT JOIN Cases ON Comp_CompanyId = Case_PrimaryCompanyId
LEFT JOIN Address_Link on Comp_CompanyId = AdLi_CompanyID
) t
UNPIVOT (Value FOR ColumnName IN (Last_Change,Last_Communication,
Last_Case,Address_Change))up
)
SELECT Customer
,[Date]
,[CommunicationType]
FROM X
WHERE rn = 1

select top 1 with a group by

I have two columns:
namecode name
050125 chris
050125 tof
050125 tof
050130 chris
050131 tof
I want to group by namecode, and return only the name with the most number of occurrences. In this instance, the result would be
050125 tof
050130 chris
050131 tof
This is with SQL Server 2000

I usually use ROW_NUMBER() to achieve this. Not sure how it performs against various data sets, but we haven't had any performance issues as a result of using ROW_NUMBER.
The PARTITION BY clause specifies which value to "group" the row numbers by, and the ORDER BY clause specifies how the records within each "group" should be sorted. So partition the data set by NameCode, and get all records with a Row Number of 1 (that is, the first record in each partition, ordered by the ORDER BY clause).
SELECT
i.NameCode,
i.Name
FROM
(
SELECT
RowNumber = ROW_NUMBER() OVER (PARTITION BY t.NameCode ORDER BY t.Name),
t.NameCode,
t.Name
FROM
MyTable t
) i
WHERE
i.RowNumber = 1;

select distinct namecode
, (
select top 1 name from
(
select namecode, name, count(*)
from myTable i
where i.namecode = o.namecode
group by namecode, name
order by count(*) desc
) x
) as name
from myTable o

SELECT max_table.namecode, count_table2.name
FROM
(SELECT namecode, MAX(count_name) AS max_count
FROM
(SELECT namecode, name, COUNT(name) AS count_name
FROM mytable
GROUP BY namecode, name) AS count_table1
GROUP BY namecode) AS max_table
INNER JOIN
(SELECT namecode, COUNT(name) AS count_name, name
FROM mytable
GROUP BY namecode, name) count_table2
ON max_table.namecode = count_table2.namecode AND
count_table2.count_name = max_table.max_count

I did not try but this should work,
select top 1 t2.* from (
select namecode, count(*) count from temp
group by namecode) t1 join temp t2 on t1.namecode = t2.namecode
order by t1.count desc

Here are to examples that you could use but the temp table use is more efficient than the view, but was done on a small data sample. You would want to check your own statistics.
--Creating A View
GO
CREATE VIEW StateStoreSales AS
SELECT t.state,t.stor_id,t.stor_name,SUM(s.qty) 'TotalSales'
,ROW_NUMBER() OVER (PARTITION BY t.state ORDER BY SUM(s.qty) DESC) AS 'Rank'
FROM [dbo].[sales] s
JOIN [dbo].[stores] t ON (s.stor_id = t.stor_id)
GROUP BY t.state,t.stor_id,t.stor_name
GO
SELECT * FROM StateStoreSales
WHERE Rank <= 1
ORDER BY TotalSales Desc
DROP VIEW StateStoreSales
---Using a Temp Table
SELECT t.state,t.stor_id,t.stor_name,SUM(s.qty) 'TotalSales'
,ROW_NUMBER() OVER (PARTITION BY t.state ORDER BY SUM(s.qty) DESC) AS 'Rank' INTO #TEMP
FROM [dbo].[sales] s
JOIN [dbo].[stores] t ON (s.stor_id = t.stor_id)
GROUP BY t.state,t.stor_id,t.stor_name
SELECT * FROM #TEMP
WHERE Rank <= 1
ORDER BY TotalSales Desc
DROP TABLE #TEMP

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server Full Text Search - Weighting Certain Columns Over Others - sql-server

Related

Return only rows from the joined table with the latest date

SQL SERVER - Left Join does not work. Because? [closed]

SQL Server 2014 Consolidate Tables avoiding duplicates

Getting most recent date from multiple SQL columns

select top 1 with a group by

Categories

Resources