How can order query result based on word occurrence count - sql-server

I have a product table and wanna search on Tag column that results must sort by count of occurrence of words.
ID | Tag
---------------------------------------
1 | LG television
2 | BOSCH vacuum cleaner 55 mm
3 | SONY home theater 55 watt
---------------------------------------
String to search: LG 55 vacuum theater home
Desired results:
1. SONY home theater 55 watt (contains three words: 55,theater,home)
2. BOSCH vacuum cleaner 55 mm (contains two words: 55,vacuum)
3. LG television (contains one word: LG)
There is a solution in Find string according to words count that uses LIKE and is very slow.
I want to implement it by FULLTEXT search
UPDATE: I tried below solution but results are wrong
SELECT ft.[Rank], p.Tag
FROM tblProducts AS p
INNER JOIN FREETEXTTABLE(tblProducts, Tag, 'LG 55 vacuum theater home') AS ft
ON ft.[Key] = p.ProductID
ORDER BY ft.[Rank] DESC;

Related

Recursive CTE help - how do you code this non-hierarchal sequence?

I'm trying to write a recursive CTE for a table that does not have a hierarchy. Meaning that there is no NULL in the family of IDs that are related.
For example table looks like this:
So it looks like this:
AccountID Account_RelationshipID
--------------------------------
1 2
2 4
4 6
6 11
11 1
15 17
17 19
19 15
So 1 relates to 2. 2 relates to 4. 4 relates to 6. 6 relates to 11. And then 11 loops back to ID of 1.
Then there is a new family. 15 relates to 17. 17 relates to 19. and then 19 goes back to 15.
There is also a separate Account_Detail table that has the account date:
AccountID AccountName AccountDate
-------------------------------------
1 Dave 1/1/2012
2 Dave 1/1/2013
4 Dave 1/1/2014
6 Dave 1/1/2015
11 Dave 1/1/2016
15 Paul 7/1/2015
17 Paul 7/1/2016
19 Paul 7/1/2017
I tried writing this as my code:
WITH C AS
(
SELECT
AR.AccountID,
AR.Account_RelationshipID,
AD.AccountDate
FROM
Account_Relationship AR
INNER JOIN
Account_Detail AD ON AD.AccountID = AR.AccountID
UNION ALL
SELECT
AR2.AccountID,
AR2.Account_RelationshipID,
AD.AccountDate
FROM
Account_Relationship AR2
INNER JOIN
Account_Detail AD2 ON AD2.Account_ID = AR2.Account_ID
INNER JOIN
C ON C.AccountID = AR2.Account_relationshipID
WHERE
AD.AccountDate < AD2.AccountDate
)
Obviously this code is totally wrong. This is as far as I've gotten. This code will loop infinitely.
I was thinking I could break the loop by adding a function that states when the AccountDate of the next AccountID in the loop is less than the AccountDate of the last AccountID, to break the loop and go to the next one.
Also, how do you get it to go to the next "family" of accountIDs in the loops (Paul in this case)? All the videos and tutorials I've seen about Recursive CTEs just teach how to do it for one family - usually with a hierarchical structure that breaks at NULL as well.
Help!!

need help in solving sql problems using order by to order building floor

i have this building floor data selected:
6
5
4
3
2
1
UG
GM
G
LG
5B
5A
B1
B2
for this sorting i use this kind of Order by :
order by
(case when ISNUMERIC(floorNo) = 1 then CAST(floorNo AS Int) end) desc ,
(case when ISNUMERIC(left(floorNo,1)) = 0 and ISNUMERIC(substring(floorNo,2,1)) = 1 then floorNo end) asc,
(case when ISNUMERIC(floorNo) = 0 and left(floorNo,1) <>'L' then floorNo end) desc
but i want to make it like this :
6
5B
5A
5
4
3
2
1
UG
GM
G
LG
B1
B2
Can ANy one Help me solve it?
If you make a complicated enough (set of) case statement(s), you would eventually be able to handle all the possibilities, but it is likely to run very slow if you have a lot of data.
If I had to do this, I would probably make a separate lookup table (FloorOrder) with two columns; this floor code and an order column (integer). Create a script to populate the lookup table with all the various possibilities - pick a maximum number of floors, basements, and subfloors per floor, and make all of the possibilities with some loops. Then add all the various floors near ground floor. Make sure the order numbers are spread out enough that you can easily add other codes in between when somebody comes up with a new option (because they will). Something like this subset.
Code Order
2 2000
1C 1300
1B 1200
1A 1100
1 1000
UG 800
GM 500
G 0
LG -300
B1 -1000
It doesn't really matter what the order codes are, as long as they sort the list in the right order, can be easily generated when creating the table, and leave space for fitting things in the gap. Whenever somebody comes up with a new weird floor code (some I've seen near me are things like M (Mezzanine, UM for Upper Mezzanine, etc), add new records to the FloorOrder table to fit them in. Make sure you table has an index on the floor codes
To use it, join to the FloorOrder table, sort by the Order column.

Postgres, find the min value of array column and get its position

Let us assume the following table: test
ids | l_ids | distance
-----------------+-----------
53 | {150,40} | {1.235, 2.265}
22 | {20,520} | {0.158, 0.568}
The positions of the two arrays (l_ids, distance) are dependent, meaning that
150 corresponds to 1.235,
40 corresponds to 2.265 and so on.
I want to get the min distance and the respective l_id. Therefore the result should be like this:
ids | l_ids | distance
----------------+-----------
53 | 150 | 1.235
22 | 20 | 0.158
By running this:
select l_ids, min(dist) as min_distance
from test, unnest(test.distance) dist
group by 1;
the result is:
l_ids | dist
------------+-----------
{150,40} | 1.235
{20,520} | 0.158
I want to get the position of the minimum value in the array of distance in order to get the respective id from the array of l_ids.
Any guideline? Thank you in advance
SELECT DISTINCT ON (ctid) l_ids, distance
FROM (SELECT ctid,
unnest(l_ids) AS l_ids,
unnest(distance) AS distance
FROM test
) q
ORDER BY ctid, distance;
l_ids | distance
-------+----------
150 | 1.235
20 | 0.158
(2 rows)
If ids are unique identiter, and you don't have repeated values in distance array, this should work:
SELECT a.* from
(SELECT ids, UNNEST(l_ids) AS l_ids, UNNEST (distance) AS distance FROM test) a
INNER JOIN (SELECT ids, MIN(distance) as mind FROM (
SELECT ids, UNNEST (distance) AS distance FROM test
) t
GROUP BY ids ) t
ON
a.ids = t.ids and a.distance = t.mind
From 9.4, you can use UNNEST()'s special ROWS FROM()-like syntax:
select distinct on (ids) ids, l_id, dist min_distance
from test, unnest(l_ids, test.distance) d(l_id, dist)
order by ids, dist
Otherwise, this is typical greatest-n-per-group query.

Optimize MDX query

I have two needs in my query
First : to have a sorted product list base on my measure.product with higher sales should appears first.
ProductCode Sales
----------- ------------
123 18
332 17
245 16
656 15
Second : to have cumulative sum on my presorted product list.
ProductCode Sales ACC
----------- ------------ ----
123 18 18
332 17 35
245 16 51
656 15 66
I wrote below MDX in order to achieve above goal:
WITH
SET SortedProducts AS
Order([DIMProduct].[ProductCode].[ProductCode].AllMEMBERS,[Measures]. [Sales],BDESC)
MEMBER [Measures].[ACC] AS
Sum
(
Head
(
[SortedProducts],Rank([DIMProduct].[ProductCode].CurrentMember,[SortedProducts])
)
,[Measures].[Sales]
)
SELECT
{[Measures].[Sales] ,[Measures].[ACC]}
ON COLUMNS,
SortedProducts
ON ROWS
FROM [Model]
But it takes about 3 minutes to run,any suggestion on how to optimize my code or is it normal?
I have 9635 products in total
if you do a quick research on google, there are different ways to achieve it (many answers here as well).
That said, I will give a try to this different way to calculate your running total
MEMBER [Measures].[SortedRank] AS Rank([Product].[Product].CurrentMember, [SortedProducts])
MEMBER [Measures].[ACC2] AS SUM(TopCount([SortedProducts], [Measures].[SortedRank]) ,[Measures].[Internet Sales Amount])
I don't know if TopCount will perform faster than Head for your case, but for example your query on my test machine on AdventureWorks cube takes the same time using Head or TopCount function.
Hope this helps

How to store bidirectional relationships

I am writing some code to find duplicate customer details in a database. I'll be using Levenshtein distance.
However, I am not sure how to store the relationships. I use databases all the time but have never come accross this situation and wondered if someone could point me in the right direction.
What confuses me is how to store the bidirectional nature of the relationship.
I've started to put some examples below, but wondered if there is a best practice for storing this type of data,
Example data
id, address
001, 5 Main Street
002, 5 Main St.
003, 5 Main Str
004, 6 High Street
005, 7 Low Street
006, 7 Low St
Suggestion 1
customer_id1, customer_id2, relationship_strength
001, 002, 0.74
001, 003, 0.77
002, 003, 0.76
005, 006, 0.77
Not happy with this approach as it sort of infers a one way relationship between customer_id1 to customer_id2. Unless of course I include all relationships both ways, but that would double the amount of processing time and the size of the tables.
eg would need to include: 002, 001, 0.74
Suggestion 2
customer_id, grouping_id
001, 1
002, 1
003, 1
005, 2
006, 2
The way to deal with symmetric relations in a relational system is as follows:
choose a canonical form in which the symmetric pairs are stored, e.g. customer_id1 < customer_id2.
Define a view SYMM_TBL as select id1,id2,... from ... UNION select id2 as id1,id1 as id2, ... FROM ...
Decent systems ought not punish you in the performance area when querying this view.
What we have here is a graph in which each node has a relationship (edit distance) with every other node. This is not in the normal range of data models. It is also not a permanent feature of your database (assuming you resolve the business processes which led to the duplicate data) so it isn't worth sweating over the solution which best fits relational theory. What we need is a a practical solution.
Think of it as a matrix. If we go for the optimum processing we won't execute the duplicate scorings. So we score Address 1 against all the other Addresses, we score Address 2 against all the other Addresses except Address 1, we score Address 3 against all the other Addresses except Addresses 1 and 2, etc. And what we end up with is a bit like a football league table:
addr
1 2 3 4 5
addr
1 - 95 95 80 76
2 - - 100 75 72
3 - - - 75 72
4 - - - - 83
5 - - - - -
This data can best be stored in suggestion 1, a table of ID1, ID2, SCORE. Although we do need to pivot the data to get the output looking like that :)
In a proper league table there are two sets of scores - Home and Away - so the table is symmetrical. But that doesn't apply here, as the edit distance for 1 > 2 is the same as 2 > 1. However, it would make querying the results more straightforward if the result set included the mirrored scores. That is, for records (1,5,76), (2,5,72), etc we generate records (5,1,76), (5,2,72). This could be done at the end of the scoring process.
addr
1 2 3 4 5
addr
1 - 95 95 80 76
2 95 - 100 75 72
3 95 100 - 75 72
4 80 75 75 - 83
5 76 72 72 83 -
Of course, this is mainly a presentational thing, so it only needs to be done for display purposes, e.g. exporting the data to a spreadsheet. We can still get all the scores for, say, Address 5 in a readable fashion without miiroring the scores using a simple SQL statement:
select case when id1 = 5 then id1 else id2 end as id1
, case when id1 = 5 then id2 else id1 end as id2
, score
from your_table
where id1 = 5
or id2 = 5
/
As always it depends on what you want to do with the data once you've calculated it.
Assuming it's simply to identify or locate duplicates then your suggestion 1 is what I'd use, i.e. a second table that simply stores the pairs and the strengths. My only suggestion is to make the strengths a scaled integer rather than a decimal.

Resources