Finding the best match with "fuzzy" ranking logic

Finding the best match with "fuzzy" ranking logic - sql-server

I need help with grouping results of the below temp table using a 'rank' column.
The temp table (MS SQL) is as follows:
student_address | school_address | student_st| school_st| district | districtID | rank
---------------------------------------------------------------------------------------
123 some street | 12 apple way | CT | CT | 322 | 322 | 0.2
123 some street | 33 pear street| CT | NJ | 039 | 039 | 0.1
333 another st. | NULL | VT | NULL | 111 | 111 | 0.0
I populated the #temp table as such:
SELECT st.student_address, sc.school_address, st.student_st, sc.district, st.districtID, '0.0' as rank
FROM students st
LEFT OUTER JOIN schools sc
ON st.[District ID] = sc.District
ORDER BY st.[District ID] asc;
I followed the results of my temp table by a series of updates that changed the 'rank' column based on certain rules (e.g. no match between school and student = 0.0, only a district match = 0.1, a district match & a state match = 0.2 and so on). The end result is that highly ranked rows are more likely to show the student's actual school vs. lesser ranked rows.
Where I need help is the final query. I essentially want to return all student info (all rows from the original students table) and the most likely corresponding school (determined by rank).
Something like (pseudo code)
select student_address, student_st, student_etc, school_address
from #temp
where rank = max(rank)
group by student_address
I know the above isn't correct SQL, but I hope it gives you an idea what I am trying to achieve?
Thanks for any guidance.

You can try this out:
select student_address, student_st, student_etc, school_address,RANK
from #temp t1
group by student_address, student_st, student_etc, school_address,RANK having
RANK=(select MAX(RANK) from #temp t2 where t1.student_address=t2.student_address)

I think you're close. Probably need to use a subquery like:
SELECT student_address, student_st, student_etc, school_address
FROM #temp
WHERE rank = (SELECT MAX(rank) FROM #temp)
...though I'm missing where student_street is coming from. The above, however looks like the pattern you're looking for.

Related

Using STRING_SPLIT for 2 columns in a single table

I've started from a table like this
ID | City | Sales
1 | London,New York,Paris,Berlin,Madrid| 20,30,,50
2 | Istanbul,Tokyo,Brussels | 4,5,6
There can be an unlimited amount of cities and/or sales.
I need to get each city and their salesamount their own record. So my result should look something like this:
ID | City | Sales
1 | London | 20
1 | New York | 30
1 | Paris |
1 | Berlin | 50
1 | Madrid |
2 | Istanbul | 4
2 | Tokyo | 5
2 | Brussels | 6
What I got so far is
SELECT ID, splitC.Value, splitS.Value
FROM Table
CROSS APLLY STRING_SPLIT(Table.City,',') splitC
CROSS APLLY STRING_SPLIT(Table.Sales,',') splitS
With one cross apply, this works perfectly. But when executing the query with a second one, it starts to multiply the number of records a lot (which makes sense I think, because it's trying to split the sales for each city again).
What would be an option to solve this issue? STRING_SPLIT is not neccesary, it's just how I started on it.

STRING_SPLIT() is not an option, because (as is mentioned in the documantation) the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
But you may try with a JSON-based approach, using OPENJSON() and string transformation (comma-separated values are transformed into a valid JSON array - London,New York,Paris,Berlin,Madrid into ["London","New York","Paris","Berlin","Madrid"]). The result from the OPENJSON() with default schema is a table with columns key, value and type and the key column is the 0-based index of each item in this array:
Table:
CREATE TABLE Data (
ID int,
City varchar(1000),
Sales varchar(1000)
)
INSERT INTO Data
(ID, City, Sales)
VALUES
(1, 'London,New York,Paris,Berlin,Madrid', '20,30,,50'),
(2, 'Istanbul,Tokyo,Brussels', '4,5,6')
Statement:
SELECT d.ID, a.City, a.Sales
FROM Data d
CROSS APPLY (
SELECT c.[value] AS City, s.[value] AS Sales
FROM OPENJSON(CONCAT('["', REPLACE(d.City, ',', '","'), '"]')) c
LEFT OUTER JOIN OPENJSON(CONCAT('["', REPLACE(d.Sales, ',', '","'), '"]')) s
ON c.[key] = s.[key]
) a
Result:
ID City Sales
1 London 20
1 New York 30
1 Paris
1 Berlin 50
1 Madrid NULL
2 Istanbul 4
2 Tokyo 5
2 Brussels 6

STRING_SPLIT has no context of what oridinal positions are. In fact, the documentation specifically states that it doesn't care about it:
The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string.
As a result, you need to use something that is aware of such basic things, such as DelimitedSplit8k_LEAD.
Then you can do something like this:
WITH Cities AS(
SELECT ID,
DSc.Item,
DSc.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.City,',') DSc)
Sales AS(
SELECT ID,
DSs.Item,
DSs.ItemNumber
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.Sales,',') DSs)
SELECT ISNULL(C.ID,S.ID) AS ID,
C.Item AS City,
S.Item AS Sale
FROM Cities C
FULL OUTER JOIN Sales S ON C.ItemNumber = S.ItemNumber;
Of course, however, the real solution is fix your design. This type of design is going to only cause you 100's of problems in the future. Fix it now, not later; you'll reap so many rewards sooner the earlier you do it.

Joining two tables and need to have MAX aggregate function in ON clause

This is my code! I want to give a part id and purchase order id to my report and it brings all the related information with those specification. The important thing is that, if we have same purchase order id and part id we need the code to return the result with the highest transaction id. The following code is not providing what I expected. Could you please help me?
SELECT MAX(INVENTORY_TRANS.TRANSACTION_ID), INVENTORY_TRANS.PART_ID
, INVENTORY_TRANS.PURC_ORDER_ID, TRACE_INV_TRANS.QTY, TRACE_INV_TRANS.CREATE_DATE, TRACE_INV_TRANS.TRACE_ID
FROM INVENTORY_TRANS
JOIN TRACE_INV_TRANS ON INVENTORY_TRANS.TRANSACTION_ID = TRACE_INV_TRANS.TRANSACTION_ID
WHERE INVENTORY_TRANS.PART_ID = #PartID
AND INVENTORY_TRANS.PURC_ORDER_ID = #PurchaseOrderID
GROUP BY TRACE_INV_TRANS.QTY, TRACE_INV_TRANS.CREATE_DATE, TRACE_INV_TRANS.TRACE_ID, INVENTORY_TRANS.PART_ID
, INVENTORY_TRANS.PURC_ORDER_ID
The sample of trace_inventory_trans table is :
part_id trace_id transaction id qty create_date
x 1 10
x 2 11
x 3 12
the sample of inventory_trans table is :
transaction_id part_id purc_order_id
11 x p20
12 x p20
I wanted to have the result of biggest transaction which is transaction 12 but it shows me transaction 11

I would use a sub-query to find the MAX value, then join that result to the other table.
The ORDER BY + TOP (1) returns the MAX value for transaction_id.
SELECT
inv.transaction_id
,inv.part_id
,inv.purc_order_id
,tr.qty
,tr.create_date
,tr.trace_id
FROM
(
SELECT TOP (1)
transaction_id,
part_id,
purc_order_id
FROM
INVENTORY_TRANS
WHERE
part_id = #PartID
AND
purc_order_id = #PurchaseOrderID
ORDER BY
transaction_id DESC
) AS inv
JOIN
TRACE_INV_TRANS AS tr
ON inv.transaction_id = tr.transaction_id;
Results:
+----------------+---------+---------------+------+-------------+----------+
| transaction_id | part_id | purc_order_id | qty | create_date | trace_id |
+----------------+---------+---------------+------+-------------+----------+
| 12 | x | p20 | NULL | NULL | 3 |
+----------------+---------+---------------+------+-------------+----------+
Rextester Demo

TSQL Conditional Where or Group By?

I have a table like the following:
id | type | duedate
-------------------------
1 | original | 01/01/2017
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | original | 10/01/2017
3 | revised | 09/01/2017
Where there may be either one or two rows for each id. If there are two rows with same id, there would be one with type='original' and one with type='revised'. If there is one row for the id, type will always be 'original'.
What I want as a result are all the rows where type='revised', but if there is only one row for a particular id (thus type='original') then I want to include that row too. So desired output for the above would be:
id | type | duedate
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | revised | 09/01/2017
I do not know how to construct a WHERE clause that conditionally checks whether there are 1 or 2 rows for a given id, nor am I sure how to use GROUP BY because the revised date could be greater than or less than than the original date so use of aggregate functions MAX or MIN don't work. I thought about using CASE somehow, but also do not know how to construct a conditional that chooses between two different rows of data (if there are two rows) and display one of them rather than the other.
Any suggested approaches would be appreciated.
Thanks!

you can use row number for this.
WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Type DESC) AS RN
FROM YourTable
)
SELECT *
FROM T
WHERE RN = 1

Is something like this sufficient?
SELECT *
FROM mytable m1
WHERE type='revised'
or 1=(SELECT COUNT(*) FROM mytable m2 WHERE m2.id=m1.id)

You could use a subquery to take the MAX([type]). In this case it works for [type] since alphabetically we want revised first, then original and "r" comes after "o" in the alphabet. We can then INNER JOIN back on the same table with the matching conditions.
SELECT T2.*
FROM (
SELECT id, MAX([type]) AS [MAXtype]
FROM myTABLE
GROUP BY id
) AS dT INNER JOIN myTable T2 ON dT.id = T2.id AND dT.[MAXtype] = T2.[type]
ORDER BY T2.[id]
Gives output:
id type duedate
1 revised 2017-02-01
2 original 2017-03-01
3 revised 2017-09-01
Here is the sqlfiddle: http://sqlfiddle.com/#!6/14121f/6/0

Updating 1 table from another using wheres

Trying to update one column, from another table with the highest Date.
Table 1 Example:
PartNumber | Cost
1000 | .10
1001 | .20
Table 2 Example:
PartNumber | Cost | Date
1000 | .10 | 2017-01-01
1000 | .50 | 2017-02-01
1001 | .20 | 2017-01-01
1002 | .50 | 2017-02-02
I would like to update table 1 with the most recent values from table2, which would be .50 for each... The query I use to update this has worked just fine until I realized I was not grabbing the correct Cost because there were multiples.. I now want to grab the highest dated revision.
My query:
UPDATE dex_mfgx..insp_master
SET dex_mfgx..insp_master.costperpart = t2.sct_cst_tot
FROM dex_mfgx..insp_master AS t1
INNER JOIN qad_repl..sct_det_sql AS t2
ON t1.partnum = t2.sct_part
WHERE t1.partnum = t2.sct_part and t2.sct_cst_date = MAX(t2.sct_cst_date) ;
My Error:
Msg 147, Level 15, State 1, Line 6
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
Not having much luck with HAVING or GROUPING, although I havent used them much..
Any have an idea that would help?

I think I understand what you are trying to solve now. Thanks to Lamak for setting me straight as I was way off base originally.
Something like this I think is what you are looking for.
with TotalCosts as
(
SELECT t2.sct_cst_tot
, t1.partnum
, RowNum = ROW_NUMBER() over(partition by t1.partnun order by t2.sct_cst_date desc)
FROM dex_mfgx..insp_master AS t1
INNER JOIN qad_repl..sct_det_sql AS t2 ON t1.partnum = t2.sct_part
)
update t1
set costperpart = tc.sct_cst_tot
from dex_mfgx..insp_master AS t1
join TotalCosts tc on tc.partnum = t1.partnum
where tc.RowNum = 1

SSIS - transform 2 records from table A to 1 record in table B

I have following data in employee Table A:
ID | emp | City_Type | City
1 | 101 | Z | Tokyo
2 | 101 | Y | New York
City_Type can either be Y or Z. Y being the city this person was born in, Z is the city he/she is living now.
I need to put these together in a table 'B' which look like the folowing:
ID | emp | Current_City | Birth_City
So in the end, Table B must be filled like this:
ID | emp | Current_City | Birth_City
1 | 101 | Tokyo | New York
(in some cases, one of the 2 can be empty/null)
Any suggestions on how to do this? I haven't been able to found much information on this myself.

I did this exercise (using sql-server) with PIVOT TABLE:
select emp, Z 'Current_City' , Y 'Birth_City' from
(
select emp,City_Type, City from TABLE__A
) x
pivot
(
max(City) FOR City_Type in (Z,Y)
) AS PivotTable
below the result achieved, with an example of a NULL value for the field Current_City
emp Current_City Birth_City
101 Tokyo New York
102 NULL London
I omitted ID, it is not clear from the request if and what needs to be added ( minimum, maximum on emp , or a new calculated or due to the INSERT in TABLE__B)
This previous query can be used to insert into TABLE__B
INSERT INTO [TABLE__B]
([emp]
,[Current_City]
,[Birth_City])
...

First create your TableB and populate [Current_City] and [Birth_City] with nulls, but make sure [emp] is there and it has all the employees you intend to modify.
Then run this SQL modified to fit your database / schema / table names / etc:
update TableB
set Current_City = (select City
from TableA
where TableA.City_Type ='Z'
and TableA.emp = TableB.emp),
Birth_City = (select City
from TableA
where TableA.City_Type ='Y'
and TableA.emp = TableB.emp)

One way would be to use the PIVOT transformation.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Finding the best match with "fuzzy" ranking logic - sql-server

You can try this out: select student_address, student_st, student_etc, school_address,RANK from #temp t1 group by student_address, student_st, student_etc, school_address,RANK having RANK=(select MAX(RANK) from #temp t2 where t1.student_address=t2.student_address)

I think you're close. Probably need to use a subquery like: SELECT student_address, student_st, student_etc, school_address FROM #temp WHERE rank = (SELECT MAX(rank) FROM #temp) ...though I'm missing where student_street is coming from. The above, however looks like the pattern you're looking for.

Related

Using STRING_SPLIT for 2 columns in a single table

Joining two tables and need to have MAX aggregate function in ON clause

TSQL Conditional Where or Group By?

Updating 1 table from another using wheres

SSIS - transform 2 records from table A to 1 record in table B

Categories

Resources