Cascading Hierarchical Join Conditions - sql-server

I'm currently putting together a schema that will be responsible for storing products, prices and margins.
The crux of the problem I'm having is how best to handle multiple scenarios.
Definitions - All these are fields in the Link (Intersection) table
Product - A widget
Margin - An data structure that represents how to alter the purchase
price to determine retail price. (complex enough to require a
separate table)
Supplier - Someone who supplies us with a Product
Authority - Someone the supplier is beholden to
Client - Someone we will retail to
ClientGroup - A collection of Clients
Some of these are optional. There will always be a Product-Margin mapping.
The other fields exist to define more specific relationships.
The rules will be applied with a hierarchy.
Examples:
Product "Foo" has a Margin of 10% (applies to all clients)
For ClientGroup "Group A" Foo has a Margin of 8%
For Client "Bob's Burgers" who is a member of "Group A" Foo has a margin of 6%
That would be covered by 3 rows, with the following fields populated (un-populated fields are null)
Product-Margin
ClientGroup-Margin
Client-Margin
Rule 3 is the most specific, and so would take precedence.
Is this link table to best way to store these hierarchical relationships?
If not, what is?
What is the best way of structuring a query to take advantage of this? I've written a query using temp tables and conditional logic but I cant help but think I'm square-pegging SQL and there's a better way of structuring the query.
I'd like to keep as much of the logic in SQL and out of the business logic.
In other words, the app can call a stored procedure, passing in the Product, and Client plus optionally Authority and /or Supplier and receive the appropriate Margin.

I think in your examples 2 and 3 product should also be populated, otherwise that margin is applied to all products for the client or client group.
The query to get results could be something like this:
SELECT TOP 1 Margin
FROM <table>
WHERE Product = #Product
AND COALESCE(Client,'') = COALESCE(#Client,Client,'')
AND COALESCE(ClientGroup,'') = COALESCE(#ClientGroup,ClientGroup,'')
ORDER BY Client DESC, ClientGroup DESC
# - parameters passed to stored procedure. I don't know if your solution will require joins instead but you could change the where conditions to joins.
This assumes product is always passed as parameter, others are optional (you can add Supplier and Authority there).
Order by desc client means rows that are not null appear on top, if client column is null for all rows then theres the same logic for client group.
Or you can use the order by method suggested in the comment by James B

Thanks for the input folks, the solution for the hierarchical behaviour that I wanted looks something like this:
SELECT TOP 1
FROM MarginLink
WHERE
(
(-- Client selection
(ClientId = #clientId)
OR
ClientGroupId = (SELECT ClientGroupId FROM ClientGroupClient WHERE ClientId = #clientId)
)
AND
(--Product Selection
(#productID BETWEEN ProductIdFrom AND ProductIdTo)
OR
(ProductTypeId = #productTypeID)
OR
(ProductIdFrom IS NULL AND ProductIdTo IS NULL )
)
AND
(-- Supplier
(SupplierId = #supplierId)
OR
(MarginLink.SupplierId IS NULL)
)
AND
(-- Authority
(AuthorityId = #authorityId)
OR
(MarginLink.AuthorityId IS NULL)
)
)
ORDER BY ClientId DESC, ClientGroupId DESC, ProductIdFrom DESC, ProductIdTo DESC, ProductTypeId DESC, SupplierId DESC, AuthorityId DESC

Related

SQL Server, 2 tables, one is input, second is database with items, find closest match

This is my first Stackflow question, I hope someone can help me out with this. I I am completely lost and a newbie at SQL.
I have two tables (which I overly simplified for this question), the first one has the customer info and the car tire that they need. The second one is simply filled with a tire id, and all of the information for the tires. I am trying to input only the customer ID and return the one closest tire that matches the input along with the values of both the selected tire and the customer's tire. The matches also need to be prioritized in that order (size most important, width next most important, ratio is least important). Any suggestions on how to do this or where to start? Is there anything I can look at to help me solve this problem? I have been trying many different procedures, and some nested selects, but nothing is getting me close. Thank you.
customertable (custno, custsize, custwidth, custratio)
1,17,255,50
2,16,235,50
etc...
tirecollection (tireid, tiresize, tirewidth, tireratio)
1,15,225,40
2,16,225,50
3,17,250,55
4,17,235,30
5,18,255,40
etc...
This is not a 100% complete solution, but may work towards coming up with a solution. The approach here is combining the tyre dimensions into one value and then ranking them within a tyre size partition. You could then pass in the customer tyre dimensions to get the closest match.
with CTE
as
(
select *, TyreSize + TyreWidth as [TyreDimensions]
from tblTyres
)
select TC.CustId, C.TyreId, C.TyreSize, C.TyreWidth, C.[TyreDimensions],
rank() over(partition by C.TyreSize order by C.[TyreDimensions]) as [RNK]
from tblTyreCustomer as TC
join CTE as C
on TC.CustTyreSize = C.TyreSize
Assuming you're running SQL Server 2008 or later, this should work (this assumes you want to get a result for a single customer on a case-by-case basis):
CREATE FUNCTION udf.GetClosestTireMatch
(
#CustomerNo int
)
RETURNS TABLE
AS RETURN
SELECT custno, tireid, tiresize, tirewidth, tireratio
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY sizediff, widthdiff, ratiodiff) AS rownum
, c.custno, c.custsize, c.custwidth, c.custratio, t.tireid, t.tiresize, t.tirewidth, t.tireratio
, ABS(c.custsize-t.tiresize) AS sizediff, ABS(c.custwidth-t.tirewidth) AS widthdiff, ABS(c.custratio-t.tireratio) AS ratiodiff
FROM (SELECT * FROM customertable WHERE custno = #CustomerNo) c
CROSS JOIN tirecollection
) sub
WHERE rownum = 1
GO
Then you run the function with:
SELECT * FROM udf.GetClosestTireMatch(5)
(where 5=the customernumber you're querying).

SQL Server - Update All Records, Per Group, With Result of SubQuery

If anyone could even just help me phrase this question better I'd appreciate it.
I have a SQL Server table, let's call it cars, which contains entries representing items and information about their owners including car_id, owner_accountNumber, owner_numCars.
We're using a system that sorts 'importantness of owner' based on number of cars owned, and relies on the owner_numCars column to do so. I'd rather not adjust this, if reasonably possible.
Is there a way I can update owner_numCars per owner_accountNumber using a stored procedure? Maybe some other more efficient way I can accomplish every owner_numCars containing the count of entries per owner_accountNumber?
Right now the only way I can think to do this is to (from the c# application):
SELECT owner_accountNumber, COUNT(*)
FROM mytable
GROUP BY owner_accountNumber;
and then foreach row returned by that query
UPDATE mytable
SET owner_numCars = <count result>
WHERE owner_accountNumber = <accountNumber result>
But this seems wildly inefficient compared to having the server handle the logic and updates.
Edit - Thanks for all the help. I know this isn't really a well set up database, but it's what I have to work with. I appreciate everyone's input and advice.
This solution takes into account that you want to keep the owner_numCars column in the CARs table and that the column should always be accurate in real time.
I'm defining table CARS as a table with attributes about cars including it's current owner. The number of cars owned by the current owner is de-normalized into this table. Say I, LAS, own three cars, then there are three entries in table CARS, as such:
car_id owner_accountNumber owner_numCars
1 LAS1 3
2 LAS1 3
3 LAS1 3
For owner_numCars to be used as an importance factor in a live interface, you'd need to update owner_numCars for every car every time LAS1 sells or buys a car or is removed from or added to a row.
Note you need to update CARS for both the old and new owners. If Sam buys car1, both Sam's and LAS' totals need to be updated.
You can use this procedure to update the rows. This SP is very context sensitive. It needs to be called after rows have been deleted or inserted for the deleted or inserted owner. When an owner is updated, it needs to be called for both the old and new owners.
To update real time as accounts change owners:
create procedure update_car_count
#p_acct nvarchar(50) -- use your actual datatype here
AS
update CARS
set owner_numCars = (select count(*) from CARS where owner_accountNumber = #p_acct)
where owner_accountNumber = #p_acct;
GO
To update all account_owners:
create procedure update_car_count_all
AS
update C
set owner_numCars = (select count(*) from CARS where owner_acctNumber = C.owner_acctNumber)
from CARS C
GO
I think what you need is a View. If you don't know, a View is a virtual table that displays/calculates data from a real table that is continously updated as the table data updates. So if you want to see your table with owner_numCars added you could do:
SELECT a.*, b.owner_numCars
from mytable as a
inner join
(SELECT owner_accountNumber, COUNT(*) as owner_numCars
FROM mytable
GROUP BY owner_accountNumber) as b
on a.owner_accountNumber = b.owner_accountNumber
You'd want to remove the owner_numCars column from the real table since you don't need to actually store that data on each row. If you can't remove it you can replace a.* with an explicit list of all the fields except owner_numCars.
You don't want to run SQL to update this value. What if it doesn't run for a long time? What if someone loads a lot of data and then runs the score and finds a guy that has 100 cars counts as a zero b/c the update didn't run. Data should only live in 1 place, updating has it living in 2. You want a view that pulls this value from the tables as it is needed.
CREATE VIEW vOwnersInfo
AS
SELECT o.*,
ISNULL(c.Cnt,0) AS Cnt
FROM OWNERS o
LEFT JOIN
(SELECT OwnerId,
COUNT(1) AS Cnt
FROM Cars
GROUP BY OwnerId) AS c
ON o.OwnerId = c.OwnerId
There are a lot of ways of doing this. Here is one way using COUNT() OVER window function and an updatable Common Table Expression [CTE]. That you won't have to worry about relating data back, ids etc.
;WITH cteCarCounts AS (
SELECT
owner_accountNumber
,owner_numCars
,NewNumberOfCars = COUNT(*) OVER (PARTITION BY owner_accountNumber)
FROM
MyTable
)
UPDATE cteCarCounts
SET owner_numCars = NewNumberOfCars
However, from a design perspective I would raise the question of whether this value (owner_numCars) should be on this table or on what I assume would be the owner table.
Rominus did make a good point of using a view if you want the data to always reflect the current value. You could also use also do it with a table valued function which could be more performant than a view. But if you are simply showing it then you could simply do something like this:
SELECT
owner_accountNumber
,owner_numCars = COUNT(*) OVER (PARTITION BY owner_accountNumber)
FROM
MyTable
By adding a where clause to either the CTE or the SELECT statement you will effectively limit your dataset and the solution should remain fast. E.g.
WHERE owner_accountNumber = #owner_accountNumber

Is it possible in SQl Server to create a self-maintaing table with self-references

I'm using Azure's SQL Database & MS SQL Server Management Studio and I wondering if its possible to create a self-referencing table that maintains itself.
I have three tables: Race, Runner, Names. The Race table includes the following columns:
Race_ID (PK)
Race_Date
Race_Distance
Number_of_Runners
The second table is Runner. Runner contains the following columns:
Runner_Id (PK)
Race_ID (Foreign Key)
Name_ID
Finish_Position
Prior_Race_ID
The Names Table includes the following columns:
Full Name
Name_ID
The column of interest is Prior_Race_ID in the Runner Table. I'd like to automatically populate this field via a Trigger or Stored Procedure, but I'm not sure if its possible to do so and how to go about it. The goal would be to be able to get all a runners races very quickly and easily by traversing the Prior_Race_ID field.
Can anyone point me to a good resource or references that explains if and how this is achievable. Also, if there is a preferred approach to achieving my objective please do share that.
Thanks for your input.
Okay, so we want, for each Competitor (better name than Names?), to find their two most recent races. You'd write a query like this:
SELECT
* --TODO - Specific columns
FROM
(SELECT
*, --TODO - Specific columns
ROW_NUMBER() OVER (PARTITION BY n.Name_ID ORDER BY r.Race_Date DESC) rn
FROM
Names n
inner join
Runners rs
on
n.Name_ID = rs.Name_ID
inner join
Races r
on
rs.Race_ID = r.Race_ID
) t
WHERE
t.rn in (1,2)
That should produce two rows per competitor. If needed, you can then PIVOT this data if you want a single row per competitor, but I'd usually leave that up to the presentation layer, rather than do it in SQL.
And so, no, I wouldn't even have a Prior_Race_ID column. As a general rule, don't store data that can be calculated - that just introduces opportunities for that data to be incorrect compared to the base data.
run the following sql(The distinct here is to avoid that a runner has more than one race at a same day):
update runner r1
set r1.prior_race_id =
(
select distinct race.race_id from runner, race where runner.race_id = race.race_id and runner.runner_id = r1.runner_id group by runner.runner_id having race.race_date = max(race.race_date)
)

Filtering a complex SQL Query

Unit - hmy, scode, hProperty
InsurancePolicy - hmy, hUnit, dtEffective, sStatus
Select MAX(i2.dtEffective) as maxdate, u.hMy, MAX(i2.hmy) as InsuranceId,
i2.sStatus
from unit u
left join InsurancePolicy i2 on i2.hUnit = u.hMy
and i2.sStatus in ('Active', 'Cancelled', 'Expired')
where u.hProperty = 2
Group By u.hmy, i2.sStatus
order by u.hmy
This query will return values for the Insurance Policy with the latest Effective Date (Max(dtEffective)). I added Max(i2.hmy) so if there was more than one Insurance Policy for the latest Effective Date, it will return the one with the highest ID (i2.hmy) in the database.
Suppose there was a Unit that had 3 Insurance Policies attached with the same latest effective date and all have different sStatus'.
The result would look like this:
maxdate UnitID InsuranceID sStatus
1/23/12 2949 1938 'Active'
1/23/12 2949 2343 'Cancelled'
1/23/12 2949 4323 'Expired'
How do I filter the results so that if there are multiple Insurance Policies with different Status' for the same unit and same date, then we choose the Insurance Policy with the 'Active' Status first, if one doesn't exist, choose 'Cancelled', and if that doesn't exist, choose 'Expired'.
This seems to be a matter of proper ranking of InsurancePolicy's rows and then joining Unit to the set of the former's top-ranked rows:
;
WITH ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (
PARTITION BY hUnit
ORDER BY dtEffective DESC, sStatus, hmy DESC
)
FROM InsurancePolicy
)
SELECT
i2.dtEffective AS maxdate,
u.hMy,
i2.hmy AS InsuranceId,
i2.sStatus
FROM Unit u
LEFT JOIN ranked i2 ON i2.hUnit = u.hMy AND i2.rnk = 1
You could make this work with one SQL statement but it will be nearly unreadable to your everyday t-sql developer. I would suggest breaking this query up into a few steps.
First, I would declare a table variable and place all the records that require no manipulation into this table (ie - Units that do not have multiple statuses for the same date = good records).
Then, get a list of your records that need work done on them (multiple statuses on the same date for the same UnitID) and place them in a table variable. I would create a "rank" column within this table variable using a case statement as illustrated here:
Pseudocode: WHEN Active THEN 1 ELSE WHEN Cancelled THEN 2 ELSE WHEN Expired THEN 3 END
Then delete records where 2 and 3 exist with a 1
Then delete records where 2 exists and 3
Finally, merge this updated table variable with your table variable containing your "good" records.
It is easy to get sucked into trying to do too much within one SQL statement. Break up the tasks to make it easier for you to develop and more manageable in the future. If you have to edit this SQL in a few years time you will be thanking yourself, not to mention any other developers that may have to take over your code.

Combining SQL results using LINQ

I have a database of company registrants (for a search/directory functionality). We've added a new table to hold "enhanced" information for registrants that pay for this feature (such as advertisements/additional images/logos etc). Right now the new table just holds the registrants unique identifier, and a few additional fields (paths to images etc). A user can search for users with specific criteria, and the enhanced listings should appear at the top of the list. The results should not show any registrant twice (so if a user has an enhanced listing they should only appear in the top "enhanced listing" area). How can I accomplish this?
Left outer join from the old table to the new table.
Prepend to your query's "order by" "case when new_table.id is null then 1 else 0 end"
So if you had this:
select foo, bar from old_table
order by bar, foo;
You'd have this:
select a.foo, a.bar from old_table a
left join new table b on (a.customer_id = b.customer_id)
order by
case when new_table.customer_id is null then 1 else 0 end,
bar, foo;
Edit: I left out the "left" from the outer join in the code.
If you are using LINQtoSQL and the designer-generated entities, you should have an entity set of related information on your registrant entity -- assuming you have set up the proper foreign key relationship. If you added this later you may need to add this by hand (see here) or delete/re-add your entities to the designer for it to pick up the new relationship. Then your query would be something like:
var registrants = db.Registrants.Where( ... selection criteria here ... );
registrants = registrants.OrderByDescending( r => r.EnhancedData.Count() )
.ThenBy( r => r.Name ); // or normal sort order
Presumably count will be either 0 or 1 so this should put the ones with enhanced data at the top of your result.

Resources