TSQL: Relational Division with Remainder (RDWR) - sql-server

I have mapping table CandidatesSkills which holds the mapping between candidate and the skills they possess. Then I have another table JobRequirements that maps jobs and required skills for that jobs.
A candidate can apply to a job if he possesses ALL the required skills for that job. A candidate can have extra skills. Given CandiateID I want to find all the jobs that candidate can apply.
I think this is Relational Division with Remainder in SQL. And there is an article here that explains the exact issue. (Note: the article tries to find all Candidates who has ALL skills for the given job. My problem is exactly opposite. I am trying to find all Jobs that matches with given Candidate's skill)
Candidate's Skills
Job to required skills mapping
based on the dataset, the query below should return JobID 2,3 and 5
Here my SQL (based on Peter Larsson (PESO) Solution for RDNR/RDWR)
DECLARE #CandidateID INT = 1
SELECT JobID
FROM
(
SELECT jr.JobID
,cnt=SUM(CASE WHEN jr.SkillID = c.SkillID THEN 1 ELSE 0 END)
,Items=COUNT(*)
FROM dbo.JobRequirements AS jr
CROSS JOIN dbo.CandidatesSkills AS c
WHERE c.CandidateID = #CandidateID
GROUP BY jr.JobID, jr.SkillID
) d
GROUP BY JobID
HAVING SUM(cnt) = MIN(Items)
AND MIN(cnt) >= 0;
However, query does not return anything. Trying to find what's wrong with my query
Here is the SQL Fiddle

Something like:
DECLARE #CandidateID INT = 1;
with cj as
(
select cs.CandidateId,
jr.JobId,
count(*) over (partition by jr.JobId, cs.CandidateId) skillsPosessed,
(select count(*) from JobRequirements where JobId = jr.JobId) skillsRequired
from CandidatesSkills cs
join JobRequirements jr
on cs.SkillId = jr.SkillId
)
select distinct cj.CandidateId, cj.JobId
from cj
where cj.skillsPosessed = cj.skillsRequired

In this case, you doing relational division with multiple divisors. In other words, you are dividing each set of JobRequirements per each JobID, by the CandidateSkills of that candidate.
In this case, a LEFT JOIN solution is much simpler
DECLARE #CandidateID INT = 1;
SELECT jr.JobID
,Skills = COUNT(c.SkillID)
,Requirements = COUNT(*)
FROM dbo.JobRequirements AS jr
LEFT JOIN dbo.CandidatesSkills AS c ON c.SkillID = jr.SkillID
AND c.CandidateID = #CandidateID
GROUP BY jr.JobID
HAVING COUNT(*) = COUNT(c.SkillID);
What this does is left-join the candidate's skills to the requirements. We then simply count up all the Requirements for the JobID, and ensure it is equal to the number of matches.
Another way to write this is
HAVING COUNT(CASE WHEN c.SkillID IS NULL THEN 1 END) = 0;
In other words: the number of non-matches should be zero.
SQL Fiddle

Related

Interview question help on relatively basic JOIN and subqueries

I was asked to:
Print the following sequence of columns for each plant that only blooms in one type of weather.
WEATHER_TYPE
PLANT_NAME"
Schema
PLANTS (table name)
PLANT_NAME, string, The name of the plant. This is the primary key.
PLANT_SPECIES, sting, The species of the plant.
SEED_DATE, date, The date the seed was planted.
WEATHER (table name)
PLANT_SPECIES, string, The species of the plant.
WEATHER_TYPE, string, The type of weather in which the plant will bloom.
I wrote the script below and tested it against sample input and achieved a desired result. I don't know if this is what is considered a 'printed' result.
Seeking understanding on what I might have missed. How might I make this script 'more efficient' and/or 'better' and/or 'more robust'?
SELECT WEATHER.WEATHER_TYPE, a.PLANT_NAME
FROM (SELECT b.PLANT_NAME, b.PLANT_SPECIES
FROM (SELECT PLANTS.PLANT_NAME, PLANTS.PLANT_SPECIES, PLANTS.SEED_DATE, WEATHER.WEATHER_TYPE
FROM PLANTS JOIN WEATHER
ON PLANTS.PLANT_SPECIES = WEATHER.PLANT_SPECIES) b
GROUP BY b.PLANT_NAME, b.PLANT_SPECIES
HAVING count(*) = 1) a JOIN WEATHER
ON a.PLANT_SPECIES = WEATHER.PLANT_SPECIES
I achieved the expected result in a SQL Server Management Studio window, but not sure if it's the 'printed' result the question-askers are looking for.
I personally consider CTEs easier to read and to debug, compared to nested "Table Expressions", as you have done. I would have done something like:
with
x as (
select p.plant_name
from plants p
join weather w on w.plant_species = p.plant_species
group by p.plant_name
having count(*) = 1
)
select x.plant_name, w.weather_type
from x
join weather w on w.plant_species = x.plant_species
I have to agree with The Impaler in regards to the readability and ease of debugging nested table expressions. As another option to the CTE (which is really the better choice), if you really want to nest things without overthinking it you can use a correlated subquery. It's easier to read, though as your result set grows you'll lose efficiency.
SELECT w.weather_type, p.plant_name
FROM plants p
JOIN weather w
ON w.plant_species = p.plant_species
WHERE (SELECT COUNT(1) FROM dbo.weather WHERE plant_species = w.plant_species) = 1
or with grouping...
SELECT w.weather_type, p.plant_name
FROM plants p
JOIN weather w
ON w.plant_species = p.plant_species
WHERE w.plant_species IN (SELECT plant_species FROM dbo.weather GROUP BY plant_species HAVING COUNT(1) = 1)
SELECT w.weather_type, p.plant_name
FROM plants p
JOIN weather w
ON w.plant_species = p.plant_species
WHERE w.weather_type="Sunny";

SQL mutliple count

Could you explain how I get this particular table result?
My 4 queries to individually get each column separately are also below.
I am not sure on method here do I nest the last 3 queries into the first or do I use a union between the queries.
Bearing in mind that the information in each one doesn't really match I assume Union or Union All isn't going to be useful.
Would a derived table be a better method. Sorry my SQL skills are fairly basic.
I need to also retain the ability to 'tweak' the where clauses as my admin decides to exclude certain records later (you IT folks will be used to that!)
Some the ability to alter the where clauses would be good in a solution.
Just to make it more annoying for ya ;-)
Query table would need to look a little like this
Company Department Total_B Total_R Total_Ret RushJobs
ACME LSD 2 100 24 3
The four queries (that work separately to get each column above are here ( I have left in the respective Group By and where clauses incidentally I_Department does map to just Department in the case of 2nd query.
-- Total B count query from B
Select Company,Department, count(*) as Total_B from B
Group by Company,Department
Order BY Company;
--Select h count from h table
Select count(*) as Total_R, I_Department from H
where L ='re-box'
Group By IDepartment
-- Select r count
Select Company,Department,Count (B_Number) AS Total_Ret
from P Inner Join B ON P.Record_Number = B.B_Number
where P.Request_Date > = 'SOMEDATE' and P.Request_Date < = 'SOMEDATERANGE'
Group By Company,Department
-- Select Rush Jobs
Select Company,Department,Count (*) as RushJobs
from Res
Inner Join B on Res.Item_Number = B.B_Number
where Res.Setup_Date >= 'Somedate' and Res.Setup_Date<= 'somedaterange'
and Res.Res_Priority = '1'
Group By Company,Department
So final table
<table><TBODY>
<TR>
<TH>Company</TH>
<TH>Department</TH>
<TH>Total_B</TH>
<TH>Total_R </TH>
<TH>Total_Ret</TH>
<TH>RushJobs</TH></TR>
<TR>
<TD>ACME</TD>
<TD>LSD</TD>
<TD>100</TD>
<TD>2</TD>
<TD>4</TD>
<TD>1</TD></TR></TBODY></table>
One approach would be to use a Common table expression (CTE) aka with statement..
This allows each query to continue to be independent allowing you to easily twerk (I was going to correct that typo but it was just too funny) the where clauses for each and combines the results in the end returning 1 record with 4 columns.
-- Total B count query from B
With B as (
Select Company,Department, count(*) as Total_B from B
Group by Company,Department
Order BY Company),
H as (
--Select h count from h table
Select count(*) as Total_R, I_Department from H
where L ='re-box'
Group By IDepartment),
R as (-- Select r count
Select Company,Department,Count (B_Number) AS Total_Ret
from P Inner Join B ON P.Record_Number = B.B_Number
where P.Request_Date > = 'SOMEDATE' and P.Request_Date < = 'SOMEDATERANGE'
Group By Company,Department),
RushJobs as (-- Select Rush Jobs
Select Company,Department,Count (*) as RushJobs
from Res
Inner Join B on Res.Item_Number = B.B_Number
where Res.Setup_Date >= 'Somedate' and Res.Setup_Date<= 'somedaterange'
and Res.Res_Priority = '1'
Group By Company,Department)
SELECT coalesce(B.Company, R.Company, RJ.Company)
, coalesce(B.Department,R.Department, Rj.Department)
, B.Total_B, H.Total_R, R.Total_Ret, RJ.RushJobs
FROM
FULL OUTER JOIN H
on B.Company = H.Company
FULL OUTER JOIN R
on B.company = R.Company
and B.Department = R.Department
FULL OUTER JOIN RushJobs RJ
on H.company = RJ.Company
and H.Department = RJ.Department

Winrate of opponent vs another apponent

I am trying to find the most efficient way to build a counter warehouse.
shortened the tables just to needed info for question
Match Table
MatchId, version, type
MatchParticipants
matchId, playerid, characterid, teamid (only team1 or team2), winner (1 or 0)
10 of these rows per match
Characters
characterid, name
So i have thought of doing a cross join of characters on characters and that gives me all the possibilities of opponents for said character then i would have do a massive subquery to look into the matches table where the id's were on opposite teams.
Any ideas essentially what i see the warehouse table looking like.
character1.characterid, character1.name,
character2.characterid, character2.name,
winrate of character1 over character2
I think that this query should meet your expectations:
SELECT x.[first] AS CharacterId,
c.name,
x.[second] AS CharacterId,
c1.name,
wins/loses AS ratio
FROM
(
SELECT m.CharacterId [first],
m1.CharacterId [second],
SUM(m.winner) AS wins,
SUM(m1.winner) AS loses
FROM MatchParticipants m
INNER JOIN MatchParticipants m1 ON m.MatchId = m1.MatchId AND m.TeamId <> m1.TeamId
GROUP BY
m.CharacterId,
m1.CharacterId
) x
LEFT JOIN Characters c ON x.[first] = c.characterid
LEFT JOIN Characters c1 ON x.[second] = c1.characterid
is this what you're looking for?
SELECT
*,
100.0 * matches/wins AS winRate
FROM
(SELECT
teamid,
playerid,
characterid AS opponentID,
COUNT(matchId) AS matches,
SUM(winner) AS wins
FROM
MatchParticipants
GROUP BY
teamid,
playerid,
characterid) w

Getting highest count using SQL Server

I'm currently developing a voting system. I wrote a query to get all votes but it only display all the vote getter, not the highest vote. What should I add in my query to execute what I need. Here's the code.
SELECT DISTINCT
b.idnumber, b.candidate_name, semester,
(SELECT COUNT(rslt_ccandidateid) FROM rslt_mstr
WHERE rslt_ccandidateid = idnumber) AS 'numberOfVotes',
b.position, b.program, b.position_categ, b.party_name,
b.school, b.yearLevel, a.hierarchy
FROM
cddt_mstr b
INNER JOIN
Position_mstr a ON a.scposition_name = b.position
WHERE
b.POSITION_CATEG = 'SUPREME COUNCIL CANDIDATES'
AND semester = '2ND SEMESTER A.Y. 2012-2013'
ORDER BY
a.hierarchy, 'numberOfVotes' DESC
Something like the below might work. I am assuming that rslt_ccandidateid IS NOT the vote value there is some other column, which I am calling [vote_column_name], that contains the vote value. I am also assuming that the vote value is an integer column. Lastly I am assuming you don't want the sum of all votes just the highest of all votes.
SELECT DISTINCT b.idnumber, b.candidate_name, semester, (
SELECT MAX([vote_column_name])
FROM rslt_mstr
WHERE rslt_ccandidateid = idnumber
GROUP BY rslt_ccandidateid
) AS 'numberOfVotes', b.position, b.program, b.position_categ, b.party_name, b.school, b.yearLevel, a.hierarchy
FROM cddt_mstr b
INNER JOIN Position_mstr a ON a.scposition_name = b.position
WHERE b.POSITION_CATEG = 'SUPREME COUNCIL CANDIDATES'
AND semester = '2ND SEMESTER A.Y. 2012-2013'
ORDER BY a.hierarchy, 'numberOfVotes' DESC

How to SELECT DISTINCT Info with TOP 1 Info and an Order By FROM the Top 1 Info

I have 2 tables, that look like:
CustomerInfo(CustomterID, CustomerName)
CustomerReviews(ReviewID, CustomerID, Review, Score)
I want to search reviews for a string and return CustomerInfo.CustomerID and CustomerInfo.CustomerName. However, I only want to show distinct CustomerID and CustomerName along with just one of their CustomerReviews.Reviews and CustomerReviews.Score. I also want to order by the CustomerReviews.Score.
I can't figure out how to do this, since a customer can leave multiple reviews, but I only want a list of customers with their highest scored review.
Any ideas?
This is the greatest-n-per-group problem that has come up dozens of times on Stack Overflow.
Here's a solution that works with a window function:
WITH CustomerCTE (
SELECT i.*, r.*, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Score DESC) AS RN
FROM CustomerInfo i
INNER JOIN CustomerReviews r ON i.CustomerID = r.CustomerID
WHERE CONTAINS(r.Review, '"search"')
)
SELECT * FROM CustomerCTE WHERE RN = 1
ORDER BY Score;
And here's a solution that works more broadly with RDBMS brands that don't support window functions:
SELECT i.*, r1.*
FROM CustomerInfo i
INNER JOIN CustomerReviews r1 ON i.CustomerID = r1.CustomerID
AND CONTAINS(r1.Review, '"search"')
LEFT OUTER JOIN CustomerReviews r2 ON i.CustomerID = r2.CustomerID
AND CONTAINS(r1.Review, '"search"')
AND (r1.Score < r2.Score OR r1.Score = r2.Score AND r1.ReviewID < r2.ReviewID)
WHERE r2.CustomerID IS NULL
ORDER BY Score;
I'm showing the CONTAINS() function because you should be using the fulltext search facility in SQL Server, not using LIKE with wildcards.
I voted for Bill Karwin's answer, but I thought I'd throw out another option.
It uses a correlated subquery, which can often incur performance problems with large data sets, so use with caution. I think the only upside is that the query is easier to immediately understand.
select *
from [CustomerReviews] r
where [ReviewID] =
(
select top 1 [ReviewID]
from [CustomerReviews] rInner
where rInner.CustomerID = r.CustomerID
order by Score desc
)
order by Score desc
I didn't add the string search filter, but that can be easily added.
I think this should do it
select ci.CustomterID, ci.CustomerName, cr.Review, cr.Score
from CustomerInfo ci inner join
(select top 1*
from CustomerReviews
where Review like '%search%'
order by Score desc) cr on ci.CustomterID = cr.CustomterID
order by cr.Score

Resources