Query to show which games which friends don't have - database

I have three tables set up in Access. I want to make a query that shows which games someone doesn't have in common with me.
I tried using an unmatched query, but that didn't work since each person has at least one game in common with me.
I guess I'm unsure how to handle this. The GameTimePlayed table basically has the opposite of the information I want to query, so is it possible to query that and add a "Not" conditional to "GameName" or something?
This is for a final project for class, and isn't due for about another month. I don't expect anyone to answer this for me, but even just a point in the right direction would be greatly appreciated. Everything I've tried to find so far is basically about unmatched queries, which did not work for me.
--EDIT TO PROVIDE MORE INFO--
I have all of the games in FavoriteGames. However, not all of my friends (PersonID) have all of my FavoriteGames. I'd like a query to show a record of FirstName, LastName, GameName, for each PersonID, for each GameName that he/she does not have.
Expected Behavior Example: PersonID 10 only has one GameName in common with me. The query should return five records for PersonID 10
(every game except Rocket League).
Sample Data:
tbl_FavoriteGames
tbl_FriendsWithGame
tbl_GameTimePlayed
GameName is the Primary Key for tbl_FavoriteGames
PersonID is the Primary Key for tbl_FriendsWithGame
PersonID, GameName Foreign Keys form a Composite Primary Key for tbl_GameTimePlayed
This is the closest I have gotten so far (still way off though) in that it removes the specified GameName:
SELECT *
FROM tbl_GameTimePlayed
WHERE NOT EXISTS
(
SELECT *
FROM tbl_FriendsWithGame
WHERE tbl_GameTimePlayed.PersonID = tbl_FriendsWithGame.PersonID
AND tbl_GameTimePlayed.GameName = tbl_FavoriteGames.GameName
);
It prompts me to enter a GameName (no idea why). When I enter a GameName, it returns all records that don't have that specific GameName.
This returns 6 games for each person, whether or not the person actually has that game. Could be useful since it contains the people/games that aren't in common.
SELECT PersonID, GameName
FROM tbl_FriendsWithGame, tbl_FavoriteGames
WHERE EXISTS (SELECT PersonID FROM tbl_GameTimePlayed WHERE GameName = tbl_GameTimePlayed.GameName);
I tried "WHERE NOT EXISTS" and that returned 0 results.
--SECOND EDIT: SOLVED!!--
I took a fresh look at the problem today, and figured it out! I used the code mentioned above to query (qry_AllPeopleAllGames) a list of all of the games, for all of the people (so 6 entries per person):
SELECT PersonID, GameName
FROM tbl_FriendsWithGame, tbl_FavoriteGames
WHERE EXISTS (SELECT PersonID FROM tbl_GameTimePlayed WHERE GameName = tbl_GameTimePlayed.GameName);
Then, I made another query that compared the qry_AllPeopleAllGames list to my tbl_GameTimePlayed (which is the list of people, games they actually own, and hours played) and spit out a list of FirstName & LastInitial and GameName that don't exist in the real list:
SELECT [tbl_FriendsWithGame]![FirstName] & " " & [tbl_FriendsWithGame]![LastInitial] AS FullName, GameName
FROM qry_AllPeopleAllGames INNER JOIN tbl_FriendsWithGame ON qry_AllPeopleAllGames.PersonID = tbl_FriendsWithGame.PersonID
WHERE ((NOT Exists (SELECT PersonID, GameName
FROM tbl_GameTimePlayed
WHERE qry_AllPeopleAllGames.PersonID = tbl_GameTimePlayed.PersonID AND qry_AllPeopleAllGames.GameName = tbl_GameTimePlayed.GameName
)));
****NOTE:**** The first part of the SELECT is not needed, I just used it for easier viewing in my actual query results (showing first name/last initial in one field).
I'm really excited that I figured this out! I'm sure there are better/more efficient ways to do this, and if you want to share, please let me know!

I included this in my initial post, but I'll post this as the answer as well.
I took a fresh look at the problem today, and figured it out! Last night while trying to test random possible solutions, I accidently made a query that lists all of the games, for all of the people (so 6 entries per person). Today, I used it as part of the solution, qry_AllPeopleAllGames:
SELECT PersonID, GameName
FROM tbl_FriendsWithGame, tbl_FavoriteGames
WHERE EXISTS (SELECT PersonID FROM tbl_GameTimePlayed WHERE GameName = tbl_GameTimePlayed.GameName);
Then, I made another query that compared the qry_AllPeopleAllGames list to my tbl_GameTimePlayed, which is the real list of people/games/hours played.
It returns the FirstName&LastInitial and the GameName for each PersonID/GameName combo that doesn't appear in the tbl_GameTimePlayed table. Here is the code:
SELECT [tbl_FriendsWithGame]![FirstName] & " " & [tbl_FriendsWithGame]![LastInitial] AS FullName, GameName
FROM qry_AllPeopleAllGames INNER JOIN tbl_FriendsWithGame ON qry_AllPeopleAllGames.PersonID = tbl_FriendsWithGame.PersonID
WHERE ((NOT Exists (SELECT PersonID, GameName
FROM tbl_GameTimePlayed
WHERE qry_AllPeopleAllGames.PersonID = tbl_GameTimePlayed.PersonID AND qry_AllPeopleAllGames.GameName = tbl_GameTimePlayed.GameName
)));
NOTE: The first part of the SELECT is not needed, I just used it for easier viewing in my actual query results (showing first name/last initial in one field).
I'm really excited that I figured this out! I'm sure there are better/more efficient ways to do this, and if you want to share, please let me know!

You need a dataset of all possible pairs of friends/games in order to determine which games each friend does not have. Do you have a tbl_Friends? Consider:
Query1:
SELECT tblFriends.ID, tbl_FavoriteGames.ID FROM tblFriends, tbl_FavoriteGames;
That is a Cartesian query - without JOIN clause every record of each table will associate with each record of other table.
Query2:
SELECT Query1.tblFriends.ID, Query1.tbl_FavoriteGames.ID
FROM tbl_FriendsWithGame RIGHT JOIN Query1 ON (tbl_FriendsWithGame.GameID = Query1.tbl_FavoriteGames.ID) AND (tbl_FriendsWithGame.FriendID = Query1.tblFriends.ID) WHERE tbl_FriendsWithGame.GameID IS NULL;
Or if you don't have tbl_Friends
SELECT DISTINCT tbl_FriendsWithGame.FriendID, tbl_FavoriteGames.ID
FROM tbl_FavoriteGames, tbl_FriendsWithGame;
Then adjust Query2.

Related

Why doesn't this "not in" clause do what I expect?

I'm writing what I think should be a simple query using the "not in" operator, and it's not doing what I expect.
Background:
I have two tables, Contact and Company.
Contact includes columns ContactID (person's identity) and CompanyID (which company they work for)
CompanyID values are expected to be equivalent to the CompanyIDs in the Company table
I want to write a query that checks how many people from the Contact table that have an "invalid" CompanyID (i.e., listed as working for a Company that isn't in the Company table)
I have a working query that does this:
select
count(ContactID)
from
Contact left join Company on Contact.CompanyID = Company.CompanyID
where
Company.CompanyID is null;
This query returns the value 2725538, which I believe to be the correct answer (I've done some simple "show me the top 10 rows" debugging, and it appears to be counting the right rows).
I wrote a second query which I expected to return the same result:
select
count(ContactID)
from
Contact
where
CompanyID not in
(select
CompanyID
from
Company)
However, this query instead returns 0.
To help me debug this, I checked two additional queries.
First, I tried commenting out the WHERE clause, which should give me all of the ContactIDs, regardless of whether they work for an invalid company:
select
count(ContactID)
from
Contact
This query returns 29722995.
Second, I tried removing the NOT from my query, which should give me the inverse of what I'm looking for (i.e., it should count the Contacts who work for valid companies):
select
count(ContactID)
from
Contact
where
CompanyID in
(select
CompanyID
from
Company)
This query returns 26997457.
Notably, these two numbers differ by exactly 2725538, the number returned by the first, working query. This is what I would expect if my second query was working. The total number of Contacts, minus the number of Contacts whose CompanyIDs are in the Company table, should equal the number of Contacts whose CompanyIDs are not in the Company table, shouldn't it?
So, why is the "not in" version of the query returning 0 instead of the correct answer?
the only issue could be of NULL CompanyID. Not In doesn't work with NULLs because of non-comparability of NULL.
try the following:
select
count(ContactID)
from
Contact
where
CompanyID not in
(select
ISNULL(CompanyID,'')
from
Company)
you can see the example in db<>fiddle here.
Please find more details HERE.

How to get rid of duplicates with T-SQL

Hi I have a login table that has some duplicated username.
Yes I know I should have put a constraint on it, but it's a bit too late for that now!
So essentially what I want to do is to first identify the duplicates. I can't just delete them since I can't be too sure which account is the correct one. The accounts have the same username and both of them have roughly the same information with a few small variances.
Is there any way to efficiently script it so that I can add "_duplicate" to only one of the accounts per duplicate?
You can use ROW_NUMBER with a PARTITION BY in the OVER() clause to find the duplicates and an updateable CTE to change the values accordingly:
DECLARE #dummyTable TABLE(ID INT IDENTITY, UserName VARCHAR(100));
INSERT INTO #dummyTable VALUES('Peter'),('Tom'),('Jane'),('Victoria')
,('Peter') ,('Jane')
,('Peter');
WITH UpdateableCTE AS
(
SELECT t.UserName AS OldValue
,t.UserName + CASE WHEN ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY ID)=1 THEN '' ELSE '_duplicate' END AS NewValue
FROM #dummyTable AS t
)
UPDATE UpdateableCTE SET OldValue = NewValue;
SELECT * FROM #dummyTable;
The result
ID UserName
1 Peter
2 Tom
3 Jane
4 Victoria
5 Peter_duplicate
6 Jane_duplicate
7 Peter_duplicate
You might include ROW_NUMBER() as another column to find the duplicates ordinal. If you've got a sort clause to get the earliest (or must current) numbered with 1 it should be easy to find and correct the duplicates.
Once you've cleaned this mess, you should ensure not to get new dups. But you know this already :-D
There is no easy way to get rid of this nightmare. Some manual actions required.
First identify duplicates.
select * from dbo.users
where userId in
(select userId from dbo.users
group by username
having count(userId) > 1)
Next identify "useless" users (for example those who registered but never place any order).
Rerun the query above. Out of this list find duplicates which are the same (by email for example) and combine them in a single record. If they did something useful previously (for example placed orders) then first assign these orders to a user which survive. Remove others.
Continue with other criteria until you you get rid of duplicates.
Then set unique constrain on username field. Also it is good idea to set unique constraint on email field.
Again, it is not easy and not automatic.
In this case where you duplicates and the original names have some variance it is highly impossible to select non duplicate rows since you are not aware which is real and which is duplicate.
I think the best thing to is to correct you data and then fix from where you are getting this slight variant duplicates.

Multi join issue

*EDIT** Thanks for all the input, and sorry for late reply. I have been away during the weekend without access to internet. I realized from the answers that I needed to provide more information, so people could understand the problem more throughly so here it comes:
I am migrating an old database design to a new design. The old one is a mess and very confusing ( I haven't been involved in the old design ). I've attached a picture of the relevent part of the old design below:
The table called Item will exist in the new design as well, and it got all columns that I need in the new design as well except one and it is here my problem begin. I need the column which I named 'neededProp' to be associated( with associated I mean like a column in the new Item table in the new design) with each new migrated row from Item.
So for a particular eid in table Environment there can be n entries in table Item. The "corresponding" set exists in table Room. The only way to know which rows that are associated in Item and Room are with the help of the columns "itemId" and "objectId" in the respective table. So for example for a particular eid there might be 100 entries in Item and Room and their "itemId" and "objectId" can be values from 1 to 100, so that column is only unique for a particular eid ( or baseSeq which it is called in table BaseFile).
Basically you can say that the tables Environment and BaseFile reminds of each other and the tables Item and Room reminds of each other. The difference is that some tables lack some columns and other may have some extra. I have no idea why it is designed like this from the beginning.
My question is if someone can help me with creating a query so that I can be able to find out the proper "neededProp" for each row in the Item-table so I can get that data into the new design?
*OLD-PART**This might be a trivial question but I can't get it to work as I want. I want to join a few tables as in the sql-statement below. If I start like this and run this query
select * from Environment e
join items ei on e.eid = ei.eid
I get like 400000 rows which is what I want. However if I add one more line so it looks like this:
select * from Environment e
join items ei on e.eid= ei.eid
left join Room r on e.roomnr = r.roomobjectnr
I get an insane amount of rows so there must be some multiplication going on. I want to get the same amount of rows ( like 400000 in this case ) even after joining the third table. Is that possible somehow? Maybe like creating a temporary view with the first 2 rows.
I am using MSSQL server.
So without knowing what data you have in your second query it's very difficult to say exactly how to write this out, and you're likely having a problem where there's an additional column that you are joining to in Rooms that perhaps you have forgotten such as something indicating a facility or hallway perhaps where you have multiple 'Room 1' entries as an example.
However, to answer your question regarding another way to write this out without using a temp table I've crufted up the below as an example of using a common table expression which will only return one record per source row.
;WITH cte_EnvironmentItems AS (
SELECT *
FROM Environment E
INNER JOIN Items I ON I.eid = E.eid
), cte_RankedRoom AS (
SELECT *
,ROW_NUMBER() OVER (ORDER BY R.UpdateDate DESC) [RN]
FROM Room R
)
SELECT *
FROM cte_EnvironmentItems E
LEFT JOIN cte_RankedRoom R ON E.roomnr = R.roomobjectnr
AND R.RN = 1
btw,do you want column from room table.if no then
select * from Environment e
join items ei on e.eid= ei.eid
where e.roomnr in (select r.roomobjectnr from Room r )
else
select * from Environment e
join items ei on e.eid= ei.eid
left join (select distinct roomobjectnr from Room) r on e.roomnr = r.roomobjectnr

Database tables: One-to-many of different types

Due to non-disclosure at my work, I have created an analogy of the situation. Please try to focus on the problem and not "Why don't you rename this table, m,erge those tables etc". Because the actual problem is much more complex.
Heres the deal,
Lets say I have a "Employee Pay Rise" record that has to be approved.
There is a table with single "Users".
There are tables that group Users together, forexample, "Managers", "Executives", "Payroll", "Finance". These groupings are different types with different properties.
When creating a "PayRise" record, the user who is creating the record also selects both a number of these groups (managers, executives etc) and/or single users who can 'approve' the pay rise.
What is the best way to relate a single "EmployeePayRise" record to 0 or more user records, and 0 or more of each of the groupings.
I would assume that the users are linked to the groups? If so in this case I would just link the employeePayRise record to one user that it applies to and the user that can approve. So basically you'd have two columns representing this. The EmployeePayRise.employeeId and EmployeePayRise.approvalById columns. If you need to get to groups, you'd join the EmployeePayRise.employeeId = Employee.id records. Keep it simple without over-complicating your design.
My first thought was to create a table that relates individual approvers to pay rise rows.
create table pay_rise_approvers (
pay_rise_id integer not null references some_other_pay_rise_table (pay_rise_id),
pay_rise_approver_id integer not null references users (user_id),
primary key (pay_rise_id, pay_rise_approver_id)
);
You can't have good foreign keys that reference managers sometimes, and reference payroll some other times. Users seems the logical target for the foreign key.
If the person creating the pay rise rows (not shown) chooses managers, then the user interface is responsible for inserting one row per manager into this table. That part's easy.
A person that appears in more than one group might be a problem. I can imagine a vice-president appearing in both "Executive" and "Finance" groups. I don't think that's particularly hard to handle, but it does require some forethought. Suppose the person who entered the data changed her mind, and decided to remove all the executives from the table. Should an executive who's also in finance be removed?
Another problem is that there's a pretty good chance that not every user should be allowed to approve a pay rise. I'd give some thought to that before implementing any solution.
I know it looks ugly but I think somethimes the solution can be to have the table_name in the table and a union query
create table approve_pay_rise (
rise_proposal varchar2(10) -- foreign key to payrise table
, approver varchar2(10) -- key of record in table named in other_table
, other_table varchar2(15) );
insert into approve_pay_rise values ('prop000001', 'e0009999', 'USERS');
insert into approve_pay_rise values ('prop000001', 'm0002200', 'MANAGERS');
Then either in code a case statement, repeated statements for each other_table value (select ... where other_table = '' .. select ... where other_table = '') or a union select.
I have to admit I shudder when I encounter it and I'll now go wash my hands after typing a recomendation to do it, but it works.
Sounds like you'd might need two tables ("ApprovalUsers" and "ApprovalGroups"). The SELECT statement(s) would be a UNION of UserIds from the "ApprovalUsers" and the UserIDs from any other groups of users that are the "ApprovalGroups" related to the PayRiseId.
SELECT UserID
INTO #TempApprovers
FROM ApprovalUsers
WHERE PayRiseId = 12345
IF EXISTS (SELECT GroupName FROM ApprovalGroups WHERE GroupName = "Executives" and PayRiseId = 12345)
BEGIN
SELECT UserID
INTO #TempApprovers
FROM Executives
END
....
EDIT: this would/could duplicate UserIds, so you would probably want to GROUP BY UserID (i.e. SELECT UserID FROM #TempApprovers GROUP BY UserID)

SQL Server insert if not exists best practice [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have a Competitions results table which holds team member's names and their ranking on one hand.
On the other hand I need to maintain a table of unique competitors names:
CREATE TABLE Competitors (cName nvarchar(64) primary key)
Now I have some 200,000 results in the 1st table and when the competitors table is empty I can perform this:
INSERT INTO Competitors SELECT DISTINCT Name FROM CompResults
And the query only takes some 5 seconds to insert about 11,000 names.
So far this is not a critical application so I can consider truncate the Competitors table once a month, when I receive the new competition results with some 10,000 rows.
But what is the best practice when new results are added, with new AND existing competitors? I don't want to truncate existing competitors table
I need to perform INSERT statement for new competitors only and do nothing if they exists.
Semantically you are asking "insert Competitors where doesn't already exist":
INSERT Competitors (cName)
SELECT DISTINCT Name
FROM CompResults cr
WHERE
NOT EXISTS (SELECT * FROM Competitors c
WHERE cr.Name = c.cName)
Another option is to left join your Results table with your existing competitors Table and find the new competitors by filtering the distinct records that donĀ“t match int the join:
INSERT Competitors (cName)
SELECT DISTINCT cr.Name
FROM CompResults cr left join
Competitors c on cr.Name = c.cName
where c.cName is null
New syntax MERGE also offer a compact, elegant and efficient way to do that:
MERGE INTO Competitors AS Target
USING (SELECT DISTINCT Name FROM CompResults) AS Source ON Target.Name = Source.Name
WHEN NOT MATCHED THEN
INSERT (Name) VALUES (Source.Name);
Don't know why anyone else hasn't said this yet;
NORMALISE.
You've got a table that models competitions? Competitions are made up of Competitors? You need a distinct list of Competitors in one or more Competitions......
You should have the following tables.....
CREATE TABLE Competitor (
[CompetitorID] INT IDENTITY(1,1) PRIMARY KEY
, [CompetitorName] NVARCHAR(255)
)
CREATE TABLE Competition (
[CompetitionID] INT IDENTITY(1,1) PRIMARY KEY
, [CompetitionName] NVARCHAR(255)
)
CREATE TABLE CompetitionCompetitors (
[CompetitionID] INT
, [CompetitorID] INT
, [Score] INT
, PRIMARY KEY (
[CompetitionID]
, [CompetitorID]
)
)
With Constraints on CompetitionCompetitors.CompetitionID and CompetitorID pointing at the other tables.
With this kind of table structure -- your keys are all simple INTS -- there doesn't seem to be a good NATURAL KEY that would fit the model so I think a SURROGATE KEY is a good fit here.
So if you had this then to get the the distinct list of competitors in a particular competition you can issue a query like this:
DECLARE #CompetitionName VARCHAR(50) SET #CompetitionName = 'London Marathon'
SELECT
p.[CompetitorName] AS [CompetitorName]
FROM
Competitor AS p
WHERE
EXISTS (
SELECT 1
FROM
CompetitionCompetitor AS cc
JOIN Competition AS c ON c.[ID] = cc.[CompetitionID]
WHERE
cc.[CompetitorID] = p.[CompetitorID]
AND cc.[CompetitionName] = #CompetitionNAme
)
And if you wanted the score for each competition a competitor is in:
SELECT
p.[CompetitorName]
, c.[CompetitionName]
, cc.[Score]
FROM
Competitor AS p
JOIN CompetitionCompetitor AS cc ON cc.[CompetitorID] = p.[CompetitorID]
JOIN Competition AS c ON c.[ID] = cc.[CompetitionID]
And when you have a new competition with new competitors then you simply check which ones already exist in the Competitors table. If they already exist then you don't insert into Competitor for those Competitors and do insert for the new ones.
Then you insert the new Competition in Competition and finally you just make all the links in CompetitionCompetitors.
You will need to join the tables together and get a list of unique competitors that don't already exist in Competitors.
This will insert unique records.
INSERT Competitors (cName)
SELECT DISTINCT Name
FROM CompResults cr LEFT JOIN Competitors c ON cr.Name = c.cName
WHERE c.Name IS NULL
There may come a time when this insert needs to be done quickly without being able to wait for the selection of unique names. In that case, you could insert the unique names into a temporary table, and then use that temporary table to insert into your real table. This works well because all the processing happens at the time you are inserting into a temporary table, so it doesn't affect your real table. Then when you have all the processing finished, you do a quick insert into the real table. I might even wrap the last part, where you insert into the real table, inside a transaction.
The answers above which talk about normalizing are great! But what if you find yourself in a position like me where you're not allowed to touch the database schema or structure as it stands? Eg, the DBA's are 'gods' and all suggested revisions go to /dev/null?
In that respect, I feel like this has been answered with this Stack Overflow posting too in regards to all the users above giving code samples.
I'm reposting the code from INSERT VALUES WHERE NOT EXISTS which helped me the most since I can't alter any underlying database tables:
INSERT INTO #table1 (Id, guidd, TimeAdded, ExtraData)
SELECT Id, guidd, TimeAdded, ExtraData
FROM #table2
WHERE NOT EXISTS (Select Id, guidd From #table1 WHERE #table1.id = #table2.id)
-----------------------------------
MERGE #table1 as [Target]
USING (select Id, guidd, TimeAdded, ExtraData from #table2) as [Source]
(id, guidd, TimeAdded, ExtraData)
on [Target].id =[Source].id
WHEN NOT MATCHED THEN
INSERT (id, guidd, TimeAdded, ExtraData)
VALUES ([Source].id, [Source].guidd, [Source].TimeAdded, [Source].ExtraData);
------------------------------
INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData)
SELECT id, guidd, TimeAdded, ExtraData from #table2
EXCEPT
SELECT id, guidd, TimeAdded, ExtraData from #table1
------------------------------
INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData)
SELECT #table2.id, #table2.guidd, #table2.TimeAdded, #table2.ExtraData
FROM #table2
LEFT JOIN #table1 on #table1.id = #table2.id
WHERE #table1.id is null
The above code uses different fields than what you have, but you get the general gist with the various techniques.
Note that as per the original answer on Stack Overflow, this code was copied from here.
Anyway my point is "best practice" often comes down to what you can and can't do as well as theory.
If you're able to normalize and generate indexes/keys -- great!
If not and you have the resort to code hacks like me, hopefully the
above helps.
Good luck!
Normalizing your operational tables as suggested by Transact Charlie, is a good idea, and will save many headaches and problems over time - but there are such things as interface tables, which support integration with external systems, and reporting tables, which support things like analytical processing; and those types of tables should not necessarily be normalized - in fact, very often it is much, much more convenient and performant for them to not be.
In this case, I think Transact Charlie's proposal for your operational tables is a good one.
But I would add an index (not necessarily unique) to CompetitorName in the Competitors table to support efficient joins on CompetitorName for the purposes of integration (loading of data from external sources), and I would put an interface table into the mix: CompetitionResults.
CompetitionResults should contain whatever data your competition results have in it. The point of an interface table like this one is to make it as quick and easy as possible to truncate and reload it from an Excel sheet or a CSV file, or whatever form you have that data in.
That interface table should not be considered part of the normalized set of operational tables. Then you can join with CompetitionResults as suggested by Richard, to insert records into Competitors that don't already exist, and update the ones that do (for example if you actually have more information about competitors, like their phone number or email address).
One thing I would note - in reality, Competitor Name, it seems to me, is very unlikely to be unique in your data. In 200,000 competitors, you may very well have 2 or more David Smiths, for example. So I would recommend that you collect more information from competitors, such as their phone number or an email address, or something which is more likely to be unique.
Your operational table, Competitors, should just have one column for each data item that contributes to a composite natural key; for example it should have one column for a primary email address. But the interface table should have a slot for old and new values for a primary email address, so that the old value can be use to look up the record in Competitors and update that part of it to the new value.
So CompetitionResults should have some "old" and "new" fields - oldEmail, newEmail, oldPhone, newPhone, etc. That way you can form a composite key, in Competitors, from CompetitorName, Email, and Phone.
Then when you have some competition results, you can truncate and reload your CompetitionResults table from your excel sheet or whatever you have, and run a single, efficient insert to insert all the new competitors into the Competitors table, and single, efficient update to update all the information about the existing competitors from the CompetitionResults. And you can do a single insert to insert new rows into the CompetitionCompetitors table. These things can be done in a ProcessCompetitionResults stored procedure, which could be executed after loading the CompetitionResults table.
That's a sort of rudimentary description of what I've seen done over and over in the real world with Oracle Applications, SAP, PeopleSoft, and a laundry list of other enterprise software suites.
One last comment I'd make is one I've made before on SO: If you create a foreign key that insures that a Competitor exists in the Competitors table before you can add a row with that Competitor in it to CompetitionCompetitors, make sure that foreign key is set to cascade updates and deletes. That way if you need to delete a competitor, you can do it and all the rows associated with that competitor will get automatically deleted. Otherwise, by default, the foreign key will require you to delete all the related rows out of CompetitionCompetitors before it will let you delete a Competitor.
(Some people think non-cascading foreign keys are a good safety precaution, but my experience is that they're just a freaking pain in the butt that are more often than not simply a result of an oversight and they create a bunch of make work for DBA's. Dealing with people accidentally deleting stuff is why you have things like "are you sure" dialogs and various types of regular backups and redundant data sources. It's far, far more common to actually want to delete a competitor, whose data is all messed up for example, than it is to accidentally delete one and then go "Oh no! I didn't mean to do that! And now I don't have their competition results! Aaaahh!" The latter is certainly common enough, so, you do need to be prepared for it, but the former is far more common, so the easiest and best way to prepare for the former, imo, is to just make foreign keys cascade updates and deletes.)
Ok, this was asked 7 years ago, but I think the best solution here is to forego the new table entirely and just do this as a custom view. That way you're not duplicating data, there's no worry about unique data, and it doesn't touch the actual database structure. Something like this:
CREATE VIEW vw_competitions
AS
SELECT
Id int
CompetitionName nvarchar(75)
CompetitionType nvarchar(50)
OtherField1 int
OtherField2 nvarchar(64) --add the fields you want viewed from the Competition table
FROM Competitions
GO
Other items can be added here like joins on other tables, WHERE clauses, etc. This is most likely the most elegant solution to this problem, as you now can just query the view:
SELECT *
FROM vw_competitions
...and add any WHERE, IN, or EXISTS clauses to the view query.
Additionally, if you have multiple columns to insert and want to check if they exists or not use the following code
Insert Into [Competitors] (cName, cCity, cState)
Select cName, cCity, cState from
(
select new.* from
(
select distinct cName, cCity, cState
from [Competitors] s, [City] c, [State] s
) new
left join
(
select distinct cName, cCity, cState
from [Competitors] s
) existing
on new.cName = existing.cName and new.City = existing.City and new.State = existing.State
where existing.Name is null or existing.City is null or existing.State is null
)

Resources