select a column named by another column in another table - sql-server

I am implementing a database that will back a role playing game. There are two relevant tables: character and weapon (plus a third table representing a standard many-to-many relationship; plus the level of each specific instance of a weapon). A character has multiple attributes (strength, agility, magic etc.) and each weapon has a base damage, a level (defined in the many-to-many association), and receives a bonus from the associated attribute of the character wielding said weapon (strength for clubs, agility for ranged weapons etc.). The effectiveness of a weapon must be derived from the three tables. The catch is that which column of the character table applies is dependent on the specific weapon being used.
My current paradigm is to perform two select queries, one to retrieve the name of the associated attribute (varchar) from the weapon table and then one - with the previously returned value substituted in - for the value of that attribute from the wielding character. I would like to replace this with a pure sql solution.
I have searched around the nets and found two other questions:
Pivot on Multiple Columns using Tablefunc and PostgreSQL Crosstab Query but neither does quite what I'm looking for. I also found the postgres internal datatype oid [https://www.postgresql.org/docs/9.1/static/datatype-oid.html ], and was able to locate the oid of a specific column, but could not find the syntax for querying the value of the column with that oid.
Table schemeta:
create table character (
id int primary key,
agility int,
strength int,
magic int,
...);
create table weapon (
id int primary key,
damage int,
associated_attribute varchar(32), --this can be another type if it'd help
...);
create table weapon_character_m2m (
id int primary key,
weapon int, --foreign key to weapon.id
character int, --foreign key to character.id
level int);
In my mind, this should be query-able with something like this (ideally resulting in the effective damage of each weapon currently in the player's possession.):
select m2m.level as level,
weapon.associated_attribute as base_attr_name,
character.??? as base_attr,
weapon.damage as base_damage,
base_damage * base_attr * level as effective_attr -- this is the column I care about, others are for clarity via alias
from weapon_character_m2m as m2m
join weapon on weapon.id=m2m.weapon
join character on character.id=m2m.character;
where m2m.character=$d -- stored proc parameter or the like
Most online resources I've found end up suggesting the database be redesigned. This is an option, but I really don't want to have a different table for each attribute to which a weapon might associate (in practice there are nearly 20 attributes that might be associated with weapon classes).
I have heard that this is possible in MSSQL by Foreign Key'ing into an internal system table, but I have no experience with MSSQL, let alone enough to attempt something like that (and I couldn't find a working sample on the internets). I would consider migrating to MSSQL (or any other sql engine) if anyone can provide a working example.

It sounds like you can just use the CASE statement. I know it is in MS SQL...not sure about PostgreSQL.
Something like (on my phone so just estimating the code 🙂):
Select other fields,
case weapon.associated_attribute
when 'agility' then character.agility
when 'strength' then character.strength
when ...
else 0 --unhandled associate_attribute
end as base_attr
from ...
The caveat here is that you will want your character attributes to be the same type which it looks like you do.
EDIT
I worked towards a view based on your feedback and realized that a view would use an unpivot rather than the case statement above but that you could use a function to do it using the CASE structure above. There are many flavours :-) MS SQL also has table-valued functions that you could use to return one attribute type for all characters. Here is the code I was playing with. It contains both a view and a function. You can choose which seems more appropriate.
create table character (
id int primary key,
agility int,
strength int,
magic int,
--...
);
insert character values (1,10,15,20),(2,11,12,13);
create table attribute_type (
attribute_id int primary key,
attribute_name varchar(20)
);
insert attribute_type values (1,'Agility'),(2,'Strength'),(3,'Magic');
create table weapon (
id int primary key,
damage int,
--associated_attribute varchar(32), --this can be another type if it'd help
attribute_id int
--...
);
insert weapon values (1,20,1),(2,30,2);
create table weapon_character_m2m (
id int primary key,
weapon int, --foreign key to weapon.id
character int, --foreign key to character.id
level int);
insert weapon_character_m2m values (1,1,1,4),(2,2,2,5);
go
create view vw_character_attributes
as
select c.id, a.attribute_id, c.attribute_value
from (
select id, attribute_name, attribute_value
from (select id, agility, strength, magic from character) p --pivoted data
unpivot (attribute_value for attribute_name in (agility, strength, magic)) u --unpivoted data
) c
join attribute_type a on a.attribute_name = c.attribute_name
;
go
create function fn_get_character_attribute (#character_id int, #attribute_id int)
returns int
as
begin
declare #attr int;
select #attr =
case #attribute_id
when 1 then c.agility
when 2 then c.strength
when 3 then c.magic
--when ...
else 0 --unhandled associate_attribute
end
from character c
where c.id = #character_id;
return #attr;
end
go
select * from vw_character_attributes;
select m2m.level as level,
at.attribute_name as base_attr_name,
ca.attribute_value as base_attr,
dbo.fn_get_character_attribute(m2m.id, weapon.attribute_id ) function_value,
weapon.damage as base_damage,
weapon.damage * ca.attribute_value * level as effective_attr -- this is the column I care about, others are for clarity via alias
from weapon_character_m2m as m2m
join weapon on weapon.id=m2m.weapon
join vw_character_attributes ca on ca.id=m2m.character and ca.attribute_id = weapon.attribute_id
join attribute_type at on at.attribute_id = weapon.attribute_id
--where m2m.character=$d; -- stored proc parameter or the like

Related

Creating a unique ID using first 3 character of last name and a sequence number

I have this employee table which I want every employee to have a unique ID using the 3 first letters of their name plus a sequence number in SQL Server.
I don't remember at all how to do this I haven't used SQL in a year and kinda forgot everything.
Can anyone refresh my mind on how to do this. Google has been of no help on this matter. Thanks
Firstly, I suggest that the numeric portion of your identifier be unique in and of itself, in case the employee gets married and changes their last name. The prefix can still appear to the left of it, but should not be necessary to be unique.
If you agree with this design, then you can simply use a numeric identity column on the Employee table and combine that with the last name when retrieving the data, using a computed column. I suggest you seed the identity with a value that has enough digits to keep your identifier lengths consistent, so for example to support 90,000 employees you can use a seed of 10,000 which ensures all identifiers are 8 characters long (three letters of the name plus five numeric).
Simple example:
CREATE TABLE Employee
(
EmployeeNo int IDENTITY(10000,1) PRIMARY KEY,
LastName VarChar(64),
EmployeeID AS SUBSTRING(UPPER(LastName), 1, 3) + RIGHT('0000' + CONVERT(char(5), EmployeeNo), 5)
)
INSERT Employee (LastName) VALUES ('Smith')
SELECT * FROM Employee
Results:
EmployeeNo LastName EmployeeID
10000 Smith SMI10000
For the purposes of your SQL and table design, your tables should all use EmployeeNo as foreign key, since it is compact and unique. Apply the three-letter prefix during data retrieval and only for customer-facing purposes.
#John Wu is right. However, if you don't want to rely on the Employee no then you use NewID() function, which will create a unique number always. Below is the code.
CREATE TABLE EmployeeDetails
(
EmployeeCode int IDENTITY(1,1),
FirstName varchar(50),
LastName Varchar(50),
Empid as left(Lastname,3) + convert(varchar(500), newid())
)
INSERT EmployeeDetails VALUES ('Atul', 'Jain')
`

Cursor based approach vs set based approach

I need to optimize a slow running stored procedure that uses a cursor-based approach to a set-based approach.
In principle, I have to compare records from a transient table (up to 300 records) against records from a "master table" (approx. half a million records and steadily growing). The matching is to be performed by comparing 20 varchar(11) columns of the two records. If at least 6 of these columns between the two records match (i.e. same data) that is considered to be a "sufficient match" and a record is to be created into a match table storing the IDs of the transient record and the master record, the total number of matches and the total number of mismatches.
Note that the number of mismatches is not equal to the balance of 20 minus the number-of-matches. That's because if any of the columns in either of the two records contains a null, it is not counted as a match nor a mismatch; it is simply ignored. Thus the need to capture the two counts (business requirement).
The current implementation uses an outer FAST_FORWARD cursor for the master table and an inner FAST_FORWARD cursor for the transient table. Within the inner cursor it has the following simple comparison logic applied to the 20 columns:
IF #newResults.data1 IS NOT NULL AND #results.data1 IS NOT NULL
BEGIN
IF #newResults.data1 = #results.data1
SET #matchCount = #matchCount + 1
ELSE
SET #mismatchCount = #mismatchCount + 1
END
Then, if the total number of matching columns (i.e. #matchCount) is >= 6, a "match record" is written to a "match table" capturing the primary keys of the two records and the number of matches and mismatches.
What I'm hoping to achieve: rather than looping through the two nested cursors and process one record at a time, use a set-based implementation to process the above. One simple solution, I could think of, would be to do an:
INSERT INTO MatchingResults (ResultID, NewResultID, matchCount, mismatchCount)
SELECT (...) WHERE (...)
...and put the whole matching enchilada in the SELECT statement. But, this is the difficult part... Would anyone be able to give me some pointers here? Or suggest a better performing solution? Many thanks!
Updated with table structures:
--
-- Transient table:
--
table NewResults
(
NewResultID int identity(1,1),
Data1 varchar(11),
Data2 varchar(11),
...
Data20 varchar(11),
SampleDate datetime
)
--
-- Master table:
--
table Results
(
ResultID int identity(1,1),
Data1 varchar(11),
Data2 varchar(11),
...
Data20 varchar(11),
SampleDate datetime
)
--
-- Match table:
--
table MatchingResults
(
ResultID int,
NewResultID int,
MatchCount int,
MismatchCount int
)

How to implement many-to-many-to-many database relationship?

I am building a SQLite database and am not sure how to proceed with this scenario.
I'll use a real-world example to explain what I need:
I have a list products that are sold by many stores in various states. Not every Store sells a particular Product at all, and those that do, may only sell it in one State or another. Most stores sell a product in most states, but not all.
For example, let's say I am trying to buy a vacuum cleaner in Hawaii. Joe's Hardware sells vacuums in 18 states, but not in Hawaii. Walmart sells vacuums in Hawaii, but not microwaves. Burger King does not sell vacuums at all, but will give me a Whopper anywhere in the US.
So if I am in Hawaii and search for a vacuum, I should only get Walmart as a result. While other stores may sell vacuums, and may sell in Hawaii, they don't do both but Walmart does.
How do I efficiently create this type of relationship in a relational database (specifically, I am currently using SQLite, but need to be able to convert to MySQL in the future).
Obviously, I would need tables for Product, Store, and State, but I am at a loss on how to create and query the appropriate join tables...
If I, for example, query a certain Product, how would I determine which Store would sell it in a particular State, keeping in mind that Walmart may not sell vacuums in Hawaii, but they do sell tea there?
I understand the basics of 1:1, 1:n, and M:n relationships in RD, but I am not sure how to handle this complexity where there is a many-to-many-to-many situation.
If you could show some SQL statements (or DDL) that demonstrates this, I would be very grateful. Thank you!
An accepted and common way is the utilisation of a table that has a column for referencing the product and another for the store. There's many names for such a table reference table, associative table mapping table to name some.
You want these to be efficient so therefore try to reference by a number which of course has to uniquely identify what it is referencing. With SQLite by default a table has a special column, normally hidden, that is such a unique number. It's the rowid and is typically the most efficient way of accessing rows as SQLite has been designed this common usage in mind.
SQLite allows you to create a column per table that is an alias of the rowid you simple provide the column followed by INTEGER PRIMARY KEY and typically you'd name the column id.
So utilising these the reference table would have a column for the product's id and another for the store's id catering for every combination of product/store.
As an example three tables are created (stores products and a reference/mapping table) the former being populated using :-
CREATE TABLE IF NOT EXISTS _products(id INTEGER PRIMARY KEY, productname TEXT, productcost REAL);
CREATE TABLE IF NOT EXISTS _stores (id INTEGER PRIMARY KEY, storename TEXT);
CREATE TABLE IF NOT EXISTS _product_store_relationships (storereference INTEGER, productreference INTEGER);
INSERT INTO _products (productname,productcost) VALUES
('thingummy',25.30),
('Sky Hook',56.90),
('Tartan Paint',100.34),
('Spirit Level Bubbles - Large', 10.43),
('Spirit Level bubbles - Small',7.77)
;
INSERT INTO _stores (storename) VALUES
('Acme'),
('Shops-R-Them'),
('Harrods'),
('X-Mart')
;
The resultant tables being :-
_product_store_relationships would be empty
Placing products into stores (for example) could be done using :-
-- Build some relationships/references/mappings
INSERT INTO _product_store_relationships VALUES
(2,2), -- Sky Hooks are in Shops-R-Them
(2,4), -- Sky Hooks in x-Mart
(1,3), -- thingummys in Harrods
(1,1), -- and Acme
(1,2), -- and Shops-R-Them
(4,4), -- Spirit Level Bubbles Large in X-Mart
(5,4), -- Spiirit Level Bubble Small in X-Mart
(3,3) -- Tartn paint in Harrods
;
The _product_store_relationships would then be :-
A query such as the following would list the products in stores sorted by store and then product :-
SELECT storename, productname, productcost FROM _stores
JOIN _product_store_relationships ON _stores.id = storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
ORDER BY storename, productname
;
The resultant output being :-
This query will only list stores that have a product name that contains an s or S (as like is typically case sensitive) the output being sorted according to productcost in ASCending order, then storename, then productname:-
SELECT storename, productname, productcost FROM _stores
JOIN _product_store_relationships ON _stores.id = storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
WHERE productname LIKE '%s%'
ORDER BY productcost,storename, productname
;
Output :-
Expanding the above to consider states.
2 new tables states and store_state_reference
Although no real need for a reference table (a store would only be in one state unless you consider a chain of stores to be a store, in which case this would also cope)
The SQL could be :-
CREATE TABLE IF NOT EXISTS _states (id INTEGER PRIMARY KEY, statename TEXT);
INSERT INTO _states (statename) VALUES
('Texas'),
('Ohio'),
('Alabama'),
('Queensland'),
('New South Wales')
;
CREATE TABLE IF NOT EXISTS _store_state_references (storereference, statereference);
INSERT INTO _store_state_references VALUES
(1,1),
(2,5),
(3,1),
(4,3)
;
If the following query were run :-
SELECT storename,productname,productcost,statename
FROM _stores
JOIN _store_state_references ON _stores.id = _store_state_references.storereference
JOIN _states ON _store_state_references.statereference =_states.id
JOIN _product_store_relationships ON _stores.id = _product_store_relationships.storereference
JOIN _products ON _product_store_relationships.productreference = _products.id
WHERE statename = 'Texas' AND productname = 'Sky Hook'
;
The output would be :-
Without the WHERE clause :-
make Stores-R-Them have a presence in all states :-
The following would make Stores-R-Them have a presence in all states :-
INSERT INTO _store_state_references VALUES
(2,1),(2,2),(2,3),(2,4)
;
Now the Sky Hook's in Texas results in :-
Note This just covers the basics of the topic.
You will need to create combine mapping table of product, states and stores as tbl_product_states_stores which will store mapping of products, state and store. The columns will be id, product_id, state_id, stores_id.

TSQL: getting next available ID

Using SQL Server 2008, have three tables, table a, table b and table c.
All have an ID column, but for table a and b the ID column is an identity integer, for table c the ID column is a varchar type
Currently a stored procedure take a name param, following certain logic, insert to table a or table b, get the identity, prefix with 'A' or 'B' then insert to table c.
Problem is, table C ID column potentially have the duplicated values, i.e. if identity from table A is 2, there might already have 'A2','A3','A5' in the ID column for table C, how to write a T-SQL query to identify the next available value in table C then ensure to update table A/B accordingly?
[Update]
this is the current step,
1. depends on input parameter, insert to table A or table B
2. initialize seed value = ##Identity
3. calculate ID value to insert to table C by prefix 'A' or append 'B' with the seed value
4. look for record match in table C by ID value from step 3, if didn't find any record, insert it, else increase seed value by 1 then repeat step 3
The issue being at a certain value range, there could be a huge block of value exists in table C ID, i.e. A3000 to A500000 existed now in table C ID, the database query is extemely slow if follow the existing logic. Needs to figure out a logic to smartly get the minimum available number (without the prefix)
it is hard to describe, hope this make more sense, I truly appreciate any help on this Thanks in advance!
This should do the trick. Simple self extracting example will work in SSMS. I even made it out of order just in case. You would just change your table to be where #Data is and then change Identifier field to replace 'ID'.
declare #Data Table ( Id varchar(3) );
insert into #Data values ('A5'),('A2'),('B1'),('A3'),('B2'),('A4'),('A1'),('A6');
With a as
(
Select
ID
, cast(right(Id, len(Id)-1) as int) as Pos
, left(Id, 1) as TableFrom
from #Data
)
select
TableFrom
, max(Pos) + 1 as NextNumberUp
from a
group by TableFrom
EDIT: If you want to not worry about production data you could add this last part amending what I wrote:
Select
TableFrom
, max(Pos) as LastPos
into #Temp
from a
group by TableFrom
select TableFrom, LastPos + 1
from #Temp
Regardless if this was production environment you are going to have to hit part of it at some time to get data. If the datasets are not too large and just varchar(256) or less and only 5 million rows or less you could dump that entire column from tableC to a temp table. Honestly query performance versus imports change vastly from system to system.
Following your design there shouldn't be any duplicates in Table C considering that A and B are unique.
A | B | C
1 1 A1
2 2 A2
B1
B2

SQL Server insert if not exists best practice [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have a Competitions results table which holds team member's names and their ranking on one hand.
On the other hand I need to maintain a table of unique competitors names:
CREATE TABLE Competitors (cName nvarchar(64) primary key)
Now I have some 200,000 results in the 1st table and when the competitors table is empty I can perform this:
INSERT INTO Competitors SELECT DISTINCT Name FROM CompResults
And the query only takes some 5 seconds to insert about 11,000 names.
So far this is not a critical application so I can consider truncate the Competitors table once a month, when I receive the new competition results with some 10,000 rows.
But what is the best practice when new results are added, with new AND existing competitors? I don't want to truncate existing competitors table
I need to perform INSERT statement for new competitors only and do nothing if they exists.
Semantically you are asking "insert Competitors where doesn't already exist":
INSERT Competitors (cName)
SELECT DISTINCT Name
FROM CompResults cr
WHERE
NOT EXISTS (SELECT * FROM Competitors c
WHERE cr.Name = c.cName)
Another option is to left join your Results table with your existing competitors Table and find the new competitors by filtering the distinct records that don´t match int the join:
INSERT Competitors (cName)
SELECT DISTINCT cr.Name
FROM CompResults cr left join
Competitors c on cr.Name = c.cName
where c.cName is null
New syntax MERGE also offer a compact, elegant and efficient way to do that:
MERGE INTO Competitors AS Target
USING (SELECT DISTINCT Name FROM CompResults) AS Source ON Target.Name = Source.Name
WHEN NOT MATCHED THEN
INSERT (Name) VALUES (Source.Name);
Don't know why anyone else hasn't said this yet;
NORMALISE.
You've got a table that models competitions? Competitions are made up of Competitors? You need a distinct list of Competitors in one or more Competitions......
You should have the following tables.....
CREATE TABLE Competitor (
[CompetitorID] INT IDENTITY(1,1) PRIMARY KEY
, [CompetitorName] NVARCHAR(255)
)
CREATE TABLE Competition (
[CompetitionID] INT IDENTITY(1,1) PRIMARY KEY
, [CompetitionName] NVARCHAR(255)
)
CREATE TABLE CompetitionCompetitors (
[CompetitionID] INT
, [CompetitorID] INT
, [Score] INT
, PRIMARY KEY (
[CompetitionID]
, [CompetitorID]
)
)
With Constraints on CompetitionCompetitors.CompetitionID and CompetitorID pointing at the other tables.
With this kind of table structure -- your keys are all simple INTS -- there doesn't seem to be a good NATURAL KEY that would fit the model so I think a SURROGATE KEY is a good fit here.
So if you had this then to get the the distinct list of competitors in a particular competition you can issue a query like this:
DECLARE #CompetitionName VARCHAR(50) SET #CompetitionName = 'London Marathon'
SELECT
p.[CompetitorName] AS [CompetitorName]
FROM
Competitor AS p
WHERE
EXISTS (
SELECT 1
FROM
CompetitionCompetitor AS cc
JOIN Competition AS c ON c.[ID] = cc.[CompetitionID]
WHERE
cc.[CompetitorID] = p.[CompetitorID]
AND cc.[CompetitionName] = #CompetitionNAme
)
And if you wanted the score for each competition a competitor is in:
SELECT
p.[CompetitorName]
, c.[CompetitionName]
, cc.[Score]
FROM
Competitor AS p
JOIN CompetitionCompetitor AS cc ON cc.[CompetitorID] = p.[CompetitorID]
JOIN Competition AS c ON c.[ID] = cc.[CompetitionID]
And when you have a new competition with new competitors then you simply check which ones already exist in the Competitors table. If they already exist then you don't insert into Competitor for those Competitors and do insert for the new ones.
Then you insert the new Competition in Competition and finally you just make all the links in CompetitionCompetitors.
You will need to join the tables together and get a list of unique competitors that don't already exist in Competitors.
This will insert unique records.
INSERT Competitors (cName)
SELECT DISTINCT Name
FROM CompResults cr LEFT JOIN Competitors c ON cr.Name = c.cName
WHERE c.Name IS NULL
There may come a time when this insert needs to be done quickly without being able to wait for the selection of unique names. In that case, you could insert the unique names into a temporary table, and then use that temporary table to insert into your real table. This works well because all the processing happens at the time you are inserting into a temporary table, so it doesn't affect your real table. Then when you have all the processing finished, you do a quick insert into the real table. I might even wrap the last part, where you insert into the real table, inside a transaction.
The answers above which talk about normalizing are great! But what if you find yourself in a position like me where you're not allowed to touch the database schema or structure as it stands? Eg, the DBA's are 'gods' and all suggested revisions go to /dev/null?
In that respect, I feel like this has been answered with this Stack Overflow posting too in regards to all the users above giving code samples.
I'm reposting the code from INSERT VALUES WHERE NOT EXISTS which helped me the most since I can't alter any underlying database tables:
INSERT INTO #table1 (Id, guidd, TimeAdded, ExtraData)
SELECT Id, guidd, TimeAdded, ExtraData
FROM #table2
WHERE NOT EXISTS (Select Id, guidd From #table1 WHERE #table1.id = #table2.id)
-----------------------------------
MERGE #table1 as [Target]
USING (select Id, guidd, TimeAdded, ExtraData from #table2) as [Source]
(id, guidd, TimeAdded, ExtraData)
on [Target].id =[Source].id
WHEN NOT MATCHED THEN
INSERT (id, guidd, TimeAdded, ExtraData)
VALUES ([Source].id, [Source].guidd, [Source].TimeAdded, [Source].ExtraData);
------------------------------
INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData)
SELECT id, guidd, TimeAdded, ExtraData from #table2
EXCEPT
SELECT id, guidd, TimeAdded, ExtraData from #table1
------------------------------
INSERT INTO #table1 (id, guidd, TimeAdded, ExtraData)
SELECT #table2.id, #table2.guidd, #table2.TimeAdded, #table2.ExtraData
FROM #table2
LEFT JOIN #table1 on #table1.id = #table2.id
WHERE #table1.id is null
The above code uses different fields than what you have, but you get the general gist with the various techniques.
Note that as per the original answer on Stack Overflow, this code was copied from here.
Anyway my point is "best practice" often comes down to what you can and can't do as well as theory.
If you're able to normalize and generate indexes/keys -- great!
If not and you have the resort to code hacks like me, hopefully the
above helps.
Good luck!
Normalizing your operational tables as suggested by Transact Charlie, is a good idea, and will save many headaches and problems over time - but there are such things as interface tables, which support integration with external systems, and reporting tables, which support things like analytical processing; and those types of tables should not necessarily be normalized - in fact, very often it is much, much more convenient and performant for them to not be.
In this case, I think Transact Charlie's proposal for your operational tables is a good one.
But I would add an index (not necessarily unique) to CompetitorName in the Competitors table to support efficient joins on CompetitorName for the purposes of integration (loading of data from external sources), and I would put an interface table into the mix: CompetitionResults.
CompetitionResults should contain whatever data your competition results have in it. The point of an interface table like this one is to make it as quick and easy as possible to truncate and reload it from an Excel sheet or a CSV file, or whatever form you have that data in.
That interface table should not be considered part of the normalized set of operational tables. Then you can join with CompetitionResults as suggested by Richard, to insert records into Competitors that don't already exist, and update the ones that do (for example if you actually have more information about competitors, like their phone number or email address).
One thing I would note - in reality, Competitor Name, it seems to me, is very unlikely to be unique in your data. In 200,000 competitors, you may very well have 2 or more David Smiths, for example. So I would recommend that you collect more information from competitors, such as their phone number or an email address, or something which is more likely to be unique.
Your operational table, Competitors, should just have one column for each data item that contributes to a composite natural key; for example it should have one column for a primary email address. But the interface table should have a slot for old and new values for a primary email address, so that the old value can be use to look up the record in Competitors and update that part of it to the new value.
So CompetitionResults should have some "old" and "new" fields - oldEmail, newEmail, oldPhone, newPhone, etc. That way you can form a composite key, in Competitors, from CompetitorName, Email, and Phone.
Then when you have some competition results, you can truncate and reload your CompetitionResults table from your excel sheet or whatever you have, and run a single, efficient insert to insert all the new competitors into the Competitors table, and single, efficient update to update all the information about the existing competitors from the CompetitionResults. And you can do a single insert to insert new rows into the CompetitionCompetitors table. These things can be done in a ProcessCompetitionResults stored procedure, which could be executed after loading the CompetitionResults table.
That's a sort of rudimentary description of what I've seen done over and over in the real world with Oracle Applications, SAP, PeopleSoft, and a laundry list of other enterprise software suites.
One last comment I'd make is one I've made before on SO: If you create a foreign key that insures that a Competitor exists in the Competitors table before you can add a row with that Competitor in it to CompetitionCompetitors, make sure that foreign key is set to cascade updates and deletes. That way if you need to delete a competitor, you can do it and all the rows associated with that competitor will get automatically deleted. Otherwise, by default, the foreign key will require you to delete all the related rows out of CompetitionCompetitors before it will let you delete a Competitor.
(Some people think non-cascading foreign keys are a good safety precaution, but my experience is that they're just a freaking pain in the butt that are more often than not simply a result of an oversight and they create a bunch of make work for DBA's. Dealing with people accidentally deleting stuff is why you have things like "are you sure" dialogs and various types of regular backups and redundant data sources. It's far, far more common to actually want to delete a competitor, whose data is all messed up for example, than it is to accidentally delete one and then go "Oh no! I didn't mean to do that! And now I don't have their competition results! Aaaahh!" The latter is certainly common enough, so, you do need to be prepared for it, but the former is far more common, so the easiest and best way to prepare for the former, imo, is to just make foreign keys cascade updates and deletes.)
Ok, this was asked 7 years ago, but I think the best solution here is to forego the new table entirely and just do this as a custom view. That way you're not duplicating data, there's no worry about unique data, and it doesn't touch the actual database structure. Something like this:
CREATE VIEW vw_competitions
AS
SELECT
Id int
CompetitionName nvarchar(75)
CompetitionType nvarchar(50)
OtherField1 int
OtherField2 nvarchar(64) --add the fields you want viewed from the Competition table
FROM Competitions
GO
Other items can be added here like joins on other tables, WHERE clauses, etc. This is most likely the most elegant solution to this problem, as you now can just query the view:
SELECT *
FROM vw_competitions
...and add any WHERE, IN, or EXISTS clauses to the view query.
Additionally, if you have multiple columns to insert and want to check if they exists or not use the following code
Insert Into [Competitors] (cName, cCity, cState)
Select cName, cCity, cState from
(
select new.* from
(
select distinct cName, cCity, cState
from [Competitors] s, [City] c, [State] s
) new
left join
(
select distinct cName, cCity, cState
from [Competitors] s
) existing
on new.cName = existing.cName and new.City = existing.City and new.State = existing.State
where existing.Name is null or existing.City is null or existing.State is null
)

Resources