I am trying to remove old data from a SQL Server database, given a list of ID's, but I'm trying to figure out how to get it to run faster. Currently deleting a list of 250 ID's takes around 1 hour. These ID's are attached to our 'root' objects, example below. Each of these has foreign key constraints.
Products
| productID | description | price |
+-----------------+-------------------+-------------+
| 1 | item 1 | 5.00 |
| 2 | item 2 | 5.00 |
| 3 | item 3 | 5.00 |
| ... | ... | ... |
Sales
| saleID | productID |
+-----------------+-------------------+
| 4 | 1 |
| 5 | 2 |
| 6 | 3 |
| ... | ... |
Taxes
| taxID | saleID |
+-----------------+-------------------+
| 7 | 4 |
| 8 | 5 |
| 9 | 6 |
| ... | ... |
Currently, we are just passing a list of product ID's and cascading through manually, such as
DECLARE #ProductIDsRemoval AS TABLE { id int }
INSERT INTO #ProductIDsRemoval VALUES (1)
DELETE t
FROM dbo.Taxes t
INNER JOIN dbo.Sales s ON (s.saleID = t.saleID)
INNER JOIN #ProductIDsRemoval p ON (s.productID = p.id)
DELETE s
FROM dbo.Sales s
INNER JOIN #ProductIDsRemoval p ON (s.productID = p.id)
DELETE p
FROM dbo.Products p
INNER JOIN #ProductIDsRemoval p2 ON (p.productID = p2.id)
This works fine, however my issue is that my table structure has ~70 tables and at least a couple thousand rows in each to remove, if not a couple million. Currently, my query takes anywhere from 1 to 6 hours to run, depending on the number of base ID's we're removing (my structure doesn't actually use Products/Taxes/Sales, but it's a decent analogy, and the number we're aiming to remove is ~750 base ids, which we are estimating 3-5 hours for runtime)
I've seen other Stack Overflow answers saying to drop all constraints, add the on-cascade delete, and then re-add the constraints, but this also is taking quite a long time, as I would need to 1. Drop constraints. 2. Rebuild with on-cascade. 3. run my query. 4 drop constraints. 5 re-add without on-cascade.
I've also been looking at possibly just selecting everything I need into temp tables, truncating all of the other tables, and then re-inserting all of my values back and re-setting the indexes based on the last item I added, but again I would need to edit all foreign keys, which I would prefer to not do.
Related
I have a master table named Master_Table and the columns and values in the master table are below:
| ID | Database | Schema | Table_name | Common_col | Value_ID |
+-------+------------+--------+-------------+------------+----------+
| 1 | Database_1 | Test1 | Test_Table1 | Test_ID | 1 |
| 2 | Database_2 | Test2 | Test_Table2 | Test_ID | 1 |
| 3 | Database_3 | Test3 | Test_Table3 | Test_ID2 | 2 |
I have another Value_Table which consist of values that need to be deleted.
| Value_ID | Common_col | Value |
+----------+------------+--------+
| 1 | Test_ID | 110 |
| 1 | Test_ID | 111 |
| 1 | Test_ID | 115 |
| 2 | Test_ID2 | 999 |
I need to build a query to create a SQL query to delete the value from the table provided in Master_Table whose database and schema information is provided in the same row. The column that I need to refer to delete the record is given in Common_col column of master table and the value I need to select is in Value column of Value_Table.
The result of my query should create a query as given below :
DELETE FROM Database_1.Test1.Test_Table1 WHERE Test_ID=110;
or
DELETE FROM Database_1.Test1.Test_Table1 WHERE Test_ID in (110,111,115);
These query should be inside a loop so that I can delete all the row from all the database and tables provided in master table.
Queries don't really create queries.
One way to do what you're saying, which could be useful if this is a one time thing or very occasional thing, is to use SSMS to generate query statements, then copy them to the clipboard, paste them into the window, and execute there.
SELECT 'DELETE FROM Database_1.Test1.Test_Table1 WHERE '
+ common_col
+ ' = '
+ convert(VARCHAR(10),value)
This probably isn't what you want; it sounds more like you want to automate cleanup or something.
You can turn this into one big query if you don't mind repeating yourself a little:
DELETE T1
FROM Database_1.Test1.Test_Table1 T1
INNER JOIN Database_1.Test1.ValueTable VT ON
(VT.common_col = 'Test_ID' and T1.Test_ID=VT.Value) OR
(VT.common_col = 'Test_ID2' and T1.Test_ID2=VT.Value)
You can also use dynamic SQL combined with the first part ... but I hate dynamic SQL so I'm not going to put it in my answer.
Hi stackoverflow database design experts!
I'm facing a design problem in my database, and I've not found any similar issue in Stackoverflow, hence this question.
I have an image table, containing image data and it's primary key. In my design, each image can be referenced multiple time accross multiple tables.
Here is a representation of the database:
-------------------- -------------------------------------------
| image | | table1 |
|--------------------| |-------------------------------------------|
| id_image | data | | id_table1 | id_image | data |
|----------|---------| |-----------|----------|--------------------|
| 1 | Image 1 | | 1 | 1 | References image 1 |
| 2 | Image 2 | | 2 | 3 | References image 3 |
| 3 | Image 3 | -------------------------------------------
--------------------
-------------------------------------------
| table2 |
|-------------------------------------------|
| id_table2 | id_image | data |
|-----------|----------|--------------------|
| 1 | 2 | References image 2 |
| 2 | 2 | References image 2 |
| 3 | 3 | References image 3 |
-------------------------------------------
Here are the tables detail:
image table
id_image auto-incremented primary key
data image data
table1 table
id_table1 auto-incremented primary key
id_image foreign key referencing image.id_image
data table1 data
table2 table
id_table2 auto-incremented primary key
id_image foreign key referencing image.id_image
data table2 data
I want my database to behave as follows:
If I delete the table1 row with id_table1 = 1, the image row with id_image = 1 must be deleted (no other references to this image)
If I then delete the table2 row with id_table2 = 1, no image should be deleted (because the image with id_image = 2 is still referenced by the table2 row with id_table2 = 2)
If I then delete the table2 row with id_table2 = 2, the image row with id_image = 2 must be deleted (no other references to this image)
If I then delete the table1 row with id_table1 = 2, no image should be deleted (because the image with id_image = 3 is still referenced by the table2 row with id_table2 = 3)
If I then delete the table2 row with id_table2 = 3, the image row with id_image = 3 must be deleted (no other references to this image)
I've already tried some cascading delete, by inverting the foreign keys (i.e. image table containing id_table1 and id_table2 foreign keys), but if an image is referenced in 2 other tables, removing one referenced table entry also removes the image, which i do not want to happen.
I've also tried to define triggers, but this approach is more complex than I thought: each time I have to check among all foreign keys to id_image to see if there is another reference to the image to delete. This sample contains 2 foreign keys, but in the database I'm designing there will be more than 10...
I feel like there is a simple solution to this simple problem, anyone here to help me?
Thanks!
Right away because of your first requirement:
If I delete the table1 row with id_table1 = 1, the image row with
id_image = 1 must be deleted (no other references to this image)
I can tell you that you can only accomplish this with a TRIGGER. The reason is because you want to automatically delete from the Parent table when a row is deleted from the Child table.
The reverse (Delete child when parent is deleted) can be done with Cascading Foreign Keys, but not this.
You will need to put Triggers on both child tables to enforce the logic you want.
I came up with a better design yesterday. It still uses triggers (as Tab Alleman said), but those are much simplier to define:
-------------------- ---------------------------
| image | | image_proxy |
|--------------------| |---------------------------|
| id_image | data | | id_image_proxy | id_image |
|----------|---------| |----------------|----------|
| 1 | Image 1 | | 1 | 1 |
| 2 | Image 2 | | 2 | 3 |
| 3 | Image 3 | | 3 | 2 |
-------------------- | 4 | 2 |
| 5 | 3 |
---------------------------
-------------------------------------------------
| table1 |
|-------------------------------------------------|
| id_table1 | id_image_proxy | data |
|-----------|----------------|--------------------|
| 1 | 1 | References image 1 |
| 3 | 2 | References image 3 |
-------------------------------------------------
-------------------------------------------------
| table2 |
|-------------------------------------------------|
| id_table2 | id_image_proxy | data |
|-----------|----------------|--------------------|
| 1 | 3 | References image 2 |
| 2 | 4 | References image 2 |
| 3 | 5 | References image 3 |
-------------------------------------------------
As you can see in the schema above, I've introduced a new table: image_proxy:
id_image_proxy auto-incremented primary key
id_image foreign key referencing image.id_image
Also, table1 and table2 reference now an image_proxy entry instead of an image entry.
With this design, the triggers are now:
After deleting entries in table1, delete corresponding entries in image_proxy.
After deleting entries in table2, delete corresponding entries in image_proxy.
After deleting entries in image_proxy, delete entries in image that are not referenced anymore in image_proxy.
I don't know if this design is the best for this issue, nor if triggers usage is safe, that's why I'll keep an eye on this post if there is any better answer or relevent comment!
I have 2 tables, ShareButton and SharePage.
ShareButton table:
+----+---------------+---------------+
| ID | Name | TotalShare |
+----+---------------+---------------+
| 1 | Facebook | 0 |
| 2 | Twitter | 0 |
+----+---------------+---------------+
SharePage table:
+----+--------------------+-------+---------------+
| ID | URL | Share | ShareButtonID |
+----+--------------------+-------+---------------+
| 1 | www.abc.xyz/page1 | 3 | 1 |
| 2 | www.abc.xyz/page1 | 14 | 2 |
| 3 | www.abc.xyz/page2 | 6 | 1 |
| 4 | www.abc.xyz/page2 | 10 | 2 |
+----+--------------------+-------+---------------+
After insert or update a record in the SharePage table, TotalShare column of ShareButton is updated
update ShareButton
set TotalShare = (sum(Share) from SharePage where "ShareButtonID" = ShareButtonID of updated/inserted record))
where ID = ShareButtonID of updated/inserted record)`
Thank for reading!
Let me start my answer by saying I agree with Mureinik. Unless you have a really bad performance hit getting the sum of shares using a simple group by query, I wouldn't recommend saving that sum in the ShareButton table.
If you really want a trigger to calculate it, I guess the simplest way to do it is this:
CREATE TRIGGER trSharePage_Changed ON SharePage
FOR UPDATE, INSERT, DELETE
AS
UPDATE buttons
SET TotalShare = SumOfShares
FROM ShareButton buttons
INNER JOIN
(
SELECT ShareButtonID, SUM(Share) As SumOfShares
FROM SharePage
GROUP BY ShareButtonID
) pages ON buttons.ID = pages.ShareButtonID
Note that this trigger will be fired after any insert, update or delete statement on table SharePage will be completed. Since it's an after trigger, you don't need to deal with the inserted and deleted tables at all.
Going of the diagram here: I'm confused on column 1 and 3.
I am working on an datawarehouse table and there are two columns that are used as a key that gets you the primary key.
The first column is the source system. there are three possible values Lets say IBM, SQL, ORACLE. Then the second part of the composite key is the transaction ID it could ne numerical or varchar. There is no 3rd column. Other than the secret key which would be a key generated by Identity(1,1) as the record gets loaded. So in the graph below I imagine if I pass in a query
Select a.Patient,
b.Source System,
b.TransactionID
from Patient A
right join Transactions B
on A.sourceSystem = B.sourceSystem and
a.transactionID = B.transactionID
where SourceSystem = "SQL"
The graph leads me to think that column 1 in the index should be set to the SourceSystem. Since it would immediately split the drill down into the next level of index by a 3rd. But when showing this graph to a coworker, they interpreted it as column 1 would be the transactionID, and column 2 as the source system.
Cols
1 2 3
-------------
| | 1 | |
| A |---| |
| | 2 | |
|---|---| |
| | | |
| | 1 | 9 |
| B | | |
| |---| |
| | 2 | |
| |---| |
| | 3 | |
|---|---| |
First, you should qualify all column names in a query. Second, left join usually makes more sense than a right join (the semantics are keep all columns in the first table). Finally, if you have proper foreign key relationships, then you probably don't need an outer join at all.
Let's consider this query:
Select p.Patient, t.Source System, t.TransactionID
from Patient p join
Transactions t
on t.sourceSystem = p.sourceSystem and
t.transactionID = p.transactionID
where t.SourceSystem = 'SQL';
The correct index for this query is Transactions(SourceSystem, TransactionId).
Notes:
Outer joins affect the choice of indexes. Basically if one of the tables has to be scanned anyway, then an index might be less useful.
t.SourceSystem = 'SQL' and p.SourceSystem = 'SQL' would probably optimize differently.
Does the patient really have a transaction id? That seems strange.
I'm having a bit of trouble getting an outer join to work: I've had them work as I expected in MS Access in the past, but getting a similar thing happening in SQL Server is giving me issues.
I have a table of scores that apply to each student like:
+-------------+------------+-------+
| StudentID | StandardID | Score |
+-------------+------------+-------+
| 100 | 1011 | 1 |
| 100 | 1012 | 2 |
| 101 | 1011 | 3 |
Each student may have many scores, and each score is related to one Standard. Additionally, each student may belong to one or more groups, which are contained within another table, groups:
+-------------+------------+
| StudentID | GroupID |
+-------------+------------+
| 100 | 83 |
| 101 | 83 |
What I want to do is extract the score information and filter it by group: this dataset will then be matched up by StudentID to the correct record elsewhere. However, for each retrieved dataset for any given student, there needs to be exactly the same number of rows: one for each standard. Ideally this (for the above data):
StudentID = 100
+------------+-------------+------------+-------+
| StandardID | StudentID | GroupID | Score |
+------------+-------------+------------+-------+
| 1011 | 100 | 83 | 1 |
| 1012 | 100 | 83 | 2 |
StudentID = 101
+------------+-------------+------------+-------+
| StandardID | StudentID | GroupID | Score |
+------------+-------------+------------+-------+
| 1011 | 101 | 83 | 3 |
| 1012 | 101 | 83 | NULL | <--Can't get this to happen
I can pull up the list that I want but there are not NULL rows in there. As a further example, if I have 4 scores for one student but only 1 score for another, I still need there to be 4 rows returned by the query, with NULLs in for the scores they don't have.
This is what I have tried so far (a bit more verbose, but in essence):
SELECT Standards.StandardID, scores.StudentID, scores.TestDate, scores.Score,
scores.Assessment
FROM scores RIGHT OUTER JOIN
(SELECT scores_1.StandardID
FROM scores AS scores_1 INNER JOIN studentGroups
ON scores_1.StudentID = studentGroups.StudentID
WHERE (studentGroups.GroupID = 83)
GROUP BY scores_1.StandardID) AS Standards
ON scores.StandardID = Standards.StandardID
WHERE scores.StudentID = 100
Any help would be amazing!
Can you provide us the database structure Because to return same number of rows for all students you need to create a temp table with different StandardIDs and then use outer join to get same number of rows for all students.
provide the table structure for further and appropriate ans.
I use scores and groups as the two tables described above. You used much more terms so I got (and maybe are) a bit confused. However, this should work:
select AllStandards.StandardID,
groups.StudentID,
groups.GroupID,
Scores.Score
from (select distinct StandardID from scores) AllStandards
left join (
scores
join groups
on groups.StudentID = scores.StudentID
)
on AllStandards.StandardID = scores.StandardID
where groups.StudentID=100
I first create a list of all available StandardID and then do a left join to all students and scores to get the list.