I am trying to write a trigger on a table to avoid insertion of two Names which are not flagged as IsDeleted. But the first part of selection contains the inserted one and so the condition is always true. I though that using FOR keyword causes the trigger to run before the INSERTION but in this case the inserted row is already in the table. Am I wrong or this is how all FOR trigger work?
ALTER TRIGGER TriggerName
ON MyTable
FOR INSERT, UPDATE
AS
BEGIN
If exist (select [Name] From MyTable WHERE IsDeleted = 0 AND [Name] in (SELECT [Name] FROM INSERTED)
BEGIN
RAISERROR ('ERROR Description', 16, 1);
Rollback;
END
END
FOR runs after the data is changed, INSTEAD OF is what I think you are after.
EDIT: As stated by others, INSTEAD OF runs instead of the data you are changing, therefore you need to insert the data if it is valid, rather than stopping the insert if it is invalid.
Read this question for a much more detailed explanation of the types of Triggers.
SQL Server "AFTER INSERT" trigger doesn't see the just-inserted row
FOR is the same as AFTER. if you want to "simulate" BEFORE trigger, use INSTEAD OF, caveat, it's not exactly what you would expect on proper BEFORE trigger, i.e. if you fail to provide the necessary INSTEAD action, your inserted/updated data could be lost/ignored.
MSSQL doesn't have BEFORE trigger.
For SQL Server, FOR runs AFTER the SQL which triggered it.
From:
http://msdn.microsoft.com/en-us/library/ms189799.aspx
FOR | AFTER
AFTER specifies that the DML trigger
is fired only when all operations
specified in the triggering SQL
statement have executed successfully.
All referential cascade actions and
constraint checks also must succeed
before this trigger fires.
AFTER is the default when FOR is the
only keyword specified.
AFTER triggers
cannot be defined on views.
I've actually ran into a similar problem lately, and found a cool way to handle it. I had a table which could have several rows for one id, but only ONE of them could be marked as primary.
In SQL Server 2008, you'll be able to make a partial unique index something like this:
create unique index IX on MyTable(name) where isDeleted = 0;
However, you can accomplish it with a little more work in SQL Server 2005. The trick is to make a view showing only the rows which aren't deleted, and then create a unique clustered index on it:
create view MyTableNotDeleted_vw
with schema_binding /* Must be schema bound to create an indexed view */
as
select name
from dbo.MyTable /* Have to use dbo. for schema bound views */
where isDeleted = 0;
GO
create unique clustered index IX on MyTableNotDeleted_vw ( name );
This will effectively create a unique constraint only affecting rows that haven't yet been deleted, and will probably perform better than a custom trigger!
Related
I have a Microsoft SQL Server 2012 database with multiple tables.
All tables contain the same two columns DataRowModified (type datetime) and DataRowLastAuthor (type nvarchar(MAX)). And no, I can't put all those columns into a separate table, it's a requirement that each table directly contains those rows.
I wrote the trigger below for the table Events to automatically update the values of those two columns whenever a row gets updated:
CREATE TRIGGER [dbo].[Trigger_Events_UpdateMetadata]
ON [dbo].[Events]
FOR UPDATE
AS
BEGIN
UPDATE [dbo].[Events]
SET [DataRowModified] = GETDATE(),
[DataRowLastAuthor] = ORIGINAL_LOGIN()
WHERE [Id] IN (SELECT [Id] FROM INSERTED)
END
Now my question is whether I have to copy (and rename) this trigger for every table I have to use it with, or can I somehow write a global trigger that works on all (or a specified set of) tables? It has to know in which table/row the update happened though, because it has to modify it.
What would be the easiest way to implement an automatically maintained LastAuthor and LastModificationDate column into many tables as described?
A trigger in SQL Server is always bound to a single table - you cannot have "global" triggers or triggers attached to multiple tables at once.
If you need a trigger on your 50 tables - you need to write 50 trigger, one each for every table. No way around this.
The only way to avoid this would be to update those columns in your database layer of your application, so that those values would already be present when you save your row of data. Things like Entity Framework allow such "bulk operations" on multiple entities to e.g. update a last modified date and last user to modify the entity.
No, But multiple triggers could invoke the same stored procedure.
I'm feeding data into SQL Server database and 1 out of every 1000 records is a duplicate due to matters outside my control. It's an exact duplicate - the entire record, the unique identifier -- everything.
I know this can solved with an 'updated' rather than insert step ... or 'on error, update' instead of insert, perhaps.
But is there a quick and easy way to make SQL Server ignore these duplicates? I haven't made an index/ unique constraint yet -- but if I did that, I don't want a 'duplicate' key value breaking or interrupting the ETL/ data flow process. I just SQL Server to keep executing the insert query. Is there a way to do this?
Just add a WHERE NOT EXISTS to the statement you're executing -
INSERT INTO table VALUES('123', 'blah') WHERE NOT EXISTS(select top 1 from table where unique_identifier_column = '123')
Just to be clear for anyone else hitting this issue, for the best performance and a slight chance of losing an insert, one should define primary key in the table and use IGNORE_DUP_KEY = ON.
If you're looking for a duplicate record on every field just use the distinct clause in your select:
Insert into DestinationTable
Select Distinct *
From SourceTable
EDIT:
I misinterpreted your question. You're trying to find a low impact way to prevent adding a record that already exists in your DestinationTable.
If you want your inserts to remain fast, one way to do it is to add an identity column to your table as the primary key. Let your duplicate records get added, but then run a maintenance routine on down or slow time that checks all records added since the last check and deletes any added duplicates. Otherwise, there is no easy way... you will have to check on every insert.
For a trigger that is tracking UPDATEs to a table, two temp tables may be referenced: deleted and inserted. Is there a way to cross-reference the two w/o using an INNER JOIN on their primary key?
I am trying to maintain referential integrity without foreign keys (don't ask), so I'm using triggers. I want UPDATEs to the primary key in table A to be reflected in the "foreign key" of look-up table B, and for this to happen when an UPDATE affects multiple records in table A.
All UPDATE trigger examples that I've seen hinge on joining the inserted and deleted tables to track changes; and they use the updated table's ID field (primary key) to set the join. But if that ID field (GUID) is the changed field in a record (or set of records), is there a good way to track those changes, so that I can enforce those changes in the corresponding look-up table?
I've just had this issue (or rather, a similar one), myself, hence the resurrection...
My eventual approach was to simply disallow updates to the PK field precisely because it would break the trigger. Thankfully, I had no business case to support updating the primary key column (these were surrogate IDs, anyway), so I could get away with it.
SQL Server offers the UPDATE function, for use within triggers, to check for this edge case:
CREATE TRIGGER your_trigger
ON your_table
INSTEAD OF UPDATE
AS BEGIN
IF UPDATE(pk1) BEGIN
ROLLBACK
DECLARE #proc SYSNAME, #table SYSNAME
SELECT TOP 1
#proc = OBJECT_NAME(##PROCID)
,#table = OBJECT_NAME(parent_id)
FROM sys.triggers
WHERE object_id = ##PROCID
RAISERROR ('Trigger %s prevents UPDATE of table %s due to locked primary key', 16, -1, #proc, #table) WITH NOWAIT
END
ELSE UPDATE t SET
col1 = i.col1
,col2 = i.col2
,col3 = i.col3
FROM your_table t
INNER JOIN inserted i ON t.pk1 = i.pk1
END
GO
(Note that the above is untested, and probably contains all manner of issues with regards to XACT_STATE or TRIGGER_NESTLEVEL -- it's just there to demonstrate the principle)
It gets a bit messy, though, so I would definitely consider code generation for this, to handle changes to the table during development (maybe even done by a DDL trigger on CREATE/ALTER table).
If you have a composite primary key, you can use IF UPDATE(pk1) OR UPDATE(pk2)... or do some bitwise work with the COLUMNS_UPDATED function, which will give you a bitmask based on the column ordinal (but I'm not going to cover that here -- see MSDN/BOL).
The other (simpler) option is to DENY UPDATE ON your_table(pk) TO public, but remember that any member of sysadmins (and probably dbo) will not honour this.
I'm with #Aaron, without a primary key you're stuck. If you have DDL privileges to add a trigger can't you add a auto increment PK column while you're at it? If you'd like, it doesn't even need to be the PK.
I'm inserting a large amount of rows into an empty table with a primary key constraint on one column.
If there is a duplicate key error, is there any way to find out the value of the key (or row) that caused the error?
Validating the data prior to the insert is sadly not something I can do right now.
Using SQL 2008.
Thanks!
Doing the count(*) / group by thing is something I'm trying to avoid, this is an insert of hundreds of millions of rows from hundreds of different DB's (some of which are on remote servers)...I don't have the time or space to do the insert twice.
The data is supposed to be unique from the providers, but unfortunately their validation doesn't seem to work correctly 100% of the time and I'm trying to at least see where it's failing so I can help them troubleshoot.
Thank you!
There's not a way of doing it that won't slow your process down, but here's one way that will make it easier. You can add an instead-of trigger on that table for inserts and updates. The trigger will check each record before inserting it and make sure it won't cause a primary key violation. You can even create a second table to catch violations, and have a different primary key (like an identity field) on that one, and the trigger will insert the rows into your error-catching table.
Here's an example of how the trigger can work:
CREATE TRIGGER mytrigger ON sometable
INSTEAD OF INSERT
AS BEGIN
INSERT INTO sometable SELECT * FROM inserted WHERE ISNUMERIC(somefield) = 1 FROM inserted;
INSERT INTO sometableRejects SELECT * FROM inserted WHERE ISNUMERIC(somefield) = 0 FROM inserted;
END
In that example, I'm checking a field to make sure it's numeric before I insert the data into the table. You'll need to modify that code to check for primary key violations instead - for example, you might join the INSERTED table to your own existing table and only insert rows where you don't find a match.
The solution would depend on how often this happens. If it's <10% of the time then I would do the following:
Insert the data
If error then do Bravax's revised solution (remove constraint, insert, find dup, report and kill dup, enable constraint).
This means it's only costing you on the few times an error occurs.
If this is happening more often then I'd look at sending the boys over to see the providers :-)
Revised:
Since you don't want to insert twice, could you:
Drop the primary key constraint.
Insert all data into the table
Find any duplicates, and remove them
Then re-add the primary key constraint
Previous reply:
Insert the data into a duplicate of the table without the primary key constraint.
Then run a query on it to determine rows which have duplicate values for the rpimary key column.
select count(*), <Primary Key>
from table
group by <Primary Key>
having count(*) > 1
Use SSIS to import the data and have it check for this as part of the data flow. That is the best way to handle. SSIS can send the bad records to a table (that you can later send to the vendor to help them clean up their act) and process the good ones.
I can't believe that SSIS does not easily address this "reality", because, let's face it, oftentimes you need and want to be able to:
See if a record exists with a certain unique or primary key
If it does not, insert it
If it does, either ignore it or update it.
I don't understand how they would let a product out the door without this capability built-in in an easy-to-use manner. Like, say, set an attribute of a component to automatically check this.
To add a NOT NULL Column to a table with many records, a DEFAULT constraint needs to be applied. This constraint causes the entire ALTER TABLE command to take a long time to run if the table is very large. This is because:
Assumptions:
The DEFAULT constraint modifies existing records. This means that the db needs to increase the size of each record, which causes it to shift records on full data-pages to other data-pages and that takes time.
The DEFAULT update executes as an atomic transaction. This means that the transaction log will need to be grown so that a roll-back can be executed if necessary.
The transaction log keeps track of the entire record. Therefore, even though only a single field is modified, the space needed by the log will be based on the size of the entire record multiplied by the # of existing records. This means that adding a column to a table with small records will be faster than adding a column to a table with large records even if the total # of records are the same for both tables.
Possible solutions:
Suck it up and wait for the process to complete. Just make sure to set the timeout period to be very long. The problem with this is that it may take hours or days to do depending on the # of records.
Add the column but allow NULL. Afterward, run an UPDATE query to set the DEFAULT value for existing rows. Do not do UPDATE *. Update batches of records at a time or you'll end up with the same problem as solution #1. The problem with this approach is that you end up with a column that allows NULL when you know that this is an unnecessary option. I believe that there are some best practice documents out there that says that you should not have columns that allow NULL unless it's necessary.
Create a new table with the same schema. Add the column to that schema. Transfer the data over from the original table. Drop the original table and rename the new table. I'm not certain how this is any better than #1.
Questions:
Are my assumptions correct?
Are these my only solutions? If so, which one is the best? I f not, what else could I do?
I ran into this problem for my work also. And my solution is along #2.
Here are my steps (I am using SQL Server 2005):
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn varchar(40) DEFAULT('')
2) Add a NOT NULL constraint with the NOCHECK option. The NOCHECK does not enforce on existing values:
ALTER TABLE MyTable WITH NOCHECK
ADD CONSTRAINT MyColumn_NOTNULL CHECK (MyColumn IS NOT NULL)
3) Update the values incrementally in table:
GO
UPDATE TOP(3000) MyTable SET MyColumn = '' WHERE MyColumn IS NULL
GO 1000
The update statement will only update maximum 3000 records. This allow to save a chunk of data at the time. I have to use "MyColumn IS NULL" because my table does not have a sequence primary key.
GO 1000 will execute the previous statement 1000 times. This will update 3 million records, if you need more just increase this number. It will continue to execute until SQL Server returns 0 records for the UPDATE statement.
Here's what I would try:
Do a full backup of the database.
Add the new column, allowing nulls - don't set a default.
Set SIMPLE recovery, which truncates the tran log as soon as each batch is committed.
The SQL is: ALTER DATABASE XXX SET RECOVERY SIMPLE
Run the update in batches as you discussed above, committing after each one.
Reset the new column to no longer allow nulls.
Go back to the normal FULL recovery.
The SQL is: ALTER DATABASE XXX SET RECOVERY FULL
Backup the database again.
The use of the SIMPLE recovery model doesn't stop logging, but it significantly reduces its impact. This is because the server discards the recovery information after every commit.
You could:
Start a transaction.
Grab a write lock on your original table so no one writes to it.
Create a shadow table with the new schema.
Transfer all the data from the original table.
execute sp_rename to rename the old table out.
execute sp_rename to rename the new table in.
Finally, you commit the transaction.
The advantage of this approach is that your readers will be able to access the table during the long process and that you can perform any kind of schema change in the background.
Just to update this with the latest information.
In SQL Server 2012 this can now be carried out as an online operation in the following circumstances
Enterprise Edition only
The default must be a runtime constant
For the second requirement examples might be a literal constant or a function such as GETDATE() that evaluates to the same value for all rows. A default of NEWID() would not qualify and would still end up updating all rows there and then.
For defaults that qualify SQL Server evaluates them and stores the result as the default value in the column metadata so this is independent of the default constraint which is created (which can even be dropped if no longer required). This is viewable in sys.system_internals_partition_columns. The value doesn't get written out to the rows until next time they happen to get updated.
More details about this here: online non-null with values column add in sql server 2012
Admitted that this is an old question. My colleague recently told me that he was able to do it in one single alter table statement on a table with 13.6M rows. It finished within a second in SQL Server 2012. I was able to confirm the same on a table with 8M rows. Something changed in later version of SQL Server?
Alter table mytable add mycolumn char(1) not null default('N');
I think this depends on the SQL flavor you are using, but what if you took option 2, but at the very end alter table table to not null with the default value?
Would it be fast, since it sees all the values are not null?
If you want the column in the same table, you'll just have to do it. Now, option 3 is potentially the best for this because you can still have the database "live" while this operation is going on. If you use option 1, the table is locked while the operation happens and then you're really stuck.
If you don't really care if the column is in the table, then I suppose a segmented approach is the next best. Though, I really try to avoid that (to the point that I don't do it) because then like Charles Bretana says, you'll have to make sure and find all the places that update/insert that table and modify those. Ugh!
I had a similar problem, and went for your option #2.
It takes 20 minutes this way, as opposed to 32 hours the other way!!! Huge difference, thanks for the tip.
I wrote a full blog entry about it, but here's the important sql:
Alter table MyTable
Add MyNewColumn char(10) null default '?';
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 0 and 1000000
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 1000000 and 2000000
go
update MyTable set MyNewColumn='?' where MyPrimaryKey between 2000000 and 3000000
go
..etc..
Alter table MyTable
Alter column MyNewColumn char(10) not null;
And the blog entry if you're interested:
http://splinter.com.au/adding-a-column-to-a-massive-sql-server-table
I had a similar problem and I went with modified #3 approach. In my case the database was in SIMPLE recovery mode and the table to which column was supposed to be added was not referenced by any FK constraints.
Instead of creating a new table with the same schema and copying contents of original table, I used SELECT…INTO syntax.
According to Microsoft (http://technet.microsoft.com/en-us/library/ms188029(v=sql.105).aspx)
The amount of logging for SELECT...INTO depends on the recovery model
in effect for the database. Under the simple recovery model or
bulk-logged recovery model, bulk operations are minimally logged. With
minimal logging, using the SELECT… INTO statement can be more
efficient than creating a table and then populating the table with an
INSERT statement. For more information, see Operations That Can Be
Minimally Logged.
The sequence of steps :
1.Move data from old table to new while adding new column with default
SELECT table.*, cast (‘default’ as nvarchar(256)) new_column
INTO table_copy
FROM table
2.Drop old table
DROP TABLE table
3.Rename newly created table
EXEC sp_rename 'table_copy', ‘table’
4.Create necessary constraints and indexes on the new table
In my case the table had more than 100 million rows and this approach completed faster than approach #2 and log space growth was minimal.
1) Add the column to the table with a default value:
ALTER TABLE MyTable ADD MyColumn int default 0
2) Update the values incrementally in the table (same effect as accepted answer). Adjust the number of records being updated to your environment, to avoid blocking other users/processes.
declare #rowcount int = 1
while (#rowcount > 0)
begin
UPDATE TOP(10000) MyTable SET MyColumn = 0 WHERE MyColumn IS NULL
set #rowcount = ##ROWCOUNT
end
3) Alter the column definition to require not null. Run the following at a moment when the table is not in use (or schedule a few minutes of downtime). I have successfully used this for tables with millions of records.
ALTER TABLE MyTable ALTER COLUMN MyColumn int NOT NULL
I would use CURSOR instead of UPDATE. Cursor will update all matching records in batch, record by record -- it takes time but not locks table.
If you want to avoid locks use WAIT.
Also I am not sure, that DEFAULT constrain changes existing rows.
Probably NOT NULL constrain use together with DEFAULT causes case described by author.
If it changes add it in the end
So pseudocode will look like:
-- without NOT NULL constrain -- we will add it in the end
ALTER TABLE table ADD new_column INT DEFAULT 0
DECLARE fillNullColumn CURSOR LOCAL FAST_FORWARD
SELECT
key
FROM
table WITH (NOLOCK)
WHERE
new_column IS NULL
OPEN fillNullColumn
DECLARE
#key INT
FETCH NEXT FROM fillNullColumn INTO #key
WHILE ##FETCH_STATUS = 0 BEGIN
UPDATE
table WITH (ROWLOCK)
SET
new_column = 0 -- default value
WHERE
key = #key
WAIT 00:00:05 --wait 5 seconds, keep in mind it causes updating only 12 rows per minute
FETCH NEXT FROM fillNullColumn INTO #key
END
CLOSE fillNullColumn
DEALLOCATE fillNullColumn
ALTER TABLE table ALTER COLUMN new_column ADD CONSTRAIN xxx
I am sure that there are some syntax errors, but I hope that this
help to solve your problem.
Good luck!
Vertically segment the table. This means you will have two tables, with the same primary key, and exactly the same number of records... One will be the one you already have, the other will have just the key, and the new Non-Null column (with default value) .
Modify all Insert, Update, and delete code so they keep the two tables in synch... If you want you can create a view that "joins" the two tables together to create a single logical combination of the two that appears like a single table for client Select statements...