I am using SQL Server 2008 and I have two tables which are of the same schema and I create a view which union the content of the two tables to provide a single view of "table" to external access.
One of the table is read only and the other table contains bulk insert/delete operation (on the other table, I will use bulk insert at some interval to insert everal thousand of rows and run another SQL Job to remove several Million rows daily).
My question is, if the other table is under bulk insert/delete operation, will the physical table be locked so that the access from external user to the union view of two tables are also blocked? (I am thinking of whether lock escalation applies in this scenario, row locks finally lock the table, which finally locks the access of the view?)
if the other table is under bulk insert/delete operation, will the physical table be locked so that the access from external user to the union view of two tables are also blocked?
Yes, with the caveat that, if the optimiser can find a way to execute the query that does not involve accessing the bulk insert table then access will not be blocked.
If you are looking to optimise bulk loading times make sure you have a read of this blog post.
EDIT
What is the actual problem you are experiencing? Do you really need to be using this view everywhere (for example are there places that only need data from one table, that are querying it via the view?)
If you want you view to be "online" all the time consider either snapshot isolation, or if you are loading up full sets into the bulk table (eg. full content is replaced daily), you can load the data into a separate table and sp_rename the table in (in a transaction)
Mostly likely yes. It depends on lock escalation
To work around (not all options):
Use the WITH (NOLOCK) table hint to ignore and don't set any locks. If used on the view it also applies to both tables
Use WITH (READPAST) if you don't mind skipping locked rows in the BCP table
Change the lock granularity for the BCP table. Use sp_tableoption and set "table lock on bulk load" = false.
Edit: Now I've had coffee...
If you need to query the bulk table during load/delete operations and get accurate results and not suffer performance hits, I suggest you need to consider SNAPSHOT isolation
Edit 2: SNAPSHOT isolation
Related
Environment: Oracle 12C
Got a table with about 10 columns which include few clob and date columns. This is a very busy table for an ETL process as described below-
Flat files are loaded into the table first, then updated and processed. The insert and updates happen in batches. Millions of records are inserted and updated.
There is also a delete process to delete old data based on a date field from the table. The delete process runs as a pl/sql procedure and deletes from the table in a loop fetching first n records only based on date field.
I do not want the delete process to interfere with the regular insert/update . What is the best practice to code the delete so that it has minimal impact on the regular insert/update process ?
I can also partition the table and delete in parallel since each partition uses its own rollback segment but am looking for a simpler way to tune the delete process.
Any suggestions on using a special rollback segment or other tuning tips ?
The first thing you should look for is to decouple various ETL processes so that you need not do all of them together or in a particular sequence. Thereby, removing the dependency of the INSERTS/UPDATES and the DELETES. While a insert/update you could manage in single MERGE block in your ETL, you could do the delete later by simply marking the rows to be deleted later, thus doing a soft delete. You could do this as a flag in your table column. And use the same in your application and queries to filter them out.
By doing the delete later, your critical path of the ETL should minimize. Partitioning the data based on date range should definitely help you to maintain the data and also make the transactions efficient if it's date driven. Also, look for any row-by-row thus slow-by-slow transactions and make them in bulk. Avoid context switching between SQL and PL/SQL as much as possible.
If you partition the table as a date range, then you could look into DROP/TRUNCATE partition which will discard the rows stored in that partition as a DDL statement. This cannot be rolled back. It executes quickly and uses few system resources (Undo and Redo). You can read more about it in the documentation.
My source tables called Event sitting in a different database and it has millions of rows. Each event can have an action of DELETE, UPDATE or NEW.
We have a Java process that goes through these events in the order they were created and do all sort of rules and then insert the results into multiple tables for look up, analyse etc..
I am using JdbcTemplate and using batchUpdate to delete and upsert to Postgres DB in a sequential order right now, but I'd like to be able to parallel too. Each batch is 1,000 entities to be insert/upserted or deleted.
However, currently even doing in a sequential manner, Postgres locks queries somehow which I don't know much about and why.
Here are some of the codes
entityService.deleteBatch(deletedEntities);
indexingService.deleteBatch(deletedEntities);
...
entityService.updateBatch(allActiveEntities);
indexingService.updateBatch(....);
Each of these services are doing insert/delete into different tables. They are in one transaction though.
The following query
SELECT
activity.pid,
activity.usename,
activity.query,
blocking.pid AS blocking_id,
blocking.query AS blocking_query
FROM pg_stat_activity AS activity
JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(activity.pid));
returns
Query being blocked: "insert INTO ENTITY (reference, seq, data) VALUES($1, $2, $3) ON CONFLICT ON CONSTRAINT ENTITY_c DO UPDATE SET data = $4",
Blockking query: delete from ENTITY_INDEX where reference = $1
There are no foreign constraints between these tables. And we do have indexes so that we can run queries for our processing as part of the process.
Why would one completely different table can block the other tables? And how can we go about resolving this?
Your query is misleading.
What it shows as “blocking query” is really the last statement that ran in the blocking transaction.
It was probably a previous statement in the same transaction that caused entity (or rather a row in it) to be locked.
I've a running system where data is inserted periodically into MS SQL DB and web application is used to display this data to users.
During data insert users should be able to continue to use DB, unfortunatelly I can't redesign the whole system right now. Every 2 hours 40k-80k records are inserted.
Right now the process looks like this:
Temp table is created
Data is inserted into it using plain INSERT statements (parameterized queries or stored proceuders should improve the speed).
Data is pumped from temp table to destination table using INSERT INTO MyTable(...) SELECT ... FROM #TempTable
I think that such approach is very inefficient. I see, that insert phase can be improved (bulk insert?), but what about transfering data from temp table to destination?
This is waht we did a few times. Rename your table as TableName_A. Create a view that calls that table. Create a second table exactly like the first one (Tablename_B). Populate it with the data from the first one. Now set up your import process to populate the table that is not being called by the view. Then change the view to call that table instead. Total downtime to users, a few seconds. Then repopulate the first table. It is actually easier if you can truncate and populate the table becasue then you don't need that last step, but that may not be possible if your input data is not a complete refresh.
You cannot avoid locking when inserting into the table. Even with BULK INSERT this is not possible.
But clients that want to access this table during the concurrent INSERT operations can do so when changing the transaction isolation level to READ UNCOMMITTED or by executing the SELECT command with the WITH NOLOCK option.
The INSERT command will still lock the table/rows but the SELECT command will then ignore these locks and also read uncommitted entries.
I have reports that perform some time consuming data calculations for each user in my database, and the result is 10 to 20 calculated new records for each user. To improve report responsiveness, a nightly job was created to run the calculations and dump the results to a snapshot table in the database. It only runs for active users.
So with 50k users, 30k of which are active, the job "updates" 300k to 600k records in the large snapshot table. The method it currently uses is it deletes all previous records for a given user, then inserts the new set. There is no PK on the table, only a business key is used to group the sets of data.
So my question is, when removing and adding up to 600k records every night, are there techniques to optimize the table to handle this? For instance, since the data can be recreated on demand, is there a way to disable logging for the table as these changes are made?
UPDATE:
One issue is I cannot do this in batch because the way the script works, it's examining one user at a time, so it looks at a user, deletes the previous 10-20 records, and inserts a new set of 10-20 records. It does this over and over. I am worried that the transaction log will run out of space or other performance issues could occur. I would like to configure the table to now worry about data preservation or other items that could slow it down. I cannot drop the indexes and all that because people are accessing the table concurrently to it being updated.
It's also worth noting that indexing could potentially speed up this bulk update rather than slow it down, because UPDATE and DELETE statements still need to be able to locate the affected rows in the first place, and without appropriate indexes it will resort to table scans.
I would, at the very least, consider a non-clustered index on the column(s) that identify the user, and (assuming you are using 2008) consider the MERGE statement, which can definitely avoid the shortcomings of the mass DELETE/INSERT method currently employed.
According to The Data Loading Performance Guide (MSDN), MERGE is minimally logged for inserts with the use of a trace flag.
I won't say too much more until I know which version of SQL Server you are using.
This is called Bulk Insert, you have to drop all indexes in destination table and send insert commands in large packs (hundreds of insert statements) separated by ;
Another way is to use BULK INSERT statement http://msdn.microsoft.com/en-us/library/ms188365.aspx
but it involves dumping data to file.
See also: Bulk Insert Sql Server millions of record
It really depends upon many things
speed of your machine
size of the records being processed
network speed
etc.
Generally it is quicker to add records to a "heap" or an un-indexed table. So dropping all of your indexes and re-creating them after the load may improve your performance.
Partitioning the table may see performance benefits if you partition by active and inactive users (although the data set may be a little small for this)
Ensure you test how long each tweak adds or reduces your load and work from there.
I am moving a system from a VB/Access app to SQL server. One common thing in the access database is the use of tables to hold data that is being calculated and then using that data for a report.
eg.
delete from treporttable
insert into treporttable (.... this thing and that thing)
Update treportable set x = x * price where (...etc)
and then report runs from treporttable
I have heard that SQL server does not like it when all records from a table are deleted as it creates huge logs etc. I tried temp sql tables but they don't persists long enough for the report which is in a different process to run and report off of.
There are a number of places where this is done to different report tables in the application. The reports can be run many times a day and have a large number of records created in the report tables.
Can anyone tell me if there is a best practise for this or if my information about the logs is incorrect and this code will be fine in SQL server.
If you do not need to log the deletion activity you can use the truncate table command.
From books online:
TRUNCATE TABLE is functionally
identical to DELETE statement with no
WHERE clause: both remove all rows in
the table. But TRUNCATE TABLE is
faster and uses fewer system and
transaction log resources than DELETE.
http://msdn.microsoft.com/en-us/library/aa260621(SQL.80).aspx
delete from sometable
Is going to allow you to rollback the change. So if your table is very large, then this can cause a lot of memory useage and time.
However, if you have no fear of failure then:
truncate sometable
Will perform nearly instantly, and with minimal memory requirements. There is no rollback though.
To Nathan Feger:
You can rollback from TRUNCATE. See for yourself:
CREATE TABLE dbo.Test(i INT);
GO
INSERT dbo.Test(i) SELECT 1;
GO
BEGIN TRAN
TRUNCATE TABLE dbo.Test;
SELECT i FROM dbo.Test;
ROLLBACK
GO
SELECT i FROM dbo.Test;
GO
i
(0 row(s) affected)
i
1
(1 row(s) affected)
You could also DROP the table, and recreate it...if there are no relationships.
The [DROP table] statement is transactionally safe whereas [TRUNCATE] is not.
So it depends on your schema which direction you want to go!!
Also, use SQL Profiler to analyze your execution times. Test it out and see which is best!!
The answer depends on the recovery model of your database. If you are in full recovery mode, then you have transaction logs that could become very large when you delete a lot of data. However, if you're backing up transaction logs on a regular basis to free the space, this might not be a concern for you.
Generally speaking, if the transaction logging doesn't matter to you at all, you should TRUNCATE the table instead. Be mindful, though, of any key seeds, because TRUNCATE will reseed the table.
EDIT: Note that even if the recovery model is set to Simple, your transaction logs will grow during a mass delete. The transaction logs will just be cleared afterward (without releasing the space). The idea is that DELETE will create a transaction even temporarily.
Consider using temporary tables. Their names start with # and they are deleted when nobody refers to them. Example:
create table #myreport (
id identity,
col1,
...
)
Temporary tables are made to be thrown away, and that happens very efficiently.
Another option is using TRUNCATE TABLE instead of DELETE. The truncate will not grow the log file.
I think your example has a possible concurrency issue. What if multiple processes are using the table at the same time? If you add a JOB_ID column or something like that will allow you to clear the relevant entries in this table without clobbering the data being used by another process.
Actually tables such as treporttable do not need to be recovered to a point of time. As such, they can live in a separate database with simple recovery mode. That eases the burden of logging.
There are a number of ways to handle this. First you can move the creation of the data to running of the report itself. This I feel is the best way to handle, then you can use temp tables to temporarily stage your data and no one will have concurency issues if multiple people try to run the report at the same time. Depending on how many reports we are talking about, it could take some time to do this, so you may need another short term solutio n as well.
Second you could move all your reporting tables to a difffernt db that is set to simple mode and truncate them before running your queries to populate. This is closest to your current process, but if multiple users are trying to run the same report could be an issue.
Third you could set up a job to populate the tables (still in separate db set to simple recovery) once a day (truncating at that time). Then anyone running a report that day will see the same data and there will be no concurrency issues. However the data will not be up-to-the minute. You also could set up a reporting data awarehouse, but that is probably overkill in your case.