My inserts in SQL Server are slow for few tables - sql-server

I have an application which inserts data into tables. I found that for 3 tables the insert of data is really slow. The schema, no. Of records etc are same in stage and my prod environment. But, the slowness is observed in prod.
I checked that index fragmentation is low and stats are up-to-date.
Schema and constraints are the same in stage and prod. But inserts are performing well in stage, but slow in prod.
It's a client's prod so I don't have access to the machine. What should I be checking now? I am planning to observe IO/ disk is good. Any ideas on what queries I should be given to the client to get to the root cause of the slowness?

Any ideas on what queries I shd be given to client to get to root cause of the slowness.
Typical cause of "slow inserts in prod" is that the log volume has higher latency and you're not batching inserts into transactions. Without a transaction, each insert statement requires a physical log flush.
The Session Wait Stats should allow you (or the DBA) to pinpoint the cause of the slowness. Waits to flush the log file are tracked as WRITELOG waits. This can also be caused by having a remote synchronous replica using Mirroring or Availibility Groups, which have their own wait types.

Related

Disable transactions on SQL Server

I need some light here. I am working with SQL Server 2008.
I have a database for my application. Each table has a trigger to stores all changes on another database (on the same server) on one unique table 'tbSysMasterLog'. Yes the log of the application its stored on another database.
Problem is, before any Insert/update/delete command on the application database, a transaction its started, and therefore, the table of the log database is locked until the transaction is committed or rolled back. So anyone else who tries to write in any another table of the application will be locked.
So...is there any way possible to disable transactions on a particular database or on a particular table?
You cannot turn off the log. Everything gets logged. You can set to "Simple" which will limit the amount of data saved after the records are committed.
" the table of the log database is locked": why that?
Normally you log changes by inserting records. The insert of records should not lock the complete table, normally there should not be any contention in insertion.
If you do more than inserts, perhaps you should consider changing that. Perhaps you should look at the indices defined on log, perhaps you can avoid some of them.
It sounds from the question that you have a create transaction at the start of your triggers, and that you are logging to the other database prior to the commit transaction.
Normally you do not need to have explicit transactions in SQL server.
If you do need explicit transactions. You could put the data to be logged into variables. Commit the transaction and then insert it into your log table.
Normally inserts are fast and can happen in parallel with out locking. There are certain things like identity columns that require order, but this is very lightweight structure they can be avoided by generating guids so inserts are non blocking, but for something like your log table a primary key identity column would give you a clear sequence that is probably helpful in working out the order.
Obviously if you log after the transaction, this may not be in the same order as the transactions occurred due to the different times that transactions take to commit.
We normally log into individual tables with a similar name to the master table e.g. FooHistory or AuditFoo
There are other options a very lightweight method is to use a trace, this is what is used for performance tuning and will give you a copy of every statement run on the database (including triggers), and you can log this to a different database server. It is a good idea to log to different server if you are doing a trace on a heavily used servers since the volume of data is massive if you are doing a trace across say 1,000 simultaneous sessions.
https://learn.microsoft.com/en-us/sql/tools/sql-server-profiler/save-trace-results-to-a-table-sql-server-profiler?view=sql-server-ver15
You can also trace to a file and then load it into a table, ( better performance), and script up starting stopping and loading traces.
The load on the server that is getting the trace log is minimal and I have never had a locking problem on the server receiving the trace, so I am pretty sure that you are doing something to cause the locks.

Difference between Stream Replication and logical replication

Could anybody tell me more about difference between physical replication and logical replication in PostgreSQL?
TL;DR: Logical replication sends row-by-row changes, physical replication sends disk block changes. Logical replication is better for some tasks, physical replication for others.
Note that in PostgreSQL 12 (current at time of update) logical replication is stable and reliable, but quite limited. Use physical replication if you are asking this question.
Streaming replication can be logical replication. It's all a bit complicated.
WAL-shipping vs streaming
There are two main ways to send data from master to replica in PostgreSQL:
WAL-shipping or continuous archiving, where individual write-ahead-log files are copied from pg_xlog by the archive_command running on the master to some other location. A restore_command configured in the replica's recovery.conf runs on the replica to fetch the archives so the replica can replay the WAL.
This is what's used for point-in-time replication (PITR), which is used as a method of continuous backup.
No direct network connection is required to the master server. Replication can have long delays, especially without an archive_timeout set. WAL shipping cannot be used for synchronous replication.
streaming replication, where each change is sent to one or more replica servers directly over a TCP/IP connection as it happens. The replicas must have a direct network connection the master configured in their recovery.conf's primary_conninfo option.
Streaming replication has little or no delay so long as the replica is fast enough to keep up. It can be used for synchronous replication. You cannot use streaming replication for PITR1 so it's not much use for continuous backup. If you drop a table on the master, oops, it's dropped on the replicas too.
Thus, the two methods have different purposes. However, both of them transport physical WAL archives from primary to replica; they differ only in the timing, and whether the WAL segments get archived somewhere else along the way.
You can and usually should combine the two methods, using streaming replication usually, but with archive_command enabled. Then on the replica, set a restore_command to allow the replica to fall back to restore from WAL archives if there are direct connectivity issues between primary and replica.
Asynchronous vs synchronous streaming
On top of that, there's synchronous and asynchronous streaming replication:
In asynchronous streaming replication the replica(s) are allowed to fall behind the master in time when the master is faster/busier. If the master crashes you might lose data that wasn't replicated yet.
If the asynchronous replica falls too far behind the master, the master might throw away information the replica needs if max_wal_size (was previously called wal_keep_segments) is too low and no slot is used, meaning you have to re-create the replica from scratch. Or the master's pg_wal(waspg_xlog) might fill up and stop the master from working until disk space is freed if max_wal_size is too high or a slot is used.
In synchronous replication the master doesn't finish committing until a replica has confirmed it received the transaction2. You never lose data if the master crashes and you have to fail over to a replica. The master will never throw away data the replica needs or fill up its xlog and run out of disk space because of replica delays. In exchange it can cause the master to slow down or even stop working if replicas have problems, and it always has some performance impact on the master due to network latency.
When there are multiple replicas, only one is synchronous at a time. See synchronous_standby_names.
You can't have synchronous log shipping.
You can actually combine log shipping and asynchronous replication to protect against having to recreate a replica if it falls too far behind, without risking affecting the master. This is an ideal configuration for many deployments, combined with monitoring how far the replica is behind the master to ensure it's within acceptable disaster recovery limits.
Logical vs physical
On top of that we have logical vs physical streaming replication, as introduced in PostgreSQL 9.4:
In physical streaming replication changes are sent at nearly disk block level, like "at offset 14 of disk page 18 of relation 12311, wrote tuple with hex value 0x2342beef1222....".
Physical replication sends everything: the contents of every database in the PostgreSQL install, all tables in every database. It sends index entries, it sends the whole new table data when you VACUUM FULL, it sends data for transactions that rolled back, etc. So it generates a lot of "noise" and sends a lot of excess data. It also requires the replica to be completely identical, so you cannot do anything that'd require a transaction, like creating temp or unlogged tables. Querying the replica delays replication, so long queries on the replica need to be cancelled.
In exchange, it's simple and efficient to apply the changes on the replica, and the replica is reliably exactly the same as the master. DDL is replicated transparently, just like everything else, so it requires no special handling. It can also stream big transactions as they happen, so there is little delay between commit on the master and commit on the replica even for big changes.
Physical replication is mature, well tested, and widely adopted.
logical streaming replication, new in 9.4, sends changes at a higher level, and much more selectively.
It replicates only one database at a time. It sends only row changes and only for committed transactions, and it doesn't have to send vacuum data, index changes, etc. It can selectively send data only for some tables within a database. This makes logical replication much more bandwidth-efficient.
Operating at a higher level also means that you can do transactions on the replica databases. You can create temporary and unlogged tables. Even normal tables, if you want. You can use foreign data wrappers, views, create functions, whatever you like. There's no need to cancel queries if they run too long either.
Logical replication can also be used to build multi-master replication in PostgreSQL, which is not possible using physical replication.
In exchange, though, it can't (currently) stream big transactions as they happen. It has to wait until they commit. So there can be a long delay between a big transaction committing on the master and being applied to the replica.
It replays transactions strictly in commit order, so small fast transactions can get stuck behind a big transaction and be delayed quite a while.
DDL isn't handled automatically. You have to keep the table definitions in sync between master and replica yourself, or the application using logical replication has to have its own facilities to do this. It can be complicated to get this right.
The apply process its self is more complicated than "write some bytes where I'm told to" as well. It also takes more resources on the replica than physical replication does.
Current logical replication implementations are not mature or widely adopted, or particularly easy to use.
Too many options, tell me what to do
Phew. Complicated, huh? And I haven't even got into the details of delayed replication, slots, max_wal_size, timelines, how promotion works, Postgres-XL, BDR and multimaster, etc.
So what should you do?
There's no single right answer. Otherwise PostgreSQL would only support that one way. But there are a few common use cases:
For backup and disaster recovery use pgbarman to make base backups and retain WAL for you, providing easy to manage continuous backup. You should still take periodic pg_dump backups as extra insurance.
For high availability with zero data loss risk use streaming synchronous replication.
For high availability with low data loss risk and better performance you should use asynchronous streaming replication. Either have WAL archiving enabled for fallback or use a replication slot. Monitor how far the replica is behind the master using external tools like Icinga.
References
continuous archiving and PITR
high availability, load balancing and replication
replication settings
recovery.conf
pgbarman
repmgr
wiki: replication, clustering and connection pooling

What is the proper way to run a long query against an active database?

We are using SQL Server 2012 EE but currently do not have the option to run queries on a R/O mirror though that is my long term goal, though am concerned I may run into the below issue in that scenario as well since the mirror would also be updating data I am querying.
I have a view that joins across several tables from two databases and is used for invoicing from existing data. Three of these tables are also actively updated by ongoing transactions. Running a report that used this view did not used to be a problem but now our database is getting much larger and I have run into some timeout problems. First the query was timing out so I set command timeout to 0 and reran the query which pegged all 4 CPUs 100% for 90 minutes and then I killed it. There were no problems with active transactions during that time. I reviewed the query and found a field I was joining on that was not indexed so created an index on that field, reran the report, which then finished in three minutes and all the CPUs were busy but not at all pegged out. Same data amount queried both times. I figured problem solved. Of course later, my boss ran a similar query, perhaps with some more data but probably not a lot more, and our live transactions started timing out 100% while his query was running. I did not get a chance to see the CPU usage during that time.
So my questions are two:
Given I have to use the live and active database, what is the proper way to run a long R/O query so that active transactions can still continue? I am considering NO LOCK but am hoping there is a better standard practice.
And what might cause sqlserver to peg out 4 CPUs with 100% busy and not cause live transaction timeouts, yet when my boss ran his query, after I added the index and my query ran much better, the live update transactions start timing out 100%?
I know this is not a lot of info to go on. I'm not very familiar with sql profiling and performance monitoring yet this behavior seems rather odd and am hoping a best practice would be the correct workaround.
The default behavior of SELECT queries in the READ_COMMITTED transaction isolation level is to acquire shared locks during query execution to provide the requested data consistency (read committed data only). These locks are typically row-level and released quickly during query execution immediately after each row is read. There are also less granular intent locks at the page and table level prevent concurrent updates to data as it is being read. Depending on the particulars of the execution plan, there may even be shared locks held at the table level for the duration of the query, which will prevent updates to the table during query execution and result in readers blocking writers.
Setting the READ_COMMITTED_SNAPSHOT database option causes SQL Server to use row versioning instead of locking to provide the same read consistency. A row version store is maintained in tempdb so that when a row requested by the query has changed since the query began, the most recent committed row version is returned instead. This row-versioning behavior avoids locking and effectively provides a statement-level snapshot of the database at the time the query began. Readers do not block writers and writers do not block readers. Do not confuse the READ_COMMITTED_SNAPSHOT database option with the SNAPSHOT isolation level (a common mistake).
The downside of setting the READ_COMMITTED_SNAPSHOT is additional resource usage. An additional 14 bytes of storage overhead for each row is incurred once the database option is enabled. Updates and deletes will generate row versions in tempdb. These versions require tempdb space for the duration of the longest running query and there is overhead in maintained the version store. Also consider whether you have existing applications that depend on readers-block-writers locking behavior. Despite this overhead, the concurrency benefits may yield better overall performance depending on your workload, while providing read integrity. See http://technet.microsoft.com/en-us/library/ms188277.aspx for more information.
Actually I decided to create a snapshot at the beginning of each month for reporting to run against. Then delete when no longer needed for reporting. This seems to work fine. I could do something similar with a database restore but slightly more work. This allows not needing a second SQL EE license, and lets me run reports w/o locking tables for live transactions.

Can a large transaction log cause cpu hikes to occur

I have a client with a very large database on Sql Server 2005. The total space allocated to the db is 15Gb with roughly 5Gb to the db and 10 Gb to the transaction log. Just recently a web application that is connecting to that db is timing out.
I have traced the actions on the web page and examined the queries that execute whilst these web operation are performed. There is nothing untoward in the execution plan.
The query itself used multiple joins but completes very quickly. However, the db server's CPU hikes to 100% for a few seconds. The issue occurs when several simultaneous users are working on the system (when I say multiple .. read about 5). Under this timeouts start to occur.
I suppose my question is, can a large transaction log cause issues with CPU performance? There is about 12Gb of free space on the disk currently. The configuration is a little out of my hands but the db and log are both on the same physical disk.
I appreciate that the log file is massive and needs attending to, but I'm just looking for a heads up as to whether this may cause CPU spikes (ie trying to find the correlation). The timeouts are a recent thing and this app has been responsive for a few years (ie its a recent manifestation).
Many Thanks,
It's hard to say exactly given the lack of data, but the spikes are commonly observed on transaction log checkpoint.
A checkpoint is a procedure of applying data sequentially appended and stored in the transaction log to the actual datafiles.
This involves lots of I/O, including CPU operations, and may be a reason of the CPU activity spikes.
Normally, a checkpoint occurs when a transaction log is 70% full or when SQL Server decides that a recovery procedure (reapplying the log) would take longer than 1 minute.
Your first priority should be to address the transaction log size. Is the DB being backed up correctly, and how frequently. Address theses issues and then see if the CPU spikes go away.
CHECKPOINT is the process of reading your transaction log and applying the changes to the DB file, if the transaction log is HUGE then it makes sense it could affect it?
You could try extending the autogrowth: Kimberley Tripp suggests upwards of 500MB autogrowth for transaction logs measured in GBs:
http://www.sqlskills.com/blogs/kimberly/post/8-Steps-to-better-Transaction-Log-throughput.aspx
(see point 7)
While I wouldn't be surprised if having a log that size wasn't causing a problem, there are other things it could be as well. Have the statistics been updated lately? Are the spikes happening when some automated job is running, is there a clear time pattern to when you have the spikes - then look at what else is running? Did you load a new version of anything on the server about the time the spikes started happeining?
In any event, the transaction log needs to be fixed. The reason it is so large is that it is not being backed up (or not backed up frequently enough). It is not enough to back up the database, you must also back up the log. We back ours up every 15 minutes but ours is a highly transactional system and we cannot afford to lose data.

SQL Server transactional replication for very large tables

I have set up transactional replication between two SQL Servers on different ends of a relatively slow VPN connection. The setup is your standard "load snapshot immediately" kind of thing where the first thing it does after initializing the subscription is to drop and recreate all tables on the subscriber side and then start doing a BCP of all the data. The problem is that there are a few tables with several million rows in them, and the process either a) takes a REALLY long time or b) just flat out fails. The messages I keep getting when I look in Replication Monitor are:
The process is running and is waiting for a response from the server.
Query timeout expired
Initializing
It then tries to restart the bulk loading process (skipping any BCP files that it has already loaded).
I am currently stuck where it just keeps doing this over and over again. It's been running for a couple days now.
My questions are:
Is there something I could do to improve this situation given that the network connection is so slow? Maybe some setting or something? I don't mind waiting a long time as long as the process doesn't keep timing out.
Is there a better way to do this? Perhaps make a backup, zip it, copy it over and then restore? If so, how would the replication process know where to pick up when it starts applying the transactions, since updates will be occurring between the time I make the backup and get it restored and running on the other side.
Yes.
You can apply the initial snapshot manually.
It's been a while for me, but the link (into BOL) has alternatives to setting up the subscriber.
Edit: From BOL How-tos, Initialize a Transactional Subscriber from a Backup
In SQL 2005, you have a "compact snapshot" option, that allow you to reduce the total size of the snapshot. When applied over a network, snapshot items "travel" compacted to the suscriber, where they are then expanded.
I think you can easily figure the potential speed gain by comparing sizes of standard and compacted snapshots.
By the way, there is a (quite) similar question here for merge replication, but I think that at the snapshot level there is no difference.

Resources