Will running a keys only query in a transaction slow down reads of that entity elsewhere in the application? - google-app-engine

In my database model I have an Entity that is read very frequently but not written frequently at all. Only the write happens in a transaction.
I have an endpoint that runs a keys-only query to check if that Entity exists inside of a transaction. This endpoint is frequently called.
My question is, will non-transaction reads of my Entity be slowed down elsewhere in my application because of my endpoint above? I read that frequently running a transaction that reads/writes a certain entity may cause other reads slower. In my case though, I am only doing a keys-only query and checking if the key exists to determine if the entity exists.
Summary:
Entity is read very frequently (not in transaction)
Entity is written very infrequently (in a transaction)
Entity is frequently queried for by a keys-only query but not written to (in a transaction)

If your key-only query is inside a read-only transaction (https://cloud.google.com/datastore/docs/concepts/transactions#read-only_transactions), then your check will not slow down your non-transactional reads.

Related

Disable transactions on SQL Server

I need some light here. I am working with SQL Server 2008.
I have a database for my application. Each table has a trigger to stores all changes on another database (on the same server) on one unique table 'tbSysMasterLog'. Yes the log of the application its stored on another database.
Problem is, before any Insert/update/delete command on the application database, a transaction its started, and therefore, the table of the log database is locked until the transaction is committed or rolled back. So anyone else who tries to write in any another table of the application will be locked.
So...is there any way possible to disable transactions on a particular database or on a particular table?
You cannot turn off the log. Everything gets logged. You can set to "Simple" which will limit the amount of data saved after the records are committed.
" the table of the log database is locked": why that?
Normally you log changes by inserting records. The insert of records should not lock the complete table, normally there should not be any contention in insertion.
If you do more than inserts, perhaps you should consider changing that. Perhaps you should look at the indices defined on log, perhaps you can avoid some of them.
It sounds from the question that you have a create transaction at the start of your triggers, and that you are logging to the other database prior to the commit transaction.
Normally you do not need to have explicit transactions in SQL server.
If you do need explicit transactions. You could put the data to be logged into variables. Commit the transaction and then insert it into your log table.
Normally inserts are fast and can happen in parallel with out locking. There are certain things like identity columns that require order, but this is very lightweight structure they can be avoided by generating guids so inserts are non blocking, but for something like your log table a primary key identity column would give you a clear sequence that is probably helpful in working out the order.
Obviously if you log after the transaction, this may not be in the same order as the transactions occurred due to the different times that transactions take to commit.
We normally log into individual tables with a similar name to the master table e.g. FooHistory or AuditFoo
There are other options a very lightweight method is to use a trace, this is what is used for performance tuning and will give you a copy of every statement run on the database (including triggers), and you can log this to a different database server. It is a good idea to log to different server if you are doing a trace on a heavily used servers since the volume of data is massive if you are doing a trace across say 1,000 simultaneous sessions.
https://learn.microsoft.com/en-us/sql/tools/sql-server-profiler/save-trace-results-to-a-table-sql-server-profiler?view=sql-server-ver15
You can also trace to a file and then load it into a table, ( better performance), and script up starting stopping and loading traces.
The load on the server that is getting the trace log is minimal and I have never had a locking problem on the server receiving the trace, so I am pretty sure that you are doing something to cause the locks.

Payment Transactions vs Database Transactions

So in payments processing, you have this concept of payment transaction. As payments or messages come in from various internal and external interfaces, payment transactions get created, probably in some main base transaction table in some relational database (at least for the sake of this question). Then the transactions have some state that changes through a series of edits until one of several final states (paid or unpaid, approved or declined, etc) is reached.
When dealing with databases, you of course have the database transaction, and my question is really, are there any rules of thumb about processing payments transactions within database transactions? The transaction is often an aggregate root to many other tables for information on the customer or cardholder or merchant or velocity settings that participate in that transaction.
I could see a rule saying, "never process more than one payment transaction in a database transaction". But I could also see a database transaction being correct when performing batch type operations, when you must consider the whole batch of transactions successful or failed, so you have the option of rollback.
The normal design pattern is to stick to the following rule: Use database transactions to transition the database from one valid state to another.
In the context of payment transactions that probably means that adding a transaction is one db transaction. Then, each processing step on the transaction (like validation, fulfillment, ...) would be another db transaction.
I could see a rule saying, "never process more than one payment transaction in a database transaction"
You can put multiple logical transactions into one physical transactions for performance reasons or for architectural reasons. That's not necessarily a problem. You need to make sure that work is not lost, though, because a failure will abort an entire batch.

Is there a delay before other transactions see a commit if using asynchronous commit in Postgresql?

I'm looking into optimizing throughput in a Java application that frequently (100+ transactions/second) updates data in a Postgresql database. Since I don't mind losing a few transactions if the database crashes, I think that using asynchronous commit could be a good fit.
My only concern is that I don't want a delay after commit until other transactions/queries see my commit. I am using the default isolation level "Read committed".
So my question is: Does using asynchronous commit in Postgresql in any way mean that there will be delays before other transactions see my committed data or before they proceed if they have been waiting for my transaction to complete? (As mentioned before, I don't care if there is a delay before my data is persisted to disk.)
It would seem that this is the behavior you're looking for.
http://www.postgresql.org/docs/current/static/wal-async-commit.html
Selecting asynchronous commit mode means that the server returns success as soon as the transaction is logically completed, before the
WAL records it generated have actually made their way to disk. This
can provide a significant boost in throughput for small transactions.
The WAL is used to provide on-disk data integrity, and has nothing to do about table integrity of a running server; it's only important if the server crashes. Since they specifically mention "as soon as the transaction is logically completed", the documention is indicating that this does not affect table behavior.

Advice for minimizing locking on an append only table in MS SQL Server?

I'm writing some logging/auditing code that will be running in production (not just when errors are thrown or while developing). After reading Coding Horror's experiences with dead-locking and logging, I decided I should seek advice. (Jeff's solution of "not logging" won't work for me, this is legally mandated security auditing)
Is there an suitable Isolation level for minimizing contention and dead-locking? Any query hints I can add to the insert statement or the stored procedure?
I care deeply about the transactional integrity for everything except the audit table. The idea is that so much will be logged that if a few entries fail, it's not a problem. If the logging stops a some other transaction-- that would be bad.
I can log to a database or a file, although logging to a file is less attractive because I need to be able to display the results somehow. Logging to a file would (almost) guarantee the logging wouldn't interfere with other code though.
A normal transaction (ie. READ COMMITTED) insert already does the 'minimal' locking. Insert intensive applications will not deadlock on the insert, no matter the order of how the insert is mixed with other operations. At worst an intensive insert system may cause page latch contention on the hot spot where insert occurs, but not deadlocks.
To cause deadlocks as described by Jeff there has to be more at play, like any one of the following:
The system is using a higher isolation level (they had it coming then and well deserve it)
They were reading from the log table during the transaction (so is no longer 'append-only')
The deadlock chain involved application layer locks (ie. .Net lock statements in the log4net framework) resulting in undetectable deadlocks (ie. application hangs). Given that solving the problem involved looking at process dumps, I guess this is the scenario they were having.
So as long as you do insert only logging in READ COMMITTED isolation level transactions you are safe. If you expect the same problem I suspect SO had (ie. deadlocks involving application layer locks) then no amount of database wizardry can save you, as the problem can still manifest even if you log on separate transaction or into separate connection.
If you don't care about consistency on your logging table, why not perform all the logging from a separate thread.
I probably would not wait for transactions to complete before logging, since the log can be pivotal in diagnosing long running transactions. Also, this enables you to see all the work a transaction that rolled back did.
Grab the stack trace and all of your logging data in the logging thread, chuck it on a queue when there are new logging messages, flush them to the db in a single transaction.
Steps to minimizing locking:
(KEY) perform all appends to the logging table outside of the main thread/connection/transaction.
Ensure your logging table has a monotonically increasing clustered index (Eg. int identity ) that is increasing each time you append a log message. This ensures the pages being inserted into are usually in memory and avoids the performance hits you get with heap tables.
Perform multiple appends to the log in a transaction (10 inserts in a transaction are faster than 10 inserts out of a transaction and usually acquire/release less locks)
Give it a break. Only perform logging to your db every N milliseconds. Batch up bits of works.
If you need to report on stuff historically, you can consider partitioning your logging table. Example: You could create a new logging table every month, and at the same time have a log VIEW that is a UNION ALL of all the older logging tables. Perform the reporting against the most appropriate source.
You will get better performance by flushing multiple logging messages in a single (smallish) transaction, and have the advantage that if 10 threads are doing work and logging stuff, only a single thread is flushing stuff to the logging table. This pipelining actually makes stuff scale better.
Since you don't care about the transactional integrity of the audit table, you can obviously perform logging outside of the transaction (i.e. after it completes). That will minimise impact on the transaction.
Also, if you want to minimize locking, you should try to ensure that as much of your query workload as possible has covering non-clustered indexes. (SQL Server 2005 and above, the use of the INCLUDE statement in NC indexes can make a big difference)
One easy way to prevent your logging from having locking issues with your 'regular' database is to not use the same database. Just create another database for your logging. As a bonus, the rapid growth of your logging database won't result in fragmentation in your main DB. Personall, I usually prefer to log to a file -- but then again, I'm used to doing heavy text manipulation in my editor - VIM. Logging to a separate DB should help avoid deadlocking issues.
Just make sure that if you try writing your own database appender for the logging framework you use, you be very careful about your locks (which I'm guessing is what tripped up Jeff in the blog post you reference). Properly written (see several of the comments in Jeff's post), you shouldn't have locking issues with your logging framework unless they do something odd.

SQL Server 2005/8 Replication Transaction ID

I have a scenario where I'm using transactional replication to replicate multiple SQL Server 2005 databases (same instance) into a single remote database (different instance on a separate physical machine).
I am then performing some processing on the replicated data for reporting purposes. I'm using table level triggers to identify changes which actions my post processing code.
Up to this point everything is fine.
However, what I'd like to know is, where certain tables are created, updated or deleted in the same transaction, is it possible to identify some sort of transaction ID from replication (or anywhere) so then I don't perform the same post processing multiple times for a single transaction.
Basic Example: I have a TUser Table and TAddress table. If I was to create both in a single transaction, they would be replicated across in a single transaction too. However, there would be two triggers fired in the replicated database - which at present causes my post processing code to be run twice. What I'd really like to identify is that these two changes arrived in the replicated in the same transaction.
Is this possible in any way? Does an identifier as I've describe exist and is it accessible?
Short answer is no, there is nothing of the sort that you can rely on. Long answer in summary would be that yes it exists, but it would not be recommended in any way to be used for anything.
Given that replication is transactionally consistent, one approach you could consider would be pushing an identifier for the primary record (in this case TUser, since an TAddress is related to TUser) onto a queue (using something like Service Broker ideally or potentially a user-defined queue) and then perform the post-processing by popping data off the queue and processing separately.
Another possibility would be simply batch processing every 'x' amount of time by polling for new/updated records from the primary tables and post-processing in that manner - you'd need to track id's, rowversions, or timestamps of some sort that you've processed for each primary table as meta-data and pull anything that hasn't yet been processed during each batch run.
Just a few thoughts, hope that helps.

Resources