How do databases such as MySQL, PostgreSQL handle transactions when there are a lot of transactions come to? How do they identity transaction's order?
Databases process transactions as they come in concurrently. The only order is whatever random order the requests happen to arrive in on the server.
The database server can always prioritize requests however it wants. But its goal is maximizing throughout while respecting isolation levels, it doesn’t care about prioritizing one kind of application business logic over another.
For the database clients to impose an order they would have to control the timing in which they submit the transactions to the database. If you want A to precede B, have B check if the results of A are present, and if not then rollback (and provide a way for B to retry). Or put A and B together in a stored procedure. For batch jobs, some scheduling of when the different jobs get executed is normal, where it’s specified some batch job will occur at a given time of day.
Related
I need some light here. I am working with SQL Server 2008.
I have a database for my application. Each table has a trigger to stores all changes on another database (on the same server) on one unique table 'tbSysMasterLog'. Yes the log of the application its stored on another database.
Problem is, before any Insert/update/delete command on the application database, a transaction its started, and therefore, the table of the log database is locked until the transaction is committed or rolled back. So anyone else who tries to write in any another table of the application will be locked.
So...is there any way possible to disable transactions on a particular database or on a particular table?
You cannot turn off the log. Everything gets logged. You can set to "Simple" which will limit the amount of data saved after the records are committed.
" the table of the log database is locked": why that?
Normally you log changes by inserting records. The insert of records should not lock the complete table, normally there should not be any contention in insertion.
If you do more than inserts, perhaps you should consider changing that. Perhaps you should look at the indices defined on log, perhaps you can avoid some of them.
It sounds from the question that you have a create transaction at the start of your triggers, and that you are logging to the other database prior to the commit transaction.
Normally you do not need to have explicit transactions in SQL server.
If you do need explicit transactions. You could put the data to be logged into variables. Commit the transaction and then insert it into your log table.
Normally inserts are fast and can happen in parallel with out locking. There are certain things like identity columns that require order, but this is very lightweight structure they can be avoided by generating guids so inserts are non blocking, but for something like your log table a primary key identity column would give you a clear sequence that is probably helpful in working out the order.
Obviously if you log after the transaction, this may not be in the same order as the transactions occurred due to the different times that transactions take to commit.
We normally log into individual tables with a similar name to the master table e.g. FooHistory or AuditFoo
There are other options a very lightweight method is to use a trace, this is what is used for performance tuning and will give you a copy of every statement run on the database (including triggers), and you can log this to a different database server. It is a good idea to log to different server if you are doing a trace on a heavily used servers since the volume of data is massive if you are doing a trace across say 1,000 simultaneous sessions.
https://learn.microsoft.com/en-us/sql/tools/sql-server-profiler/save-trace-results-to-a-table-sql-server-profiler?view=sql-server-ver15
You can also trace to a file and then load it into a table, ( better performance), and script up starting stopping and loading traces.
The load on the server that is getting the trace log is minimal and I have never had a locking problem on the server receiving the trace, so I am pretty sure that you are doing something to cause the locks.
I am creating a database medical system and then I came to a point where I am trying to create a notification feature and i will use SQL jobs in it, where the SQL job responsibility is to check some tables and the entities that will find it need to be notified for a change in certain data will put their ids in an entity called Notification and a trigger will be called for the app to check that table and send the notificiation.
what I want to ask is how many SQL jobs can a sql server handle ?
Does the number of running SQL jobs in background affect the performance of my application or the database performance in a way or another ?
NOTE: the SQL job will run every 10 seconds
I couldn't find any useful information online.
thanks in advance.
This question really doesn't have enough background to get a definitive answer. What are the considerations?
Do the queries in your ten-second job actually complete in ten seconds, even when your DBMS is under its peak transactional workload? Obviously, if the job routinely doesn't complete in ten seconds, you'll get jobs piling up.
Do the queries in your job lock up tables and/or indexes so the transactional load can't run efficiently? (You should use SET ISOLATION LEVEL READ UNCOMMITTED; as much as you can so database reads won't lock things unnecessarily.)
Do the queries in your job do a lot of rows' worth of inserts and updates, and so swamp the SQL Server transaction logs?
How big is your server? (CPU cores? RAM? IO capacity?) How big is your database?
If your project succeeds and you get many users, will your answers to the above questions remain the same? (Hint: no.)
You should spend some time on the execution plans for the queries in your job, and try to make them as efficient as possible. Add the necessary indexes. If necessary refactor the queries to make them more efficient. SSMS will show you the execution plans and suggest appropriate indexes.
If your job is doing things like deleting expired rows, you may want to build the expiration in your data model. For example, suppose your job does
DELETE FROM readings WHERE expiration_date >= GETDATE()
and your application does this, relying on your job to avoid getting expired readings.
SELECT something FROM readings
You can refactor your application query to say
SELECT something FROM readings WHERE expiration_date < GETDATE()
and then run your job overnight, at a quiet time, rather than every ten seconds.
A ten-second job is not the greatest idea in the world. If you can rework your application so it will function correctly with a ten-second, ten-minute, or twelve-hour job, you'll have a more resilient production system. At any rate if something goes wrong with the job when your system is very busy you'll have more than ten seconds to fix it.
Here is the situation:
There is a transaction intensive database - used for both routine transactions and reports.
I was wondering if I could isolate these two operations and 2 independent databases, so reports could run off of one database and all the transactions could occur in another one. This would improve performance for the OLTP SQL database.
I have gone over a few options like, Mirroring, Log shipping, Replication, Snapshots, Clustering - but would like to discuss the best possible strategy for the desired result.
Please advise the best solution to implement this strategy, or any other thoughts/suggestion you may have.
I am thinking this is a classic textbook case of separation of frontend and backend database.
For the projects and people I have worked with, there was a strong agreement that the two should be separated.
In one case, there were three tiers of databases:
Frontend transactions Middle summary
repository for reference by frontend transactions
Backend information repository
The frontend transaction speed was so critical, even that layer was dissected into multiple databases, one database per manufacturing area. The transactions were performed by equipment requiring very fast response.
Data from the frontend databases were used, together with customer and management -oriented databases to construct records for the backend reporting repository at an hourly frequency, because management needed short information latency for their operational and engineering decisions. If we could perform the information-compilation at 15 minute intervals, we would have done it. Depending on project, that backend repository could either be Oracle or Sybase IQ.
However, the frontend transactions performed by equipment needed to refer to some meta information. Response time required by the equipment could not run the risk of being interrupted by someone running a huge adhoc query on the backend repository, which was frequent.
So, a middle layer bridging database was created, which consists of nightly abstracts of information from the backend repository.
Schema designed with commonality-keys
Schema design is very important, to optimise the response and performance of all the databases. You have to ensure your database records are commonality-key-indexed and discrete-time-indexed.
For a manufacturing plant filled with robots and equipment, divided into manufacturing areas, each area has a frontend transaction database. Each area database needs to have a commonality-key dispatcher. When
a piece of equipment needed to perform a batch of operations, the beginOp event requests for a discrete-key from the dispatcher. An operation cycle may take seconds, or days, or weeks. Every time a piece of equipment needed to perform a transaction on its state of operation, it includes that commonality-key. An operation could have sub-operations and sub-sub-operations, etc - but each of such operation is required to obtain a commonality-key from its area dispatcher.
The commonality-key dispatcher is simply the beginOp table in the database with an auto-increment key. Any equipment sharing a same begun operation, it is able to infer/obtain that commonality-key from the table due to meticulous process sequencing strategy.
For areas where we could ensure that no two operations on the whole floor could start at the same 100 millisecond, there was no need for a dispatcher because we could simply use the date-time of a beginOp event, where the datetime function of the database server is the natural/spontaneous key dispatcher.
The reason for this discussion on commonality-key is because the transaction response required is so quick, you do not want pieces of equipment to have to communicate with each other unnecessarily just to tell each other they are recording events of the same operation. The robots and equipment simply perform the transactions with the commonality-key they are holding.
The hourly compilation of information for insertion into the backend repository conveniently uses the composite-key of commonality+area, to construct the hierarchy of events.
Frontend piping database
OK, this is really extreme. In some areas, the transactions were so frequent, that we had a FIFO database. We introduced a fourth tier database. For optimal transaction response, we had to keep a database size below 1GB. A transaction-piping process existed to empty old transactions into the fourth tier databases. I found that it was easier (and better response) to create a pool of new databases, so that every time its size reaches 1GB, it is moved out and immediately replaced with a new database from the pool - leaving the machines performing the hourly compilation to join up the databases. So that left us with depending on an existing metadata database to house the commonality-key dispatcher table with some meta-data tables.
In retrospect, one might think the commonality-key dispatcher table and metadata tables could have been housed in the middle tier bridging database, but because the database management processes were automated and cookie-cut, it was cleaner to create a new process than to modify the process managing the mid-tier bridging database. Those management routines were used across the world, so you cannot willy-nilly change them without causing havoc to the financial performance of the company or offending the respective data layer architects maintaining them.
It took a lot of organisational skills for the managers to pull all these together. So transactional data design is not just simply a technical skill but process planning skills involving a whole lot of people head-butting each other until you get it right.
What you ask for is totally standard - OLAP and OLTP do not mix in heavy load scenarios.
You use SQL Server. Look into SSAS (SQL Server Analytical Processing) for something to build cubes (different approach than SQL) that you can then report against.
If you do not wnat that, then mirriroring is the next best solution - you can put a mirror online in read only mode for reporting, and it gives you, also, a backup to activate if the main server fails ;) Always good.
CLustering is a non-issue - it will allow you to move the database to another node, but it does not solve the performance issue at all. Log file shipping, replication - good, though I would go with mirroring, read only copy for reporting, loading the data into SSAS.
We have a read/write Cluster which replicates (using transactional replication) to "read only" servers (not physically read only , the web app just performs reads on them). We do the same for reporting and this scales pretty good.
We have multiple sites, 32+ servers and a couple of reporting servers in this configuration with very high volume of inserts, updates and reads.
We primarily use reporting services for internal reporting. Reporting doesnt effect our core business , which I guess is your main concern.
I have two SQL Server 2005 instances that are geographically separated. Important databases are replicated from the primary location to the secondary using transactional replication.
I'm looking for a way that I can monitor this replication and be alerted immediately if it fails.
We've had occasions in the past where the network connection between the two instances has gone down for a period of time. Because replication couldn't occur and we didn't know, the transaction log blew out and filled the disk causing an outage on the primary database as well.
My google searching some time ago led to us monitoring the MSrepl_errors table and alerting when there were any entries but this simply doesn't work. The last time replication failed (last night hence the question), errors only hit that table when it was restarted.
Does anyone else monitor replication and how do you do it?
Just a little bit of extra information:
It seems that last night the problem was that the Log Reader Agent died and didn't start up again. I believe this agent is responsible for reading the transaction log and putting records in the distribution database so they can be replicated on the secondary site.
As this agent runs inside SQL Server, we can't simply make sure a process is running in Windows.
We have emails sent to us for Merge Replication failures. I have not used Transactional Replication but I imagine you can set up similar alerts.
The easiest way is to set it up through Replication Monitor.
Go to Replication Monitor and select a particular publication. Then select the Warnings and Agents tab and then configure the particular alert you want to use. In our case it is Replication: Agent Failure.
For this alert, we have the Response set up to Execute a Job that sends an email. The job can also do some work to include details of what failed, etc.
This works well enough for alerting us to the problem so that we can fix it right away.
You could run a regular check that data changes are taking place, though this could be complex depending on your application.
If you have some form of audit train table that is very regularly updated (i.e. our main product has a base audit table that lists all actions that result in data being updated or deleted) then you could query that table on both servers and make sure the result you get back is the same. Something like:
SELECT CHECKSUM_AGG(*)
FROM audit_base
WHERE action_timestamp BETWEEN <time1> AND BETWEEN <time2>
where and are round values to allow for different delays in contacting the databases. For instance, if you are checking at ten past the hour you might check items from the start the last hour to the start of this hour. You now have two small values that you can transmit somewhere and compare. If they are different then something has most likely gone wrong in the replication process - have what-ever pocess does the check/comparison send you a mail and an SMS so you know to check and fix any problem that needs attention.
By using SELECT CHECKSUM_AGG(*) the amount of data for each table is very very small so the bandwidth use of the checks will be insignificant. You just need to make sure your checks are not too expensive in the load that apply to the servers, and that you don't check data that might be part of open replication transactions so might be expected to be different at that moment (hence checking the audit trail a few minutes back in time instead of now in my example) otherwise you'll get too many false alarms.
Depending on your database structure the above might be impractical. For tables that are not insert-only (no updates or deletes) within the timeframe of your check (like an audit-trail as above), working out what can safely be compared while avoiding false alarms is likely to be both complex and expensive if not actually impossible to do reliably.
You could manufacture a rolling insert-only table if you do not already have one, by having a small table (containing just an indexed timestamp column) to which you add one row regularly - this data serves no purpose other than to exist so you can check updates to the table are getting replicated. You can delete data older than your checking window, so the table shouldn't grow large. Only testing one table does not prove that all the other tables are replicating (or any other tables for that matter), but finding an error in this one table would be a good "canery" check (if this table isn't updating in the replica, then the others probably aren't either).
This sort of check has the advantage of being independent of the replication process - you are not waiting for the replication process to record exceptions in logs, you are instead proactively testing some of the actual data.
I have a scenario where I'm using transactional replication to replicate multiple SQL Server 2005 databases (same instance) into a single remote database (different instance on a separate physical machine).
I am then performing some processing on the replicated data for reporting purposes. I'm using table level triggers to identify changes which actions my post processing code.
Up to this point everything is fine.
However, what I'd like to know is, where certain tables are created, updated or deleted in the same transaction, is it possible to identify some sort of transaction ID from replication (or anywhere) so then I don't perform the same post processing multiple times for a single transaction.
Basic Example: I have a TUser Table and TAddress table. If I was to create both in a single transaction, they would be replicated across in a single transaction too. However, there would be two triggers fired in the replicated database - which at present causes my post processing code to be run twice. What I'd really like to identify is that these two changes arrived in the replicated in the same transaction.
Is this possible in any way? Does an identifier as I've describe exist and is it accessible?
Short answer is no, there is nothing of the sort that you can rely on. Long answer in summary would be that yes it exists, but it would not be recommended in any way to be used for anything.
Given that replication is transactionally consistent, one approach you could consider would be pushing an identifier for the primary record (in this case TUser, since an TAddress is related to TUser) onto a queue (using something like Service Broker ideally or potentially a user-defined queue) and then perform the post-processing by popping data off the queue and processing separately.
Another possibility would be simply batch processing every 'x' amount of time by polling for new/updated records from the primary tables and post-processing in that manner - you'd need to track id's, rowversions, or timestamps of some sort that you've processed for each primary table as meta-data and pull anything that hasn't yet been processed during each batch run.
Just a few thoughts, hope that helps.