I have developed a report viewer in .NET Winforms (it just runs queries and displays results).
This works against a reporting database. However, the above is a small subset of a much larger application, which gets data from another database. It looks like this:
Monitored system has a change in state (e.g. latency increases) => Event is recorded into SQL Server database (call this database A) as a transaction => This fires a trigger to write the same event into the reporting database.
I am not sure about the differences between the two databases, they may be tuned for different goals or there may be some financial or even political reason for the two databases.
Anyway, the term was mentioned that the reporting database is "transactionally dependent" on the main database. What exactly does this mean? The reporting database depends entirely on the transactions of database A? This made me think of some questions:
1) How could I handle the situation that the reporting database has no disk space, but database A is still firing triggers to the reporting database? Would it be good to queue
2) Linked to the above, would it work if I queue the triggers and their data not able to fire into the reporting db (not sure how, but conceptually...)? Even then, this makes the system not real time.
Are there any other dangers/issues with exception handling in a setup like this?
Thanks
Such dependencies are actually very bad in production. For once, triggers and updating (remote) databases is a sure shot to kill performance. But more importantly is the issue of availability. The applicaitons that depend on Database A are now tied to the availability of Database B, because if database B is unavailable then the trigger cannor do its work, it will fail and the application will hit errors. So righ now the amdinsitrator(s) of database B are on hook for the operations of the applications using database A.
There are many approaches for this issue, the simplest one is to deploy transactional replication from a publication in database A with a subscription in database B. This isolates the two databases from a transactional point of view, allowing for application dependent on database A to go ahead unhintered when database B is unavailable, or just slow.
If the system has to be real time, then triggers are the only way. Note that triggers are fully synchronous - the operation on the reporting database will have to complete successfully, or the trigger will fail, and it's likely you will then fail your operation on the transaction database since it's in a trigger, the statement on the original table will fail, which may or may not be caught, but either way the change to that table in the transaction database will not occur.
There are valid reasons for this scenario, but it really creates a dependency of the transaction database on the reporting database, since if the reporting database is down, the transaction database effectively becomes read-only or worse.
That's not really what you want.
You can look at replication if your database have the same structure. Typically, when I think of a reporting database, I'm thinking of something with a different structure which is optimized for reporting, not just another copy of the data isolated for performance reasons (which is fine, but this is basically simply throwing hardware at the problem to stop reporting users hurting transaction users).
Related
TL;DR: Is it possible to basically create a fast, temporary, "fork" of a database (like a snapshot transaction) without any locks given that I know for a fact that the changes will never be committed and always be rolled back.
Details:
I'm currently working with SQL Server and am trying to implement a feature where the user can try all sorts of stuff (in the application) that is never persisted in the database.
My first instinct was to (mis)use snapshot transactions for that to basically "fork" the database into a short lived (under 15min) user-specific context. The rest of the application wouldn't even have to know that all the actions the user performs will later be thrown away (I currently persist the connection across requests - it's a web application).
Problem is that there are situations where the snapshot transaction locks and waits for other transactions to complete. My guess is that this happens because SQL server has to make sure it can merge the data if one of the open transactions commits, but in my case I know for a fact that I will never commit the changes from this transactions and always throw the data away (note that not everything happens in this transactions, there are other things that a user can do that happen on a different connection and are persisted).
Are there other ideas, that don't involve cloning the database (too large/slow) or updating/changing the schema of all tables (I'd like to avoid "poisoning" the schema with the implemenation detail of the "try out" feature).
No. SQL Server has copy-on-write Database Snapshots, but the snapshots are read-only. So where a SNAPSHOT transaction acquires regular exclusive locks when it modifies the database, a Database Snapshot would just give you an error.
There are storage technologies that can a writable copy-on-write storage snapshot, like NetApp. You would run a command to create a new LUN that is a snapshot of an existing LUN, present it to your server as a disk, mount its volume in a folder or drive letter, and attach the files you find there as a database. This is often done for cloning across environments to refresh dev/test with prod data without having to copy all the data. But it seems like way too much infrastructure work for your use case.
Need some sanity check.
Imagine having 1 SQL Server instance, a beefy system (i.e 48GB of RAM and tons of storage). Obviously there comes a point where it gets hammered in a situation where there are lots of jobs running.
These jobs/DB are part of an external piece of software and cannot be controlled or modified by us directly.
Now, when these jobs run, besides the queries probably being inefficient, do bring the DB down - they become very slow so any "regular" users are having slow responses.
The immediate thing I can think of is replication of some kind where maybe, the "secondary" DB would be the one where these jobs point to and do their hammering, still leaving the primary available and active but would receive any updates from secondary for data consistency/integrity.
Would this be the right thing to do? Ultimately I want the load to be elsewhere but have the primary be aware of updates and update itself without bringing it down or being very slow.
What is this called in MS SQL Server? Does such a thing exist? The jobs will be doing a read-write FYI.
There are numerous approaches to this, all of which are native to SQL Server, but I think you should look into Transactional Replication:
https://learn.microsoft.com/en-us/sql/relational-databases/replication/transactional/transactional-replication?view=sql-server-ver16
It effectively creates a read-only replica based on log shipping that, for reporting purposes, is practically real time.
From the documentation:
"By default, Subscribers to transactional publications should be treated as read-only, because changes are not propagated back to the Publisher. However, transactional replication does offer options that allow updates at the Subscriber."
Your scenario likely has nuances I don't know about, but you can use various flavors of SQL Replication, custom triggers, linked servers, 3-part queries, etc. to fill in the holes.
I joined a project a while ago, which is a a few web servers and a few backend servers.
They all do CRUD things on one database.
Unfortunately, a few tables fall into a deadlock situation for a while now. We can see those victim statements via SQL Server Management Studio and its extended events feature.
Primary keys and all the necessary indexes are set already. We even rebuilt them, alot of these had fragmentations over 50%.
Thing is, there is this one table we would like to switch to the isolation level called SNAPSHOT. I know this won't solve the deadlock situation at all hence I read that write statements might block each other.
One table contains logs (login of users, tasks started and ended on the backends, yadda yadda...), the other one contains all the processes, so the backends are selecting, inserting and updating (like setting the "running" field from 0 to 1 and vice versa). While the first one for logging reasons might be good for the snapshot level, I doubt it might be recommended for the process table, as far as I understood how the snapshot leveling is working. And I am also aware that rollbacks of transactions will block the tables during the rollback process anyway.
Even the sysobjects table is getting blocked sometimes when a table has to be dropped. And I must mention that the database is ridiculously large, like many many table.
What I would like to know is, if you guys ever switched from whatever isolation level to snapshot and what challenges you had to face, or even if you changed your mind when it came to deadlock prevention and tried a different approach, like hardware upgrade, etc...
I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks
I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.
If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.
I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.
In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.
I have a need to do auditing all database activity regardless of whether it came from application or someone issuing some sql via other means. So the auditing must be done at the database level. The database in question is Oracle. I looked at doing it via Triggers and also via something called Fine Grained Auditing that Oracle provides. In both cases, we turned on auditing on specific tables and specific columns. However, we found that Performance really sucks when we use either of these methods.
Since auditing is an absolute must due to regulations placed around data privacy, I am wondering what is best way to do this without significant performance degradations. If someone has Oracle specific experience with this, it will be helpful but if not just general practices around database activity auditing will be okay as well.
I'm not sure if it's a mature enough approach for a production
system, but I had quite a lot of success with monitoring database
traffic using a network traffic sniffer.
Send the raw data between the application and database off to another
machine and decode and analyse it there.
I used PostgreSQL, and decoding the traffic and turning it into
a stream of database operations that could be logged was relatively
straightforward. I imagine it'd work on any database where the packet
format is documented though.
The main point was that it put no extra load on the database itself.
Also, it was passive monitoring, it recorded all activity, but
couldn't block any operations, so might not be quite what you're looking for.
There is no need to "roll your own". Just turn on auditing:
Set the database parameter AUDIT_TRAIL = DB.
Start the instance.
Login with SQLPlus.
Enter the statement audit all;This turns on auditing for many critical DDL operations, but DML and some other DDL statements are still not audited.
To enable auditing on these other activities, try statements like these:audit alter table; -- DDL audit
audit select table, update table, insert table, delete table; -- DML audit
Note: All "as sysdba" activity is ALWAYS audited to the O/S. In Windows, this means the Windows event log. In UNIX, this is usually $ORACLE_HOME/rdbms/audit.
Check out the Oracle 10g R2 Audit Chapter of the Database SQL Reference.
The database audit trail can be viewed in the SYS.DBA_AUDIT_TRAIL view.
It should be pointed out that the internal Oracle auditing will be high-performance by definition. It is designed to be exactly that, and it is very hard to imagine anything else rivaling it for performance. Also, there is a high degree of "fine-grained" control of Oracle auditing. You can get it just as precise as you want it. Finally, the SYS.AUD$ table along with its indexes can be moved to a separate tablespace to prevent filling up the SYSTEM tablespace.
Kind regards,
Opus
If you want to record copies of changed records on a target system you can do this with Golden Gate Software and not incur much in the way of source side resource drain. Also you don't have to make any changes to the source database to implement this solution.
Golden Gate scrapes the redo logs for transactions referring to a list of tables you are interested in. These changes are written to a 'Trail File' and can be applied to a different schema on the same database, or shipped to a target system and applied there (ideal for reducing load on your source system).
Once you get the trail file to the target system there are some configuration tweaks you can set an option to perform auditing and if needed you can invoke 2 Golden Gate functions to get info about the transaction:
1) Set the INSERTALLRECORDS Replication parameter to insert a new record in the target table for every change operation made to the source table. Beware this can eat up a lot of space, but if you need comprehensive auditing this is probably expected.
2) If you don't already have a CHANGED_BY_USERID and CHANGED_DATE attached to your records, you can use the Golden Gate functions on the target side to get this info for the current transaction. Check out the following functions in the GG Reference Guide:
GGHEADER("USERID")
GGHEADER("TIMESTAMP")
So no its not free (requires Licensing through Oracle), and will require some effort to spin up, but probably a lot less effort/cost than implementing and maintaining a custom solution rolling your own, and you have the added benefit of shipping the data to a remote system so you can guarantee minimal impact on your source database.
if you are using oracle then there is feature called CDC(Capture data change) which is more performance efficient solution for audit kind of requirements.