Need to show updated data only after all tables has updated - sql-server

I have three tables Table1, Table2, Table3. Every day data will update in these tables. Table1 takes 30min to update completely, Table2 takes 45min and Table3 takes 1hr. I have to show updated data to user only after all tables update process completed.
What would be the possible way to achieve this?

Your most likely approach here is to use triggers on the tables to create logs. But there are some warnings that go along with that.
Firstly triggers are like a hidden functionality, there get commonly overlooked and in the future it's likely that someone will be scratching their head wondering where the log data came from. They aren't bad practice, but thats human nature.
Secondly triggers add overhead to the process. every insert/update/delete on the initial tables will trigger an event to write to the logs, and those writes do take resources.
Thirdly depending on how you decide to build your logs there will be the overhead to compare the "inserted" and "deleted" tables behind each initial write.
Things to consider are: Do you want to keep the whole previous record and whole new record? Do you want to only store changes? How much space is this log going to take in the db? Do you want these triggers to only be active during the process or 24/7?
Triggers can cause issues but are likely to be the best option provided you are careful.

Related

Updating database keys where one table's keys refer to another's

I have two tables in DynamoDB. One has data about homes, one has data about businesses. The homes table has a list of the closest businesses to it, with walking times to each of them. That is, the homes table has a list of IDs which refer to items in the businesses table. Since businesses are constantly opening and closing, both these tables need to be updated frequently.
The problem I'm facing is that, when either one of the tables is updated, the other table will have incorrect data until it is updated itself. To make this clearer: let's say one business closes and another one opens. I could update the businesses table first to remove the old business and add the new one, but the homes table would then still refer to the now-removed business. Similarly, if I updated the homes table first to refer to the new business, the businesses table would not yet have this new business' data yet. Whichever table I update first, there will always be a period of time where the two tables are not in synch.
What's the best way to deal with this problem? One way I've considered is to do all the updates to a secondary database and then swap it with my primary database, but I'm wondering if there's a better way.
Thanks!
Dynamo only offers atomic operations on the item level, not transaction level, but you can have something similar to an atomic transaction by enforcing some rules in your application.
Let's say you need to run a transaction with two operations:
Delete Business(id=123) from the table.
Update Home(id=456) to remove association with Business(id=123) from the home.businesses array.
Here's what you can do to mimic a transaction:
Generate a timestamp for locking the items
Let's say our current timestamp is 1234567890. Using a timestamp will allow you to clean up failed transactions (I'll explain later).
Lock the two items
Update both Business-123 and Home-456 and set an attribute lock=1234567890.
Do not change any other attributes yet on this update operation!
Use a ConditionalExpression (check the Developer Guide and API) to verify that attribute_not_exists(lock) before updating. This way you're sure there's no other process using the same items.
Handle update lock responses
Check if both updates succeeded to Home and Business. If yes to both, it means you can proceed with the actual changes you need to make: delete the Business-123 and update the Home-456 removing the Business association.
For extra care, also use a ConditionExpression in both updates again, but now ensuring that lock == 1234567890. This way you're extra sure no other process overwrote your lock.
If both updates succeed again, you can consider the two items updated and consistent to be read by other processes. To do this, run a third update removing the lock attribute from both items.
When one of the operations fail, you may try again X times for example. If it fails all X times, make sure the process cleans up the other lock that succeeded previously.
Enforce the transaction lock throught your code
Always use a ConditionExpression in any part of your code that may update/delete Home and Business items. This is crucial for the solution to work.
When reading Home and Business items, you'll need to do this (this may not be necessary in all reads, you'll decide if you need to ensure consistency from start to finish while working with an item read from DB):
Retrieve the item you want to read
Generate a lock timestamp
Update the item with lock=timestamp using a ConditionExpression
If the update succeeds, continue using the item normally; if not, wait one or two seconds and try again;
When you're done, update the item removing the lock
Regularly clean up failed transactions
Every minute or so, run a background process to look for potentially failed transactions. If your processes take at max 60 seconds to finish and there's an item with lock value older than, say 5 minutes (remember lock value is the time the transaction started), it's safe to say that this transaction failed at some point and whatever process running it didn't properly clean up the locks.
This background job would ensure that no items keep locked for eternity.
Beware this implementation do not assure a real atomic and consistent transaction in the sense traditional ACID DBs do. If this is mission critical for you (e.g. you're dealing with financial transactions), do not attempt to implement this. Since you said you're ok if atomicity is broken on rare failure occasions, you may live with it happily. ;)
Hope this helps!

Rolling back specific, older SQL transaction

We have a fairly large stored proc that merges two people found in our system with similar names. It deletes and updates on many different tables (about 10 or so, all very different tables). It's all wrapped in a transaction and rolls back if it fails of course. This may be a dumb question, but is it possible to somehow store and rollback just this specific transaction at a later time without having to create and insert into many "history" tables that keep track of exactly what happened? I don't want to restore the whole database, just the results of a stored procedures specific transaction, and at a later date.
It sounds like you may want to investigate Change Data Capture.
It will still be capturing data as it changes, and if you're only doing it for one execution or for a very small amount of data, other methods may be better.
Once a transaction has been committed, it's not possible to roll back just that one transaction at a later date. You're "committed" in quite a literal sense. You can obviously restore from a backup and roll back every transaction since a particular point, but that's probably not what you are looking for.
So making auditing tables are about your only option. As another answer pointed out you can use Data Change Capture, but unless you have forked out the big money for Enterprise Edition, this isn't an option for you. But if you're just interested in undoing this particular type of change, it's probably easiest to add some code to your procedure that does the merge to store the data records necessary to re-split them and create a procedure to do the actual split. But you have to keep in mind that you must handle any changes to the "merged" data that might break your ability to perform the split. This is why SQL can't do it for you automatically... it doesn't know how you might want to handle any changes to the data that might occur after your original transaction.

How can I store temporary, self-destroying data into a t-sql table?

Suppose I have a table which contains relevant information. However, the data is only relevant for, let's say, 30 minutes.
After that it's just database junk, so I need to get rid of it asap.
If I wanted, I could clean this table periodically, setting an expiration date time for each record individually and deleting expired records through a job or something. This is my #1 option, and it's what will be done unless someone convince me otherwise.
But I think this solution may be problematic. What if someone stops the job from running and no one notices? I'm looking for something like a built-in way to insert temporary data into a table. Or a table that has "volatile" data itself, in a way that it automagically removes data after x amount of time after its insertion.
And last but not least, if there's no built-in way to do that, could I be able to implement this functionality in SQL server 2008 (or 2012, we will be migrating soon) myself? If so, could someone give me directions as to what to look for to implement something like it?
(Sorry if the formatting ends up bad, first time using a smartphone to post on SO)
As another answer indicated, TRUNCATE TABLE is a fast way to remove the contents of a table, but it's aggressive; it will completely empty the table. Also, there are restrictions on its use; among others, it can't be used on tables which "are referenced by a FOREIGN KEY constraint".
Any more targeted removal of rows will require a DELETE statement with a WHERE clause. Having an index on relevant criteria fields (such as the insertion date) will improve performance of the deletion and might be a good idea (depending on its effect on INSERT and UPDATE statements).
You will need something to "trigger" the DELETE statement (or TRUNCATE statement). As you've suggested, a SQL Server Agent job is an obvious choice, but you are worried about the job being disabled or removed. Any solution will be vulnerable to someone removing your work, but there are more obscure ways to trigger an activity than a job. You could embed the deletion into the insertion process-- either in whatever stored procedure or application code you have, or as an actual table trigger. Both of those methods increase the time required for an INSERT and, because they are not handled out of band by the SQL Server Agent, will require your users to wait slightly longer. If you have the right indexes and the table is reasonably-sized, that might be an acceptable trade-off.
There isn't any other capability that I'm aware of for SQL Server to just start deleting data. There isn't automatic data retention policy enforcement.
See #Yuriy comment, that's relevant.
If you really need to implement it DB side....
Truncate table is fast way to get rid of records.
If all you need is ONE table and you just need to fill it with data, use it and dispose it asap you can consider truncating a (permanent) "CACHE_TEMP" table.
The scenario can become more complicated you are running concurrent threads/jobs and each is handling it's own data.
If that data is just existing for a single "job"/context you can consider using #TEMP tables. They are a bit volatile and maybe can be what you are looking for.
Also you maybe can use table variables, they are a bit more volatile than temporary tables but it depends on things you don't posted, so I cannot say what's really better.

Trying to design a column that should sum values from another table

sorry if the title is not very clear. I will try to explain now:
I have two tables: table A and table B. The relation between them is one (for A table) to many (for B table). So, it's something like master-detail situation.
I have a column 'Amount' in table B that is obviously decimal and a column 'TotalAmount' in table A.
I'm trying ti figure out how to keep the value in table A up to date. My suggestion is to make a view based on table A with aggregate query counting the Amounts from table B. Of course with the proper indexes ...
But, my team-mate suggest to update the value in table A every time we change something in table B from our application.
I wonder, what will be the best solution here? May be a third variant?
Some clarification ... We expected this tables to be the fastest growing tables in our database. And The table B will grow much much faster than table A. The most frequent operation in table B will be insert ... and almost nothing else. The most frequent operation in table A will be select ... but not only.
I see a number of options:
Use an insert trigger on table B and do the updates table A as your friend suggests. This will keep table B as up to date as possible.
Have a scheduled job that updates table A every x minutes (x = whatever makes sense for your application).
When updating table B, do an update on table A in your application logic. This may not work out if you update table B in many places.
If you have a single place in your app where you insert new rows to table B, then the most simple solution is to send an UPDATE A set TotalAmount=TotalAmount + ? where ID = ? and pass the values you just used to insert into B. Make sure you wrap both queries (the insert and the update) in a transaction so either both happen or none.
If that's not simple, then your next option is a database trigger. Read the docs for your database how to create them. Basically a trigger is a small piece of code that gets executed when something happens in the DB (in your case, when someone inserts data in table B).
The view is another option but it may cause performance problems during selects which you'll find hard to resolve. Try a "materialized view" or a "computed column" instead (but these can cause performance problems when you insert/remove columns).
If this value is going to change a lot, you're better off using a view: It is definitely the safer implementation. But even better would be using triggers (if your database supports them.)
I would guess your mate suggests updating the value on each insert because he thinks that you will need the value quite often and it might lead to a slow-down recalculating the value each time. If that is the case:
Your database should take care of the caching, so this probably won't be an issue.
If it is, nonetheless, you can add that feature at a later stage - this way you can make sure your application works otherwise and will have a much easier time debugging that cache column.
I would definitely recommend using a trigger over using application logic, as that ensures that the database keeps the value up-to-date, rather than relying on all callers. However, from a design point of view, I would be wary of storing generated data in the same table as non-generated data -- I believe it's important to maintain a clear separation, so people don't get confused between what data they should be maintaining and what will be maintained for them.
However, in general, prefer views to triggers -- that way you don't have to worry about maintaining the value at all. Profile to determine whether performance is an issue. In Postgres, I believe you could even create an index on the computed values, so the database wouldn't have to look at the detail table.
The third way, recalculating periodically, will be much slower than triggers and probably slower than a view. That it's not appropriate for your use anyway is the icing on the cake :).

Do triggers decreases the performance? Inserted and deleted tables?

Suppose i am having stored procedures which performs Insert/update/delete operations on table.
Depending upon some criteria i want to perform some operations.
Should i create trigger or do the operation in stored procedure itself.
Does using the triggers decreases the performance?
Does these two tables viz Inserted and deleted exists(persistent) or are created dynamically?
If they are created dynamically does it have performance issue.
If they are persistent tables then where are they?
Also if they exixts then can i access Inserted and Deleted tables in stored procedures?
Will it be less performant than doing the same thing in a stored proc. Probably not but with all performance questions the only way to really know is to test both approaches with a realistic data set (if you have a 2,000,000 record table don't test with a table with 100 records!)
That said, the choice between a trigger and another method depends entirely on the need for the action in question to happen no matter how the data is updated, deleted, or inserted. If this is a business rule that must always happen no matter what, a trigger is the best place for it or you will eventually have data integrity problems. Data in databases is frequently changed from sources other than the GUI.
When writing a trigger though there are several things you should be aware of. First, the trigger fires once for each batch, so whether you inserted one record or 100,000 records the trigger only fires once. You cannot assume ever that only one record will be affected. Nor can you assume that it will always only be a small record set. This is why it is critical to write all triggers as if you are going to insert, update or delete a million rows. That means set-based logic and no cursors or while loops if at all possible. Do not take a stored proc written to handle one record and call it in a cursor in a trigger.
Also do not send emails from a cursor, you do not want to stop all inserts, updates, or deletes if the email server is down.
Yes, a table with a trigger will not perform as well as it would without it. Logic dictates that doing something is more expensive than doing nothing.
I think your question would be more meaningful if you asked in terms of whether it is more performant than some other approach that you haven't specified.
Ultimately, I'd select the tool that is most appropriate for the job and only worry about performance if there is a problem, not before you have even implemented a solution.
Inserted and deleted tables are available within the trigger, so calling them from stored procedures is a no-go.
It decreases performance on the query by definition: the query is then doing something it otherwise wasn't going to do.
The other way to look at it is this: if you were going to manually be doing whatever the trigger is doing anyway then they increase performance by saving a round trip.
Take it a step further: that advantage disappears if you use a stored procedure and you're running within one server roundtrip anyway.
So it depends on how you look at it.
Performance on what? the trigger will perform an update on the DB after the event so the user of your system won't even know it's going on. It happens in the background.
Your question is phrased in a manner quite difficult to understand.
If your Operation is important and must never be missed, then you have 2 choice
Execute your operation immediately after Update/Delete with durability
Delay the operation by making it loosely coupled with durability.
We also faced the same issue and our production MSSQL 2016 DB > 1TB with >500 tables and need to send changes(insert, update, delete) of few columns from 20 important tables to 3rd party. Number of business process that updates those few columns in 20 important tables were > 200 and it's a tedious task to modify them because it's a legacy application. Our existing process must work without any dependency of data sharing. Data Sharing order must be important. FIFO must be maintained
eg User Mobile No: 123-456-789, it change to 123-456-123 and again change to 123-456-456
order of sending this 123-456-789 --> 123-456-123 --> 123-456-456. Subsequent request can only be send if response of first previous request is successful.
We created 20 new tables with limited columns that we want. We compare main tables and new table (MainTable1 JOIN MainTale_LessCol1) using checksum of all columns and TimeStamp Column to Identify change.
Changes are logged in APIrequest tables and updated back in MainTale_LessCol1. Run this logic in Scheduled Job every 15 min.
Separate process will pick from APIrequest and send data to 3rd party.
We Explored
Triggers
CDC (Change Data Capture)
200+ Process Changes
Since our deadlines were strict, and cumulative changes on those 20 tables were > 1000/sec and our system were already on peak capacity, our current design work.
You can try CDC share your experience

Resources