Efficient way to delete records every 10 mins - sql-server

Problem at hand
Need to delete some few thousand records every 10 minutes from a SQL Server database table.This is part of cleanup for older records.
Solutions under consideration
There's .Net Service running for some other functionality. Same service can be used with a timer to execute SQL delete command on db.
SQL server job
Trigger
Key consideration for providing solution
Ours is a web product which gets deployed at different client locations. we want minimal operational overhead as resources doing deployment are very limited technical skill and we also want to make sure that there's less to none configuration requirement for our Product.
Performance is very important, as it on live transactional database.

This sounds like exactly the sort of work that a SQL Server job was intended to provide; database maintenance.
A scheduled job can execute a basic T-SQL statement that will delete the records you don't want any more, on whatever schedule you want it to run on. The job creation can be scripted to be part of your standard deployment scripts, which should negate the deployment costs.
Additionally, by utilizing an established part of SQL Server, you capitalize on the knowledge of other database administrators that will understand SQL jobs and be able to manage them.

I would not use a trigger...and stick with SQL Server DTS or SSIS. Obviously you will need some kind of identifier so I would use a timestamp column with an index...if that's not required just fire off a TRUNCATE once nightly.

The efficiency of the delete comes from indexes, has nothing to do how the timer is triggered. It is very important that the 'old' records be easily identifiable by a range scan. If the DELETE has to scan the whole table to find these 'old' records, it will block all other activity. Usually in such cases the table is clustered by the datetime value first, and unique primary keys are delegated to a non-clustered index, if needed.
Now how to pop the timer, you really have three alternatives:
SQL Agent job
Conversation Timers
Application timer
SQL Agent job is the best option for 10 minute intervals. Only drawback is that it does not work on SQL Express deployments. If that is a concern, then conversation timers and activated procedures are a viable alternative.
Last option has the disadvantage that the application must be running for the timer to trigger deletion. If this is not a concern (ie. if the application is not running, it doesn't matter that the records are not deleted) then is OK. Note that ASP.Net applications are very bad host for such timers, because of the way IIS and ASP may choose to recycle and put to sleep app pools.

Related

How many SQL jobs a sql server can handle?

I am creating a database medical system and then I came to a point where I am trying to create a notification feature and i will use SQL jobs in it, where the SQL job responsibility is to check some tables and the entities that will find it need to be notified for a change in certain data will put their ids in an entity called Notification and a trigger will be called for the app to check that table and send the notificiation.
what I want to ask is how many SQL jobs can a sql server handle ?
Does the number of running SQL jobs in background affect the performance of my application or the database performance in a way or another ?
NOTE: the SQL job will run every 10 seconds
I couldn't find any useful information online.
thanks in advance.
This question really doesn't have enough background to get a definitive answer. What are the considerations?
Do the queries in your ten-second job actually complete in ten seconds, even when your DBMS is under its peak transactional workload? Obviously, if the job routinely doesn't complete in ten seconds, you'll get jobs piling up.
Do the queries in your job lock up tables and/or indexes so the transactional load can't run efficiently? (You should use SET ISOLATION LEVEL READ UNCOMMITTED; as much as you can so database reads won't lock things unnecessarily.)
Do the queries in your job do a lot of rows' worth of inserts and updates, and so swamp the SQL Server transaction logs?
How big is your server? (CPU cores? RAM? IO capacity?) How big is your database?
If your project succeeds and you get many users, will your answers to the above questions remain the same? (Hint: no.)
You should spend some time on the execution plans for the queries in your job, and try to make them as efficient as possible. Add the necessary indexes. If necessary refactor the queries to make them more efficient. SSMS will show you the execution plans and suggest appropriate indexes.
If your job is doing things like deleting expired rows, you may want to build the expiration in your data model. For example, suppose your job does
DELETE FROM readings WHERE expiration_date >= GETDATE()
and your application does this, relying on your job to avoid getting expired readings.
SELECT something FROM readings
You can refactor your application query to say
SELECT something FROM readings WHERE expiration_date < GETDATE()
and then run your job overnight, at a quiet time, rather than every ten seconds.
A ten-second job is not the greatest idea in the world. If you can rework your application so it will function correctly with a ten-second, ten-minute, or twelve-hour job, you'll have a more resilient production system. At any rate if something goes wrong with the job when your system is very busy you'll have more than ten seconds to fix it.

How to set Azure SQL to rebuild indexes automatically?

In on premise SQL databases, it is normal to have a maintenance plan for rebuilding the indexes once in a while, when it is not being used that much.
How can I set it up in Azure SQL DB?
P.S: I tried it before, but since I couldn't find any options for that, I thought maybe they are doing it automatically until I've read this post and tried:
SELECT
DB_NAME() AS DBName
,OBJECT_NAME(ps.object_id) AS TableName
,i.name AS IndexName
,ips.index_type_desc
,ips.avg_fragmentation_in_percent
FROM sys.dm_db_partition_stats ps
INNER JOIN sys.indexes i
ON ps.object_id = i.object_id
AND ps.index_id = i.index_id
CROSS APPLY sys.dm_db_index_physical_stats(DB_ID(), ps.object_id, ps.index_id, null, 'LIMITED') ips
ORDER BY ps.object_id, ps.index_id
And found out that I have indexes that need maintaining
Update: Note that the engineering team has published updated guidance to better codify some of the suggestions in this answer in a more "official" from Microsoft place as some customers asked for that. SQL Server/DB Index Guidance. Thanks, Conor
original answer:
I'll point out that most people don't need to consider rebuilding indexes in SQL Azure at all. Yes, B+ Tree indexes can become fragmented, and yes this can cause some space overhead and some CPU overhead compared to having perfectly tuned indexes. So, there are some scenarios where we do work with customers to rebuild indexes. (The primary scenario is when the customer may run out of space, currently, as disk space is somewhat limited in SQL Azure due to the current architecture). So, I will encourage you to step back and consider that using the SQL Server model for managing databases is not "wrong" but it may or may not be worth your effort.
(If you do end up needing to rebuild an index, you are welcome to use the models posted here by the other posters - they are generally fine models to script tasks. Note that SQL Azure Managed Instance also supports SQL Agent which you can also use to create jobs to script maintenance operations if you so choose).
Here are some details that may help you decide if you may be a candidate for index rebuilds:
The link you referenced is from a post in 2013. The architecture for SQL Azure was completely redone after that post. Specifically, the hardware architecture moved from a model that was based on local spinning disks to one based on local SSDs (in most cases). So, the guidance in the original post is out of date.
You can have cases in the current architecture where you can run out of space with a fragmented index. You have options to rebuild the index or to move to a larger reservation size for awhile (which will cost more money) that supports a larger disk space allocation. [Since the local SSD space on the machines is limited, reservation sizes are roughly linked to proportions of the machine. As we get newer hardware with larger/more drives, you have more scale-up options].
SSD fragmentation impact is relatively low compared to rotating disks since the cost of a random IO is not really any higher than a sequential one. The CPU overhead of walking a few more B+ Tree intermediate pages is modest. I've usually seen an overhead of perhaps 5-20% max in the average case (which may or may not justify regular rebuilds which have a much bigger workload impact when rebuilding)
If you are using query store (which is on by default in SQL Azure), you can evaluate whether a specific index rebuild helps your performance visibly or not. You can do this as a test to see if your workload improves before bothering to take the time to build and manage index rebuild operations yourself.
Please note that there is currently no intra-database resource governance within SQL Azure for user workloads. So, if you start an index rebuild, you may end up consuming lots of resources and impacting your main workload. You can try to time things to be done off-hours, of course, but for applications with lots of customers around the world this may not be possible.
Additionally, I will note that many customers have index rebuild jobs "because they want stats to be updated". It is not necessary to rebuild an index just to rebuild the stats. In recent SQL Server and SQL Azure, the algorithm for stats update was made more aggressive on larger tables and the model for how we estimate cardinality in cases where customers are querying recently inserted data (since the last stats update) have been changed in later compatibility levels. So, it is often the case that the customer doesn't even need to do any manual stats update at all.
Finally, I will note that the impact of stats being out of date was historically that you'd get plan choice regressions. For repeated queries, a lot of the impact of this was mitigated by the introduction of the automatic tuning feature over query store (which forces prior plans if it notices a large regression in query performance compared to the prior plan).
The official recommendation that I give customers is to not bother with index rebuilds unless they have a tier-1 app where they've demonstrated real need (benefits outweigh the costs) or where they are a SaaS ISV where they are trying to tune a workload over many databases/customers in elastic pools or in a multi-tenant database design so they can reduce their COGS or avoid running out of disk space (as mentioned earlier) on a very big database. In the largest customers we have on the platform, we sometimes see value in doing index operations manually with the customer, but we often do not need to have a regular job where we do this kind of operation "just in case". The intent from the SQL team is that you don't need to bother with this at all and you can just focus on your app instead. There are always things that we can add or improve into our automatic mechanisms, of course, so I completely allow for the possibility that an individual customer database may have a need for such actions. I've not seen any myself beyond the cases I mentioned, and even those are rarely an issue.
I hope this gives you some context to understand why this isn't being done in the platform yet - it just hasn't been an issue for the vast majority of customer databases we have today in our service compared to other pressing needs. We revisit the list of things we need to build each planning cycle, of course, and we do look at opportunities like this regularly.
Good luck - whatever your outcome here, I hope this helps you make the right choice.
Sincerely,
Conor Cunningham
Architect, SQL
You can use Azure Automation to schedule index maintenance tasks as explained here :Rebuilding SQL Database indexes using Azure Automation
Below are steps :
1) Provision an Automation Account if you don’t have any, by going to https://portal.azure.com and select New > Management > Automation Account
2) After creating the Automation Account, open the details and now click on Runbooks > Browse Gallery
Type on the search box the word “indexes” and the runbook “Indexes tables in an Azure database if they have a high fragmentation” appears:
4) Note that the author of the runbook is the SC Automation Product Team at Microsoft. Click on Import:
5) After importing the runbook, now let’s add the database credentials to the assets. Click on Assets > Credentials and then on “Add a credential…” button.
6) Set a Credential name (that will be used later on the runbook), the database user name and password:
7) Now click again on Runbooks and then select the “Update-SQLIndexRunbook” from the list, and click on the “Edit…” button. You will be able to see the PowerShell script that will be executed:
8) If you want to test the script, just click on the “Test Pane” button, and the test window opens. Introduce the required parameters and click on Start to execute the index rebuild. If any error occurs, the error is logged on the results window. Note that depending on the database and the other parameters, this can take a long time to complete:
9) Now go back to the editor, and click on the “Publish” button enable the runbook. If we click on “Start”, a window appears asking for the parameters. But as we want to schedule this task, we will click on the “Schedule” button instead:
10) Click on the Schedule link to create a new Schedule for the runbook. I have specified once a week, but that will depend on your workload and how your indexes increase their fragmentation over time. You will need to tweak the schedule based on your needs and by executing the initial queries between executions:
11) Now introduce the parameters and run settings:
NOTE: you can play with having different schedules with different settings, i.e. having a specific schedule for a specific table.
With that, you have finished. Remember to change the Logging settings as desired:
Azure Automation is good and pricing is also negligible..
Some other options you have are
1.Create a execute sql task and schedule it through sql agent .The execute sql task should contain the index rebuild code along with stats rebuild
2.You also can create a linked server to SQLAZURE and create a sql agent job.To create a linked server to azure, you can see this SO link:I need to add a linked server to a MS Azure SQL Server
As #TheGamiswar suggested, add a linked server, then create a stored procedure like this:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [LinkedServerName].[RemoteDB].[dbo].[sp_RebuildReorganizIndexes]
AS
BEGIN
ALTER INDEX PK_MyTable ON MyTable REBUILD WITH (STATISTICS_NORECOMPUTE = ON, ONLINE=ON);
ALTER INDEX IX_MyTable ON MyTable REBUILD WITH (STATISTICS_NORECOMPUTE = ON, ONLINE=ON); --Nonclustered index
ALTER INDEX PK_MyTable ON MyTable REORGANIZE;
ALTER INDEX IX_MyTable ON MyTable REORGANIZE;
END
Then on your linked server use "SQL Server Agent" to create a new job and a schedule:
For details please see https://learn.microsoft.com/en-us/sql/ssms/agent/create-a-job?view=sql-server-2017

Replicating a SQL Server database for read access

I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks
I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.
If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.
I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.
In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.

Azure SQL Database trigger to insert audit info into Azure Table

I am working on a database auditing solution and was thinking of having SQL Server triggers take care of changes and inserting them into an auditing table. Since this is a SQL Azure Database and will be fairly large I am concerned about the cost of a growing database due to auditing.
In order to cut down on the costs needed for auditing purposes, I am considering storing the audit table (or tables) in Azure Tables instead of Azure SQL databases. So the question becomes, how to get the SQL Server trigger to get the changed data into Azure Tables?
The only thing I can come up with is to have an audit table (or tables) in SQL Databases so the trigger can insert the rows locally, and then have a Worker Role every X seconds pull any rows from that and move them to Azure Tables and delete from the SQL Database table so it doesn't grow large.
Is there a better way to do this integration? Can I somehow put a message in a queue from a trigger?
Azure SQL Database (formerly SQL Azure) doesn't support CLR (hence no EXTERNAL NAME trigger parameter) so there's no way for your triggers to do anything outside of T-SQL. If you want audit content to go to a table, you could take the approach you came up with (temporarily write to SQL table, then move content periodically to Table). There are other approaches you could take (and this would be opinion/subjective, frowned upon here), but going with the queue concept for a minute, since you asked about queues, and illustrating what you could do with Azure Queues:
You could use an Azure queue to specify an item to insert/update in your SQL database. The queue processing code could then be responsible for performing the update and writing to the Azure table. Since the queue messages must be explicitly deleted after processing, you could simply repeat the queue message processing if something failed during execution (e.g. you write to SQL but fail before writing to table storage). The message eventually becomes visible for reading again, if you don't delete it before its timeout value. As long as your operations are idempotent, you'd be ok with this pattern.
A cheaper solution than using worker roles would be to use a combination of Azure Scheduled Tasks (you can enable them for free to run every 15 min within Mobile Apps) and Azure Web Sites. Basically the way it would work is to run this scheduled job every 15 min which would make an HTTP call to some code you have running within your Azure Web Site. This code would do the same work you had outlined for your worker role.
Alternatively, use SQL Server System-Versioned temporal tables to automatically handle the writing of audited record (i.e., changes) to corresponding history tables.

Using Offline Indexing in SQL Server

I've written a .Net application which has an SQL Server 2008 R2 database with relatively small number of tables, but in some tables there might be some 100,000,000 records! For improving performance of SELECTs, I've created necessary indexes and it works well. But, as everyone knows, indexes need to be rebuilt when they are fragmented.
We have installed an SQL Server 2008 R2 Express on one of customer PCs plus my Winforms application. Three more PCs connect to this database over regular LAN, and everything seems fine.
Now, the problem is that, I want to rebuild indexes, for example every time a user starts using my program on ANY of the machines. Well, I can execute several ALTER INDEXes, but as stated in MS docs, OFFLINE indexing will lock the tables for period of indexing. Which means other users will lose access to tables when a user starts the program! I know there is an ONLINE option, but it doesn't work in Express edition of SQL Server.
In other environments with a real server running all the time, I would create an Agent Job which rebuilt indexes over night.
How can I solve this problem?
Without a normal 24/7 server running, it's difficult to do such maintenance automatically without disturbing users. I don't think putting that job at the application startup is a good idea, as it can really start many times together without a real reason, and also slows down startup significantly if tables are big, in addition to keep everyone else out as you say.
I would opt for 2 choices:
Setup a job on the "server" to do the rebuild on either SQL Server startup or computer startup. It will slow down the initialization of that PC when the user first power it on, but once done, it should work OK, and most likely with similar results to the nightly job.
Add an option in the application to launch the reindexing job manually when the user wants to do it, warning that it will take some time and during the process anyone else cannot use it. While this provides maximum flexibility, it relies on the user doing so when they start noting delays.

Resources