Hangfire Job is not removed after ExpireAt - sql-server

I am using the default SQL Server storage for Hangfire. Since the rate of job creation per-day can be quite high (in the count of hundreds of thousands), the size of the database also grows quite fast.
Initially I thought it will still be fine because Succeeded jobs would be removed after one day. To my surprise, even after several days, the job is not yet removed from Hangfire.Jobs table.
When I ran this query on 30 July 2015:
select min(ExpireAt) from Hangfire.Job
where StateName = 'Succeeded'
I got 2015-07-21 13:28:27.543, which is about 9 days ago.
I can manually delete the jobs myself or create a batch job that would periodically clean it up, but is there any scenario where the Succeeded job are still retained beyond their ExpireAt?

Related

Is it possible to execute DMV queries between certain time periods?

We have a DMV query that executes every 10 mins and inserts usage statistics, like SESSION_CURRENT_DATABASE, SESSION_LAST_COMMAND_STARTTIME, etc.. and supposedly has been running fine for the last 2 years.
Today we were notified by data hyperingestion team that the last records shown were from 6/10. So we found out the job has been stuck for 14 days not executing new statistics since. We've immediately restarted the job and it's been executing successfully since the morning, but basically we've lost the data during this 14 days period. Is there a way for us to execute this DMV query between 6/10-6/24 on the $SYSTEM.DISCOVER to recover these past 14 days of data?
Or all hope's lost?
DMV query:
SELECT [SESSION_ID]
,[SESSION_SPID]
,[SESSION_CONNECTION_ID]
,[SESSION_USER_NAME]
,[SESSION_CURRENT_DATABASE]
,[SESSION_USED_MEMORY]
,[SESSION_PROPERTIES]
,[SESSION_START_TIME]
,[SESSION_ELAPSED_TIME_MS]
,[SESSION_LAST_COMMAND_START_TIME]
,[SESSION_LAST_COMMAND_END_TIME]
,[SESSION_LAST_COMMAND_ELAPSED_TIME_MS]
,[SESSION_IDLE_TIME_MS]
,[SESSION_CPU_TIME_MS]
,[SESSION_LAST_COMMAND_CPU_TIME_MS]
,[SESSION_READS]
,[SESSION_WRITES]
,[SESSION_READ_KB]
,[SESSION_WRITE_KB]
,[SESSION_COMMAND_COUNT]
FROM $SYSTEM.DISCOVER_SESSIONS
I wouldn't say it's "gone" unless the instance has been restarted or the db has been detached. For example, the dmv for procedure usage should still have data in it, but you won't be able to specifically recreate what it looked like 10 days ago.
You can get a rough idea by looking back through the 2 years of data you already have, and get a sense of if there are spikes or consistent usage. Then, grab a snapshot of the DMV today, and extrapolate it back 14 days to get a rough idea of what usage was like.

Most efficient method to expire records after 10 minutes from creation in sql server

I want to implement a system in sql server that will automatically delete each record after 10 minutes.
Since this expiry time will be different for every row,
it means that using a job would be incredibly ineffective in performing this task.
What is the most efficient way to achieve this?
Of course, there are other problems,job such as every 10 runs, and if row insert at 9: 1 ,
Because jobs will be created once every ten minutes and this row must be removed at 9:11, at 9:20 deleted

Performance problems temporarily fixed by sp_updatestats, despite daily sp_updatestats execution

I see a similar question here from 2013, but it was not answered so I am posting my version.
We are using SQL Server 2008 (SP4) - 10.0.6000.29 (X64) and have a database that is about 70GB in size with about 350 tables. On a daily basis, there are only a small number of updates occurring, though a couple times a year we dump a fair amount of data into it. There are several Windows Services that constantly query the database, but rarely updated it. There are also several websites that use it, and desktop applications (again, minimal daily updates).
The problem we have is that every once in a while a query that hits certain records will take much longer than normal. The following is a bogus example:
This query against 2 tables with less than 600 total records might take 30+ seconds:
select *
from our_program_access bpa
join our_user u on u.user_id = bpa.user_id
where u.user_id = 50 and program_name = 'SomeApp'
But when you change the user_id value to another user record, it takes less than one second:
select *
from our_program_access bpa
join our_user u on u.user_id = bpa.user_id
where u.user_id = 51 and program_name = 'SomeApp'
The real queries that are being used are a little more complex, but the idea is the same: search ID 50 takes 30+ seconds, search ID 51 takes < 1 second, but both return only 1 record out of about 600 total.
We have found that the issue seems related to the statistics. When this problem occurs, we run sp_updatestats, and all the queries are equal and fast in time. So, we started to run sp_updatestats in a maintainenance plan every night. But the problem still pops up. We also tried setting AUTO_UPDATE_STATISTICS_ASYNC on, but the problem eventually popped up.
While the database is large, it doesn't really undergo tremendous changes, though it does face constant queries from different services.
There are several other databases on the same server such as a mail log, SharePoint, and web filtering. Overall, performance is very good until we run into this problem.
Does it make sense that on a database that undergoes relatively small changes daily would need to run sp_updatstats so frequently? What else can we do to resolve this issue?

Primavera P6 database has grown to a very large size

I'm not a P6 admin, nor am I a (SQL Server) DBA. I'm just a Winforms developer (with T-SQL) who has agreed to do a little research for the scheduling group.
I believe the version they're running is 8.2, desktop (non-Citrix). Backend is SQL Server. The backend has grown to 36gb and nightly backups are periodically filling drives to their limits.
REFRDEL holds 135 million records, dating back to some time in 2012.
UDFVALUE holds 26 million records
All other tables have reasonable numbers of records.
Can someone clue us in as to which of the several cleanup-oriented stored procedures to run (if any), or offer some sane advice so that we can get the backend down to a manageable size, please? Something that would not violate best practices and is considered very safe, please.
When you look at the data in the database there is a column name "delete_session_id". Do you see any with the value of -99? If so, then there is an unresolved issue on this. If not, then proceed with the following to get the clean up jobs running again...
If you are using SQL Server (Full Editions), perform the following steps to resolve the issue:
Verify that the SQL Server Agent service is started on the server and has a startup type of automatic.
Logs for this service can be found (by default) at:
C:\Program Files\Microsoft SQL Server\\LOG\SQLAGENT.x
This log includes information on when the service was stopped/started
If the SQL Agent is started, you can then check what jobs exist on the SQL Server database by issuing the following command as SA through SQL Query Analyzer (2000) or through Microsoft SQL Server Management Studio:
select * from msdb.dbo.sysjobs
If the Primavera background processes (SYMON and DAMON) are not listed, or the SQL Agent was not started, then these background processes can be reinitialized by running the following commands as SA user against the Project Management database:
exec initialize_background_procs
exec system_monitor
exec data_monitor
A bit late coming to this, but thought the following may be useful to some.
We noticed REFRDEL had grown to a large size and after some investigation discovered the following ...
DAMON runs the following procedures to perform clean-up:
BGPLOG_CLEANUP
REFRDEL_CLEANUP
REFRDEL Bypass
CLEANUP_PRMQUEUE
USESSION_CLEAR_LOGICAL_DELETES
CLEANUP_LOGICAL_DELETES
PRMAUDIT_CLEANUP
CLEANUP_USESSAUD
USER_DEFINED_BACKGROUND
DAMON was configured to run every Saturday around 4pm but we noticed that it had been continuously failing. This was due to an offline backup process which started at 10pm. We first assumed that this was preventing the REFRDEL_CLEANUP from running.
However after monitoring REFRDEL for a couple of weeks, we found that REFRDEL_CLEANUP was actually running and removing data from the table. You can check your table by running the following query on week 1 and then again in week 2 to verify the oldest records are being deleted.
select min(delete_date), max(delete_date), count(*) from admuser.refrdel;
The problem is to do with the default parameters used by the REFRDEL_CLEANUP procedure. These are described here but in summary the procedure is set to retain the 5 most recent days worth of records and delete just 1 days' worth of records. This is what's causing the issue...DAMON runs just once a week...and when it runs the cleanup job, it's only deleting 1 day's data but has accumulated a week's worth...therefore the amount of data will just get bigger and bigger.
The default parameters can be overridden in the SETTINGS table.
Here are the steps I took to correct the issue:
First, clean up the table..
-- 1. create backup table
CREATE TABLE ADMUSER.REFRDEL_BACKUP TABLESPACE PMDB_DAT1 NOLOGGING AS
Select * from admuser.refrdel where delete_date >= (sysdate - 5);
-- CHECK DATA HAS BEEN COPIED
-- 2. disable indexes on REFRDEL
alter index NDX_REFRDEL_DELETE_DATE unusable;
alter index NDX_REFRDEL_TABLE_PK unusable;
-- 3. truncate REFRDEL table
truncate table admuser.refrdel;
-- 4. restore backed up data
ALTER TABLE ADMUSER.REFRDEL NOLOGGING;
insert /*# append */ into admuser.refrdel select * from admuser.refrdel_backup;
--verify number of rows copied
ALTER TABLE ADMUSER.REFRDEL LOGGING;
commit;
-- 5. rebuild indexes on REFRDEL
alter index NDX_REFRDEL_DELETE_DATE rebuild;
alter index NDX_REFRDEL_TABLE_PK rebuild;
-- 6. gather table stats
exec dbms_stats.gather_table_stats(ownname => 'ADMUSER', tabname => 'REFRDEL', cascade => TRUE);
-- 7. drop backup table
drop table admuser.refrdel_backup purge;
Next, override the parameters so we try to delete at least 10 days' worth of data. The retention period will always keep 5 days' worth of data.
exec settings_write_string(‘10',’database.cleanup.Refrdel’,’DaysToDelete’); -- delete the oldest 10 days of data
exec settings_write_string(’15’,’database.cleanup.Refrdel’,’IntervalStep’); -- commit after deleting every 15 minutes of data
exec settings_write_string(‘5d’,’database.cleanup.Refrdel’,’KeepInterval’); -- only keep 5 most recent days of data
This final step is only relevant to my environment and will not apply to you unless you have similar issues. This is to alter the start time for DAMON to allow it complete before our offline backup process kicks in. So in this instance I have changed the start time from 4pm to midnight.
BEGIN
DBMS_SCHEDULER.SET_ATTRIBUTE (
name => 'BGJOBUSER.DAMON',
attribute => 'start_date',
value => TO_TIMESTAMP_TZ('2016/08/13 00:00:00.000000 +00:00','yyyy/mm/dd hh24:mi:ss.ff tzr'));
END;
/
It is normal for UDFVALUE to hold a large number of records. Each value for any user-defined field attached to any object in P6 will be represented as a record in this table.
REFRDEL on the other hand should be automatically cleaned up during normal operation in a healthy system. In P6 8.x, they should be cleaned up by the data_monitor process, which by default is configured to run once a week (on Saturdays).
You should be able to execute it manually, but be forewarned: it could take a long time to complete if it hasn't executed properly since 2012.
36gb is still a very, very large database. For some clients a database of that magnitude might not be unreasonable depending on the total number of activities and, especially, the kinds of data that is stored. For example, notepads take comparatively a large amount of space.
In your case though, since you already know data_monitor hasn't executed properly for a while, it's more likely that the tables are full of records that have been soft-deleted but haven't yet been purged. You can see such records by running a query such as:
select count(*) from task where delete_session_id is not null;
Note that you must select from the task table, not the view, as the view automatically filters these soft-deleted records out.
You shouldn't delete such records manually. They should be cleaned up, along with the records in REFRDEL, as a result of running data_monitor.

Change Data Capture (CDC) cleanup job only removes a few records at a time

I'm a beginner with SQL Server. For a project I need CDC to be turned on. I copy the cdc data to another (archive) database and after that the CDC tables can be cleaned immediately. So the retention time doesn't need to be high, I just put it on 1 minute and when the cleanup job runs (after the retention time is already fulfilled) it appears that it only deleted a few records (the oldest ones). Why didn't it delete everything? Sometimes it doesn't delete anything at all. After running the job a few times, the other records get deleted. I find this strange because the retention time has long passed.
I set the retention time at 1 minute (I actually wanted 0 but it was not possible) and didn't change the threshold (= 5000). I disabled the schedule since I want the cleanup job to run immediately after the CDC records are copied to my archive database and not particularly on a certain time.
My logic for this idea was that for example there will be updates in the afternoon. The task to copy CDC records to archive database should run at 2:00 AM, after this task the cleanup job gets called. So because of the minimum retention time, all the CDC records should be removed by the cleanup job. The retention time has passed after all?
I just tried to see what happened when I set up a schedule again in the job, like how CDC is meant to be used in general. After the time has passed I checked the CDC table and turns out it also only deletes the oldest record. So what am I doing wrong?
I made a workaround where I made a new job with the task to delete all records in the CDC tables (and disabled the entire default CDC cleanup job). This works better as it removes everything but it's bothering me because I want to work with the original cleanup job and I think it should be able to work in the way that I want it to.
Thanks,
Kim
Rather than worrying about what's in the table, I'd use the helper functions that are created for each capture instance. Specifically, cdc.fn_cdc_get_all_changes_ and cdc.fn_cdc_get_net_changes_. A typical workflow that I've used wuth these goes something below (do this for all of the capture instances). First, you'll need a table to keep processing status. I use something like:
create table dbo.ProcessingStatus (
CaptureInstance sysname,
LSN numeric(25,0),
IsProcessed bit
)
create unique index [UQ_ProcessingStatus]
on dbo.ProcessingStatus (CaptureInstance)
where IsProcessed = 0
Get the current max log sequence number (LSN) using fn_cdc_get_max_lsn.
Get the last processed LSN and increment it using fn_cdc_increment_lsn. If you don't have one (i.e. this is the first time you've processed), use fn_cdc_get_min_lsn for this instance and use that (but don't increment it!). Record whatever LSN you're using in the table with, set IsProcessed = 0.
Select from whichever of the cdc.fn_cdc_get… functions makes sense for your scenario and process the results however you're going to process them.
Update IsProcessed = 1 for this run.
As for monitoring your original issue, just make sure that the data in the capture table is generally within the retention period. That is, if you set it to 2 days, I wouldn't even think about it being a problem until it got to be over 4 days (assuming that your call to the cleanup job is scheduled at something like every hour). And when you process with the above scheme, you don't need to worry about "too much" data being there; you're always processing a specific interval rather than "everything".

Resources