Primavera P6 database has grown to a very large size

Primavera P6 database has grown to a very large size - sql-server

I'm not a P6 admin, nor am I a (SQL Server) DBA. I'm just a Winforms developer (with T-SQL) who has agreed to do a little research for the scheduling group.
I believe the version they're running is 8.2, desktop (non-Citrix). Backend is SQL Server. The backend has grown to 36gb and nightly backups are periodically filling drives to their limits.
REFRDEL holds 135 million records, dating back to some time in 2012.
UDFVALUE holds 26 million records
All other tables have reasonable numbers of records.
Can someone clue us in as to which of the several cleanup-oriented stored procedures to run (if any), or offer some sane advice so that we can get the backend down to a manageable size, please? Something that would not violate best practices and is considered very safe, please.

When you look at the data in the database there is a column name "delete_session_id". Do you see any with the value of -99? If so, then there is an unresolved issue on this. If not, then proceed with the following to get the clean up jobs running again...
If you are using SQL Server (Full Editions), perform the following steps to resolve the issue:
Verify that the SQL Server Agent service is started on the server and has a startup type of automatic.
Logs for this service can be found (by default) at:
C:\Program Files\Microsoft SQL Server\\LOG\SQLAGENT.x
This log includes information on when the service was stopped/started
If the SQL Agent is started, you can then check what jobs exist on the SQL Server database by issuing the following command as SA through SQL Query Analyzer (2000) or through Microsoft SQL Server Management Studio:
select * from msdb.dbo.sysjobs
If the Primavera background processes (SYMON and DAMON) are not listed, or the SQL Agent was not started, then these background processes can be reinitialized by running the following commands as SA user against the Project Management database:
exec initialize_background_procs
exec system_monitor
exec data_monitor

A bit late coming to this, but thought the following may be useful to some.
We noticed REFRDEL had grown to a large size and after some investigation discovered the following ...
DAMON runs the following procedures to perform clean-up:
BGPLOG_CLEANUP
REFRDEL_CLEANUP
REFRDEL Bypass
CLEANUP_PRMQUEUE
USESSION_CLEAR_LOGICAL_DELETES
CLEANUP_LOGICAL_DELETES
PRMAUDIT_CLEANUP
CLEANUP_USESSAUD
USER_DEFINED_BACKGROUND
DAMON was configured to run every Saturday around 4pm but we noticed that it had been continuously failing. This was due to an offline backup process which started at 10pm. We first assumed that this was preventing the REFRDEL_CLEANUP from running.
However after monitoring REFRDEL for a couple of weeks, we found that REFRDEL_CLEANUP was actually running and removing data from the table. You can check your table by running the following query on week 1 and then again in week 2 to verify the oldest records are being deleted.
select min(delete_date), max(delete_date), count(*) from admuser.refrdel;
The problem is to do with the default parameters used by the REFRDEL_CLEANUP procedure. These are described here but in summary the procedure is set to retain the 5 most recent days worth of records and delete just 1 days' worth of records. This is what's causing the issue...DAMON runs just once a week...and when it runs the cleanup job, it's only deleting 1 day's data but has accumulated a week's worth...therefore the amount of data will just get bigger and bigger.
The default parameters can be overridden in the SETTINGS table.
Here are the steps I took to correct the issue:
First, clean up the table..
-- 1. create backup table
CREATE TABLE ADMUSER.REFRDEL_BACKUP TABLESPACE PMDB_DAT1 NOLOGGING AS
Select * from admuser.refrdel where delete_date >= (sysdate - 5);
-- CHECK DATA HAS BEEN COPIED
-- 2. disable indexes on REFRDEL
alter index NDX_REFRDEL_DELETE_DATE unusable;
alter index NDX_REFRDEL_TABLE_PK unusable;
-- 3. truncate REFRDEL table
truncate table admuser.refrdel;
-- 4. restore backed up data
ALTER TABLE ADMUSER.REFRDEL NOLOGGING;
insert /*# append */ into admuser.refrdel select * from admuser.refrdel_backup;
--verify number of rows copied
ALTER TABLE ADMUSER.REFRDEL LOGGING;
commit;
-- 5. rebuild indexes on REFRDEL
alter index NDX_REFRDEL_DELETE_DATE rebuild;
alter index NDX_REFRDEL_TABLE_PK rebuild;
-- 6. gather table stats
exec dbms_stats.gather_table_stats(ownname => 'ADMUSER', tabname => 'REFRDEL', cascade => TRUE);
-- 7. drop backup table
drop table admuser.refrdel_backup purge;
Next, override the parameters so we try to delete at least 10 days' worth of data. The retention period will always keep 5 days' worth of data.
exec settings_write_string(‘10',’database.cleanup.Refrdel’,’DaysToDelete’); -- delete the oldest 10 days of data
exec settings_write_string(’15’,’database.cleanup.Refrdel’,’IntervalStep’); -- commit after deleting every 15 minutes of data
exec settings_write_string(‘5d’,’database.cleanup.Refrdel’,’KeepInterval’); -- only keep 5 most recent days of data
This final step is only relevant to my environment and will not apply to you unless you have similar issues. This is to alter the start time for DAMON to allow it complete before our offline backup process kicks in. So in this instance I have changed the start time from 4pm to midnight.
BEGIN
DBMS_SCHEDULER.SET_ATTRIBUTE (
name => 'BGJOBUSER.DAMON',
attribute => 'start_date',
value => TO_TIMESTAMP_TZ('2016/08/13 00:00:00.000000 +00:00','yyyy/mm/dd hh24:mi:ss.ff tzr'));
END;
/

It is normal for UDFVALUE to hold a large number of records. Each value for any user-defined field attached to any object in P6 will be represented as a record in this table.
REFRDEL on the other hand should be automatically cleaned up during normal operation in a healthy system. In P6 8.x, they should be cleaned up by the data_monitor process, which by default is configured to run once a week (on Saturdays).
You should be able to execute it manually, but be forewarned: it could take a long time to complete if it hasn't executed properly since 2012.
36gb is still a very, very large database. For some clients a database of that magnitude might not be unreasonable depending on the total number of activities and, especially, the kinds of data that is stored. For example, notepads take comparatively a large amount of space.
In your case though, since you already know data_monitor hasn't executed properly for a while, it's more likely that the tables are full of records that have been soft-deleted but haven't yet been purged. You can see such records by running a query such as:
select count(*) from task where delete_session_id is not null;
Note that you must select from the task table, not the view, as the view automatically filters these soft-deleted records out.
You shouldn't delete such records manually. They should be cleaned up, along with the records in REFRDEL, as a result of running data_monitor.

Related

MS SQL Trigger for ETL vs Performance

I would need information what might be the impact for production DB of creating triggers for ~30 Production tables that capture any Update,Delete and Insert statement and put following information "PK", "Table Name", "Time of modification" to separate table.
I have limited ability to test it as I have read only permissions to both Prod and Test environment (and I can get one work day for 10 end users to test it).
I have estimated that number of records inserted from those triggers will be around ~150-200k daily.
Background:
I have project to deploy Data Warehouse for database that is very customized + there are jobs running every day that manipulate the data. Updated on Date column is not being maintain (customization) + there are hard deletes occurring on tables. We decided to ask DEV team to add triggers like:
CREATE TRIGGER [dbo].[triggerName] ON [dbo].[ProductionTable]
FOR INSERT, UPDATE, DELETE
AS
INSERT INTO For_ETL_Warehouse (Table_Name, Regular_PK, Insert_Date)
SELECT 'ProductionTable', PK_ID, GETDATE() FROM inserted
INSERT INTO For_ETL_Warehouse (Table_Name, Regular_PK, Insert_Date)
SELECT 'ProductionTable', PK_ID, GETDATE() FROM deleted
on core ~30 production tables.
Based on this table we will pull delta from last 24 hours and push it to Data Warehouse staging tables.
If anyone had similar issue and can help me estimate how it can impact performance on production database I will really appreciate. (if it works - I am saved, if not I need to propose other solution. Currently mirroring or replication might be hard to get as local DEVs have no idea how to set it up...)
Other ideas how to handle this situation or perform tests are welcome (My deadline is Friday 26-01).

First of all I would suggest you code your table name into a smaller variable and not a character one (30 tables => tinyint).
Second of all you need to understand how big is the payload you are going to write and how:
if you chose a correct clustered index (date column) then the server will just need to out data row by row in a sequence. That is a silly easy job even if you put all 200k rows at once.
if you code the table name as a tinyint, then basically it has to write:
1byte (table name) + PK size (hopefully numeric so <= 8bytes) + 8bytes datetime - so aprox 17bytes on the datapage + indexes if any + log file . This is very lightweight and again will put no "real" pressure on sql sever.
The trigger itself will add a small overhead, but with the amount of rows you are talking about, it is negligible.
I saw systems that do similar stuff on a way larger scale with close to 0 effect on the work process, so I would say that it's a safe bet. The only problem with this approach is that it will not work in some cases (ex: outputs to temp tables from DML statements). But if you do not have these kind of blockers then go for it.
I hope it helps.

Creating log files in SQL Server 2012

I'm sure this is probably SQL 101 but Google searches keep finding 'creating logins' entries. Let me give a quick overview.
I have drifted into SQL reporting from general IT Support as a result of the need for more detailed reports than our systems can provide. My company runs leisure centres and we very helpfully use 3 different leisure management systems across 70+ sites. All 3 are SQL based but do the same job in very different ways. I have produced loads of reports in SSRS but the 2 or 3 I have done that access all systems are very, very time consuming and just one link down means the whole report is inaccessible.
A request to send data to a third party for marketing purposed has forces us to finally look at centralising data from all of the systems to make reporting much easier. There will only essentially be 2 tables - 1 for membership details and one for activities. I have done the hard part of creating a view that produces the same information from each of the 3 systems and set up a central database to bring the data back to. I will have a stored procedure running on each system that will populate a table with records from the previous day. There will then be a job on the central server that will copy data from these tables and remove it once transferred. So far so (relatively) simple.
The problem is that the central server will be trying to retrieve data from over 60 servers - all with their own network links. Some sites are remote with poor DSL connections so there will be times when some of the data can't be copied by the scheduled job. I am happy that a SQL agent job can have these as steps and one failed connection won't stop the whole process but my concern is that troubleshooting when something goes wrong will be tricky if I don't get some kind of logging in place.
The stored procedures although complicated SQL wise are just update/insert record jobs. What would be helpful is that the update job writes to a log file somehow reporting that it affected 20 rows. And the insert job affected 100 rows. Basic stuff but I have no idea how I go about it. What would also be useful is some kind of warning when one of the steps fails. SQL Agent will help but I want to build as much resilience as possible in whilst I am at the 3 server stage before rolling out to the 60+ server stage.
Any pointers in the right direction would be much appreciated. My SQL skills are self taught (With a lot of Stack Overflow help!) and although I have learnt a lot about producing complicated views and queries in the last couple of weeks, most of my SQL has just been queries for SSRS so this is all new to me.
Many thanks.

The output clause will get you what you want for logging. What it allows you to do is capture what your statement is doing. Below is an example of of how to perform an update and capture the changes in a logging table.
As for error handling and resilience I would take a look into using SSIS to perform your ETL. SSIS gives you a much more robust feature set for error handling.
-- Create Temp Tables
CREATE TABLE #myLog
(
id int,
oldVal int,
newVal int
);
CREATE TABLE #myTable
(
id int,
val int
);
-- Add Values to #myTable
INSERT INTO #myTable VALUES
(1, 1234),
(2, 1234);
-- Output Contents of #myTable
SELECT * FROM #myTable;
-- Update #myTable & Capture Changes
UPDATE #myTable
SET val = 12345
OUTPUT
inserted.id,
deleted.val,
inserted.val
INTO #myLog
WHERE id = 2
-- Output Contents of #myTable and #myLog
SELECT * FROM #myTable
SELECT * FROM #myLog
-- Drop Temp Tables
DROP TABLE #myLog
DROP TABLE #myTable

How do you reload incremental data using SQL Server CDC?

I haven't been able to find documentation/an explanation on how you would reload incremental data using Change Data Capture (CDC) in SQL Server 2014 with SSIS.
Basically, on a given day, if your SSIS incremental processing fails and you need to start again. How do you stage the recently changed records again?

I suppose it depends on what you're doing with the data, eh? :) In the general case, though, you can break it down into three cases:
Insert - check if the row is there. If it is, skip it. If not, insert it.
Delete - assuming that you don't reuse primary keys, just run the delete again. It will either find a row to delete or it won't, but the net result is that the row with that PK won't exist after the delete.
Update - kind of like the delete scenario. If you reprocess an update, it's not really a big deal (assuming that your CDC process is the only thing keeping things up to date at the destination and there's no danger of overwriting someone/something else's changes).

Assuming you are using the new CDC SSIS 2012 components, specifically the CDC Control Task at the beginning and end of the package. Then if the package fails for any reason before it runs the CDC Control Task at the end of the package those LSNs (Log Sequence Number) will NOT be marked as processed so you can just restart the SSIS package from the top after fixing the issue and it will just reprocess those records again. You MUST use the CDC Control Task to make this work though or keep track the LSNs yourself (before SSIS 2012 this was the only way to do it).
Matt Masson (Sr. Program Manager on MSFT SQL Server team) has a great post on this with a step-by-step walkthrough: CDC in SSIS for SQL Server 2012
Also, see Bradley Schacht's post: Understanding the CDC state Value

So I did figure out how to do this in SSIS.
I record the min and max LSN number everytime my SSIS package runs in a table in my data warehouse.
If I want to reload a set of data from the CDC source to staging, in the SSIS package I need to use the CDC Control Task and set it to "Mark CDC Start" and in the text box labelled "SQL Server LSN to start...." I put the LSN value I want to use as a starting point.
I haven't figured out how to set the end point, but I can go into my staging table and delete any data with an LSN value > then my endpoint.
You can only do this for CDC changes that have not been 'cleaned up' - so only for data that has been changed within the last 3 days.
As a side point, I also bring across the lsn_time_mapping table to my data warehouse since I find this information historically useful and it gets 'cleaned up' every 4 days in the source database.

To reload the same changes you can use the following methods.
Method #1: Store the TFEND marker from the [cdc_states] table in another table or variable. Reload back the marker to your [cdc_states] from the "saved" value to process the same range again. This method, however, allows you to start processing from the same LSN but if in the meanwhile you change table got more changes those changes will be captured as well. So, you can potentially get more changes that happened after you did the first data capture.
Method #2: In order to capture the specified range, record the TFEND markers before and after the range is processed. Now, you can use the OLEDB Source Connection (SSIS) with the following cdc functions. Then use the CDC Splitter as usual to direct Inserts, Updates, and Deletes.
DECLARE #start_lsn binary(10);
DECLARE #end_lsn binary(10);
SET #start_lsn = 0x004EE38E921A01000001;-- TFEND (1) -- if null then sys.fn_cdc_get_min_lsn('YourCapture') to start from the beginnig of _CT table
SET #end_lsn = 0x004EE38EE3BB01000001; -- TFEND (2)
SELECT * FROM [cdc].[fn_cdc_get_net_changes_YOURTABLECAPTURE](
#start_lsn
,#end_lsn
,N'all' -- { all | all with mask | all with merge }
--,N'all with mask' -- shows values in "__$update_mask" column
--,N'all with merge' -- merges inserts and updates together. It's meant for processing the results using T-SQL MERGE statement
)
ORDER BY __$start_lsn;

After insert trigger - SQL Server 2008

I have data coming in from datastage that is being put in our SQL Server 2008 database in a table: stg_table_outside_data. The ourside source is putting the data into that table every morning. I want to move the data from stg_table_outside_data to table_outside_data where I keep multiple days worth of data.
I created a stored procedure that inserts the data from stg_table_outside_Data into table_outside_data and then truncates stg_table_outside_Data. The outside datastage process is outside of my control, so I have to do this all within SQL Server 2008. I had originally planned on using a simple after insert statement, but datastage is doing a commit after every 100,000 rows. The trigger would run after the first commit and cause a deadlock error to come up for the datastage process.
Is there a way to set up an after insert to wait 30 minutes then make sure there wasn't a new commit within that time frame? Is there a better solution to my problem? The goal is to get the data out of the staging table and into the working table without duplications and then truncate the staging table for the next morning's load.
I appreciate your time and help.

One way you could do this is take advantage of the new MERGE statement in SQL Server 2008 (see the MSDN docs and this blog post) and just schedule that as a SQL job every 30 minutes or so.
The MERGE statement allows you to easily just define operations (INSERT, UPDATE, DELETE, or nothing at all) depending on whether the source data (your staging table) and the target data (your "real" table) match on some criteria, or not.
So in your case, it would be something like:
MERGE table_outside_data AS target
USING stg_table_outside_data AS source
ON (target.ProductID = source.ProductID) -- whatever join makes sense for you
WHEN NOT MATCHED THEN
INSERT VALUES(.......)
WHEN MATCHED THEN
-- do nothing

You shouldn't be using a trigger to do this, you should use a scheduled job.

maybe building a procedure that moves all data from stg_table_outside_Data to table_outside_data once a day, or by using job scheduler.

Do a row count on the trigger, if the count is less than 100,000 do nothing. Otherwise, run your process.

Is deleting all records in a table a bad practice in SQL Server?

I am moving a system from a VB/Access app to SQL server. One common thing in the access database is the use of tables to hold data that is being calculated and then using that data for a report.
eg.
delete from treporttable
insert into treporttable (.... this thing and that thing)
Update treportable set x = x * price where (...etc)
and then report runs from treporttable
I have heard that SQL server does not like it when all records from a table are deleted as it creates huge logs etc. I tried temp sql tables but they don't persists long enough for the report which is in a different process to run and report off of.
There are a number of places where this is done to different report tables in the application. The reports can be run many times a day and have a large number of records created in the report tables.
Can anyone tell me if there is a best practise for this or if my information about the logs is incorrect and this code will be fine in SQL server.

If you do not need to log the deletion activity you can use the truncate table command.
From books online:
TRUNCATE TABLE is functionally
identical to DELETE statement with no
WHERE clause: both remove all rows in
the table. But TRUNCATE TABLE is
faster and uses fewer system and
transaction log resources than DELETE.
http://msdn.microsoft.com/en-us/library/aa260621(SQL.80).aspx

delete from sometable
Is going to allow you to rollback the change. So if your table is very large, then this can cause a lot of memory useage and time.
However, if you have no fear of failure then:
truncate sometable
Will perform nearly instantly, and with minimal memory requirements. There is no rollback though.

To Nathan Feger:
You can rollback from TRUNCATE. See for yourself:
CREATE TABLE dbo.Test(i INT);
GO
INSERT dbo.Test(i) SELECT 1;
GO
BEGIN TRAN
TRUNCATE TABLE dbo.Test;
SELECT i FROM dbo.Test;
ROLLBACK
GO
SELECT i FROM dbo.Test;
GO
i
(0 row(s) affected)
i
1
(1 row(s) affected)

You could also DROP the table, and recreate it...if there are no relationships.
The [DROP table] statement is transactionally safe whereas [TRUNCATE] is not.
So it depends on your schema which direction you want to go!!
Also, use SQL Profiler to analyze your execution times. Test it out and see which is best!!

The answer depends on the recovery model of your database. If you are in full recovery mode, then you have transaction logs that could become very large when you delete a lot of data. However, if you're backing up transaction logs on a regular basis to free the space, this might not be a concern for you.
Generally speaking, if the transaction logging doesn't matter to you at all, you should TRUNCATE the table instead. Be mindful, though, of any key seeds, because TRUNCATE will reseed the table.
EDIT: Note that even if the recovery model is set to Simple, your transaction logs will grow during a mass delete. The transaction logs will just be cleared afterward (without releasing the space). The idea is that DELETE will create a transaction even temporarily.

Consider using temporary tables. Their names start with # and they are deleted when nobody refers to them. Example:
create table #myreport (
id identity,
col1,
...
)
Temporary tables are made to be thrown away, and that happens very efficiently.
Another option is using TRUNCATE TABLE instead of DELETE. The truncate will not grow the log file.

I think your example has a possible concurrency issue. What if multiple processes are using the table at the same time? If you add a JOB_ID column or something like that will allow you to clear the relevant entries in this table without clobbering the data being used by another process.

Actually tables such as treporttable do not need to be recovered to a point of time. As such, they can live in a separate database with simple recovery mode. That eases the burden of logging.

There are a number of ways to handle this. First you can move the creation of the data to running of the report itself. This I feel is the best way to handle, then you can use temp tables to temporarily stage your data and no one will have concurency issues if multiple people try to run the report at the same time. Depending on how many reports we are talking about, it could take some time to do this, so you may need another short term solutio n as well.
Second you could move all your reporting tables to a difffernt db that is set to simple mode and truncate them before running your queries to populate. This is closest to your current process, but if multiple users are trying to run the same report could be an issue.
Third you could set up a job to populate the tables (still in separate db set to simple recovery) once a day (truncating at that time). Then anyone running a report that day will see the same data and there will be no concurrency issues. However the data will not be up-to-the minute. You also could set up a reporting data awarehouse, but that is probably overkill in your case.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight