Shrink DB by replacing data with NULL? - sql-server

I have a Azure SQL DB that stores photos in varchar(MAX) that are uploaded from a PowerApp during a sign-in / out process. The DB is growing quickly. I don't need the old pictures but I want to keep old records for in / out times. I thought I could shrink or make re-usable space by replacing the photo data with a NULL. However in testing it seems the DB is growing not shrinking when I do this. I can confirm the fields are becoming NULL but DB is growing as I do this.
This is what I ran:
UPDATE [dbo].[Daily Activity Attendance]
SET Signature = NULL, Photo = NULL, SigninSig = NULL, SigninPhoto = NULL
WHERE [AttendanceDate] < '2019-03-20'
Is this just a bad idea or am I doing something wrong?

To compact your LOB pages you should run an ALTER INDEX command and include the option WITH ( LOB_COMPACTION = ON)
I'm not sure exactly what is causing the growth - it may be worth monitoring the UNDOCUMENTED DMF : sys.dm_db_database_page_allocations to see what is happening to your rows as you change the value to NULL
SELECT *
FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID(N'<your table name>'),1,1,'limited');

Related

MS SQL Trigger for ETL vs Performance

I would need information what might be the impact for production DB of creating triggers for ~30 Production tables that capture any Update,Delete and Insert statement and put following information "PK", "Table Name", "Time of modification" to separate table.
I have limited ability to test it as I have read only permissions to both Prod and Test environment (and I can get one work day for 10 end users to test it).
I have estimated that number of records inserted from those triggers will be around ~150-200k daily.
Background:
I have project to deploy Data Warehouse for database that is very customized + there are jobs running every day that manipulate the data. Updated on Date column is not being maintain (customization) + there are hard deletes occurring on tables. We decided to ask DEV team to add triggers like:
CREATE TRIGGER [dbo].[triggerName] ON [dbo].[ProductionTable]
FOR INSERT, UPDATE, DELETE
AS
INSERT INTO For_ETL_Warehouse (Table_Name, Regular_PK, Insert_Date)
SELECT 'ProductionTable', PK_ID, GETDATE() FROM inserted
INSERT INTO For_ETL_Warehouse (Table_Name, Regular_PK, Insert_Date)
SELECT 'ProductionTable', PK_ID, GETDATE() FROM deleted
on core ~30 production tables.
Based on this table we will pull delta from last 24 hours and push it to Data Warehouse staging tables.
If anyone had similar issue and can help me estimate how it can impact performance on production database I will really appreciate. (if it works - I am saved, if not I need to propose other solution. Currently mirroring or replication might be hard to get as local DEVs have no idea how to set it up...)
Other ideas how to handle this situation or perform tests are welcome (My deadline is Friday 26-01).
First of all I would suggest you code your table name into a smaller variable and not a character one (30 tables => tinyint).
Second of all you need to understand how big is the payload you are going to write and how:
if you chose a correct clustered index (date column) then the server will just need to out data row by row in a sequence. That is a silly easy job even if you put all 200k rows at once.
if you code the table name as a tinyint, then basically it has to write:
1byte (table name) + PK size (hopefully numeric so <= 8bytes) + 8bytes datetime - so aprox 17bytes on the datapage + indexes if any + log file . This is very lightweight and again will put no "real" pressure on sql sever.
The trigger itself will add a small overhead, but with the amount of rows you are talking about, it is negligible.
I saw systems that do similar stuff on a way larger scale with close to 0 effect on the work process, so I would say that it's a safe bet. The only problem with this approach is that it will not work in some cases (ex: outputs to temp tables from DML statements). But if you do not have these kind of blockers then go for it.
I hope it helps.

improve sql delete performance and reduce log file and tempDB size?

I have a huge database, it process the email traffic everyday. In the system, it needs delete some old emails everyday:
Delete from EmailList(nolock)
WHERE EmailId IN (
SELECT EmailId
FROM Emails
WHERE EmailDate < DATEADD([days], -60, GETDATE())
)
It works, but the problem is: it takes a long time to finish and the log file becomes very huge because of this. The log file size increases more than 100GB everyday.
I'm thinking we can change it to
Delete from EmailList(nolock)
WHERE EXISTS (
SELECT EmailId
FROM Emails
WHERE (Emails.EmailId = EmailList.EmailId) AND
(EmailDate < DATEADD([days], -60, GETDATE()))
)
But other than this, is there anything we can do to improve the performance. most of all, reduce the log file size?
EmailId is indexed.
Ive seen
GetDate()-60
style syntax perform MUCH better than
DATEADD([days], -60, GETDATE()))
especially if there is an index on the date column. A few fellow DBAs and I had spent quite a bit of time trying to understand WHY it would perform better, but the result was in the pudding.
Another thing you might want to consider, considering the volume of records I presume you have to delete, is to chunk the deletes in batches of say 1000 or 10000 records. This would probably speed up the delete process.
Have you tried parition by date, then you can just drop the table versions for the days yo uare not interested in anymore. Given a "hugh" database you for sure do run enterprise edition of SQL Server (after all, hugh is bigger than very large) and that has table partitioning.
[EDIT]:
regarding #TomTom's comment:
If you have SQL Server Enterprise edition available you should use Table Partitioning.
If this is not the case, my original post may be helpfull:
[ORIGINAL POST]
Deleting a large amount of data is always difficult. I faced the same problem and I went with the following solution:
Depending on your requirements this will not work, but maybe you can get some ideas from it.
Instead of using 1 table, use 2 tables, with the same schema. Create a synonym (I assume you are using MS SQL server) that points to the "active table of the 2 tables (active means, this is the table that you currently write to). Use this synyonym for the inserts in your application, or instead of using the synonym, the application could just change the table each x days it writes to.
Every x days you can truncate the old/inactive table and afterwards recreate the synonym to target the truncated table (if you use the synonnym solution), so effectively you are partitioning the data per time.
You have to synchronize the switch of the active table. I automated this completely, by using a shared App-lock for the application, and an Exclusive Applock when changing the synonym (== blocking the writing application during the switching process).
If changing your applicaiotn's code is not an option, consider using the same principle but instead of writing to the synonym you could create a view with instead of triggers (the insert operation would insert into the "active" partition). The trigger code would need syhcnronize using something like the Applock as mentioned above (so that writes during the switching process work).
My solution is a litte more complex, so I currently cannot post the code here, But it works without problems for a high load application and the swithcingt/cleanup process is completely automated.

How can I fix this Access 2003 bug? Data Entry Auto-Generating a Value

I'm experiencing an odd data entry bug in MS Access and I am hoping that someone can possibly help shed a bit of light on why this might be happening and how to fix it.
I have a data table that is defined in our SQL Server database. The definition is below, with only the field names changed.
CREATE TABLE [dbo].[MyTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[TextField1] [nvarchar](10) NOT NULL,
[TextField2] [nvarchar](50) NOT NULL,
[Integer1] [int] NOT NULL CONSTRAINT [DF_MyTable_Integer1] DEFAULT (0),
[Integer2] [int] NOT NULL,
[LargerTextField] [nvarchar](300) NULL
) ON [PRIMARY]
As you can see from the definition of this table that there is nothing special about it. The problem that I am having is with a linked data table in an MS Access 2003 database that links through ODBC to this table.
After defining and creating the data table in SQL Server, I opened my working Access Database and linked to the new table. I need to manually create the records that belong in this table. However, when I started to add the data rows, I noticed that as I tabbed out of the LargerTextField to a new row, the LargerTextField was being defaulted to '2', even though I had not entered anything nor defined a default value on the field?!
Initially, I need this field to be Null. I'll come back later and with an update routine populate the data. But why would MS Access default a value in my field, even though the schema for the table clearly does not define one? Has anyone seen this or have any clue why this may happen?
EDIT
One quick correction, as soon as I tab into the LargerTextField, the value defaults to '2', not when I tab out. Small, subtle difference, but possibly important.
As a test, I also created a new, fresh MS Database an linked the table. I'm having the exact same problem. I assume this could be a problem with either MS SQL Server or, possibly, ODBC.
Wow, problem solved. This isn't a bug but it was certainly not behavior I desire or expected.
This behavior is occurring because of the data I am manually entering in fields Integer1 and Integer2. I am manually entering a 0 as the value of Integer1 and a 1 into Integer2. I've never seen Access automatically assume my data inputs, but it looks like it's recognizing data that is sequentially entered.
As a test, I entered a record with Integer1 set to 1 and Integer2 set to 2. Sure enough, when I tabbed into LargerTextField, the value of 3 was auto-populated.
I hate that this was a problem because of user ignorance but, I'll be honest, in my past 10+ years of using MS Access I can not recall even once seeing this behavior. I would almost prefer to delete this question to save face but since it caught me off guard and I'm an experienced user, I might as well leave it in the StackExchange archives for others who may have the same experience. :/
As an experiment fire up a brand-new Access DB and connect to this table to see if you get the same behavior. I suspect this Access DB was connected to a table like this in the past and had that default set. Access has trouble forgetting sometimes :)

TSQL updating large table with other from TEMPDB causes enormous grow

I have a custom import tool which bulk-insert the data in temp (421776 rows). After that the tool inserts unknown rows into the target table and updates existing rows based on a hash key(combination of 2 columns). The target db has nearly the same row count. The update query looks something like this (about 20 less update columns)
update targetTable set
theDB.dbo.targetTable.code=temp.code,
theDB.dbo.targetTable.name=temp.name,
from [tempDB].[dbo].[targettable] as temp
where theDB.dbo.targetTable.hash=temp.hash COLLATE SQL_Latin1_General_CP1_CI_AS
I know the nvarchar compare with a collate is a bit bad but not easy to avoid. Still the hash column has it's own unique index. Also locally it works well but on this server of mine the temp DB keeps growing to 21 gig. Reindexing and shrinking won't work at all.
Just a side note for others who face tempdb problems. A good read is http://bradmcgehee.com/wp-content/uploads/presentations/Optimizing_tempdb_Performance_chicago.pdf
It looks like you're using tempdb explicitly with data you've put there. Is there are a reason to use tempdb as if it was your own database?
The reason tempdb is growing is because you're explicitly putting data there. 420k rows doesn't sound heavy, but it's best to keep it within your own user db.
Suggest changing your business logic to move away from [tempDB].[dbo].[targettable] to something on your own user database.
You can temporarily change the transaction logging from Full or Bulk logged down to simple. That will keep everything from getting logged for a rollback.
Is this a cartesian product when there's no explicit join?

How to reduce size of SQL Server table that grew from a datatype change

I have a table on SQL Server 2005 that was about 4gb in size.
(about 17 million records)
I changed one of the fields from datatype char(30) to char(60) (there are in total 25 fields most of which are char(10) so the amount of char space adds up to about 300)
This caused the table to double in size (over 9gb)
I then changed the char(60) to varchar(60) and then ran a function to cut extra whitespace out of the data (so as to reduce the average length of the data in the field to about 15)
This did not reduce the table size. Shrinking the database did not help either.
Short of actually recreating the table structure and copying the data over (that's 17 million records!) is there a less drastic way of getting the size back down again?
You have not cleaned or compacted any data, even with a "shrink database".
DBCC CLEANTABLE
Reclaims space from dropped variable-length columns in tables or indexed views.
However, a simple index rebuild if there is a clustered index should also do it
ALTER INDEX ALL ON dbo.Mytable REBUILD
A worked example from Tony Rogerson
Well it's clear you're not getting any space back ! :-)
When you changed your text fields to CHAR(60), they are all filled up to capacity with spaces. So ALL your fields are now really 60 characters long.
Changing that back to VARCHAR(60) won't help - the fields are still all 60 chars long....
What you really need to do is run a TRIM function over all your fields to reduce them back to their trimmed length, and then do a database shrinking.
After you've done that, you need to REBUILD your clustered index in order to reclaim some of that wasted space. The clustered index is really where your data lives - you can rebuild it like this:
ALTER INDEX IndexName ON YourTable REBUILD
By default, your primary key is your clustered index (unless you've specified otherwise).
Marc
I know I'm not answering your question as you are asking, but have you considered archiving some of the data to a history table, and work with fewer rows?
Most of the times you might think at first glance that you need all that data all the time but when actually sitting down and examining it, there are cases where that's not true. Or at least I've experienced that situation before.
I had a similar problem here SQL Server, Converting NTEXT to NVARCHAR(MAX) that was related to changing ntext to nvarchar(max).
I had to do an UPDATE MyTable SET MyValue = MyValue in order to get it to resize everything nicely.
This obviously takes quite a long time with a lot of records. There were a number of suggestions as how better to do it. They key one was a temporary flag indicated if it had been done or not and then updating a few thousand at a time in a loop until it was all done. This meant I had "some" control over how much it was doing.
On another note though, if you really want to shrink the database as much as possible, it can help if you turn the recovery model down to simple, shrink the transaction logs, reorganise all the data in the pages, then set it back to full recovery model. Be careful though, shrinking of databases is generally not advisable, and if you reduce the recovery model of a live database you are asking for something to go wrong.
Alternatively, you could do a full table rebuild to ensure there's no extra data hanging around anywhere:
CREATE TABLE tmp_table(<column definitions>);
GO
INSERT INTO tmp_table(<columns>) SELECT <columns> FROM <table>;
GO
DROP TABLE <table>;
GO
EXEC sp_rename N'tmp_table', N'<table>';
GO
Of course, things get more complicated with identity, indexes, etc etc...

Resources