Table using too much disk space - sql-server

When I run the "Top disk usage by table" report in Sql Server Management Studio, it shows one of my tables using about 1.8GB of disk space:
The table definition:
CREATE TABLE [dbo].[RecipeItems](
[wo_id] [varchar](50) NOT NULL,
[invent_id] [varchar](50) NOT NULL,
[invent_dim_id] [varchar](50) NULL,
[ratio] [float] NOT NULL
) ON [PRIMARY]
I'd roughly estimate that with each row takes less than 200 bytes, and with only 7K records, this shouldn't take up more than 1-2MB. But obviously, this is not the case. What might be the reason this table uses so much storage?

Chances are that a lot of data has been updated or deleted. Since it is a heap updates can lead to forwarding records. I would try this first:
ALTER TABLE dbo.RecipeItems REBUILD;
Next I would consider adding a clustered index.
Do not run a shrink database command to fix this table, PLEASE.
When you perform your "delete all and bulk insert" I would do it this way, running a rebuild in the middle:
TRUNCATE TABLE dbo.RecipeItems;
ALTER TABLE dbo.RecipeItems REBUILD;
BULK INSERT dbo.RecipeItems FROM ...
If you add a clustered index you may want to do this a little differently. And if you can't use TRUNCATE, keep using DELETE, obviously. TRUNCATE will cause less log churn if the table is eligible, and since you are wiping out the table and re-populating it, it's not something you seem to need to recover from. In fact you might just consider dropping the table and re-creating it each time.

Related

Index Seek Query Will Sometimes Take Minutes To Complete

I have a query that (~95% of the time) executes nearly instantly on the production Azure SQL Database. Running the query in SSMS (in production) shows that my non-clustered index is being utilized with an index seek (cost 100%).
However, randomly the database all of a sudden gets into a state where this same query will fail to execute. It always times out from the calling application. Logging into SSMS when this episode is occurring I can manually execute the query and it will eventually complete after minutes of execution (since there are no time out limits in SSMS vs that of the calling application).
After I allow the query to fully execute without timeouts I can subsequently execute the query again with instant results. The calling application can also call it now with instant results again. It appears that by allowing it to fully execute without a timeout clears up whatever issue was occurring and returns execution back to normal.
Monitoring the server metrics shows no real issues or spikes in CPU utilization that would suggest the server is just in a stressed state during this time. All other queries within the application still execute quickly as normal. Even queries that utilize this same table and non-clustered index.
Table
CREATE TABLE [dbo].[Item] (
[Id] UNIQUEIDENTIFIER NOT NULL,
[UserId] UNIQUEIDENTIFIER NULL,
[Type] TINYINT NOT NULL,
[Data] NVARCHAR (MAX) NULL,
[CreationDate] DATETIME2 (7) NOT NULL,
CONSTRAINT [PK_Item] PRIMARY KEY CLUSTERED ([Id] ASC),
CONSTRAINT [FK_Item_User] FOREIGN KEY ([UserId]) REFERENCES [dbo].[User] ([Id])
);
This table has millions of rows in it.
Index
CREATE NONCLUSTERED INDEX [IX_Item_UserId_Type_IncludeAll]
ON [dbo].[Item]([UserId] ASC, [Type] ASC)
INCLUDE ([Data], [CreationDate]);
Issue Query
SELECT
*
FROM
[dbo].[Item]
WHERE
[UserId] = #UserId
AND [Data] IS NOT NULL
While I was catching it in the act today in SSMS, I also modified to query to to remove the AND [Data] IS NOT NULL from the where clause. Ex:
SELECT
*
FROM
[dbo].[Item]
WHERE
[UserId] = #UserId
This query executed instantly and execution plans show that it is utilizing the index properly. Adding back AND [Data] IS NOT NULL causes the query be slow again. This Data column can hold large amounts of JSON data so I am not sure if that somehow has anything to do with it.
Running sp_WhoIsActive while the episode is occurring and my query is long-running shows that reads, physical_reads, cpu, and used_memory are ever-increasing as the query continues to execute. Interestingly, the query_plan column is NULL while it is running so I am not able to see what plan it is actually utilizing. Though I can always see that the index seek is utilized while running it manually thereafter.
Why would this query get into a state where it would take a really long time to execute while the majority of the time it executes with near instant results? We can see that it is properly utilizing it's non-clustered index as a seek operation.
Why does allowing the query to fully execute in SSMS (vs timing out as the calling application does) seem to clear up the problem going forward?
How can I avoid these types of episodes?
Few things i would check...
1.Your query doesn't have a good index , a good index would be below since you are doing a select * as well as data is not null
create index nci on table(userid,data)
include(rest of columns in select )
2.Try updating statistics and indexes for this table,this will help if there is a index fragmentation or stale statistics
3.Try option(recompile) hint to see if parameter sniffing is a problem

Recommended SQL Server table design for file import and processing

I have a scenario where files will be uploaded into a database table (dbo.FileImport) with each line of the file in a new row. Each row will contain the line data and the name of the file it came from. The file names are unique but may contain a few million lines. Multiple file's data may exist in the table at one time.
Each file is processed and the results are stored in a separate table. After processing the data related to the file, the data is deleted from the import table to keep the table from growing indefinitely.
The table structure is as follows:
CREATE TABLE [dbo].[FileImport] (
[Id] BIGINT IDENTITY (1, 1) NOT NULL,
[FileName] VARCHAR (100) NOT NULL,
[LineData] NVARCHAR (300) NOT NULL
);
During the processing the data for the relevant file is loaded with the following query:
SELECT [LineData] FROM [dbo].[FileImport] WHERE [FileName] = #FileName
And then deleted with the following statement:
DELETE FROM [dbo].[FileImport] WHERE [FileName] = #FileName
My question is pertaining to the table design with regard to performance and longevity...
Is it necessary to have the [Id] column if I never use it (I am concerned about running out of numbers in the Identity eventually too)?
Should I add a PRIMARY KEY Constraint to the [Id] column?
Should I have a CLUSTERED or NONCLUSTERED index for the [FileName] column?
Should I be making use of NOLOCK whenever I query this table (it is updated very regularly)?
Would there be concern of fragmentation with the continual adding and deleting of data to/from this table? If so, how should I handle this?
Any advice or thoughts would be much appreciated. Opinionated designs are welcome ;-)
Update 2017-12-10
I failed to mention that the lines of a file may not be unique. So please take this into account if this affects the recommendation.
An example script in the answer would be an added bonus! ;-)
Is it necessary to have the [Id] column if I never use it (I am
concerned about running out of numbers in the Identity eventually
too)?
It is not necessary to have an unused column. This is not a relational table and will not be referenced by a foreign key so one could make the argument a primary key is unnecessary.
I would not be concerned about running out of 64-bit integer values. bigint can hold a positive value of up to 36,028,797,018,963,967. It would take centuries to run out of values if you load 1 billion rows a second.
Should I add a PRIMARY KEY Constraint to the [Id] column?
I would create a composite clustered primary key on FileName and ID. That would provide an incremental value to facilitate retrieving rows in the order of insertion and the FileName leftmost key column would benefit your queries greatly.
Should I have a CLUSTERED or NONCLUSTERED index for the [FileName]
column?
See above.
Should I be making use of NOLOCK whenever I query this table (it is
updated very regularly)?
No. Assuming you query by FileName, only the rows requested will be touched with the suggested primary key.
Would there be concern of fragmentation with the continual adding and
deleting of data to/from this table? If so, how should I handle this?
Incremental keys avoid fragmentation.
EDIT:
Here's the suggested DDL for the table:
CREATE TABLE dbo.FileImport (
FileName VARCHAR (100) NOT NULL
, RecordNumber BIGINT NOT NULL IDENTITY
, LineData NVARCHAR (300) NOT NULL
CONSTRAINT PK_FileImport PRIMARY KEY CLUSTERED(FileName, RecordNumber)
);
Here is a rough sketch how I would do it
CREATE TABLE [FileImport].[FileName] (
[FileId] BIGINT IDENTITY (1, 1) NOT NULL,
[FileName] VARCHAR (100) NOT NULL
);
go
alter table [FileImport].[FileName]
add constraint pk_FileName primary key nonclustered (FileId)
go
create clustered index cix_FileName on [FileImport].[FileName]([FileName])
go
CREATE TABLE [FileImport].[LineData] (
[FileId] VARCHAR (100) NOT NULL,
[LineDataId] BIGINT IDENTITY (1, 1) NOT NULL,
[LineData] NVARCHAR (300) NOT NULLL.
constraint fk_LineData_FileName foreign key (FileId) references [FileImport].[FileName](FIleId)
);
alter table [FileImport].[LineData]
add constraint pk_FileName primary key clustered (FileId, LineDataId)
go
This is with some normalization so you don't have to reference your full file name every time - you probably don't have to do (in case you prefer not to and just move FileName to second table instead of the FileId and cluster your index on (FileName, LeneDataId)) it but since we are using relational database ...
No need for any additional indexes - tables are sorted by the right keys
Should I be making use of NOLOCK whenever I query this table (it is
updated very regularly)?
If your data means anything to you, don't use it, It's a matter in fact, if you have to use it - something really wrong with your DB architecture. The way it is indexed SQL Server will use Seek operation which is very fast.
Would there be concern of fragmentation with the continual adding and
deleting of data to/from this table? If so, how should I handle this?
You can set up a maintenance job that rebuilds your indexes and run it nightly with Agent (or what ever)

Best way to avoid deadlocks from triggers? Non-clustered index? SQL Server 2012

I have a table with this structure:
CREATE TABLE Log_File
(
Hostname CHAR(15) NOT NULL,
Line_Number INT NOT NULL,
Log_Line VARCHAR(8000) NOT NULL,
CONSTRAINT pk_Log_File PRIMARY KEY (Hostname, Line_Number)
)
Each server is bulk inserting about 1000 rows every 5 seconds. Whenever there is a bulk insert, a trigger runs on the INSERTED table and it iterates through the log_line records with a cursor and updates another table.
When I only have one server writing to the Log_File table, I have no issues. When I try have 20 servers writing to the table at the same time, I occasionally get a deadlock error and the transaction closes on some of the machines, killing the thread.
This is usually a problem when I start up the application on each server because it has to scan the Log_File table to find the MAX(Line_Number) for itself so it knows where to begin reading its own log file from.
I haven't set an index on this table. Would creating a clustered or nonclustered index help the situation? Unfortunately a cursor is absolutely necessary since I need to iterate through each record in order to deal with islands and gaps.
Any help on reducing deadlocks or making this faster is appreciated!

ORACLE table performance basics

Complete newbie to Oracle DBA-ing, and yet trying to migrate a SQL Server DB (2008R2) to Oracle (11g - total DB size only ~20Gb)...
I'm having a major problem with my largest single table (~30 million rows). Rough structure of the table is:
CREATE TABLE TableW (
WID NUMBER(10,0) NOT NULL,
PID NUMBER(10,0) NOT NULL,
CID NUMBER(10,0) NOT NULL
ColUnInteresting1 NUMBER(3,0) NOT NULL,
ColUnInteresting2 NUMBER(3,0) NOT NULL,
ColUnInteresting3 FLOAT NOT NULL,
ColUnInteresting4 FLOAT NOT NULL,
ColUnInteresting5 VARCHAR2(1024 CHAR),
ColUnInteresting6 NUMBER(3,0) NOT NULL,
ColUnInteresting7 NUMBER(5,0) NOT NULL,
CreatedDate DATE NOT NULL,
ModifiedDate DATE NOT NULL,
CreatedByUser VARCHAR2(20 CHAR),
ModifiedByUser VARCHAR2(20 CHAR)
);
ALTER TABLE TableW ADD CONSTRAINT WPrimaryKey PRIMARY KEY (WID)
ENABLE;
CREATE INDEX WClusterIndex ON TableW (PID);
CREATE INDEX WCIDIndex ON TableW (CID);
ALTER TABLE TableW ADD CONSTRAINT FKTableC FOREIGN KEY (CID)
REFERENCES TableC (CID) ON DELETE CASCADE
ENABLE;
ALTER TABLE TableW ADD CONSTRAINT FKTableP FOREIGN KEY (PID)
REFERENCES TableP (PID) ON DELETE CASCADE
ENABLE;
Running through some basics test, it seems a simple 'DELETE FROM TableW WHERE PID=13455' is taking a huge amount of time (~880s) to execute what should be a quick delete (~350 rows). [query run via SQL Developer].
Generally, the performance of this table is noticeably worse than its SQL equivalent. There are no issues under SQL Server, and the structure of this table and the surrounding ones look sensible for Oracle by comparison to SQL.
My problem is that I cannot find a useful set of diagnostics to start looking for where the problem lies. Any queries / links greatly appreciated.
[The above is a request for help based on the assumption it should not take anything like 10 minutes to delete 350 rows from a table with 30 million records, when it takes SQL Server <1s to do the same for an equivalent DB structure]
EDIT:
The migration is being performed thus:
1 In SQL developer:
- Create Oracle User, tablespace, grants etc AS Sys
- Create the tables, sequences, triggers etc AS New User
2 Via some Java:
- Check SQL-Oracle structure consistency
- Disable all foreign keys
- Move data (Truncate destination table, Select From Old, Insert Into New)
- Adjust sequences to correct starting value
- Enable foreign keys
If you ask us how to improve the performance, then there are several ways to improve it:
Parallel DML
Partitioning.
Parallel DML consumes all the resource you have to perform the operation. Oracle runs several threads to complete the operation. Other sessions has to wait for the end of the operation, because system resources are busy.
Partitioning let you exclude old sections right away. For example, your table stores the data from 2000 to 2014. Most likely you don't need old records, so you can split your table for several partitions and exclude the oldest one.
Check the wait events for your session that's doing the DELETE. That will tell you what your main bottleneck is.
And echoing Marco's comment above - Make sure your table stats are up to date - that will help the optimizer build a good plan to run those queries for you.
To update all (and in case any else finds this):
The correct question to find a solution was: what tables do you have referencing this one?
The problem was another table (let's call it TableV) using WID as a foreign key, but the WID column in TableV was not indexed. This means for every record delete in TableW, the whole of TableV had to be searched for associated records to be deleted. As TableV is >3 million rows, deleting the small set of 350 rows in TableV meant the Oracle server trying to read a total of >1 billion rows.
A single index added to WID in TableV, and the delete statement now takes <1s.
Thanks to all for the comments - a lot of Oracle inner working learnt!

SQL Server 2008: Table size is increased due to ghost records related to LOB data

A have the following table on Production environment. It is heavily updated (lots of inserts and deletes). This table contains LOB data types - ntext and nvarchar(max).
Data is constantly removed and inserted in this table. But total row count is quite stable and is about 150,000.
But for unknown reason table size is only increased. It means that space of deleted data is not release.
For example, at this moment there are 150,000 rows in the table and it occupies about 60GB. If I copy this data to new table (simple insert into) then my data will occupy only 10GB.
What I tried to do:
Shrink file or database is not helping me
Index rebuild is not helping me
DBCC CLEANTABLE is not helping me
Here's the table structure:
CREATE TABLE dbo.T_Test(
KeyHash nvarchar(50) NOT NULL,
SiteDomainId int NOT NULL,
srcFullUrl nvarchar(max) NOT NULL,
srcResponse ntext NOT NULL,
srcExpirationDate datetime NOT NULL,
srcKey nvarchar(max) NOT NULL,
srcCachePeriodInMinutes int NOT NULL,
srcNumOfHits int NOT NULL,
srcVital bit NOT NULL,
CONSTRAINT PK_T_Test_1 PRIMARY KEY NONCLUSTERED
(
KeyHash ASC,
SiteDomainId ASC
))
GO
CREATE CLUSTERED INDEX [IX_T_Test_srcExpirationDate_ppa] ON dbo.T_Test
(
srcExpirationDate ASC
)
GO
What I know exactly that the issue is in the ghost records related to LOB data.
select * from sys.dm_db_index_physical_stats(db_id(), object_id('MyTable'), null, null, N'DETAILED') returned the following:
index_type_desc alloc_unit_type_desc record_count ghost_record_count
CLUSTERED INDEX LOB_DATA 394996 2869376
But ghost process is working normally, i.e. ghost records are removed for IN_ROW_DATA of clustered index.
At this moment I don't have idea how to delete ghost records and reclaim space.
The only way is to truncate table and upload data again.
Any suggestion how to avoid this issue are valuable. Thank you.
Configuration of my environment is Microsoft SQL Server Web Edition (64-bit) 10.0.2531.0
Could be the ghost clean up never catches up with DELETEs (on SF from Paul Randal) especially given your usage pattern. I remember seeing this on SF but I've never had this issue and have never tried Paul's suggested fix so YMMV sorry.
"Could be the ghost clean up never catches up with DELETEs (on SF from Paul Randal) especially given your usage pattern. I remember seeing this on SF but I've never had this issue and have never tried Paul's suggested fix so YMMV sorry."
I have read Paul Randal approach. And it really works but only for IN_ROW_DATA. In my case ghost records for IN_ROW_DATA are cleaned, but ghost records for LOB are not cleaned. I ran on Production loop with constant table scan. Ghost records for IN_ROW_DATA are close to zero for this case, but it didn't change behaviour of LOB ghost records.
I have restarted sql server process and the issue has been resolved.
See cumulative update 4 for SQL 2008 R2 SP1
http://support.microsoft.com/kb/2622823

Resources