Sql Server row size limit and table design - sql-server

I have this query on SQL Server 2008
CREATE TABLE MediaLibrary
(
MediaId bigint NOT NULL IDENTITY (1, 1),
MediaTypeId smallint NOT NULL,
ImageNameByUser nchar(100) NULL,
GeneratedName uniqueidentifier NOT NULL,
UploadedByUserId uniqueidentifier NOT NULL,
UploadedDate date NOT NULL,
ProfilePhoto bit NOT NULL,
PublicPhoto bit NOT NULL,
AppointmentId bigint NULL,
OriginalImage nchar(1000) NULL,
ThumbImage nchar(1000) NULL,
MediumImage nchar(1000) NULL,
LargeImage nchar(1000) NULL,
UrlThumb nchar(1000) NULL,
UrlMedium nchar(1000) NULL,
UrlLarge nchar(1000) NULL,
InactiveReasonId smallint NULL,
InactiveDate datetime NULL
) ON [PRIMARY]
GO
When I attempt to create the table I get this error
Creating or altering table 'MediaLibrary' failed because the minimum row size would be 14273, including 9 bytes of internal overhead. This exceeds the maximum allowable table row size of 8060 bytes.
I get that I am hitting the limit on row size, but this is not a big table so I am wondering if this is not a good design?
When I changed the nchar(1000) to varChar(1000) the table saved fine. My concern is that once data is actually getting saved into the table that I will hit the row size limit again.

Assuming you're not going to populate all columns, you need to use nvarchar (or just varchar) and not nchar (or char). The reason is that an nchar(1000) needs to reserve 2000 bytes, whether you're going to use it or not. This isn't true for varchar/nvarchar.
Now, if you are going to potentially have 1000 characters in each of these columns, it's not going to work no matter what data type you use. The reason is that the fundamental storage element in SQL Server is an 8K page. So it's not possible to store a row with more than ~8K (there is some page header overhead as well as other bits that may be used depending on data types in the column). The workarounds are typically to:
varchar(max) - which can store data that doesn't fit off-row as a blob, but there is a performance overhead for this, and it can introduce some limitations, e.g. the ability to perform online rebuilds
change the table structure, so that these URLs are stored as separate rows in a separate table. Example:
CREATE TABLE dbo.Media
(
MediaID BIGINT IDENTITY(1,1) PRIMARY KEY,
MediaTypeID SMALLINT NOT NULL,
ImageNameByUser NVARCHAR(100) NULL, -- should also not be nchar
GeneratedName UNIQUEIDENTIFIER NOT NULL,
UploadedByUserId UNIQUEIDENTIFIER NOT NULL,
UploadedDate date NOT NULL,
ProfilePhoto bit NOT NULL,
PublicPhoto bit NOT NULL,
AppointmentId bigint NULL,
InactiveReasonId smallint NULL,
InactiveDate datetime NULL
);
CREATE TABLE dbo.URLTypes
(
URLTypeID TINYINT NOT NULL PRIMARY KEY,
Description NVARCHAR(32) NOT NULL UNIQUE
);
INSERT dbo.URLTypes VALUES(1,'OriginalImage'),(2,'ThumbImage'),...;
CREATE TABLE dbo.MediaURLs
(
MediaID BIGINT NOT NULL FOREIGN KEY REFERENCES dbo.Media(MediaID),
URLTypeID TINYINT NOT NULL FOREIGN KEY REFERENCES dbo.URLTypes(URLTypeID),
URL VARCHAR(2048) NOT NULL
);
As an aside, are you really going to need to support Unicode for URLs?

I was facing a similar issue with a small table, no more than 7 fields, and all of them Integer.
It turns out that the developers found out that it was faster to drop and recreate one of the columns (to reset the value of that table) than updating the value for all of the rows.
So when the column was recreated too much times, it will trigger that error:
Creating or altering table 'TableName' failed because the minimum row size would be XX, including X bytes of internal overhead. This exceeds the maximum allowable table row size of 8060 bytes.
The solution was to run
ALTER TABLE [TableName] REBUILD
I hope this helps someone

Related

Cannot create a row of size 8084 which is greater than the allowable maximum row size of 8060

I have an existing table like below, which has a column char(4000) to save the document path, the table has a lot of data.
CREATE TABLE [dbo].[Document_Master] (
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[TMasterID] [bigint] NULL,
[DocumentName] [varchar](100) NULL,
[DocumentPath] [nchar](4000) NULL,
[IsMisc] [bit] NOT NULL,
[ReceivedDate] [date] NULL
GO
now I want to change the DocumentPath column from char(4000) to nvarchar(255) when I try to alter the column type I get the below error.
ALTER TABLE Document_Master
ALTER column DocumentPath nvarchar(255)
go
error -
Warning: The table "Document_Master" has been created, but its maximum row size exceeds the allowed maximum of 8060 bytes. INSERT or UPDATE to this table will fail if the resulting row exceeds the size limit.
Msg 511, Level 16, State 1, Line 22
Cannot create a row of size 8084 which is greater than the allowable maximum row size of 8060.
The statement has been terminated.
any help would be appreciated
now I want to change the DocumentPath column from char(4000) to nvarchar(max)
That is NOT what you do, you know...
ALTER TABLE Document_Master
ALTER column DocumentPath nvarchar(4000)
This is NOT setting it to max, sorry. It does say 4000, not max. 4000 already is comically long for a path (seriously, that is a path that is 1.7 printed pages long. I would suggest having a really good look on this and not going max, but realistic.

SQL Server - Order Identity Fields in Table

I have a table with this structure:
CREATE TABLE [dbo].[cl](
[ID] [int] IDENTITY(1,1) NOT NULL,
[NIF] [numeric](9, 0) NOT NULL,
[Name] [varchar](80) NOT NULL,
[Address] [varchar](100) NULL,
[City] [varchar](40) NULL,
[State] [varchar](30) NULL,
[Country] [varchar](25) NULL,
Primary Key([ID],[NIF])
);
Imagine that this table has 3 records. Record 1, 2, 3...
When ever I delete Record number 2 the IDENTITY Field generates a Gap. The table then has Record 1 and Record 3. Its not correct!
Even if I use:
DBCC CHECKIDENT('cl', RESEED, 0)
It does not solve my problem becuase it will set the ID of the next inserted record to 1. And that's not correct either because the table will then have a multiple ID.
Does anyone has a clue about this?
No database is going to reseed or recalculate an auto-incremented field/identity to use values in between ids as in your example. This is impractical on many levels, but some examples may be:
Integrity - since a re-used id could mean records in other systems are referring to an old value when the new value is saved
Performance - trying to find the lowest gap for each value inserted
In MySQL, this is not really happening either (at least in InnoDB or MyISAM - are you using something different?). In InnoDB, the behavior is identical to SQL Server where the counter is managed outside of the table, so deleted values or rolled back transactions leave gaps between last value and next insert. In MyISAM, the value is calculated at time of insertion instead of managed through an external counter. This calculation is what is giving the perception of being recalcated - it's just never calculated until actually needed (MAX(Id) + 1). Even this won't insert inside gaps (like the id = 2 in your example).
Many people will argue if you need to use these gaps, then there is something that could be improved in your data model. You shouldn't ever need to worry about these gaps.
If you insist on using those gaps, your fastest method would be to log deletes in a separate table, then use an INSTEAD OF INSERT trigger to perform the inserts with your intended keys by first looking for records in these deletions table to re-use (then deleting them to prevent re-use) and then using the MAX(Id) + 1 for any additional rows to insert.
I guess what you want is something like this:
create table dbo.cl
(
SurrogateKey int identity(1, 1)
primary key
not null,
ID int not null,
NIF numeric(9, 0) not null,
Name varchar(80) not null,
Address varchar(100) null,
City varchar(40) null,
State varchar(30) null,
Country varchar(25) null,
unique (ID, NIF)
)
go
I added a surrogate key so you'll have the best of both worlds. Now you just need a trigger on the table to "adjust" the ID whenever some prior ID gets deleted:
create trigger tr_on_cl_for_auto_increment on dbo.cl
after delete, update
as
begin
update dbo.cl
set ID = d.New_ID
from dbo.cl as c
inner join (
select c2.SurrogateKey,
row_number() over (order by c2.SurrogateKey asc) as New_ID
from dbo.cl as c2
) as d
on c.SurrogateKey = d.SurrogateKey
end
go
Of course this solution also implies that you'll have to ensure (whenever you insert a new record) that you check for yourself which ID to insert next.

How can I block access for a given index value for the duration of a transaction in SQL Server 2012?

With the table specified below, how could I modify the locking behavior in SQL Server 2012 to block statements and transactions attempting to select data pertaining to a specific UserId column?
I have been attempting to come up with a stored procedure that will successfully change an address for a specific user. In order to do this, the existing record is marked as deleted by setting DeletedOn to the current date. Afterward, the new record is inserted. I do not want any queries to be able to see that no valid address is present for the given user in the table between the deletion mark and the insertion.
Queries related to a different user's address should be able to complete, so long as that user's address is not in the process of being modified.
CREATE TABLE [Address]
(
[Id] BIGINT NOT NULL,
[UserId] FOREIGN KEY REFERENCES [User]([Id]) NOT NULL,
[House] CHARACTER VARYING(255) NOT NULL,
[Street] CHARACTER VARYING(255) NOT NULL,
[City] CHARACTER VARYING (255) NOT NULL,
[State] CHARACTER VARYING(255) NOT NULL,
[Zip] CHARACTER VARYING(15) NOT NULL,
[CreatedOn] DATETIMEOFFSET NOT NULL,
[DeletedOn] DATETIMEOFFSET NULL,
UNIQUE([UserId], [DeletedOn]),
CHECK(([DeletedOn] IS NULL) OR ([CreatedOn] <= [DeletedOn])),
PRIMARY KEY([Id])
);
Using a history table solved this issue. It seems that UNIQUE constraints cause lots of lock escalations when they are defined as composites.
The history table now tracks all of the old versions of a particular record and history inserts are combined with live table updates in a repeatable read transaction.
What do you know, I was approaching the whole problem the wrong way!

Store Time series data in SQL Server, and pull the data dynamically

I am trying to design a relation based database, but at the same time store time series data.
For example, I have one table
CREATE TABLE [dbo].[Fund](
[FundID] [int] NOT NULL,
[FundName] [nvarchar](50) NULL,
[FundCurrency] [nchar](3) NOT NULL,
CONSTRAINT [PK_Fund] PRIMARY KEY CLUSTERED
)
I have another table to store the data, except the first column's name 'Dates', all the others are f+fundID, e.g. f1001.
CREATE TABLE [dbo].[FundData](
[Dates] [datetime] NOT NULL,
[f1001] [float] NULL,
CONSTRAINT [PK_FundData] PRIMARY KEY CLUSTERED
)
I don't know whether this naive way is efficient or not. Since the data it will handle will be not much. Daily quote, max 10 year's daily data, and max 500 columns.
But the real problem I am facing, is how to create a UDF to return a series of data with dates, with input FundID and BeginDate and EndDate. Or how to create a Stored Procedure to return one single quote, given FundID and Date.
Since UDF don't accept dynamic column name, I really don't know how I can achieve this. Either by redesign how to store the data, or by some smart way of making UDF.
Thank you very much in advance
You should add FundId as a foreign Key to FundData, add Begin and End date for each record (assuming I understand the problem correctly)
CREATE TABLE [dbo].[FundData](
[Dates] [datetime] NOT NULL,
[FundId] int NOT NULL,
[Value] [float] NULL,
[BeginDate] [datetime],
[EndDate] [datetime],
CONSTRAINT [PK_FundData] PRIMARY KEY CLUSTERED,
CONSTRAINT [FK_FundData_Fund] FOREIGN KEY FundId REFERENCES Fund(FundId)
)
you can then do the following where #FundId, #BeginDate and #EndDate are all T-SQL variables or stored procedure parameters
SELECT Value
FROM FundData
WHERE FundID = #FundId
AND BeginDate >= #BeginDate
AND EndDate <= #EndDate

SQL design for various data types

I need to store data in a SQL Server 2008 database from various data sources with different data types. Data types allowed are: Bit, Numeric (1, 2 or 4 bytes), Real and String. There is going to be a value, a timestamp, a FK to the item of which the value belongs and some other information for the data stored.
The most important points are the read performance and the size of the data. There might be a couple thousand items and each item may have millions of values.
I have 5 possible options:
Separate tables for each data type (ValueBit, ValueTinyInt, ValueSmallInt, etc... tables)
Separate tables with inheritance (Value table as base table, ValueBit table just for storing the Bit value, etc...)
Single value table for all data types, with separate fields for each data type (Value table, with ValueBit BIT, ValueTinyInt TINYINT etc...)
Single table and single value field using sql_variant
Single table and single value field using UDT
With case 2, a PK is a must, and,
1000 item * 10 000 000 data each > Int32.Max, and,
1000 item * 10 000 000 data each * 8 byte BigInt PK is huge
Other than that, I am considering 1 or 3 with no PK. Will they differ in size?
I do not have experience with 4 or 5 and I do not think that they will perform well in this scenario.
Which way shall I go?
Your question is hard to answer as you seem to use a relational database system for something it is not designed for. The data you want to keep in the database seems to be too unstructured for getting much benefit from a relational database system. Database designs with mostly fields like "parameter type" and "parameter value" that try to cover very generic situations are mostly considered to be bad designs. Maybe you should consider using a "non relational database" like BigTable. If you really want to use a relational database system, I'd strongly recommend to read Beginning Database Design by Clare Churcher. It's an easy read, but gets you on the right track with respect to RDBS.
What are usage scenarios? Start with samples of queries and calculate necessary indexes.
Consider data partitioning as mentioned before. Try to understand your data / relations more. I believe the decision should be based on business meaning/usages of the data.
I think it's a great question - This situation is fairly common, though it is awkward to make tables to support it.
In terms of performance, having a table like indicated in #3 potentially wastes a huge amount of storage and RAM because for each row you allocate space for a value of every type, but only use one. If you use the new sparse table feature of 2008 it could help, but there are other issues too: it's a little hard to constrain/normalize, because you want only only one of the multiple values to be populated for each row - having two values in two columns would be an error, but the design doesn't reflect that. I'd cross that off.
So, if it were me I'd be looking at option 1 or 2 or 4, and the decision would be driven by this: do I typically need to make one query returning rows that have a mix of values of different types in the same result set? Or will I almost always ask for the rows by item and by type. I ask because if the values are different types it implies to me some difference in the source or the use of that data (you are unlikely, for example, to compare a string and a real, or a string and a bit.) This is relevant because having different tables per type might actually be a significant performance/scalability advantage, if partitioning the data that way makes queries faster. Partitioning data into smaller sets of more closely related data can give a performance advantage.
It's like having all the data in one massive (albeit sorted) set or having it partitioned into smaller, related sets. The smaller sets favor some types of queries, and if those are the queries you will need, it's a win.
Details:
CREATE TABLE [dbo].[items](
[itemid] [int] IDENTITY(1,1) NOT NULL,
[item] [varchar](100) NOT NULL,
CONSTRAINT [PK_items] PRIMARY KEY CLUSTERED
(
[itemid] ASC
)
)
/* This table has the problem of allowing two values
in the same row, plus allocates but does not use a
lot of space in memory and on disk (bad): */
CREATE TABLE [dbo].[vals](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[valueBit] [bit] NULL,
[valueNumericA] [numeric](2, 0) NULL,
[valueNumericB] [numeric](8, 2) NULL,
[valueReal] [real] NULL,
[valueString] [varchar](100) NULL,
CONSTRAINT [PK_vals] PRIMARY KEY CLUSTERED
(
[itemid] ASC,
[datestamp] ASC
)
)
ALTER TABLE [dbo].[vals] WITH CHECK
ADD CONSTRAINT [FK_vals_items] FOREIGN KEY([itemid])
REFERENCES [dbo].[items] ([itemid])
GO
ALTER TABLE [dbo].[vals] CHECK CONSTRAINT [FK_vals_items]
GO
/* This is probably better, though casting is required
all the time. If you search with the variant as criteria,
that could get dicey as you have to be careful with types,
casting and indexing. Also everything is "mixed" in one
giant set */
CREATE TABLE [dbo].[allvals](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[value] [sql_variant] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[allvals] WITH CHECK
ADD CONSTRAINT [FK_allvals_items] FOREIGN KEY([itemid])
REFERENCES [dbo].[items] ([itemid])
GO
ALTER TABLE [dbo].[allvals] CHECK CONSTRAINT [FK_allvals_items]
GO
/* This would be an alternative, but you trade multiple
queries and joins for the casting issue. OTOH the implied
partitioning might be an advantage */
CREATE TABLE [dbo].[valsBits](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[val] [bit] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[valsBits] WITH CHECK
ADD CONSTRAINT [FK_valsBits_items] FOREIGN KEY([itemid])
REFERENCES [dbo].[items] ([itemid])
GO
ALTER TABLE [dbo].[valsBits] CHECK CONSTRAINT [FK_valsBits_items]
GO
CREATE TABLE [dbo].[valsNumericA](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[val] numeric( 2, 0 ) NOT NULL
) ON [PRIMARY]
GO
... FK constraint ...
CREATE TABLE [dbo].[valsNumericB](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[val] numeric ( 8, 2 ) NOT NULL
) ON [PRIMARY]
GO
... FK constraint ...
etc...

Resources