SQL Server Row Length - sql-server

I'm attempting to determine the row length in bytes of a table by executing the following stored procedure:
CREATE TABLE #tmp
(
[ID] int,
Column_name varchar(640),
Type varchar(640),
Computed varchar(640),
Length int,
Prec int,
Scale int,
Nullable varchar(640),
TrimTrailingBlanks varchar(640),
FixedLenNullInSource varchar(640),
Collation varchar(256)
)
INSERT INTO #tmp exec sp_help MyTable
SELECT SUM(Length) FROM #tmp
DROP TABLE #tmp
The problem is that I don't know the table definition (data types, etc..) of the table returned by 'sp_help.'
I get the following error:
Insert Error: Column name or number of supplied values does not match table definition.
Looking at the sp_help stored procedure does not give me any clues.
What is the proper CREATE TABLE statement to insert the results of a sp_help?

How doing it this way instead?
CREATE TABLE tblShowContig
(
ObjectName CHAR (255),
ObjectId INT,
IndexName CHAR (255),
IndexId INT,
Lvl INT,
CountPages INT,
CountRows INT,
MinRecSize INT,
MaxRecSize INT,
AvgRecSize INT,
ForRecCount INT,
Extents INT,
ExtentSwitches INT,
AvgFreeBytes INT,
AvgPageDensity INT,
ScanDensity DECIMAL,
BestCount INT,
ActualCount INT,
LogicalFrag DECIMAL,
ExtentFrag DECIMAL
)
GO
INSERT tblShowContig
EXEC ('DBCC SHOWCONTIG WITH TABLERESULTS')
GO
SELECT * from tblShowContig WHERE ObjectName = 'MyTable'
GO

Try this:
-- Sum up lengths of all columns
select SUM(sc.length)
from syscolumns sc
inner join systypes st on sc.xtype = st.xtype
where id = object_id('table')
-- Look at various items returned
select st.name, sc.*
from syscolumns sc
inner join systypes st on sc.xtype = st.xtype
where id = object_id('table')
No guarantees though, but it appears to be the same length that appears in sp_help 'table'
DISCLAIMER:
Note that I read the article linked by John Rudy and in addition to the maximum sizes here you also need other things like the NULL bitmap to get the actual row size. Also the sizes here are maximum sizes. If you have a varchar column the actual size is less on most rows....
Vendoran has a nice solution, but I do not see the maximum row size anywhere (based on table definition). I do see the average size and all sorts of allocation information which is exactly what you need to estimate DB size for most things.
If you are interested in just what sp_help returns for length and adding it up, then I think (I'm not 100% sure) that the query to sysobjects returns those same numbers. Do they represent the full maximum row size? No, you are missing things like the NULL bitmap. Do they represent a realistic measure of your actual data? No. Again VARCHAR(500) does not take 500 bytes if you only are storing 100 characters. Also TEXT fields and other fields stored separately from the row do not show their actual size, just the size of the pointer.

None of the aforementioned answers is correct or valid.
The question is one of determining the number of bytes consumed per row by each column's data type.
The only method(s) I have that work are:
exec sp_help 'mytable' - then add up the Length field of the second result set (If working from Query Analyzer or Management Studio - simply copy and paste the result into a spreadsheet and do a SUM)
Write a C# or VB.NET program that accesses the second resultset and sums the Length field of each row.
Modify the code of sp_help.
This cannot be done using Transact SQL and sp_help because there is no way to deal with multiple resultsets.
FWIW: The table definitions of the resultsets can be found here:
http://msdn.microsoft.com/en-us/library/aa933429(SQL.80).aspx

I can't help you with creating a temp table to store sp_help information, but I can help you with calculating row lengths. Check out this MSDN article; it helps you calculate such based on the field lengths, type, etc. Probably wouldn't take too much to convert it into a SQL script you could reuse by querying against sysobjects, etc.
EDIT:
I'm redacting my offer to do a script for it. My way was nowhere near as easy as Vendoran's. :)
As an aside, I take back what I said earlier about not being able to help with the temp table. I can: You can't do it. sp_help outputs seven rowsets, so I don't think you'll be able to do something as initially described in the original question. I think you're stuck using a different method to come up with it.

This will give you all the information you need
Select * into #mytables
from INFORMATION_SCHEMA.columns
select * from #mytables
drop table #mytables
UPDATE:
The answer I gave was incomplete NOT incorrect. If you look at the data returned you'd realize that you could write a query using case to calculate a rows size in bytes. It has all you need: the datatype|size|precision. BOL has the bytes used by each datatype.
I will post the complete answer when I a chance.

Related

Converting bigint to smallint shows an error

ALTER TABLE employee
ALTER COLUMN emp_phoneNo SMALLINT;
I am trying to alter the data type from BIGINT to SMALLINT and it is showing this error:
Arithmetic overflow error converting expression to data type int.
I am not able to understand what is wrong.
You have existing rows with values in that specific column that are bigger than the new data type allows.
You need to update or delete the rows that are currently "oversized".
(or not perform the column alter at all .. because most likely you don't want to lose the information)
You can find the rows with this query:
SELECT 'CurrentlyOverSized' as MyLabel, * FROM dbo.employee WHERE ABS(emp_phoneNo ) > 32767
Note a phone number like : 5555555555 (which would be numeric for 555-555-5555) would be greater than the 32767 number.
Even 5555555 (for 555-5555 (no area code)) is too big for 32767.
Also
A debatable topic. But number or string for storing phone numbers...check out this link for food for thought:
What datatype should be used for storing phone numbers in SQL Server 2005?
Personally I think numeric is the wrong data type for phone numbers.
Whatever you do, be consistent. If you go with a string (varchar(xyz)) for example..........store them with no extra characters 5555555555, with hyphens 555-555-5555 , with dot 555.555.5555 .. or whatever but do them all the same would be my advice.

T-SQL joining external variables into tables

I've come from an application dev and been thrust into the web dev and I'm getting my head around asymmetrical data requests/returns and how to handle them.
I need to make a number of SQL requests and though the best way to manage which ones are returned would be to insert a UUID or something similar into the return sql table.
Also, in general I'm pretty basic with my sql language, but I want to add an external value into my returned table, where #ext would be the external data added in from the original request.
SELECT *
FROM
#ext AS uuid,
dbo.Orders
WHERE ....
expected return table
uuid: 12234
customer: jack
orderNo: 774
postAddy: 123 Albert St
...
The error I'm always getting is "but declare the table variable "#ext".
Is this the right approach or am I just doing something dumb?
The error message you are getting is telling you that you haven't declared the table variable #ext. This is because you've used a variable name (with the # prefix) in the FROM clause where it's expecting a table or other table-like object (ie. table, view, table variable, TVF, etc).
The #ext variable appears to be a scalar (single-valued) variable, so it isn't recognised in the FROM clause. You should try something like this instead:
SELECT
-- scalar values and column names / aliases go here
#ext AS uuid, *
FROM
-- only tables, views, table variables, TVF's etc go here
dbo.Orders
WHERE ....
Note that if your query returns multiple rows, they will all have the same value for uuid. This may or may not be desirable, and there may be better ways to achieve what you want, in terms of managing the data that is returned from multiple queries, but this is best posed in another question once you have a working example.
Make sure you know what #ext is for you and how to properly reference it.
If it's a sacalar value you can use it on expressions:
DECLARE #ext INT = 5
SELECT
#ext AS ScalarValue,
#ext + 10 AS ScalarOperation,
#ext + S.SomeColumn AS ScalarOperationWithTableColumn
FROM
SomeTable AS S
If it's a table variable, you can reference it as table (as in your example):
DECLARE #ext TABLE (
FirstValue INT,
SecondValue VARCHAR(100))
INSERT INTO #ext (
FirstValue,
SecondValue)
VALUES
(10, 'SomeText'),
(20, 'AnotherText')
SELECT
E.FirstValue,
E.SecondValue
FROM
#ext AS E
/*
LEFT JOIN ....
WHERE
....
*/

Convert Date Stored as VARCHAR into INT to compare to Date Stored as INT

I'm using SQL Server 2014. My request I believe is rather simple. I have one table containing a field holding a date value that is stored as VARCHAR, and another table containing a field holding a date value that is stored as INT.
The date value in the VARCHAR field is stored like this: 2015M01
The data value in the INT field is stored like this: 201501
I need to compare these tables against each other using EXCEPT. My thought process was to somehow extract or TRIM the "M" out of the VARCHAR value and see if it would let me compare the two. If anyone has a better idea such as using CAST to change the date formats or something feel free to suggest that as well.
I am also concerned that even extracting the "M" out of the VARCHAR may still prevent the comparison since one will still remain VARCHAR and the other is INT. If possible through a T-SQL query to convert on the fly that would be great advice as well. :)
REPLACE the string and then CONVERT to integer
SELECT A.*, B.*
FROM TableA A
INNER JOIN
(SELECT intField
FROM TableB
) as B
ON CONVERT(INT, REPLACE(A.varcharField, 'M', '')) = B.intField
Since you say you already have the query and are using EXCEPT, you can simply change the definition of that one "date" field in the query containing the VARCHAR value so that it matches the INT format of the other query. For example:
SELECT Field1, CONVERT(INT, REPLACE(VarcharDateField, 'M', '')) AS [DateField], Field3
FROM TableA
EXCEPT
SELECT Field1, IntDateField, Field3
FROM TableB
HOWEVER, while I realize that this might not be feasible, your best option, if you can make this happen, would be to change how the data in the table with the VARCHAR field is stored so that it is actually an INT in the same format as the table with the data already stored as an INT. Then you wouldn't have to worry about situations like this one.
Meaning:
Add an INT field to the table with the VARCHAR field.
Do an UPDATE of that table, setting the INT field to the string value with the M removed.
Update any INSERT and/or UPDATE stored procedures used by external services (app, ETL, etc) to do that same M removal logic on the way in. Then you don't have to change any app code that does INSERTs and UPDATEs. You don't even need to tell anyone you did this.
Update any "get" / SELECT stored procedures used by external services (app, ETL, etc) to do the opposite logic: convert the INT to VARCHAR and add the M on the way out. Then you don't have to change any app code that gets data from the DB. You don't even need to tell anyone you did this.
This is one of many reasons that having a Stored Procedure API to your DB is quite handy. I suppose an ORM can just be rebuilt, but you still need to recompile, even if all of the code references are automatically updated. But making a datatype change (or even moving a field to a different table, or even replacinga a field with a simple CASE statement) "behind the scenes" and masking it so that any code outside of your control doesn't know that a change happened, not nearly as difficult as most people might think. I have done all of these operations (datatype change, move a field to a different table, replace a field with simple logic, etc, etc) and it buys you a lot of time until the app code can be updated. That might be another team who handles that. Maybe their schedule won't allow for making any changes in that area (plus testing) for 3 months. Ok. It will be there waiting for them when they are ready. Any if there are several areas to update, then they can be done one at a time. You can even create new stored procedures to run in parallel for any updated app code to have the proper INT datatype as the input parameter. And once all references to the VARCHAR value are gone, then delete the original versions of those stored procedures.
If you want everything in the first table that is not in the second, you might consider something like this:
select t1.*
from t1
where not exists (select 1
from t2
where cast(replace(t1.varcharfield, 'M', '') as int) = t2.intfield
);
This should be close enough to except for your purposes.
I should add that you might need to include other columns in the where statement. However, the question only mentions one column, so I don't know what those are.
You could create a persisted view on the table with the char column, with a calculated column where the M is removed. Then you could JOIN the view to the table containing the INT column.
CREATE VIEW dbo.PersistedView
WITH SCHEMA_BINDING
AS
SELECT ConvertedDateCol = CONVERT(INT, REPLACE(VarcharCol, 'M', ''))
--, other columns including the PK, etc
FROM dbo.TablewithCharColumn;
CREATE CLUSTERED INDEX IX_PersistedView
ON dbo.PersistedView(<the PK column>);
SELECT *
FROM dbo.PersistedView pv
INNER JOIN dbo.TableWithIntColumn ic ON pv.ConvertedDateCol = ic.IntDateCol;
If you provide the actual details of both tables, I will edit my answer to make it clearer.
A persisted view with a computed column will perform far better on the SELECT statement where you join the two columns compared with doing the CONVERT and REPLACE every time you run the SELECT statement.
However, a persisted view will slightly slow down inserts into the underlying table(s), and will prevent you from making DDL changes to the underlying tables.
If you're looking to not persist the values via a schema-bound view, you could create a non-persisted computed column on the table itself, then create a non-clustered index on that column. If you are using the computed column in WHERE or JOIN clauses, you may see some benefit.
By way of example:
CREATE TABLE dbo.PCT
(
PCT_ID INT NOT NULL
CONSTRAINT PK_PCT
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, SomeChar VARCHAR(50) NOT NULL
, SomeCharToInt AS CONVERT(INT, REPLACE(SomeChar, 'M', ''))
);
CREATE INDEX IX_PCT_SomeCharToInt
ON dbo.PCT(SomeCharToInt);
INSERT INTO dbo.PCT(SomeChar)
VALUES ('2015M08');
SELECT SomeCharToInt
FROM dbo.PCT;
Results:

How to measure table size in GB in a table in SQL

In a previous question #Morawski was saying that "a table with 1,000 columns and 44,000 rows It's about 330 MB; that's how much a browser uses for just a few open tabs".
How many columns and rows the table should have to tell its size is > 10 GB
(suposing the table has only double values).
How did #Morawski concluded that 1,000 columns and 44,000 is 330MB?
Is there any script that could tell this in SQL?
There is a sproc call sp_spaceused. Don't know if this is what #Morawski used but as an example on a dev db I had handy:
exec sp_spaceused 'aspnet_users'
gives
name rows reserved data index_size unused
------------- ------- ------------ -------- ------------ ----------
aspnet_Users 3 48 KB 8 KB 40 KB 0 KB
-- Measures tables size (in kilobytes)
-- Tested in MS SQL Server 2008 R2
declare #t table (
name nvarchar(100), [rows] int, [reserved] nvarchar(100), [data] nvarchar(100), [index_size] nvarchar(100), [unused] nvarchar(100)
)
declare #name nvarchar(100)
declare tt cursor for
Select name from sys.tables
open tt
fetch next from tt into #name
while ##FETCH_STATUS = 0
begin
insert into #t
exec sp_spaceused #name
fetch next from tt into #name
end
close tt
deallocate tt
select name as table_name, [rows] as rows_count, data + [index] as total_size, data as data_size, [index] as index_size
from (select name,
[rows],
cast (LEFT(data, LEN(data)-3) as int) data,
cast (LEFT(index_size, LEN(index_size)-3) as int) [index]
from #t
) x
order by 3 desc, 1
Not sure about the TSQL script (I'm sure it exists), but you can find it through the UI (SSMS) as follows:
1) R-click the table
2) ...Properties
3) ...Storage tab
From there, it will tell you both the "data space" and the "index space" -- so if you want a total footprint, just add those up.
EDIT
Consider also log space if you're looking for a total footprint for the table.
Here is info on the stored procedure listed in #jon's answer. Also, it references the sys views where you can query the space usage data directly. http://msdn.microsoft.com/en-us/library/ms188776.aspx
There are precise formulas to do capacity planning for SQL Server:
Estimating the Size of a Clustered Index
Estimating the Size of a Nonclustered Index
Estimating the Size of a Heap
With 1000 columns of fixed length doubles (that would be the float(53) SQL type, 8 bytes of storage) your row approaches the max row size limit, but it actually fits in page. 44k rows require 44k pages (due to the huge row size, only one row per page would fit), that is, at 8kb a page 44000*8kb = ~344 Mb. If you have a clustered index size would increase depending on the key size, see the first link above.
But a table design of 1000 columns is a huge code smell. Your question is very vague about the database part, your previous question never mentions a database and is about in memory arrays, when added together these two questions just don't make much sense.
Perhaps you are interested in reading about Sparse Columns, about EAV modeling or about XML data type.
To find the size of all tables in the database, you can use the undocumented stored procedure sp_MSforeachtable.
From SQL Shack:
There have always been some undocumented objects in SQL Server that are used internally by Microsoft, but they can be used by anybody that have access to it. One of those objects is a stored procedure called sp_MSforeachtable.
sp_MSforeachtable is a stored procedure that is mostly used to apply a T-SQL command to every table, iteratively, that exists in the current database.
Here's how you would use it:
sp_MSforeachtable 'exec sp_spaceused [?]'

My IN clause leads to a full scan of an index in T-SQL. What can I do?

I have a sql query with 50 parameters, such as this one.
DECLARE
#p0 int, #p1 int, #p2 int, (text omitted), #p49 int
SELECT
#p0=111227, #p1=146599, #p2=98917, (text omitted), #p49=125319
--
SELECT
[t0].[CustomerID], [t0].[Amount],
[t0].[OrderID], [t0].[InvoiceNumber]
FROM [dbo].[Orders] AS [t0]
WHERE ([t0].[CustomerID]) IN
(#p0, #p1, #p2, (text omitted), #p49)
The estimated execution plan shows that the database will collect these parameters, order them, and then read the index Orders.CustomerID from the smallest parameter to the largest, then do a bookmark lookup for the rest of the record.
The problem is that there the smallest and largest parameter could be quite far apart and this will lead to reading possibly the entire index.
Since this is being done in a loop from the client side (50 params sent each time, for 1000 iterations), this is a bad situation. How can I formulate the query/client side code to get my data without repetitive index scanning while keeping the number of round trips down?
I thought about ordering the 50k parameters such that smaller readings of the index would occur. There is a wierd mitigating circumstance that prevents this - I can't use this solution. To model this circumstance, just assume that I only have 50 id's available at any time and can't control their relative position in the global list.
Insert the parameters into a temporary table, then join it with your table:
DECLARE #params AS TABLE(param INT);
INSERT
INTO #params
VALUES (#p1)
...
INSERT
INTO #params
VALUES (#p49)
SELECT
[t0].[CustomerID], [t0].[Amount],
[t0].[OrderID], [t0].[InvoiceNumber]
FROM #params, [dbo].[Orders] AS [t0]
WHERE ([t0].[CustomerID]) = #params.param
This will most probably use NESTED LOOPS with a INDEX SEEK over CustomerID on each loop.
An index range scan is pretty fast. There's usually a lot less data in the index than in the table and there's a much better chance that the index is already in memory.
I can't blame you for wanting to save round trips to the server by putting each of the IDs your looking for in a bundle. If the index RANGE scan really worries you, you can create a parameterized server side cursor (e.g., in TSQL) that takes the CustomerID as a parameter. Stop as soon as you find a match. That query should definitely use an index unique scan instead of a range scan.
To build on Quassnoi's answer, if you were working with SQL 2008, you could save yourself some time by inserting all 50 items with one statement. SQL 2008 has a new feature for multiple valued inserts.
e.g.
INSERT INTO #Customers (CustID)
VALUES (#p0),
(#p1),
<snip>
(#p49)
Now #Customers table is populated and ready to INNER JOIN on, or your IN clause.

Resources