Does a SELECT COUNT(*) query have to do a full table scan? - sql-server

Does a query that gets the count of all rows in a table have to do a full table scan or does SQL Server maintain a count of rows somewhere?
SELECT COUNT(*) FROM TABLE_NAME;
The table TABLE_NAME has a primary key, and therefore a clustered index, and looks like so:
CREATE TABLE TABLE_NAME
(
Id int PRIMARY KEY IDENTITY(1, 1),
Name nvarchar(50) NOT NULL
);
I am using Microsoft SQL Server 2014.

The server will always read all records (if there's an index then it will scan the entire index) to count the rows. You can't escape this as long as you are doing SELECT COUNT(*) FROM Table.
If your table has a clustered index, you can change your query to an "under the hood" query to retrieve the count without actually fetching the records with:
SELECT OBJECT_NAME(i.id) [Table_Name], i.rowcnt [Row_Count]
FROM sys.sysindexes i WITH (NOLOCK)
WHERE i.indid in (0,1)
ORDER BY i.rowcnt desc
if you are looking for an approximate count of the records, you can also use the following query:
SELECT
TableName = t.NAME,
SchemaName = s.Name,
[RowCount] = p.rows,
TotalSpaceMB = CONVERT(DECIMAL(18,2), SUM(a.total_pages) * 8 / 1024.0),
UsedSpaceMB = CONVERT(DECIMAL(18,2), SUM(a.used_pages) * 8 / 1024.0),
UnusedSpaceMB = CONVERT(DECIMAL(18,2), (SUM(a.total_pages) - SUM(a.used_pages)) * 8 / 1024.0)
FROM
sys.tables t
INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN sys.schemas s ON t.schema_id = s.schema_id
WHERE
t.NAME NOT LIKE 'dt%'
AND t.is_ms_shipped = 0
AND i.OBJECT_ID > 255
GROUP BY
t.Name,
s.Name,
p.Rows
ORDER BY
TotalSpaceMB DESC
This will show non-system tables with their calculated (not exact) row count and the sum of the sizes of their data (with any index they might have), relatively fast without retrieving the records.

When SQL Server performs a query like SELECT COUNT(*), SQL Server will use the narrowest non-clustered index to count the rows. If the table does not have any non-clustered index, it will have to scan the table.
If your table has a clustered index you can get your count even faster.

SELECT COUNT(*) FROM TABLE_NAME;
Does a full table scan.
For optimizations you can refer to this.

you can following way. it is better in performance I guess.
SELECT COUNT(1) FROM TABLE_NAME

Related

Tsql a way to find empty tables without using sys. Partitions

Is there a way I can find all empty tables in a database without using sys. Partition rows?
You can use this query:
SELECT i.rowcnt, o.NAME
FROM sysindexes AS i
INNER JOIN sysobjects AS o ON i.id = o.id
WHERE i.indid < 2 AND OBJECTPROPERTY(o.id, 'IsMSShipped') = 0
AND i.rowcnt = 0
ORDER BY i.rowcnt desc

Query to return distinct values in fields containing string across multiple tables

The following query looks for fields within a db that contain '%string%' and returns them in a table with 5 columns; Schema, Table, Number of fields containing desired string in title of field within table, number of rows in table, and finally the field names.
SELECT s.name schemaName
, t.name tabName
, COUNT(c.name) OVER (PARTITION BY t.name ORDER BY t.name) totalColsWithString
, rc.row_count
, c.name colName
FROM sys.all_columns c
JOIN sys.tables t ON (t.object_id = c.object_id)
JOIN sys.schemas s ON (s.schema_id = t.schema_id)
LEFT JOIN
(
SELECT o.name
, ddps.row_count
FROM sys.indexes i
JOIN sys.objects o ON (i.object_id = o.object_id)
JOIN sys.dm_db_partition_stats AS ddps ON (i.object_id = ddps.object_id AND i.index_id = ddps.index_id)
WHERE i.index_id < 2 AND o.is_ms_shipped = 0
) rc ON (rc.name = t.name)
WHERE c.name LIKE '%String%'
AND row_count <> 0;
What I now want is a field that shows the number of distinct values in those fields which contain 'string' in the title (in all the columns returned in above query).
Does MS SQL Server store any info about distinct values in fields? Can it be made to?
Updated answer
You actually should consider to run 2 queries in that case. Or use subqueries, which may cause performance issues at a big dataset. Subqueries could look like this
SELECT [...],
(SELECT COUNT(*) FROM all_columns c2 WHERE c2.name= c.name) AS totalcolswithstring
FROM [...]
I set up a fiddle for you SqlFiddle

How to find list of tables having no records in SQL server

How to display a list of tables having no record in them and they are existing in the sql server database.Required is only to show tables with no record in them.
Try this:
SELECT
t.NAME AS TableName,
p.rows AS RowCounts
FROM
sys.tables t
INNER JOIN
sys.partitions p ON t.object_id = p.OBJECT_ID
WHERE
t.NAME NOT LIKE 'dt%'
AND t.is_ms_shipped = 0
AND p.rows = 0
GROUP BY
t.Name, p.Rows
ORDER BY
t.Name
The query goes to the sys.tables and other catalog views to find the tables, their indexes and partitions, to find those tables that have a row count of 0.
Alteration to add Schema names:
SELECT
sch.name,
t.NAME AS TableName,
p.rows AS RowCounts
FROM
sys.tables t
INNER JOIN
sys.partitions p ON t.object_id = p.OBJECT_ID
inner Join sys.schemas sch
on t.schema_id = sch.schema_id
WHERE
t.NAME NOT LIKE 'dt%'
AND t.is_ms_shipped = 0
AND p.rows = 0
GROUP BY
sch.name,t.Name, p.Rows
ORDER BY
sch.name,t.Name
I found some of the previous answers still returned tables with data for me.
However, the following seems to correctly on report those with no rows (using SQL Server 2019):
SELECT schema_name(tab.schema_id) + '.' + tab.name AS [table]
FROM sys.tables tab
INNER JOIN sys.partitions part ON tab.object_id = part.object_id
WHERE part.index_id IN (
1
,0
) -- 0 - table without PK, 1 table with PK
GROUP BY schema_name(tab.schema_id) + '.' + tab.name
HAVING sum(part.rows) = 0
ORDER BY [table]

How to find largest objects in a SQL Server database?

How would I go about finding the largest objects in a SQL Server database? First, by determining which tables (and related indices) are the largest and then determining which rows in a particular table are largest (we're storing binary data in BLOBs)?
Are there any tools out there for helping with this kind of database analysis? Or are there some simple queries I could run against the system tables?
I've been using this SQL script (which I got from someone, somewhere - can't reconstruct who it came from) for ages and it's helped me quite a bit understanding and determining the size of indices and tables:
SELECT
t.name AS TableName,
i.name as indexName,
sum(p.rows) as RowCounts,
sum(a.total_pages) as TotalPages,
sum(a.used_pages) as UsedPages,
sum(a.data_pages) as DataPages,
(sum(a.total_pages) * 8) / 1024 as TotalSpaceMB,
(sum(a.used_pages) * 8) / 1024 as UsedSpaceMB,
(sum(a.data_pages) * 8) / 1024 as DataSpaceMB
FROM
sys.tables t
INNER JOIN
sys.indexes i ON t.object_id = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
WHERE
t.name NOT LIKE 'dt%' AND
i.object_id > 255 AND
i.index_id <= 1
GROUP BY
t.name, i.object_id, i.index_id, i.name
ORDER BY
object_name(i.object_id)
Of course, you can use another ordering criteria, e.g.
ORDER BY SUM(p.rows) DESC
to get the tables with the most rows, or
ORDER BY SUM(a.total_pages) DESC
to get the tables with the most pages (8K blocks) used.
In SQL Server 2008, you can also just run the standard report Disk Usage by Top Tables. This can be found by right clicking the DB, selecting Reports->Standard Reports and selecting the report you want.
This query help to find largest table in you are connection.
SELECT TOP 1 OBJECT_NAME(OBJECT_ID) TableName, st.row_count
FROM sys.dm_db_partition_stats st
WHERE index_id < 2
ORDER BY st.row_count DESC
You may also use the following code:
USE AdventureWork
GO
CREATE TABLE #GetLargest
(
table_name sysname ,
row_count INT,
reserved_size VARCHAR(50),
data_size VARCHAR(50),
index_size VARCHAR(50),
unused_size VARCHAR(50)
)
SET NOCOUNT ON
INSERT #GetLargest
EXEC sp_msforeachtable 'sp_spaceused ''?'''
SELECT
a.table_name,
a.row_count,
COUNT(*) AS col_count,
a.data_size
FROM #GetLargest a
INNER JOIN information_schema.columns b
ON a.table_name collate database_default
= b.table_name collate database_default
GROUP BY a.table_name, a.row_count, a.data_size
ORDER BY CAST(REPLACE(a.data_size, ' KB', '') AS integer) DESC
DROP TABLE #GetLargest
#marc_s's answer is very great and I've been using it for few years. However, I noticed that the script misses data in some columnstore indexes and doesn't show complete picture. E.g. when you do SUM(TotalSpace) against the script and compare it with total space database property in Management Studio the numbers don't match in my case (Management Studio shows larger numbers). I modified the script to overcome this issue and extended it a little bit:
select
tables.[name] as table_name,
schemas.[name] as schema_name,
isnull(db_name(dm_db_index_usage_stats.database_id), 'Unknown') as database_name,
sum(allocation_units.total_pages) * 8 as total_space_kb,
cast(round(((sum(allocation_units.total_pages) * 8) / 1024.00), 2) as numeric(36, 2)) as total_space_mb,
sum(allocation_units.used_pages) * 8 as used_space_kb,
cast(round(((sum(allocation_units.used_pages) * 8) / 1024.00), 2) as numeric(36, 2)) as used_space_mb,
(sum(allocation_units.total_pages) - sum(allocation_units.used_pages)) * 8 as unused_space_kb,
cast(round(((sum(allocation_units.total_pages) - sum(allocation_units.used_pages)) * 8) / 1024.00, 2) as numeric(36, 2)) as unused_space_mb,
count(distinct indexes.index_id) as indexes_count,
max(dm_db_partition_stats.row_count) as row_count,
iif(max(isnull(user_seeks, 0)) = 0 and max(isnull(user_scans, 0)) = 0 and max(isnull(user_lookups, 0)) = 0, 1, 0) as no_reads,
iif(max(isnull(user_updates, 0)) = 0, 1, 0) as no_writes,
max(isnull(user_seeks, 0)) as user_seeks,
max(isnull(user_scans, 0)) as user_scans,
max(isnull(user_lookups, 0)) as user_lookups,
max(isnull(user_updates, 0)) as user_updates,
max(last_user_seek) as last_user_seek,
max(last_user_scan) as last_user_scan,
max(last_user_lookup) as last_user_lookup,
max(last_user_update) as last_user_update,
max(tables.create_date) as create_date,
max(tables.modify_date) as modify_date
from
sys.tables
left join sys.schemas on schemas.schema_id = tables.schema_id
left join sys.indexes on tables.object_id = indexes.object_id
left join sys.partitions on indexes.object_id = partitions.object_id and indexes.index_id = partitions.index_id
left join sys.allocation_units on partitions.partition_id = allocation_units.container_id
left join sys.dm_db_index_usage_stats on tables.object_id = dm_db_index_usage_stats.object_id and indexes.index_id = dm_db_index_usage_stats.index_id
left join sys.dm_db_partition_stats on tables.object_id = dm_db_partition_stats.object_id and indexes.index_id = dm_db_partition_stats.index_id
group by schemas.[name], tables.[name], isnull(db_name(dm_db_index_usage_stats.database_id), 'Unknown')
order by 5 desc
Hope it will be helpful for someone.
This script was tested against large TB-wide databases with hundreds of different tables, indexes and schemas.
If you are using Sql Server Management Studio 2008 there are certain data fields you can view in the object explorer details window. Simply browse to and select the tables folder. In the details view you are able to right-click the column titles and add fields to the "report". Your mileage may vary if you are on SSMS 2008 express.
I've found this query also very helpful in SqlServerCentral, here is the link to original post
Sql Server largest tables
select name=object_schema_name(object_id) + '.' + object_name(object_id)
, rows=sum(case when index_id < 2 then row_count else 0 end)
, reserved_kb=8*sum(reserved_page_count)
, data_kb=8*sum( case
when index_id<2 then in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count
else lob_used_page_count + row_overflow_used_page_count
end )
, index_kb=8*(sum(used_page_count)
- sum( case
when index_id<2 then in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count
else lob_used_page_count + row_overflow_used_page_count
end )
)
, unused_kb=8*sum(reserved_page_count-used_page_count)
from sys.dm_db_partition_stats
where object_id > 1024
group by object_id
order by
rows desc
In my database they gave different results between this query and the 1st answer.
Hope somebody finds useful

when were index statistics last updated?

Is there a quick and easy way to list when every index in the database last had their statistics updated? The preferred answer would be a query. Also, is it possible to determine the "quality" of the statistics: FULLSCAN, SAMPLE n, etc.
EDIT
This worked for what I needed, a slight mod to #OrbMan great answer...
SELECT
STATS_DATE(i.object_id, i.index_id) AS LastStatisticsDate
,o.Name AS TableName
,i.name AS IndexName
FROM sys.objects o
INNER JOIN sys.indexes i ON o.object_id = i.object_id
WHERE o.is_ms_shipped=0
ORDER BY 1 DESC
You can do: STATS_DATE ( table_id , index_id )
So:
USE AdventureWorks;
GO
SELECT 'Index Name' = i.name, 'Statistics Date' = STATS_DATE(i.object_id, i.index_id)
FROM sys.objects o
JOIN sys.indexes i ON o.name = 'Address' AND o.object_id = i.object_id;
GO
where Address is the name of the table whose indexes you would like to examine.

Resources