Snowflake: Count number of NULL values in each column SQL - snowflake-cloud-data-platform

What is the Snowflake SQL version of this SQL Server solution that counts NULL values in each column of a table? The SQL executes via the SQL Server EXEC function. I would prefer to not use a stored procedure because of access restrictions but such a solution would be good to have for the future.
I would prefer the solution display the column name in one column and the number of NULL values in another a la:
COLUMN_NAME NULLVALUES
HATS 325
SHOES 0
DOGS 9998
The dynamic SQL (SQL built at runtime) below is the best I have been able to do so far. The name of the column containing the column names will unfortunately be the first column in the table. Ideally, the SQL would execute automatically but experiments with BEGIN and END as well as EXECUTE IMMEDIATE did not work.
-- SUMMARY: Count NULL values for every column of the specified table.
-- This SQL dynamically builds a query that counts the NULL values in every column of a table.
-- One copies and pastes the code in the resulting SQLROW column (can double-click to open query window then copy to clipboard)
USE DATABASE YOUR_DATABASE
-- What database contains the table of interest?
SET DBName = 'YOUR_DATABASE';
-- What is the schema of the table?
SET SchemaName = 'YOUR_SCHEMA';
--What is the table name?
SET TableName = 'YOUR_TABLE';
SELECT ($DBName || '.' || $SchemaName || '.' || $TableName) as FullTablePath;
WITH SQLText AS (
SELECT
ROW_NUMBER() OVER (ORDER BY c.column_name) AS RowNum,
'SELECT ' || '''' || c.column_name || '''' ||
', SUM(CASE WHEN ' || c.column_name || ' IS NULL THEN 1 ELSE 0 END) AS NullValues
FROM ' || $DBName || '.' || $SchemaName || '.' || $TableName AS SQLRow
FROM
information_schema.tables t
INNER JOIN information_schema.columns c
ON c.table_name = t.table_name and c.table_schema = t.table_schema
WHERE
t.table_name = $TableName and t.table_schema = $SchemaName),
Recur AS (
SELECT
RowNum,
TO_VARCHAR(SQLRow) AS SQLRow
FROM
SQLText
WHERE
RowNum = 1
UNION ALL
SELECT
t.RowNum,
r.SQLRow || ' UNION ALL ' || t.SQLRow
FROM
SQLText t
INNER JOIN Recur r ON t.RowNum = r.RowNum + 1
),
no_dupes as (
select * from Recur where RowNum = (SELECT MAX(RowNum) FROM Recur)
)
select SQLRow from no_dupes

some setup:
create table test.test.some_nulls(a string, b int, c boolean) as
select * from values
('a', 1, true),
('b', 2, false),
('c', 3, null),
('d', null, true),
('e', null, false),
('f', null, null),
(null, 7, true),
('h', 8, false),
(null, 9, null),
('j', null, true),
(null, null, false),
('l', null, null);
and:
SET DBName = 'TEST';
SET SchemaName = 'TEST';
SET TableName = upper('some_nulls');
this is the sql we are wanting to have run in the end:
select count(*)-count(a) as a,
count(*)-count(b) as b,
count(*)-count(c) as c
from TEST.TEST.some_nulls;
A
B
C
3
6
4
and this SQL builds that SQL, via LISTAGG:
select 'select '
|| listagg('count(*)-count('||column_name||') as ' || column_name, ', ') WITHIN GROUP (order by ordinal_position )
|| ' from '|| $DBName ||'.'|| $SchemaName ||'.'|| $TableName as sql
from information_schema.columns
where table_name = $TableName;
SQL
select count(*)-count(A) as A, count(*)-count(B) as B, count(*)-count(C) as C from TEST.TEST.SOME_NULLS
thus this SQL builds, and runs the second SQL and then runs the results which is the first SQL, via RESULTSETs, CURSORS, Snowflake Scripting
declare
res resultset;
select_statement text;
begin
select_statement := 'select ''select '' '||
' || listagg(''count(*)-count(''||column_name||'') as '' || column_name, '', '') WITHIN GROUP (order by ordinal_position ) ' ||
' || '' from '|| $DBName ||'.'|| $SchemaName ||'.'|| $TableName || ''' as sql' ||
' from '||$DBName||'.information_schema.columns '||
' where table_name = ''' || $TableName || '''';
res := (execute immediate :select_statement);
let cur1 cursor for res;
for record in cur1 do
res := (execute immediate record.sql);
break;
end for;
return table(res);
end;
and gives:
A
B
C
3
6
4

Related

Validate decimal and integer values by looping through list of dynamic columns

I was tasked to validate the decimal and integer values of the columns from a list of tables. I have around 10-12 tables having different column names.
I created a lookup table which has the table name and the column names of decimal and integer as shown below. for example 'Pricedetails' and 'Itemdetails' tables have many columns of which only the ones mentioned in the Lookup table are required.
lkpTable
TableName
requiredcolumns
Pricedetails
sellingPrice,RetailPrice,Wholesaleprice
Itemdetails
ItemID,Itemprice
Pricedetails
Priceid
Mafdate
MafName
sellingPrice
RetailPrice
Wholesaleprice
01
2020-01-01
Americas
25.00
43.33
33.66
02
2020-01-01
Americas
43.45
22.55
11.11
03
2021-01-01
Asia
-23.00
-34.00
23.00
Itemdetails
ItemID
ItemPrice
Itemlocation
ItemManuf
01
45.11
Americas
SA
02
25.00
Americas
SA
03
35.67
Americas
SA
I have created a stored procedure with table name as input parameter, and able to pull the required column names of the tables (input parameter) from the lookup table and store that resultset into a table variable, below is the code.
declare #resultset Table
(
id INT identity(1,1),
tablename varchar(200) ,
ColumnNames varchar(max)
)
declare #tblname varchar(200),#sql varchar(max),#cols varchar(max),
INSERT INTO #resultset
select tablename,ColumnNames
from lkptable where tablename ='itemdetails'
select #cols = ColumnNames from #resultset;
select #tblname = TableName from #resultset;
----- Split the comma separated columnnames
Create table ##splitcols
(
ID int identity(1,1),
Name varchar(50)
)
Insert into ##splitcols
select value from string_split(#cols,',')
set #sql = 'select ' +#cols + ' from ' +#tblname
--print (#cols)
exec (#sql)
select * from ##splitcols
On executing the above code i get the below result sets, similarly what ever table name i provide i can get the required columns and its relevant data, now i am stuck at this point on how to validate whether the columns are decimal or int. I tried using while loop and cursor to pass the Name value from Resultset2, to the new dynamic query, somehow i don't find any way on how to validate.
ItemID
ItemPrice
01
45.11
02
25.00
03
35.67
Resultset2
ID
Name
01
ItemID
02
ItemPrice
you can validate in this way
insert into #splitcols values (1.1),(1.11)
select case when (t * 100)%10 = 0 then 1 else 0 end as valid from #splitcols
Similarly for number/integer
You can generate one giant UNION ALL query using dynamic SQL, then run it.
Each query would be of the form:
SELECT
TableName = 'SomeTable',
ColumnName,
IsInt
FROM (
SELECT
[Column1] = CASE WHEN COUNT(CASE WHEN ROUND([Column1], 0) <> [Column1] THEN 1 END) = 0 THEN 'All ints' ELSE 'Not All ints' END,
[Column2] = CASE WHEN COUNT(CASE WHEN ROUND([Column2], 0) <> [Column2] THEN 1 END) = 0 THEN 'All ints' ELSE 'Not All ints' END
FROM SomeTable
) t
UNPIVOT (
ColumnName FOR IsInt IN (
[Column1], [Column2]
)
) u
The script is as follows
DECLARE #sql nvarchar(max);
SELECT #sql = STRING_AGG('
SELECT
TableName = ' + QUOTENAME(t.name, '''') + ',
ColumnName,
IsInt
FROM (
SELECT ' + c.BeforePivotCoumns + '
FROM ' + QUOTENAME(t.name) + '
) t
UNPIVOT (
ColumnName FOR IsInt IN (
' + c.UnpivotColumns + '
)
) u
', '
UNION ALL '
)
FROM sys.tables t
JOIN lkpTable lkp ON lkp.TableName = t.name
CROSS APPLY (
SELECT
BeforePivotCoumns = STRING_AGG(CAST('
' + QUOTENAME(c.name) + ' = CASE WHEN COUNT(CASE WHEN ROUND(' + QUOTENAME(c.name) + ', 0) <> ' + QUOTENAME(c.name) + ' THEN 1 END) = 0 THEN ''All ints'' ELSE ''Not All ints'' END'
AS nvarchar(max)), ','),
UnpivotColumns = STRING_AGG(QUOTENAME(c.name), ', ')
FROM sys.columns c
JOIN STRING_SPLIT(lkp.requiredcolumns, ',') req ON req.value = c.name
WHERE c.object_id = t.object_id
) c;
PRINT #sql;
EXEC sp_executesql #sql;
db<>fiddle
If you are on an older version of SQL Server then you can't use STRING_AGG and instead you need to hack it with FOR XML and STUFF.
DECLARE #unionall nvarchar(100) = '
UNION ALL ';
DECLARE #sql nvarchar(max);
SET #sql = STUFF(
(SELECT #unionall + '
SELECT
TableName = ' + QUOTENAME(t.name, '''') + ',
ColumnName,
IsInt
FROM (
SELECT ' + STUFF(c1.BeforePivotCoumns.value('text()[1]','nvarchar(max)'), 1, 1, '') + '
FROM ' + QUOTENAME(t.name) + '
) t
UNPIVOT (
ColumnName FOR IsInt IN (
' + STUFF(c2.UnpivotColumns.value('text()[1]','nvarchar(max)'), 1, 1, '') + '
)
) u'
FROM sys.tables t
JOIN (
SELECT DISTINCT
lkp.TableName
FROM lkpTable lkp
) lkp ON lkp.TableName = t.name
CROSS APPLY (
SELECT
',
' + QUOTENAME(c.name) + ' = CASE WHEN COUNT(CASE WHEN ROUND(' + QUOTENAME(c.name) + ', 0) <> ' + QUOTENAME(c.name) + ' THEN 1 END) = 0 THEN ''All ints'' ELSE ''Not All ints'' END'
FROM lkpTable lkp2
CROSS APPLY STRING_SPLIT(lkp2.requiredcolumns, ',') req
JOIN sys.columns c ON req.value = c.name
WHERE c.object_id = t.object_id
AND lkp2.TableName = lkp.TableName
FOR XML PATH(''), TYPE
) c1(BeforePivotCoumns)
CROSS APPLY (
SELECT
', ' + QUOTENAME(c.name)
FROM lkpTable lkp2
CROSS APPLY STRING_SPLIT(lkp2.requiredcolumns, ',') req
JOIN sys.columns c ON req.value = c.name
WHERE c.object_id = t.object_id
AND lkp2.TableName = lkp.TableName
FOR XML PATH(''), TYPE
) c2(UnpivotColumns)
FOR XML PATH(''), TYPE
).value('text()[1]','nvarchar(max)'), 1, LEN(#unionall), '');
PRINT #sql;
EXEC sp_executesql #sql;
db<>fiddle

Inserting records from a table to another using dynamic SQL with conditions to convert data type

I have two tables whose structures as follows:
table_A
CREATE TABLE table_A
(
col_a varchar(100),
col_b bigint,
col_c datetime
)
table_b
--Note that columns are same--
CREATE TABLE table_B
(
col_a varchar(10),
col_b varchar(10),
col_c varchar(20)
)
Now I want to INSERT data into table_A from table_B with proper data type conversion.
Below is the SQL string:
INSERT INTO table_A(col_a,col_b,col_c)
SELECT CONVERT(varchar,col_a),CONVERT(INT,col_b),CONVERT(datetime,col_c) FROM table_B
So far so good.
Now I want generate the SQL dynamically with the help of INFORMATION_SCHEMA.COLUMNS.
For this I have followed the below steps:
Step 1:
Join the Information Schema for the above two tables viz table_A and table_B and store them in a #TempTable. Lets assume that #TempTable has an ID column that is IDENTITY(1,1) but that doesn't follow any sequence like 1,2,3...(Typically this happens in Synapse SQL)
INSERT INTO #TempTable
SELECT S.COLUMN_NAME AS Src_Col,
S.DATA_TYPE AS Src_dtype,
D.COLUMN_NAME AS Dest_Col,
D.DATA_TYPE AS Dest_dtype,
CASE WHEN S.DATA_TYPE NOT LIKE D.DATA_TYPE THEN
'CONVERT('+ '''' + D.DATA_TYPE + '''' + ',' + '''' + S.DATA_TYPE + '''' + ')'
ELSE S.DATA_TYPE AS Modified_Col
FROM INFORMATION_SCHEMA S
JOIN INFORMATION_SCHEMA.COLUMNS D
ON S.COLUMN_NAME = D.COLUMN_NAME AND S.TABLE_NAME = REPLACE(D.TABLE_NAME,'_B','_A')
Step 2:
Iterate over #TempTable to fetch the Modified_Col values
SET #Max_ID = (SELECT MAX(ID) FROM #TempTable);
SET #Min_ID = (SELECT MIN(ID) FROM #TempTable);
SET #ColToInsert = '';
SET #Dest_Col = '';
WHILE #Min_ID <= #Max_ID
BEGIN
SET #ColToInsert = (SELECT #ColToInsert + Modified_Col FROM #TempTable T WHERE T.ID = #Min_ID);
SET #Dest_Col = (SELECT #Dest_Col + Dest_Col FROM #TempTable T WHERE T.ID = #Min_ID);
SET #Min_ID = #Min_ID + 1;
END
Step 3:
Use that #ColToInsert in the below Dynamic SQL
SET #DySQL = 'INSERT INTO Table_A(' + #Dest_Col + ') SELECT ' + #ColToInsert + ' FROM table_B';
exec (#DySQL);
Now at this step 3 I am not getting the expected result. No data is getting inserted into table_A. I can understand that in the CASE statement I have to make some fixes so that convert... portion becomes a string. And I am not able to do so.
Any clue would be appreciated.
I don't understand why you need the temp table at all. You just need to aggregate using STRING_AGG.
You also need to quote the objects and columns using QUOTENAME, and you should use sys.columns etc rather than INFORMATION_SCHEMA, which is for compatibility only.
DECLARE #tableA sysname = 't';
DECLARE #tableB sysname = 's';
DECLARE #sql nvarchar(max) = (
SELECT CONCAT(
'INSERT INTO ',
QUOTENAME(#tableA),
'(',
STRING_AGG(CAST(QUOTENAME(cA.name) AS nvarchar(max)), ', '),
')
SELECT ',
STRING_AGG(
CASE WHEN cA.user_type_id <> cB.user_type_id THEN
CONCAT(
'CONVERT(',
typ.name,
CASE
WHEN typ.name IN ('varchar','nvarchar','char','nchar','varbinary','binary')
THEN CONCAT('(', CASE WHEN cA.max_length = -1 THEN 'max' END, NULLIF(cA.max_length, -1), ')')
WHEN typ.name IN ('datetime2','datetimeoffset','time','float','real')
THEN CONCAT('(', cA.scale, ')')
WHEN typ.name IN ('float','real')
THEN CONCAT('(', cA.precision, ')')
WHEN typ.name IN ('decimal','numeric')
THEN CONCAT('(', cA.precision, ',', cA.scale, ')')
END,
', ',
CAST(QUOTENAME(cB.name) AS nvarchar(max)),
')'
)
ELSE
CAST(QUOTENAME(cB.name) AS nvarchar(max))
END
, ', '),
'
FROM ',
QUOTENAME(#tableB)
)
FROM sys.columns cA
JOIN sys.tables tA ON ta.object_id = cA.object_id AND tA.name = #tableA
JOIN sys.types typ ON typ.user_type_id = cA.user_type_id
JOIN sys.columns cB ON cB.name = cA.name
JOIN sys.tables tB ON tB.object_id = cB.object_id AND tB.name = #tableB
);
PRINT #sql; -- your friend
EXEC sp_executesql #sql;
db<>fiddle

Get a list of all columns that do not have only NULL values in SQL Server

I NEVER do complicated stuff in SQL - until now...
I have a database with over 2000 tables, each table has about 200 columns.
I need to get a list of all the columns in one of those tables that are populated at least 1 time.
I can get a list of all the columns like this:
SELECT [name] AS [Column name]
FROM syscolumns with (nolock)
WHERE id = (SELECT id FROM sysobjects where name like 'DOCSDB_TDCCINS')
But I need only the columns that are populated 1 or more times.
Any help would be appreciated.
Here is how I would do it, first run this:
SELECT 'SELECT '''+syscolumns.name+''' FROM '+sysobjects.name+' HAVING COUNT('+syscolumns.name+') > 0'
FROM syscolumns with (nolock)
JOIN sysobjects with (nolock) ON syscolumns.id = sysobjects.id
WHERE syscolumns.id = (SELECT id FROM sysobjects where name like 'Email')
Copy all the select statements and run them.
This will give you a list of the column names without nulls.
(nb I did not test because I don't have an SQL server available right now, so I could have a typo)
It may be also be useful to count the non-null instances, obviously 0 or not 0 was your initial question, and counting the instances versus exists not/exists will be slower.
select 'union select ''' + Column_Name + ''',count(*)'
+ ' from ' + table_name
+ ' where ' + column_name + ' is not null'
from
(
select * from information_schema.columns with (nolock)
where Is_Nullable = 'YES'
AND Table_Name like 'DOCSDB_TDCCINS'
) DD
Then remove the superfluous leading 'union' and run the query
A different idea is to create a dynamic unpivot for every table.
Declare #q NVarchar(MAX) = NULL
;With D AS (
SELECT TABLE_SCHEMA
, TABLE_NAME
, STUFF((SELECT ', ' + QUOTENAME(ci.COLUMN_NAME)
FROM INFORMATION_SCHEMA.COLUMNS ci
WHERE (ci.TABLE_NAME = c.TABLE_NAME)
AND (ci.TABLE_SCHEMA = c.TABLE_SCHEMA)
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)')
,1,2,'') AS _Cols
, STUFF((SELECT ', Count(' + QUOTENAME(ci.COLUMN_NAME) + ') '
+ QUOTENAME(ci.COLUMN_NAME)
FROM INFORMATION_SCHEMA.COLUMNS ci
WHERE (ci.TABLE_NAME = c.TABLE_NAME)
AND (ci.TABLE_SCHEMA = c.TABLE_SCHEMA)
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)')
,1,2,'') AS _ColsCount
FROM INFORMATION_SCHEMA.COLUMNS c
GROUP BY TABLE_SCHEMA, TABLE_NAME
)
SELECT #q = COALESCE(#q + ' UNION ALL ', '') + '
SELECT ''' + TABLE_SCHEMA + ''' _Schema, ''' + TABLE_NAME + ''' _Table, _Column
FROM (SELECT ' + _ColsCount + ' from ' + TABLE_SCHEMA + '.' + TABLE_NAME + ') x
UNPIVOT
(_Count FOR _Column IN (' + _Cols + ')) u
WHERE _Count > 0'
FROM D
exec sp_executesql #q
In the CTE _Cols returns the comma separated quoted name of the columns of the table, while _ColsCount returns the same list with the COUNT function, for example for a table of mine a row of D is
TABLE_SCHEMA | TABLE_NAME | _Cols | _ColsCount
------------- ----------------- ------------------------------ -----------------------------------------------------------------------------
dbo | AnnualInterests | [Product_ID], [Rate], [Term] | Count([Product_ID]) [Product_ID], Count([Rate]) [Rate], Count([Term]) [Term]
while the main query trasform this line in the UNPIVOT to return the columns in rows
SELECT 'dbo' _Schema, 'AnnualInterests' _Table, _Column
FROM (SELECT Count([Product_ID]) [Product_ID], Count([Term]) [Term]
, Count([Rate]) [Rate] from dbo.AnnualInterests) x
UNPIVOT
(_Count FOR _Column IN ([Product_ID], [Term], [Rate])
WHERE _Count > 0
using the string variable concatenation and sp_executesql to run the string complete the script.
Hope you can achieve this by a simple alteration on your code like
SELECT [name] AS [Column name]
FROM syscolumns with (nolock)
WHERE id = (SELECT id FROM sysobjects where name like 'DOCSDB_TDCCINS')
and (select count(*) from DOCSDB_TDCCINS)>0

How can I inexpensively determine if a column contains only NULL records?

I have a large table with 500 columns and 100M rows. Based on a small sample, I believe only about 50 of the columns contain any values, and the other 450 contain only NULL values. I want to list the columns that contain no data.
On my current hardware, it would take about 24 hours to query every column (select count(1) from tab where col_n is not null)
Is there a less expensive way to determine that a column is completely empty/NULL?
What about this:
SELECT
SUM(CASE WHEN column_1 IS NOT NULL THEN 1 ELSE 0) column_1_count,
SUM(CASE WHEN column_2 IS NOT NULL THEN 1 ELSE 0) column_2_count,
...
FROM table_name
?
You can easily create this query if you use INFORMATION_SCHEMA.COLUMNS table.
EDIT:
Another idea:
SELECT MAX(column_1), MAX(column_2),..... FROM table_name
If result contains value, column is populated. It should require one table scan.
Try this one -
DDL:
IF OBJECT_ID ('dbo.test2') IS NOT NULL
DROP TABLE dbo.test2
CREATE TABLE dbo.test2
(
ID BIGINT IDENTITY(1,1) PRIMARY KEY
, Name VARCHAR(10) NOT NULL
, IsCitizen BIT NULL
, Age INT NULL
)
INSERT INTO dbo.test2 (Name, IsCitizen, Age)
VALUES
('1', 1, NULL),
('2', 0, NULL),
('3', NULL, NULL)
Query 1:
DECLARE
#TableName SYSNAME
, #ObjectID INT
, #SQL NVARCHAR(MAX)
SELECT
#TableName = 'dbo.test2'
, #ObjectID = OBJECT_ID(#TableName)
SELECT #SQL = 'SELECT' + CHAR(13) + STUFF((
SELECT CHAR(13) + ', [' + c.name + '] = ' +
CASE WHEN c.is_nullable = 0
THEN '0'
ELSE 'CASE WHEN ' + totalrows +
' = SUM(CASE WHEN [' + c.name + '] IS NULL THEN 1 ELSE 0 END) THEN 1 ELSE 0 END'
END
FROM sys.columns c WITH (NOWAIT)
CROSS JOIN (
SELECT totalrows = CAST(MIN(p.[rows]) AS VARCHAR(50))
FROM sys.partitions p
WHERE p.[object_id] = #ObjectID
AND p.index_id IN (0, 1)
) r
WHERE c.[object_id] = #ObjectID
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, ' ') + CHAR(13) + 'FROM ' + #TableName
PRINT #SQL
EXEC sys.sp_executesql #SQL
Output 1:
SELECT
[ID] = 0
, [Name] = 0
, [IsCitizen] = CASE WHEN 3 = SUM(CASE WHEN [IsCitizen] IS NULL THEN 1 ELSE 0 END) THEN 1 ELSE 0 END
, [Age] = CASE WHEN 3 = SUM(CASE WHEN [Age] IS NULL THEN 1 ELSE 0 END) THEN 1 ELSE 0 END
FROM dbo.test2
Query 2:
DECLARE
#TableName SYSNAME
, #SQL NVARCHAR(MAX)
SELECT #TableName = 'dbo.test2'
SELECT #SQL = 'SELECT' + CHAR(13) + STUFF((
SELECT CHAR(13) + ', [' + c.name + '] = ' +
CASE WHEN c.is_nullable = 0
THEN '0'
ELSE 'CASE WHEN '+
'MAX(CAST([' + c.name + '] AS CHAR(1))) IS NULL THEN 1 ELSE 0 END'
END
FROM sys.columns c WITH (NOWAIT)
WHERE c.[object_id] = OBJECT_ID(#TableName)
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, ' ') + CHAR(13) + 'FROM ' + #TableName
PRINT #SQL
EXEC sys.sp_executesql #SQL
Output 2:
SELECT
[ID] = 0
, [Name] = 0
, [IsCitizen] = CASE WHEN MAX(CAST([IsCitizen] AS CHAR(1))) IS NULL THEN 1 ELSE 0 END
, [Age] = CASE WHEN MAX(CAST([Age] AS CHAR(1))) IS NULL THEN 1 ELSE 0 END
FROM dbo.test2
Results:
ID Name IsCitizen Age
----------- ----------- ----------- -----------
0 0 0 1
Could you check if colums idexing will help you reach some performance improve
CREATE UNIQUE NONCLUSTERED INDEX IndexName ON dbo.TableName(ColumnName)
WHERE ColumnName IS NOT NULL;
GO
SQL server query to get the list of columns in a table along with Data types, NOT NULL, and PRIMARY KEY constraints
Run SQL in best answer of above questions and generate a new query like below.
Select ISNULL(column1,1), ISNULL(column2,1), ISNULL(column3,1) from table
You would not need to 'count' all of the 100M records. When you simply back out of the query with a TOP 1 as soon as you hit a column with a not-null value, would save a lot of time while providing the same information.
500 Columns?!
Ok, the right answer to your question is: normalize your table.
Here's what happening for the time being:
You don't have an index on that column so SQL Server has to do a full scan of your humongous table.
SQL Server will certainly fully read every row (it means every columns even if you're only interested in one).
And since your row are most likely over 8kb... http://msdn.microsoft.com/en-us/library/ms186981%28v=sql.105%29.aspx
Seriously, normalize your table and if needed split it horizontally (put "theme grouped" columns inside separate table, to only read them when you need them).
EDIT: You can rewrite your query like this
select count(col_n) from tab
and if you want to get all columns at once (better):
SELECT
COUNT(column_1) column_1_count,
COUNT(column_2) column_2_count,
...
FROM table_name
If most records are not null maybe you can mix some of the approach suggested (for example check only nullable fields) with this:
if exists (select * from table where field is not null)
this should speed up the search because exists stops the search as soon as condition is met, in this example a single not null record is enough to decide the status of the field.
If the field has an index this should be almost instant.
Normally adding top 1 to this query is not needed because the query optimizer knows that you do not need to retrieve all the matching records.
You can use this stored procedure to the trick You need to provide the table name you wish to query note that if you'll pass to procedure the #exec parameter = 1 it will execute the select query
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[SP_SELECT_NON_NULL_COLUMNS] ( #tablename varchar (100)=null, #exec int =0)
AS BEGIN
SET NOCOUNT ON
IF #tablename IS NULL
RAISERROR('CANT EXECUTE THE PROC, TABLE NAME IS MISSING',16 ,1)
ELSE
BEGIN
IF OBJECT_ID('tempdb..#table') IS NOT NULL DROP TABLE #table
DECLARE #i VARCHAR (max)=''
DECLARE #sentence VARCHAR (max)=''
DECLARE #SELECT VARCHAR (max)
DECLARE #LocalTableName VARCHAR(50) = '['+#tablename+']'
CREATE TABLE #table (ColumnName VARCHAR (max))
SELECT #i+=
' IF EXISTS ( SELECT TOP 1 '+column_name+' FROM ' +#LocalTableName+' WHERE ' +column_name+
' '+'IS NOT NULL) INSERT INTO #table VALUES ('''+column_name+''');'
FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name=#tablename
INSERT INTO #table
EXEC (#i)
SELECT #sentence = #sentence+' '+columnname+' ,' FROM #table
DROP TABLE #table
IF #exec=0
BEGIN
SELECT 'SELECT '+ LTRIM (left (#sentence,NULLIF(LEN (#sentence)-1,-1)))+
+' FROM ' +#LocalTableName
END
ELSE
BEGIN
SELECT #SELECT= 'SELECT '+ LTRIM (left (#sentence,NULLIF(LEN (#sentence)-1,-1)))+
+' FROM '+#LocalTableName
EXEC (#SELECT)
END
END
END
Use it like this:
EXEC [dbo].[SP_SELECT_NON_NULL_COLUMNS] 'YourTableName' , 1

SQL Server COALESCE function with WHERE clause

I want to concatenate my SQL Server table's column value in Varchar parameter. For that I am using the COALESCE function. But when I use it with Select statement with Where clause, I think it's not taking where clause conditions.
SELECT
#UserIds = COALESCE(#UserIds,'') + CONVERT(VARCHAR(MAX), UserID) +','
FROM vw_Users
WHERE GroupID = #GroupID
AND ISNULL(Active_yn, 'Y') = 'Y'
AND ISNULL(Delete_YN, 'N') = 'N'
So can anybody help me out on this?
If corecctly understood your problem, you can use CASE something like:
SELECT
CASE WHEN ISNULL(Active_yn, 'Y') = 'Y'
AND ISNULL(Delete_YN, 'N') = 'N'
AND GroupID = #GroupID THEN #UserIds = COALESCE(#UserIds,'') + CONVERT(VARCHAR(MAX), UserID) +','
ELSE Condition
END
FROM vw_Users
In this case It will concatenate only when all conditions meet, otherwise you can pass condition to ELSE.
Your query looks fine, where conditions are correct, just a small change in COALESCE function to concatenate "," only when needed. Here's a sample with data:
DECLARE #UserIds VARCHAR(100);
DECLARE #GroupID INT = 1;
WITH vw_Users (UserID, GroupID, Active_YN, Delete_YN) AS (
SELECT 1, 1, 'Y', 'N' UNION ALL -- valid value
SELECT 2, 1, 'Y', 'N' UNION ALL -- valid value
SELECT 3, 1, 'N', 'N' UNION ALL -- invalid value because Active_YN <> 'Y'
SELECT 4, 1, 'Y', 'Y' UNION ALL -- invalid value because Deleted_YN <> 'N'
SELECT 5, 2, 'Y', 'N' -- invalid value because GroupID <> 1
)
SELECT
#UserIds = COALESCE(#UserIds + ',','') + CONVERT(VARCHAR(MAX), UserID)
FROM vw_Users
WHERE GroupID = #GroupID
AND ISNULL(Active_yn, 'Y') = 'Y'
AND ISNULL(Delete_YN, 'N') = 'N'
SELECT #UserIds --> 1,2
You should try following,
Declare #cols nvarchar(max) = ''
SELECT #cols =
STUFF(( SELECT DISTINCT TOP 100 PERCENT
',' + CONVERT(VARCHAR(10), t2.UserID)
FROM vw_Users AS t2
WHERE GroupID = #GroupID
AND ISNULL(Active_yn, 'Y') = 'Y'
AND ISNULL(Delete_YN, 'N') = 'N'
FOR XML PATH('')
), 1, 1, '')
select #cols
Here is Sql fiddle

Resources