Selecting sample rows from queried list of tables in Snowflake - snowflake-cloud-data-platform

On the newbie struggle bus and using Snowflake. I queried to find all of the tables in a database that have a column name containing a certain phrase. Is there a simple-ish way to select 10 (or any number, just something) rows from each of these tables in a single query?
What I’m working with so far:
Use database ‘mydatabase’;
Select table_name, column_name
From informationschema.columns
Where column_name like ‘%phrase%’
Let’s say I get 15 tables in the result. I’d like to see a small sample of records from each of those tables so I can confirm what actual values are in those ‘%phrase%’ columns. I’d LOVE to do it without individually querying each table in the list.
Any suggestions for someone who has given up on Googling how to do it?

This will take some steps.
First lets construct a query that will sample 10 rows from a list of tables:
with list_cols as (
select table_name, column_name
from information_schema.columns
where column_name like '%WEEK%'
), tables as (
select distinct table_name
from list_cols
)
select 'select * from ' || listagg(x, ' union all \n')
from (
select '(select \''|| table_name || '\' table_name, object_construct(x.*) sample_row from ' || table_name || ' x limit 10)' x
from tables
)
The above query will return a value like:
select * from (select 'CLAIRE_DECAYING' table_name, object_construct(x.*) sample_row from CLAIRE_DECAYING x limit 10) union all
(select 'CLAIRE_DECAYING_HUGE' table_name, object_construct(x.*) sample_row from CLAIRE_DECAYING_HUGE x limit 10) union all
(select 'DECAY_PUZZLE' table_name, object_construct(x.*) sample_row from DECAY_PUZZLE x limit 10) union all
(select 'DECAY_PUZZLE_100M' table_name, object_construct(x.*) sample_row from DECAY_PUZZLE_100M x limit 10) union all
(select 'DECAY_PUZZLE_10M' table_name, object_construct(x.*) sample_row from DECAY_PUZZLE_10M x limit 10) union all
(select 'QL2_CARS_WEEKLY_VIZ' table_name, object_construct(x.*) sample_row from QL2_CARS_WEEKLY_VIZ x limit 10) union all
(select 'SAMPLE_TABLE' table_name, object_construct(x.*) sample_row from SAMPLE_TABLE x limit 10) union all
(select 'QL2_CARS_WEEKLY' table_name, object_construct(x.*) sample_row from QL2_CARS_WEEKLY x limit 10)
Which is a query that you can run to get 10 sample rows of each input able.
Note that we use object_construct(*) to get around the need for each select to have the same # of columns: The whole row gets represented in one column.
Then to have that query run and get its output, you can either copy paste it and run it, or capture that value and execute immediately:
https://docs.snowflake.com/en/sql-reference/sql/execute-immediate.html

Related

SQL Server : efficient way to find missing Ids

I am using SQL Server to store tens of millions of records. I need to be able to query its tables to find missing rows where there are gaps in the Id column, as there should be none.
I am currently using a solution that I have found here on StackOverflow:
CREATE PROCEDURE [dbo].[find_missing_ids]
#Table NVARCHAR(128)
AS
BEGIN
DECLARE #query NVARCHAR(MAX)
SET #query = 'WITH Missing (missnum, maxid) '
+ N'AS '
+ N'('
+ N' SELECT 1 AS missnum, (select max(Id) from ' + #Table + ') '
+ N' UNION ALL '
+ N' SELECT missnum + 1, maxid FROM Missing '
+ N' WHERE missnum < maxid '
+ N') '
+ N'SELECT missnum '
+ N'FROM Missing '
+ N'LEFT OUTER JOIN ' + #Table + ' tt on tt.Id = Missing.missnum '
+ N'WHERE tt.Id is NULL '
+ N'OPTION (MAXRECURSION 0);';
EXEC sp_executesql #query
END;
This solution has been working very well, but it has been getting slower and more resource intensive as the tables have grown. Now, running the procedure on a table of 38 million rows is taking about 3.5 minutes and lots of CPU.
Is there a more efficient way to perform this? After a certain range has been found to not contain any missing Ids, I no longer need to check that range again.
JBJ's answer is almost complete. The query needs to return the From and Through for each range of missing values.
select B+1 as [From],A-1 as[Through]from
(select StuffID as A,
lag(StuffID)over(order by StuffID)as B from Stuff)z
where A<>B+1
order by A
I created a test table with 50 million records, then deleted a few. The first row of the result is:
From Through
33 35
This indicates that all IDs in the range from 33 through 35 are missing, i.e. 33, 34 and 35.
On my machine the query took 37 seconds.
try
select pId
from (select Id, lag(Id) over (order by Id) pId from yourschema.yourtable) e
where pId <> (Id-1)
order by Id
replacing yourschema.yourtable with the appropriate table information
Try this solution, it will be faster than CTE.
;WITH CTE AS
(
SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT NULL)) RN
FROM ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v(id) --10 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v1(id)--100 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v2(id) --1000 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v3(id) --10000 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v4(id)--100000 ROWS
CROSS JOIN ( values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) v5(id)--1000000 ROWS
)
SELECT RN AS Missing
FROM CTE C
LEFT JOIN YOURABLE T ON T.ID=R.ID
WHERE T.ID IS NULL
If you want you can use master..[spt_values] also to generate the number like following.
SELECT (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) RN
FROM master..[spt_values] T1
CROSS JOIN (select top 500 * from master..[spt_values]) T2
Above query will generate 1268500 numbers
Note: You need to add the CROSS JOIN as per your requirement.

Display All Columns from all Table by using Union ALL with Different no of Columns in Each Table

I have Three Tables with Different no of Columns. e.g T1(C1), T2(C1,C2,C3), T3(C1,C4). I want to generate a Dynamic SQL that will create a View like
CREATE VIEW [dbo].[vwData]
AS
SELECT C1,NULL AS C2,NULL AS C3,NULL AS C4
FROM DBO.T1
UNION ALL
SELECT C1,C2,C3,NULL AS C4
FROM DBO.T2
UNION ALL
SELECT C1,NULL AS C2,NULL AS C3,C4
FROM DBO.T3
I have achieved this goal by using two nested loop by Checking Each column If It is Existed in a table or not.
But in Production we have around 30 tables with around 60 Columns in Each table.
Create of Dynamic SQL is taking around 7 minutes and this is not Acceptable to us. We want to improve performance Further.
Immediate help would be highly appreciated.
Here's some Dynamic SQL which would create and execute what you describe. How does this compare to your current SQL's performance?
Fiddle: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=800747a3d832e6e29a15484665f5cc8b
declare #tablesOfInterest table(tableName sysname, sql nvarchar(max))
declare #allColumns table(columnName sysname)
declare #sql nvarchar(max)
insert #tablesOfInterest(tableName) values ('table1'), ('table2')
insert #allColumns (columnName)
select distinct c.name
from sys.columns c
where c.object_id in
(
select object_id(tableName)
from #tablesOfInterest
)
update t
set sql = 'select ' + columnSql + ' from ' + quotename(tableName)
from #tablesOfInterest t
cross apply
(
select string_agg(coalesce(quotename(c.Name), 'null') + ' ' + quotename(ac.columnName), ', ') within group (order by ac.columnName)
from #allColumns ac
left outer join sys.columns c
on c.object_id = object_id(t.tableName)
and c.Name = ac.columnName
) x(columnSql)
select #sql = string_agg(sql, ' union all ')
from #tablesOfInterest
print #sql
exec (#sql)
As mentioned in the comments, rather than running this dynamic SQL every time you need to execute this query, you could use it to generate a view which you can then reuse as required.
Adding indexes and filters to the underlying tables as appropriate could further improve performance; but without knowing more of the context, we can't give much advise on specifics.
You might try this:
I use some general tables where I know, that they share some of their columns to show the principles. Just replace the tables with your own tables:
Attention: I do not use these INFORMATION_SCHEMA tables to read their content. They serve as examples with overlapping columns...
DECLARE #statement NVARCHAR(MAX);
WITH cte(x) AS
(
SELECT
(SELECT TOP 1 * FROM INFORMATION_SCHEMA.TABLES FOR XML AUTO, ELEMENTS XSINIL,TYPE) AS [*]
,(SELECT TOP 1 * FROM INFORMATION_SCHEMA.COLUMNS FOR XML AUTO, ELEMENTS XSINIL,TYPE) AS [*]
,(SELECT TOP 1 * FROM INFORMATION_SCHEMA.ROUTINES FOR XML AUTO, ELEMENTS XSINIL,TYPE) AS [*]
--add all your tables here...
FOR XML PATH(''),TYPE
)
,AllColumns AS
(
SELECT DISTINCT a.value('local-name(.)','nvarchar(max)') AS ColumnName
FROM cte
CROSS APPLY x.nodes('/*/*') A(a)
)
,AllTables As
(
SELECT a.value('local-name(.)','nvarchar(max)') AS TableName
,a.query('*') ConnectedColumns
FROM cte
CROSS APPLY x.nodes('/*') A(a)
)
SELECT #statement=
STUFF((
(
SELECT 'UNION ALL SELECT ' +
'''' + TableName + ''' AS SourceTableName ' +
(
SELECT ',' + CASE WHEN ConnectedColumns.exist('/*[local-name()=sql:column("ColumnName")]')=1 THEN QUOTENAME(ColumnName) ELSE 'NULL' END + ' AS ' + QUOTENAME(ColumnName)
FROM AllColumns ac
FOR XML PATH('root'),TYPE
).value('.','nvarchar(max)') +
' FROM ' + REPLACE(QUOTENAME(TableName),'.','].[')
FROM AllTables
FOR XML PATH(''),TYPE).value('.','nvarchar(max)')
),1,10,'');
EXEC( #statement);
Short explanation:
The first row of each table will be tranformed into an XML. Using AUTO-mode will use the table's name in the <root> and add all columns as nested elements.
The second CTE will create a distinct list of all columns existing in any of the tables.
the third CTE will extract all Tables with their connected columns.
The final SELECT will use a nested string-concatenation to create a UNION ALL SELECT of all columns. The existance of a given name will decide, whether the column is called with its name or as NULL.
Just use PRINT to print out the #statement in order to see the resulting dynamically created SQL command.

Creating an array of column names in Microsoft Server SQL

I'm fairly new to SQL, roughly a week of using it.
I'm trying to figure out a way to create an array of column names.
Through research, I've found a way to select column names and a way to select an nth row. However, I need some way to combine these two.
Here is the following code for each:
Selecting columns:
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = N'tablename'
Selecting nth row:
SELECT * FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY Starting) NUM,
* FROM tablename
) A
WHERE NUM = 1
Is there a way to combine the two so that I can get a particular value for the nth row for the first select command (column names)?
SELECT COLUMN_NAME FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY ORDINAL_POSITION) Num, COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = N'tablename'
) A
WHERE NUM = 1
ORDINAL_POSITION is nothing but "column identification number".

Simple query gone wrong

I must be having an off day. This should be obvious but I don't get it.
-- check for necessary updates to dbnotes
select count(distinct table_name)
from ccsv4.[INFORMATION_SCHEMA].[COLUMNS]
returns 46
select count(distinct table_name)
from dbnotes
returns 44
select distinct table_name
from ccsv4.[INFORMATION_SCHEMA].[COLUMNS]
where table_name not in (select distinct table_name from dbnotes)
order by table_name
returns nothing
select distinct table_name
from dbnotes
where table_name not in (select distinct table_name
from ccsv4.[INFORMATION_SCHEMA].[COLUMNS])
order by table_name
returns nothing
What am I missing guys?
You are using not in. If any value from the subquery is NULL, nothing will be returned.
With a subquery, always use not exists. It has the right semantics:
select distinct table_name
from ccsv4.[INFORMATION_SCHEMA].[COLUMNS] c
where not exists (select 1
from dbnotes d
where d.table_name = c.table_name
);
I am pretty sure that tables have to have at least one column, so you might as well use information_schema.tables instead. It saves you the distinct:
select table_name
from ccsv4.information_schema.tables t
where not exists (select 1
from dbnotes d
where d.table_name = t.table_name
);

Dynamic sql to convert column names with one row into table with 2 columns and several rows

After searching for several ways of converting columns to rows using PIVOT, cross join etc my question still goes unanswered
I have a Result set which returns 1 row and 147 columns
ID | Name | DOB | BloodGroup | ...... 147 columns
1 XYZ 17MAY A+ ......
My aim is to convert this result set into 2 columns and 147 rows
Column_Name | Value
ID 1
NAME XYZ
: :
How should I go about it ? I appreciate your feedback
I took the second approach Gordon mentioned in his post, but built dynamic SQL from it. I CROSS JOINED the result of a few sys table JOINs and a source table, then built a CASE statement off the column names. I UNION it all together as dynamic SQL then EXECUTE it. To make it easy, I've made all the variable items into variables which you fill out at the beginning of the routine. Here's the code:
USE AdventureWorks2012;
GO
DECLARE #MySchema VARCHAR(100),
#MyTable VARCHAR(100),
#MyUIDColumn VARCHAR(100),
#MyFieldsMaxLength VARCHAR(10),
#SQL AS VARCHAR(MAX);
SET #MySchema = 'Person';
SET #MyTable = 'Person';
-- Unique ID which routine will identify unique entities by. Will also sort on this value in the end result dataset.
SET #MyUIDColumn = 'BusinessEntityID';
-- This determines the max length of the fields you will cast in your Value column.
SET #MyFieldsMaxLength = 'MAX';
WITH cteSQL
AS
(
SELECT 1 AS Sorter, 'SELECT c.name AS ColumnName,' AS SQL
UNION ALL
SELECT 2, 'CASE' AS Statement
UNION ALL
SELECT 3, 'WHEN c.name = ''' + c.name + ''' THEN CAST(mt.' + c.name + ' AS VARCHAR(' + #MyFieldsMaxLength + ')) '
FROM sys.tables t INNER JOIN sys.columns c
ON t.object_id = c.object_id
WHERE t.name = #MyTable
UNION ALL
SELECT 4, 'END AS Value' AS Statement
UNION ALL
SELECT 5, 'FROM sys.tables t INNER JOIN sys.columns c ON t.object_id = c.object_id INNER JOIN sys.schemas s ON t.schema_id = s.schema_id, ' + #MySchema + '.' + #MyTable + ' mt WHERE t.name = ''' + #MyTable + ''' AND s.name = ''' + #MySchema + ''' ORDER BY mt. ' + #MyUIDColumn + ', c.name;'
)
SELECT #SQL =
(
SELECT SQL + ' '
FROM cteSQL
ORDER BY Sorter
FOR XML PATH ('')
);
EXEC(#SQL);
I really can't say what execution time will be like. I ran it against AdventureWorks2012, Person.Person table (~20k rows, 13 columns) on my local machine and it brought back ~2.5 million rows in about 8 seconds, if that means anything. The good thing is that its flexible to take any table seamlessly. Anyway, just thought it was a fun puzzle so decided to play with it a bit. Hope it helps.
EDIT: Thinking about it, this is probably even slower than Gordon's proposed method, but I did it aready. Oh well. (Yeah, his method works in about half the time. Getting fancy didn't help me much.)
This is called unpivot. The easiest way, conceptually, is to do:
select 'id' as column_name, cast(id as varchar(255)) as column_value
from Result
union all
select 'name', name
from Result
union all
. . .
This can be cumbersome to type. If result is a table, you can use information_schema.columns to create the SQL, something like:
select 'select ''' + column_name + ''' as column_name, cast(' + column_name + ' as varchar(255)) as column_value from result union all'
from information_schema.columns
where table_name = 'Result'
This method is not the most efficient approach, because it requires reading the table for each column. For that unpivot is the better approach. The syntax is described here.
Thanks for the response.
I figured out a way of doing it.
I got all the column names in a comma separated string variable. 2. Passed the same string to the UNPIVOT object. By this approach, hard coding of the 140 column names was completely avoided.

Resources