I want to copy tables from a SQL Server to a data lake and dynamically add certain columns based on the data type of existing columns. The tables in question vary in row and column size, but the main goal is to transform any column of type geometry into two new columns, one with the WKT string of the geometry and one with SRID information about the geometry. Here is an example of such a table:
CREATE TABLE [dbo].[Stations](
[Id] [uniqueidentifier] NOT NULL,
[StationNumber] [nvarchar](256) NOT NULL,
[StationName] [nvarchar](256) NULL,
[Location] [geometry] NOT NULL,
[LocationTypeTypeListItemId] [uniqueidentifier] NOT NULL,
[mLastModified] [datetimeoffset](7) NOT NULL,
[LocationRef] [geometry] NULL)
I managed to write this snippet of T-SQL
SELECT STRING_AGG(select_string, ', ') AS sstring
FROM (
SELECT Object_Schema_name(c.object_id) as [SCHEMA_NAME]
, object_NAME(c.object_id) AS TABLE_NAME
, c.NAME AS COLUMN_NAME
, t.NAME AS DATA_TYPE
, CONCAT(c.NAME, ' AS ', c.NAME) AS select_string
FROM sys.all_columns c
INNER JOIN sys.types t ON t.user_type_id = c.user_type_id
where Object_Schema_name(c.object_id) <> 'sys'
and object_NAME(c.object_id) = 'Stations'
UNION
SELECT Object_Schema_name(c.object_id) as [SCHEMA_NAME]
, object_NAME(c.object_id) AS TABLE_NAME
, t.NAME AS DATA_TYPE
, CONCAT(c.NAME, '.STSrid AS ', c.NAME, 'Srid') AS select_string
FROM sys.all_columns c
INNER JOIN sys.types t ON t.user_type_id = c.user_type_id
where Object_Schema_name(c.object_id) <> 'sys'
and t.name = 'geometry'
and object_NAME(c.object_id) = 'Stations'
) AS temp
This generates a list with the desired columns as a long string with the format
"Id AS Id, StationNumber AS StationNumber, StationName AS StationName,
Location.STAsText() AS Location, Location.STSrid AS
LocationSRID, [...]"
Now, when I try to use the string in a copy data activity, I get the following error
The identifier that starts with '["Active AS Active",
" DistanceToOutletKm AS DistanceToOutletKm"," Id AS Id",
" Location AS Location"," Location.STSrid AS Locatio' is
too long. Maximum length is 128.
One of the queries I tried in the Copy Data Activity looks like this:
'SELECT ' + [#{split(activity('Lookup1').output.value[0].sstring, ',')}] +
' FROM ' + [#{item().table_schema}].[#{item().table_name}]
where the lookup activity executes the previous query to generate the list.
How can I get the copy data activity to understand that it needs to treat the string as a list of columns?
Use concat() function to add string values and variables in the expression.
#concat('select ', activity('Lookup1').output.value[0].sstring, ' from ',item().schema_name,'.',item().table_name)
The input expression value of the copy data source:
Related
I have the below code to pull the row and column counts from each table within a database (e.g., db1). But I have several databases (e.g., db1, db2 etc.) , so manually updating the database name in the USE statement for every run isn't very convenient. Is there a way to pass a list of database names in a cursor (or something else that allows iteration) and then run the below query for every database in the list, appending the results from each run? I can get the list of database names from this query select name from master.dbo.sysdatabases where name like '%db%'.
USE [db1]
;with [rowCount] as
(
SELECT DB_NAME() as [DB_Name],
QUOTENAME(SCHEMA_NAME(sOBJ.schema_id)) + '.' + QUOTENAME(sOBJ.name) AS [TableName],
SUM(sPTN.Rows) AS [RowCount]
FROM SYS.OBJECTS AS sOBJ
INNER JOIN SYS.PARTITIONS AS sPTN
ON sOBJ.object_id = sPTN.object_id
WHERE
sOBJ.type = 'U'
AND sOBJ.is_ms_shipped = 0x0
AND index_id < 2 -- 0:Heap, 1:Clustered
GROUP BY
sOBJ.schema_id
,sOBJ.name
)
,columnCount as
(
select
QUOTENAME(col.TABLE_SCHEMA) + '.' + QUOTENAME(col.TABLE_NAME) AS [TableName],
count(*) as ColumnCount
from INFORMATION_SCHEMA.COLUMNS col
inner join INFORMATION_SCHEMA.TABLES tbl
on col.TABLE_SCHEMA = tbl.TABLE_SCHEMA
and col.TABLE_NAME = tbl.TABLE_NAME
and tbl.TABLE_TYPE <> 'view'
group by
QUOTENAME(col.TABLE_SCHEMA) + '.' + QUOTENAME(col.TABLE_NAME)
)
select r.[DB_Name], r.TableName, r.[RowCount], c.ColumnCount
from [rowCount] r
inner join columnCount c
on r.TableName = c.TableName
ORDER BY r.[TableName]
This question already has an answer here:
Getting the result columns of table valued functions in SQL Server 2008 R2
(1 answer)
Closed 1 year ago.
By "result shape" I mean a list of column names and data types returned by the function. With a stored procedure I can use sys.dm_exec_describe_first_result_set_for_object:
SELECT name, system_type_name
FROM sys.dm_exec_describe_first_result_set_for_object
(
OBJECT_ID('[dbo].[MyProcedureName]'),
NULL
);
But if I call that system function with the name of a table valued function, I just get null.
Is there any way to do the same for a table valued function?
A table function cannot be used in that context, because it has no inherent "result", it is in essence a parameterized view.
sys.columns has all the details you need
SELECT c.name, t.name as system_type_name
FROM sys.columns c
JOIN sys.types t ON t.user_type_id = c.user_type_id
WHERE c.object_id = OBJECT_ID('[dbo].[MyProcedureName]');
Another option: you could use dm_exec_describe_first_result_set on a batch instead.
Note that you need to fill in parameters if necessary. They could just be default.
SELECT name, system_type_name
FROM sys.dm_exec_describe_first_result_set
(
'select * from [dbo].[MyProcedureName]()',
NULL,
1
);
-- for parameters:
SELECT name, system_type_name
FROM sys.dm_exec_describe_first_result_set
(
'select * from [dbo].[MyProcedureName2](default, default)',
NULL,
1
);
If you don't know how many parameters, you can use a little bit of dynamic SQL:
DECLARE #sql nvarchar(max) = N'SELECT * FROM ' + QUOTENAME(#functionName) + N'(' +
(
SELECT STRING_AGG(N'DEFAULT', N',')
FROM sys.parameters p
WHERE p.object_id = OBJECT_ID(#functionName)
) + N')';
SELECT name, system_type_name
FROM sys.dm_exec_describe_first_result_set
(
#sql,
NULL,
1
);
Good afternoon.
I am using SQL Server 2008/TSQL. It is probably important to note I do not have write access to any db (I am a read only user). I do not have write access to the db but can insert temp tables if absolutely needed.
I would like to preface what I am about to say with letting everyone know I have no formal education in SQL. Hopefully this makes sense - I may be inaccurate in some vocabulary etc.
Scope of what I am trying to do:
1. Select a specific recordID (value) in a column (primary key) from a table
2. Find where that specific number/recordID is used in all dependents/foreign keys
3. Return the tablename and columnname with a count of how many times that value was found
So, as an example...
You have a table with information on a person tied to a recordID, say something like:
dbo.MemberInfo with RecordID, Name Etc.
The ID number of the member (MemberInfo.RecordID) is used in other
tables, say: dbo.Awards as [HonoreeID]
(dbo.Awards.HonoreeID=MemberInfo.RecordID) dbo.Address as [MemberID]
(dbo.Address.MemberID=MemberInfo.RecordID) dbo.Contact as [PersonID]
(dbo.Contact.PersonID=MemberInfo.RecordID) ...and potentially a few
hundred others
I basically want to run through all the tables and see how many times a particular value/record is in use. Now, to add some complexity to this, it needs to be generic, as the column I may be looking up dependents on may change from day to day. (Ex. I may be looking for dependents of EventID tomorrow)
My current process is:
-Use a select to find the ID of the person I need
-Look at all the foreign keys linked to RecordID (Primary Key) of dbo.Members
-Dump the tablenames and columns of the foreign keys out into Excel
-Do a find and replace to make a bunch of SELECT COUNTS with a WHERE=#Variable
-Put it into SQL, define my variable and set it equal to the initial ID number
There has to be a better way. I have attempted many variations of the following with lots of errors and no success:
--DECLARE #Selected CHAR
SELECT T.Name, C.Name
--SET #Selected=(SELECT T.Name FROM sys.tables T)
--CASE WHEN (T.NAME IS NOT NULL) THEN 1
--ELSE '0' END AS 'MyTrial'
FROM
--sys.tables t
sys.foreign_key_columns AS fk
INNER JOIN sys.tables AS t ON fk.parent_object_id = t.object_id
INNER JOIN sys.columns AS c ON fk.parent_object_id = c.object_id AND fk.parent_column_id = c.column_id
WHERE Referenced_Object_ID=--Insert object ID here
My line of thinking was/is:
1. Query foreign keys used in a table/column, return the table name and column name from the dependent tables
2. Feed these results into something that can build me a new query to return a count of a value in each of the tables/columns where applicable
3. Return the tablename as well, so it can easily be fed into another select statement should I need to look at the details making up the count.
So my results might look something like this:
Tablename, Columnname, Count of Value in Column
Ideally, no value, table name etc. would be returned if the count is less than one.
My process may be extremely flawed out of the gate, but anything offered helps me learn. Thanks!
The script below uses no stored procedures but does use a temporary table. As far as I can conceive, there's no way around it. It runs very quickly for me, although I was working with a relatively small test database.
This is my first venture into dynamic SQL, so I can't testify to the safety of this code against SQL injection attacks and such. I, too, am self-taught.
Declare #Primary_Table varchar(100) = '';
Declare #Column_Name varchar(100) = '';
Declare #Specific_Value int = ;
IF(OBJECT_ID('tempdb..#Selected_Tables') is not null)
Begin
Drop Table #Selected_Tables;
End
Select Distinct T.Name as Table_Name
, C.Name as Column_Name
, CAST(null as bigint) as Referenced_Records
Into #Selected_Tables
FROM sys.foreign_key_columns AS fk
INNER JOIN sys.tables AS t
ON fk.parent_object_id = t.object_id
INNER JOIN sys.columns AS c
ON fk.parent_object_id = c.object_id
AND fk.parent_column_id = c.column_id
Inner Join sys.tables AS t2
ON fk.referenced_object_id = t2.object_id
Inner Join sys.columns AS c2
on fk.referenced_column_id = c2.column_id
Where t2.name = #Primary_Table
and c2.Name = #Column_Name;
Declare #sqlCommand nvarchar(max);
Declare #Unprocessed_Records int = (Select Count(1)
From #Selected_Tables
Where Referenced_Records is null);
Declare #Processing_Table varchar(1000) = (Select Top 1 Table_Name
From #Selected_Tables
Where Referenced_Records is null);
Declare #Processing_Column varchar(1000) = (Select Top 1 Column_Name
From #Selected_Tables
Where Referenced_Records is null
and Table_Name = #Processing_Table);
While #Unprocessed_Records > 0
Begin
Set #sqlCommand = 'Update #Selected_Tables '
+ 'Set Referenced_Records = (Select Count(1) '
+ 'From ' + #Processing_Table + ' '
+ 'Where ' + #Processing_Column + ' = ' + CAST(#Specific_Value as nvarchar(1000)) + ') '
+ 'Where Table_Name = ''' + #Processing_Table + ''' '
+ 'and Column_Name = ''' + #Processing_Column + ''';'
Exec (#sqlCommand);
Set #Unprocessed_Records = (Select Count(1)
From #Selected_Tables
Where Referenced_Records is null);
Set #Processing_Table = (Select Top 1 Table_Name
From #Selected_Tables
Where Referenced_Records is null);
Set #Processing_Column = (Select Top 1 Column_Name
From #Selected_Tables
Where Referenced_Records is null
and Table_Name = #Processing_Table);
End;
Select * From #Selected_Tables;
After searching for several ways of converting columns to rows using PIVOT, cross join etc my question still goes unanswered
I have a Result set which returns 1 row and 147 columns
ID | Name | DOB | BloodGroup | ...... 147 columns
1 XYZ 17MAY A+ ......
My aim is to convert this result set into 2 columns and 147 rows
Column_Name | Value
ID 1
NAME XYZ
: :
How should I go about it ? I appreciate your feedback
I took the second approach Gordon mentioned in his post, but built dynamic SQL from it. I CROSS JOINED the result of a few sys table JOINs and a source table, then built a CASE statement off the column names. I UNION it all together as dynamic SQL then EXECUTE it. To make it easy, I've made all the variable items into variables which you fill out at the beginning of the routine. Here's the code:
USE AdventureWorks2012;
GO
DECLARE #MySchema VARCHAR(100),
#MyTable VARCHAR(100),
#MyUIDColumn VARCHAR(100),
#MyFieldsMaxLength VARCHAR(10),
#SQL AS VARCHAR(MAX);
SET #MySchema = 'Person';
SET #MyTable = 'Person';
-- Unique ID which routine will identify unique entities by. Will also sort on this value in the end result dataset.
SET #MyUIDColumn = 'BusinessEntityID';
-- This determines the max length of the fields you will cast in your Value column.
SET #MyFieldsMaxLength = 'MAX';
WITH cteSQL
AS
(
SELECT 1 AS Sorter, 'SELECT c.name AS ColumnName,' AS SQL
UNION ALL
SELECT 2, 'CASE' AS Statement
UNION ALL
SELECT 3, 'WHEN c.name = ''' + c.name + ''' THEN CAST(mt.' + c.name + ' AS VARCHAR(' + #MyFieldsMaxLength + ')) '
FROM sys.tables t INNER JOIN sys.columns c
ON t.object_id = c.object_id
WHERE t.name = #MyTable
UNION ALL
SELECT 4, 'END AS Value' AS Statement
UNION ALL
SELECT 5, 'FROM sys.tables t INNER JOIN sys.columns c ON t.object_id = c.object_id INNER JOIN sys.schemas s ON t.schema_id = s.schema_id, ' + #MySchema + '.' + #MyTable + ' mt WHERE t.name = ''' + #MyTable + ''' AND s.name = ''' + #MySchema + ''' ORDER BY mt. ' + #MyUIDColumn + ', c.name;'
)
SELECT #SQL =
(
SELECT SQL + ' '
FROM cteSQL
ORDER BY Sorter
FOR XML PATH ('')
);
EXEC(#SQL);
I really can't say what execution time will be like. I ran it against AdventureWorks2012, Person.Person table (~20k rows, 13 columns) on my local machine and it brought back ~2.5 million rows in about 8 seconds, if that means anything. The good thing is that its flexible to take any table seamlessly. Anyway, just thought it was a fun puzzle so decided to play with it a bit. Hope it helps.
EDIT: Thinking about it, this is probably even slower than Gordon's proposed method, but I did it aready. Oh well. (Yeah, his method works in about half the time. Getting fancy didn't help me much.)
This is called unpivot. The easiest way, conceptually, is to do:
select 'id' as column_name, cast(id as varchar(255)) as column_value
from Result
union all
select 'name', name
from Result
union all
. . .
This can be cumbersome to type. If result is a table, you can use information_schema.columns to create the SQL, something like:
select 'select ''' + column_name + ''' as column_name, cast(' + column_name + ' as varchar(255)) as column_value from result union all'
from information_schema.columns
where table_name = 'Result'
This method is not the most efficient approach, because it requires reading the table for each column. For that unpivot is the better approach. The syntax is described here.
Thanks for the response.
I figured out a way of doing it.
I got all the column names in a comma separated string variable. 2. Passed the same string to the UNPIVOT object. By this approach, hard coding of the 140 column names was completely avoided.
I wanted to get the table name. I have the column name and when I try to look up at the Sys.Columns table I get the matching name of the column. How will I get the table name to which the required column is associated
SELECT OBJECT_SCHEMA_NAME(object_id) AS TableSchemaName,
OBJECT_NAME(object_id) AS TableName
FROM sys.columns
WHERE name = 'YourColumnName'
I hope this helps:
select t.name from sys.columns c
inner join sys.tables t
on c.object_id = t.object_id
where c.name = 'insert column name here'
select OBJECT_NAME(object_id) as TableName from sys.Columns where name='columnNamehere'
Try this
declare #columnName As varchar(50) = 'ParentColumnName'
select t.name from sys.tables t
join sys.columns c
on c.object_id = t.object_id
and c.name = #columnName
select name as 'TableName' from sys.tables where object_id=
(select object_id from sys.columns where name='UserName')
I know that this is an old question, but the answers listed to date do not get at what the parent table name is for a view's columns, nor if a column is aliased to have a new name with respect to the column name in the parent table.
Unfortunately, (at least in 2008R2) it seems that even with registering your Views to a Schema, the referencing_minor_id of sys.dm_sql_referenced_entities (or the equivalent column from sys.SQL_Modules) is always set to zero. However, you can retrieve all referred-to tables (and parent views), along with which fields of those tables are queried with sys.dm_sql_referenced_entities (or sys.SQL_Modules). However, it does not capture the oorder of those bindings, so the following won't quite work to link view columns directly to table columns, but it'll provide an approximation:
DECLARE #obj_name nvarchar(50) = 'Table_or_View_name_here';
with obj_id as (
select object_id, name, OBJECT_SCHEMA_NAME(object_id)+'.'+name as qualified_name from sys.all_objects as o
where o.name = #obj_name
),
tv_columns as ( -- table or view
select o.name as Obj_Name, c.* from sys.columns as c join
obj_id as o on
c.object_id=o.object_id
),
sql_referenced_entities as (
SELECT
o.name as referencing_name,
o.object_id,
COALESCE(NULLIF(referencing_minor_id,0),ROW_NUMBER() OVER (ORDER BY (select 'NO_SORT'))) as referencing_minor_id,
referenced_server_name,
referenced_database_name,
referenced_schema_name,
referenced_entity_name,
referenced_minor_name,
referenced_id,
referenced_minor_id,
referenced_class,
referenced_class_desc,
is_caller_dependent,
is_ambiguous
FROM obj_id as o, sys.dm_sql_referenced_entities((select qualified_name from obj_id), 'OBJECT') where referenced_minor_id<>0
)
select
c.object_id as object_id,
o.name as object_name,
c.column_id,
c.name as column_name,
c2.object_id as parent_table_object_id,
o2.name as parent_table_name,
c2.column_id as parent_column_id,
c2.name as parent_column_name
-- ,c.*,
-- ,c2.*
from sys.columns as c join
obj_id as o on
c.object_id=o.object_id left outer join
(sql_referenced_entities as s join
sys.all_objects as o2 on
s.referenced_id=o2.object_id and s.referenced_class=1 join
sys.columns as c2 on
s.referenced_id=c2.object_id and s.referenced_minor_id=c2.column_id
) on
c.object_id=s.object_id and c.column_id=s.referencing_minor_id
To get the true aliases used, as well as any calculations involving the combinations of multiple fields, you would have to parse the output of either OBJECT_DEFINITION(OBJECT_ID('schema.view')) (or potentially exec sp_helptext 'schema.view'), as follows
Starting with OBJECT_DEFINITION(OBJECT_ID('schema.view'))
Mark what is enclosed in single quotes as superseeding the removal and other rules to follow
Remove blocks between /* */ comments
Remove any text after --, up to the next linebreak sequence (see EXEC SP_HELPTEXT 'sp_helptext' for their end-of-line code)
Look up table/subselect aliases from the FROM clause
Parse out the SELECT clause and break on commas, rather than end-of-line.
Reduce any contiguous whitespace to a single space character. Force whitespace before [ and after ] when a dot/period (or whitespace) don't already appear.
We'll put the above into a stored procedure that we'll call usp_helptext_for_view
See which table_alias.field_name are aliased or simply appear as field_name. See the below code snippet to see how to seperate the field alias from the definition.
Link the view to the table as appropriate.
drop table #s
create table #s (id bigint identity(1,1) primary key, text nvarchar(max))
insert into #s (text) exec usp_helptext_for_view #qualified_viewname
with s as (select
id,
text,
az=(select
MIN(x*(case x when 0 then null else 1 end)) FROM (VALUES
(charindex(#viewfieldname,text COLLATE Latin1_General_CI_AS)),
(charindex('['+#viewfieldname+']',text COLLATE Latin1_General_CI_AS)),
(charindex('AS ['+#viewfieldname+']',text COLLATE Latin1_General_CI_AS)),
(charindex('as '+#viewfieldname,text COLLATE Latin1_General_CI_AS))
) AS value(x)
),
NULLIF(charindex('=',text),0)) as eq --oh, the irony of how the two different styles are applied
FROM #s
)
SELECT
#viewfieldname as ViewField,
CASE eq WHEN NULL
THEN IIF(az IS NULL, NULL, LEFT(text, az-2))
ELSE RIGHT(text,LENGTH(text)-eq) -- alternately ELSE CASE az WHEN NULL THEN NULL WHEN <eq THEN RIGHT(text,LENGTH(text)-eq) ELSE NULL END
END as ViewFieldDefinition,
id as sortPosition
FROM s
WHERE text like '%'+#viewfieldname+'%' -- you should be able to eliminate this clause without affecting the results.
ORDER BY id, #viewfieldname