SSIS SCD wizard does a select * on my data - sql-server

Visual studio hangs on me when using the Slowly Changing Dimension Wizard.
I select the correct connection.
Then I try to open the dropdown 'Table or view' to select a destination table.
At this moment visual studio hangs on me..
I have this on all client machines and on different visual studio versions and only on this specific database. In activity monitor I noticed that the wizard does a select * on all tables in the database... I have one table that has +4billion rows (+300GB). It is the select * on this table that takes so long.
Does anybody have any idea what causes the select * on my database, or why they are doing this? And even better, how to fix this?

Don't use the slowly changing dimension wizard in SSIS at all. The data flow it creates performs really badly compared to what you can write with TSQL.
A couple of assumption; you need a type 2 SCD and you are using at least SQL Server 2008 with MERGE statements available.
Instead of SSIS use the OUTPUT clause of the MERGE statement within TSQL to perform the dimension update/insert. For example:
INSERT INTO Customer_Master
SELECT
Source_Cust_ID,
First_Name,
Last_Name,
Eff_Date,
End_Date,
Current_Flag
FROM
(
MERGE
Customer_Master CM
USING
Customer_Source CS
ON
CM.Source_Cust_ID = CS.Source_Cust_ID
WHEN NOT MATCHED
THEN
INSERT VALUES
(
CS.Source_Cust_ID,
CS.First_Name,
CS.Last_Name,
CONVERT(char(10), GETDATE()-1, 101),
'12/31/2199',
'y'
)
WHEN MATCHED
AND CM.Current_Flag = 'y'
AND (CM.Last_Name <> CS.Last_Name )
THEN
UPDATE
SET
CM.Current_Flag = 'n',
CM.End_date = convert(char(10), getdate()- 2, 101)
OUTPUT
$Action Action_Out,
CS.Source_Cust_ID,
CS.First_Name,
CS.Last_Name,
convert(char(10), getdate()-1, 101) Eff_Date,
'12/31/2199' End_Date,
'y' Current_Flag
) AS MERGE_OUT
WHERE
MERGE_OUT.Action_Out = 'UPDATE';
Source: http://www.kimballgroup.com/2008/11/design-tip-107-using-the-sql-merge-statement-for-slowly-changing-dimension-processing/

Related

Achieve the same result from this postgresql query in SQL Server 2017

I have the following query that runs in postgresql-9.6, I need to achieve the same output on a SQL Server DB.
Here is the query, I've replaced all fields from my DB with the string values that would come from them anyway (DB Fields are: "primary_key_fields", "primary_key_values", "table_name", "min_sequence"):
SELECT
UNNEST(STRING_TO_ARRAY(demo.primary_key_fields, ',')) AS primary_key_fields,
UNNEST(STRING_TO_ARRAY(demo.primary_key_values, ',')) AS primary_key_values,
table_name,
min_sequence,
ROW_NUMBER() OVER(partition by demo.primary_key_fields) AS rn
FROM (
SELECT
'Name,surname,age,location,id' AS primary_key_fields,
'Nash,Marley,27,South Africa,121' AS primary_key_values,
'person' AS table_name,
'1' AS min_sequence
UNION ALL
SELECT
'Name,surname,age,location,id' AS primary_key_fields,
'Paul,Scott,25,South America,999' AS primary_key_values,
'person' AS table_name,
'1' AS min_sequence
) demo
I'm expecting the following output:
Highly appreciate the assistance. I'm using SQL Server 2017.
No longer needed. This question can be closed. No solution was found, changed the source system to accomdate what was needed.

SQL Server STUFF String Concat Slow

Is there any alternative to the SQL Server STUFF function?
I am developing a Windows Service that loops over a database and does some data processing, but the step of fetching data is extremely slow.
I have these tables
Sensors table that define sensors config
Items table that records each item information from devices
Itemdata table that stores sensor values for each item row, so Itemdata table is linked to Sensors and Items tables
I need to select data from items with grouping itemsdata as col like this
1=5|2=6|
I use this T-SQL - it's working fine, but it's slow with more than 200,000 rows.
Without it, exec is extremely fast
With actual execution plan it take 99% in the stuff function:
I am using the following TSQL
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
IF (#dtFrom IS NOT NULL AND #dtTo IS NOT NULL)
BEGIN -- with both dates
SELECT
m.itemsId,
m.ObjectId,
0 AS [type],
STUFF((SELECT
(CAST(Sensors.SourceNameId AS nvarchar(10)) + '=' + CAST(t.Value AS nvarchar(20)) + '|')
FROM [tavl2].[tavl].[itemsData] t WITH (NOLOCK)
LEFT JOIN tavl2.tavl.Sensors WITH (NOLOCK) ON t.SensorsId = Sensors.SensorsId
WHERE t.itemsId = m.itemsId
FOR xml PATH (''), TYPE).value('.[1]', 'nvarchar(max)'), 1, 0, '') AS params
FROM
tavl.[items] m WITH (NOLOCK)
WHERE
m.ObjectId = #objId
AND m.GpsTime BETWEEN #dtFrom AND #dtTo
AND m.Valid = 1;
END
Any better solutions?
i Managed to do it using SQLCLR ,
Here is an open-source CLR That do it in a very fast time
GROUP_CONCAT string aggregate for SQL Server (hosted at codeplex)

Selecting Oracle Stored Procedure in SSRS Crashes Visual Studio 2005 IDE

Problem Conditions
I have a very simple Oracle (11g) Stored Procedure that is declared like so:
CREATE OR REPLACE PROCEDURE pr_myproc(L_CURSOR out SYS_REFCURSOR)
is
BEGIN
OPEN L_CURSOR FOR
SELECT * FROM MyTable;
END;
This compiles correctly. The cursor contains col1, col2 and col3.
In SSRS, i have a Shared Data Source that uses Oracle OLEDB Provider for Oracle 11g:
Provider=OraOLEDB.Oracle.1;Data Source=LIFEDEV
(Plus the user credentials).
What Works OK:
The stored procedure executes
correctly in PL/SQL Developer
The 'test connect' in works fine in SSRS
A query string of SELECT * FROM MyTable; with Command Type of 'text' produces the correct fields in the SSRS report.
.NET Oracle Provider instead of Oracle OLE DB Provider
What Fails:
If i change the Command Type to 'Stored Procedure' and enter 'pr_myproc', when I click 'OK' Visual Studio 2005 (service pack 2) simply hangs/crashes.
Does anyone have any knowledge/experience of this?
Any help would be most appreciated. Thanks.
FURTHER INFORMATION
I've modified the provider from the Oracle OLE DB Provider to the .NET Oracle Provider, and, magically, it works.
This would seem to indicate an issue with the Oracle provider.
Any more thoughts?
We got to the bottom of this.
On the environment where the procedure resided, we have a substantial data dictionary. The two providers when looking up information use two different queries.
Here is the one the Oracle Provider used, taking 10+ minutes:
select * from (select null PROCEDURE_CATALOG
, owner PROCEDURE_SCHEMA
, object_name PROCEDURE_NAME
, decode (object_type, 'PROCEDURE', 2, 'FUNCTION', 3, 1) PROCEDURE_TYPE
, null PROCEDURE_DEFINITION
, null DESCRIPTION
, created DATE_CREATED
, last_ddl_time DATE_MODIFIED
from all_objects where object_type in ('PROCEDURE','FUNCTION')
union all
select null PROCEDURE_CATALOG
, arg.owner PROCEDURE_SCHEMA
, arg.package_name||'.'||arg.object_name PROCEDURE_NAME
, decode(min(arg.position), 0, 3, 2) PROCEDURE_TYPE
, null PROCEDURE_DEFINITION
, decode(arg.overload, '', '', 'OVERLOAD') DESCRIPTION
, min(obj.created) DATE_CREATED
, max(obj.last_ddl_time) DATE_MODIFIED
from all_objects obj, all_arguments arg
where arg.package_name is not null
and arg.owner = obj.owner
and arg.object_id = obj.object_id
group by arg.owner, arg.package_name, arg.object_name, arg.overload ) PROCEDURES
WHERE PROCEDURE_NAME = '[MY_PROCEDURE_NAME]' order by 2, 3
More info can be found here

Making one table equal to another without a delete *

I know this is bit of a strange one but if anyone had any help that would be greatly appreciated.
The scenario is that we have a production database at a remote site and a developer database in our local office. Developers make changes directly to the developer db and as part of the deployment process a C# application runs and produces a series of .sql scripts that we can execute on the remote side (essentially delete *, insert) but we are looking for something a bit more elaborate as the downtime from the delete * is unacceptable. This is all reference data that controls menu items, functionality etc of a major website.
I have a sproc that essentially returns a diff of two tables. My thinking is that I can insert all the expected data in to a tmp table, execute the diff, and drop anything from the destination table that is not in the source and then upsert everything else.
The question is that is there an easy way to do this without using a cursor? To illustrate the sproc returns a recordset structured like this:
TableName Col1 Col2 Col3
Dest
Src
Anything in the recordset with TableName = Dest should be deleted (as it does not exist in src) and anything in Src should be upserted in to dest. I cannot think of a way to do this purely set based but my DB-fu is weak.
Any help would be appreciated. Apologies if the explanation is sketchy; let me know if you need anymore details.
Yeah, that sproc would work. Use a FULL JOIN with that table and add a column to indicate insert, update or delete. Then create separate SQL statements for them based on the column indicator. Set based.
Sorry not a FULL JOIN, you'll need to break them down to separate LEFT and RIGHT JOINS. Did this in NotePad, so apologies if it doesn't work:
INSERT INTO tempDeployData(ID,IUDType)
SELECT ed.id, 'D'
FROM tmpDeployData td
RIGHT JOIN existingData ed ON td.id = ed.id
WHERE td.id IS NULL
UPDATE td
SET td.IUDType = CASE WHEN ed.id IS NULL THEN
'I'
ELSE
'U'
END
FROM tmpDeployData td
LEFT JOIN existingData ed ON td.id = ed.id
INSERT INTO existingData(ID,a,b,c)
SELECT td.ID,td.a,td.b,td.c
FROM tmpDeployData td
WHERE td.IUDType = 'I'
DELETE ed
FROM existingData ed
INNER JOIN tmpDeployData td ON ed.ID = td.ID
WHERE td.IUDType = 'D'
UPDATE ed
SET ed.a = td.a,
ed.b = td.b,
ed.c = td.c
FROM existingData ed
INNER JOIN tmpDeployData td ON ed.ID = td.ID
WHERE td.IUDType = 'U'
Just realized you're pulling info into the temptable as a staging table, not the source of the data. In that case you can use the FULL JOIN:
INSERT INTO tmpDeployData(ID,a,b,c,IUDType)
SELECT sd.ID,
sd.a,
sd.b,
sd.c
'IUDType' = CASE WHEN ed.id IS NULL THEN
'I'
WHEN sd.id IS NULL THEN
'D'
ELSE
'U'
END
FROM sourceData sd
FULL JOIN existingData ed ON sd.id = ed.id
Then same DML statements as before.
took at tablediff
tables do not need to participate in replication to run the utility. there's a wonderful -f switch to generate t-sql to put the tables 'in-sync':
Generates a Transact-SQL script to
bring the table at the destination
server into convergence with the table
at the source server. You can
optionally specify a name and path for
the generated Transact-SQL script
file. If file_name is not specified,
the Transact-SQL script file is
generated in the directory where the
utility runs.
There's a much, much easier way to do this assuming you're using SQL Server 2008: The MERGE statement.
Migrating all changes from one table to another is as simple as:
MERGE DestinationTable d
USING SourceTable s
ON d.Id = s.Id
WHEN MATCHED THEN UPDATE
SET d.Col1 = s.Col1, d.Col2 = s.Col2, ...
WHEN NOT MATCHED BY TARGET THEN
INSERT (Id, Col1, Col2, ...)
VALUES (s.Id, s.Col1, s.Col2, ...)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
That's it. DestinationTable will be identical to SourceTable after that.
Why don't you just take a backup of the production database and restore it over your development database? You should have change scripts for all ddl differences from the production database that you can run on the database after the restore and it would test the deployment to production.
edit:
Sorry, just re-read your question, it looks like you are storing your configuration info in your development db and generating your change scripts from that so this wouldn't work.
I would recommend creating change scripts by hand and storing them in source control. Then use sqlcmd or osql and a batch file to run your change scripts on the database.

Hidden Features of SQL Server

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What are some hidden features of SQL Server?
For example, undocumented system stored procedures, tricks to do things which are very useful but not documented enough?
Answers
Thanks to everybody for all the great answers!
Stored Procedures
sp_msforeachtable: Runs a command with '?' replaced with each table name (v6.5 and up)
sp_msforeachdb: Runs a command with '?' replaced with each database name (v7 and up)
sp_who2: just like sp_who, but with a lot more info for troubleshooting blocks (v7 and up)
sp_helptext: If you want the code of a stored procedure, view & UDF
sp_tables: return a list of all tables and views of database in scope.
sp_stored_procedures: return a list of all stored procedures
xp_sscanf: Reads data from the string into the argument locations specified by each format argument.
xp_fixeddrives:: Find the fixed drive with largest free space
sp_help: If you want to know the table structure, indexes and constraints of a table. Also views and UDFs. Shortcut is Alt+F1
Snippets
Returning rows in random order
All database User Objects by Last Modified Date
Return Date Only
Find records which date falls somewhere inside the current week.
Find records which date occurred last week.
Returns the date for the beginning of the current week.
Returns the date for the beginning of last week.
See the text of a procedure that has been deployed to a server
Drop all connections to the database
Table Checksum
Row Checksum
Drop all the procedures in a database
Re-map the login Ids correctly after restore
Call Stored Procedures from an INSERT statement
Find Procedures By Keyword
Drop all the procedures in a database
Query the transaction log for a database programmatically.
Functions
HashBytes()
EncryptByKey
PIVOT command
Misc
Connection String extras
TableDiff.exe
Triggers for Logon Events (New in Service Pack 2)
Boosting performance with persisted-computed-columns (pcc).
DEFAULT_SCHEMA setting in sys.database_principles
Forced Parameterization
Vardecimal Storage Format
Figuring out the most popular queries in seconds
Scalable Shared Databases
Table/Stored Procedure Filter feature in SQL Management Studio
Trace flags
Number after a GO repeats the batch
Security using schemas
Encryption using built in encryption functions, views and base tables with triggers
In Management Studio, you can put a number after a GO end-of-batch marker to cause the batch to be repeated that number of times:
PRINT 'X'
GO 10
Will print 'X' 10 times. This can save you from tedious copy/pasting when doing repetitive stuff.
A lot of SQL Server developers still don't seem to know about the OUTPUT clause (SQL Server 2005 and newer) on the DELETE, INSERT and UPDATE statement.
It can be extremely useful to know which rows have been INSERTed, UPDATEd, or DELETEd, and the OUTPUT clause allows to do this very easily - it allows access to the "virtual" tables called inserted and deleted (like in triggers):
DELETE FROM (table)
OUTPUT deleted.ID, deleted.Description
WHERE (condition)
If you're inserting values into a table which has an INT IDENTITY primary key field, with the OUTPUT clause, you can get the inserted new ID right away:
INSERT INTO MyTable(Field1, Field2)
OUTPUT inserted.ID
VALUES (Value1, Value2)
And if you're updating, it can be extremely useful to know what changed - in this case, inserted represents the new values (after the UPDATE), while deleted refers to the old values before the UPDATE:
UPDATE (table)
SET field1 = value1, field2 = value2
OUTPUT inserted.ID, deleted.field1, inserted.field1
WHERE (condition)
If a lot of info will be returned, the output of OUTPUT can also be redirected to a temporary table or a table variable (OUTPUT INTO #myInfoTable).
Extremely useful - and very little known!
Marc
sp_msforeachtable: Runs a command with '?' replaced with each table name.
e.g.
exec sp_msforeachtable "dbcc dbreindex('?')"
You can issue up to 3 commands for each table
exec sp_msforeachtable
#Command1 = 'print ''reindexing table ?''',
#Command2 = 'dbcc dbreindex(''?'')',
#Command3 = 'select count (*) [?] from ?'
Also, sp_MSforeachdb
Connection String extras:
MultipleActiveResultSets=true;
This makes ADO.Net 2.0 and above read multiple, forward-only, read-only results sets on a single database connection, which can improve performance if you're doing a lot of reading. You can turn it on even if you're doing a mix of query types.
Application Name=MyProgramName
Now when you want to see a list of active connections by querying the sysprocesses table, your program's name will appear in the program_name column instead of ".Net SqlClient Data Provider"
TableDiff.exe
Table Difference tool allows you to discover and reconcile differences between a source and destination table or a view. Tablediff Utility can report differences on schema and data. The most popular feature of tablediff is the fact that it can generate a script that you can run on the destination that will reconcile differences between the tables.
Link
A less known TSQL technique for returning rows in random order:
-- Return rows in a random order
SELECT
SomeColumn
FROM
SomeTable
ORDER BY
CHECKSUM(NEWID())
In Management Studio, you can quickly get a comma-delimited list of columns for a table by :
In the Object Explorer, expand the nodes under a given table (so you will see folders for Columns, Keys, Constraints, Triggers etc.)
Point to the Columns folder and drag into a query.
This is handy when you don't want to use heinous format returned by right-clicking on the table and choosing Script Table As..., then Insert To... This trick does work with the other folders in that it will give you a comma-delimited list of names contained within the folder.
Row Constructors
You can insert multiple rows of data with a single insert statement.
INSERT INTO Colors (id, Color)
VALUES (1, 'Red'),
(2, 'Blue'),
(3, 'Green'),
(4, 'Yellow')
If you want to know the table structure, indexes and constraints:
sp_help 'TableName'
HashBytes() to return the MD2, MD4, MD5, SHA, or SHA1 hash of its input.
Figuring out the most popular queries
With sys.dm_exec_query_stats, you can figure out many combinations of query analyses by a single query.
Link
with the commnad
select * from sys.dm_exec_query_stats
order by execution_count desc
The spatial results tab can be used to create art.
enter link description here http://michaeljswart.com/wp-content/uploads/2010/02/venus.png
EXCEPT and INTERSECT
Instead of writing elaborate joins and subqueries, these two keywords are a much more elegant shorthand and readable way of expressing your query's intent when comparing two query results. New as of SQL Server 2005, they strongly complement UNION which has already existed in the TSQL language for years.
The concepts of EXCEPT, INTERSECT, and UNION are fundamental in set theory which serves as the basis and foundation of relational modeling used by all modern RDBMS. Now, Venn diagram type results can be more intuitively and quite easily generated using TSQL.
I know it's not exactly hidden, but not too many people know about the PIVOT command. I was able to change a stored procedure that used cursors and took 2 minutes to run into a speedy 6 second piece of code that was one tenth the number of lines!
useful when restoring a database for Testing purposes or whatever. Re-maps the login ID's correctly:
EXEC sp_change_users_login 'Auto_Fix', 'Mary', NULL, 'B3r12-36'
Drop all connections to the database:
Use Master
Go
Declare #dbname sysname
Set #dbname = 'name of database you want to drop connections from'
Declare #spid int
Select #spid = min(spid) from master.dbo.sysprocesses
where dbid = db_id(#dbname)
While #spid Is Not Null
Begin
Execute ('Kill ' + #spid)
Select #spid = min(spid) from master.dbo.sysprocesses
where dbid = db_id(#dbname) and spid > #spid
End
Table Checksum
Select CheckSum_Agg(Binary_CheckSum(*)) From Table With (NOLOCK)
Row Checksum
Select CheckSum_Agg(Binary_CheckSum(*)) From Table With (NOLOCK) Where Column = Value
I'm not sure if this is a hidden feature or not, but I stumbled upon this, and have found it to be useful on many occassions. You can concatonate a set of a field in a single select statement, rather than using a cursor and looping through the select statement.
Example:
DECLARE #nvcConcatonated nvarchar(max)
SET #nvcConcatonated = ''
SELECT #nvcConcatonated = #nvcConcatonated + C.CompanyName + ', '
FROM tblCompany C
WHERE C.CompanyID IN (1,2,3)
SELECT #nvcConcatonated
Results:
Acme, Microsoft, Apple,
If you want the code of a stored procedure you can:
sp_helptext 'ProcedureName'
(not sure if it is hidden feature, but I use it all the time)
A stored procedure trick is that you can call them from an INSERT statement. I found this very useful when I was working on an SQL Server database.
CREATE TABLE #toto (v1 int, v2 int, v3 char(4), status char(6))
INSERT #toto (v1, v2, v3, status) EXEC dbo.sp_fulubulu(sp_param1)
SELECT * FROM #toto
DROP TABLE #toto
In SQL Server 2005/2008 to show row numbers in a SELECT query result:
SELECT ( ROW_NUMBER() OVER (ORDER BY OrderId) ) AS RowNumber,
GrandTotal, CustomerId, PurchaseDate
FROM Orders
ORDER BY is a compulsory clause. The OVER() clause tells the SQL Engine to sort data on the specified column (in this case OrderId) and assign numbers as per the sort results.
Useful for parsing stored procedure arguments: xp_sscanf
Reads data from the string into the argument locations specified by each format argument.
The following example uses xp_sscanf
to extract two values from a source
string based on their positions in the
format of the source string.
DECLARE #filename varchar (20), #message varchar (20)
EXEC xp_sscanf 'sync -b -fproducts10.tmp -rrandom', 'sync -b -f%s -r%s',
#filename OUTPUT, #message OUTPUT
SELECT #filename, #message
Here is the result set.
-------------------- --------------------
products10.tmp random
Return Date Only
Select Cast(Floor(Cast(Getdate() As Float))As Datetime)
or
Select DateAdd(Day, 0, DateDiff(Day, 0, Getdate()))
dm_db_index_usage_stats
This allows you to know if data in a table has been updated recently even if you don't have a DateUpdated column on the table.
SELECT OBJECT_NAME(OBJECT_ID) AS DatabaseName, last_user_update,*
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID( 'MyDatabase')
AND OBJECT_ID=OBJECT_ID('MyTable')
Code from: http://blog.sqlauthority.com/2009/05/09/sql-server-find-last-date-time-updated-for-any-table/
Information referenced from:
SQL Server - What is the date/time of the last inserted row of a table?
Available in SQL 2005 and later
Here are some features I find useful but a lot of people don't seem to know about:
sp_tables
Returns a list of objects that can be
queried in the current environment.
This means any object that can appear
in a FROM clause, except synonym
objects.
Link
sp_stored_procedures
Returns a list of stored procedures in
the current environment.
Link
Find records which date falls somewhere inside the current week.
where dateadd( week, datediff( week, 0, TransDate ), 0 ) =
dateadd( week, datediff( week, 0, getdate() ), 0 )
Find records which date occurred last week.
where dateadd( week, datediff( week, 0, TransDate ), 0 ) =
dateadd( week, datediff( week, 0, getdate() ) - 1, 0 )
Returns the date for the beginning of the current week.
select dateadd( week, datediff( week, 0, getdate() ), 0 )
Returns the date for the beginning of last week.
select dateadd( week, datediff( week, 0, getdate() ) - 1, 0 )
Not so much a hidden feature but setting up key mappings in Management Studio under Tools\Options\Keyboard:
Alt+F1 is defaulted to sp_help "selected text" but I cannot live without the adding Ctrl+F1 for sp_helptext "selected text"
Persisted-computed-columns
Computed columns can help you shift the runtime computation cost to data modification phase. The computed column is stored with the rest of the row and is transparently utilized when the expression on the computed columns and the query matches. You can also build indexes on the PCC’s to speed up filtrations and range scans on the expression.
Link
There are times when there's no suitable column to sort by, or you just want the default sort order on a table and you want to enumerate each row. In order to do that you can put "(select 1)" in the "order by" clause and you'd get what you want. Neat, eh?
select row_number() over (order by (select 1)), * from dbo.Table as t
Simple encryption with EncryptByKey

Resources