How to load data in different servers - sql-server

I am designing an ETL project on SSIS and I want it to be dynamic. I will use this project for many customers therefore I will query these extractions against different servers.
For example, I have this query in a step with "execute SQL task" component :
INSERT DataWarehouse.schema.fact1
SELECT *
FROM Database.schema.table1
My datawarehouse is always in localhost But "Database.schema.table1" could be in different servers therefore I will have Different linkservers in our customer's servers to retrieve its data.
This means for example I will need the query change like this for customer1 :
INSERT DataWarehouse.schema.fact1
SELECT *
FROM [192.168.1.100].Database.schema.table1
And for customer2 I will need the query to be like this :
INSERT DataWarehouse.schema.fact1
SELECT *
FROM [10.2.5.100].Database.schema.table1
I've tried extract and loading with SSIS components but because of my complex queries, It became so messy.
Any ideas how to make my query dynamic?

As per this link Changing Properties of a Linked Server in SQL Server
One way to solve your problem is to make sure that the linked server logical name is always the same, regardless of what the actual physical host is.
So the process here would be:
Create the linked server with the linked server wizard
Use this to rename the server to a consistent name that can be used in your code
i.e.
EXEC master.dbo.sp_serveroption
#server=N'192.168.1.100',
#optname=N'name',
#optvalue=N'ALinkedServer'
Now you can refer to ALinkedServer in your code
A better way is to script the linked server creation properly - don't use the SSMS wizard
Here's the template - you need to do more research to fund out the correct values here
USE master;
GO
EXEC sp_addlinkedserver
#server = 'ConsistentServerName',
#srvproduct = 'product name',
#provider = 'provider name',
#datasrc = 'ActualPhysicalServerName',
#location = 'location',
#provstr = 'provider string',
#catalog = 'catalog';
GO
But the last word is: Don't use linked servers. Use SSIS

I would suggest you to do the below steps to execute same statement across multiple servers. As suggested by #Nick.McDermaid, I would strongly recommend against linked server. It is better to go for exact server name in SSIS.
Put the INSERT statement into a separate variable
Create a foreach container in SSIS.
Inside foreach containter, have a script task and get the current server name from the list of servernames. You can have comma separated list of servernames and get current one.
Again, inside foreach container, create Execute Process Task & call Sqlcmd.exe with connection information specific to each server, based on the server name got in Step No. 3, using SSIS expressions. Refer to this Stackoverflow post on using expressions for Execute ProcessTask for more information on calling Execute process task in SSIS.

How about making a SSIS package that works for one of your systems.
Parameterize your working package to accept a connection string
create another package that loops thru your connection strings and calls your working package and passes the conn string

Related

SSIS - Dynamically loop over multiple databases

I have to consolidate data from from 1000+ databases having the same structure/tables in one unique DB.
DBs may be added and removed potentially on a daily basis so I need to retrieve the list of DBs dynamically and run the dynamically generated SQL query to extract data on each of them.
I designed the Data Flow with a query from a variable that is working fine if executed with a static value:
With a SQL task I get the list of instances, I loop over the them and with a nested Foreach Loop/SQL task I retrieve the database names and create the dynamic SQL with the following statement (DB name is anonymized):
SELECT 'select ''' + name + ''' as DatabaseName, ID from ' + name + '.[dbo].[Orders] as querytext FROM sys.databases WHERE name LIKE ( 'XXX%_%' );
This part is also working fine:
How can I use the result of the SQL task "Execute SQL Task - Get query text" as query to be executed in the Source "OLE DB Source 1" (part of "Data Flow Task 3")?
I tried mapping an Object variable "User::SqlCommandFromSQLTask" in the result set of the SQL task, then set it up as ADO object source variable and with a Script task convert it to string and pass the value to the variable SqlStringFromSQLTask3 (used as source in "OLE DB Source 1") but I get the error Violation of PRIMARY KEY constraint, like if the data flow is always running with a static value I set up as default:
While, if I remove the value from the variable panel, I get the error "Command text was not set for the command object.", even changing the property DelayValidation of the Data Flow to false.
Any help is much appreciated.
When I have used SSIS to connect to multiple SQL Server boxes, I have stored those SQL Server connection strings in a table in a central database. Then I use a query of that table as the input to the foreach loop data flow task. If we ever have to change a sql server connection string, which does happen, we just update that table with the newest value.

How can I query over all db of my server without looping over DB in pymssql connection

I'd like first to know how to make a query over all the databases in my server instance with pymssql (in MSSQL management studio = right click --> new query on the server thumbnail then don't need to specify the name of the db in the query - it just gives you one more column in the output which is the segment from which the record is from). Then how do you do the same as registered servers on two or multiple hosts (I have 2 hosts and I want to pass the same query do I really need to make the two connections ?)
thanks
You could use sp_foreachdb, like this:
EXECUTE master.sys.sp_MSforeachdb 'USE [?]; EXEC update table set foo = bar'
Maybe this can help you (but - to be honest - I did not really understand what you want :-) )
SELECT * FROM sys.databases

Dynamic SQL without having to use fully qualified table names in SQL (Openrowset?)

I have a large set of pre-existing sql select statements.
From a stored procedure on [Server_A], I would like to execute each of these statements on multiple different SQL Servers & Databases (the list is stored in a local table on [Server_A] , and return the results into a table on [Server_A].
However, I do not want to have to use fully qualified table names in my sql statements. I want to execute "select * from users", not "select * from ServerName.DatabaseName.SchemaName.Users"
I've investigated using Openrowset, but I am unable to find any examples where both the Server name and DatabaseName can be specified as an attribute of the connection, rather than physically embedded within the actual SQL statement.
Is Openrowset capable of this? Is there an alternate way of doing this (from within a stored procedure, as opposed to resorting to Powershell or some other very different approach?)
The inevitable "Why do I want to do this?"
You can do it (specify the server and database in the connection
attributes and then use entirely generic sql across all databases) in
virtually every other language that accesses SQL Server.
Changing all my pre-existing complex SQL to be fully qualified is a
huge PITA (besides, you simply shouldn't have to do this)
This can be done quite easily via SQLCLR. If the result set is to be dynamic then it needs to be a Stored Procedure instead of a TVF.
Assuming you are doing a Stored Procedure, you would just:
Pass in #ServerName, #DatabaseName, #SQL
Create a SqlConnection with a Connection String of: String.Concat("Server=", ServerName.Value, "; Database=", DatabaseName.Value, "; Trusted_Connection=yes; Enlist=false;") or use ConnectionStringBuilder
Create a SqlCommand for that SqlConnection and using SQL.Value.
Enable Impersonation via SqlContext.WindowsIdentity.Impersonate();
_Connection.Open();
undo Impersonation -- was only needed to establish the connection
_Reader = Command.ExecuteReader();
SqlContext.Pipe.Send(_Reader);
Dispose of Reader, Command, Connection, and ImpersonationContext in finally clause
This approach is less of a security issue than enabling Ad Hoc Distributed Query access as it is more insulated and controllable. It also does not allow for a SQL Server login to get elevated permissions since a SQL Server login will get an error when the code executes the Impersonate() method.
Also, this approach allows for multiple result sets to be returned, something that OPENROWSET doesn't allow for:
Although the query might return multiple result sets, OPENROWSET returns only the first one.
UPDATE
Modified pseudo-code based on comments on this answer:
Pass in #QueryID
Create a SqlConnection (_MetaDataConnection) with a Connection String of: Context Connection = true;
Query _MetaDataConnection to get ServerName, DatabaseName, and Query based on QueryID.Value via SqlDataReader
Create another SqlConnection (_QueryConnection) with a Connection String of: String.Concat("Server=", _Reader["ServerName"].Value, "; Database=", _Reader["DatabaseName"].Value, "; Trusted_Connection=yes; Enlist=false;") or use ConnectionStringBuilder
Create a SqlCommand (_QueryCommand) for _QueryConnection using _Reader["SQL"].Value.
Using _MetaDataConnection, query to get parameter names and values based on QueryID.Value
Cycle through SqlDataReader to create SqlParameters and add to _QueryCommand
_MetaDataConnection.Close();
Enable Impersonation via SqlContext.WindowsIdentity.Impersonate();
_QueryConnection.Open();
undo Impersonation -- was only needed to establish the connection
_Reader = _QueryCommand.ExecuteReader();
SqlContext.Pipe.Send(_Reader);
Dispose of Readers, Commands, Connections, and ImpersonationContext in finally clause
If you want to execute a sql statement on every database in a instance you can use (the unsupported, unofficial, but widely used) exec sp_MSforeachdb like this:
EXEC sp_Msforeachdb 'use [?]; select * from users'
This will be the equivalent of going through every database through a
use db...
go
select * from users
This is an interesting problem because I googled for many, many hours, and found several people trying to do exactly the same thing as asked in the question.
Most common responses:
Why would you want to do that?
You can not do that, you must fully qualify your objects names
Luckily, I stumbled upon the answer, and it is brutally simple. I think part of the problem is, there are so many variations of it with different providers & connection strings, and there are so many things that could go wrong, and when one does, the error message is often not terribly enlightening.
Regardless, here's how you do it:
If you are using static SQL:
select * from OPENROWSET('SQLNCLI','Server=ServerName[\InstanceName];Database=AdventureWorks2012;Trusted_Connection=yes','select top 10 * from HumanResources.Department')
If you are using Dynamic SQL - since OPENROWSET does not accept variables as arguments, you can use an approach like this (just as a contrived example):
declare #sql nvarchar(4000) = N'select * from OPENROWSET(''SQLNCLI'',''Server=Server=ServerName[\InstanceName];Database=AdventureWorks2012;Trusted_Connection=yes'',''#zzz'')'
set #sql = replace(#sql,'#zzz','select top 10 * from HumanResources.Department')
EXEC sp_executesql #sql
Noteworthy: In case you think it would be nice to wrap this syntax up in a nice Table Valued function that accepts #ServerName, #DatabaseName, #SQL - you cannot, as TVF's resultset columns must be determinate at compile time.
Relevant reading:
http://blogs.technet.com/b/wardpond/archive/2005/08/01/the-openrowset-trick-accessing-stored-procedure-output-in-a-select-statement.aspx
http://blogs.technet.com/b/wardpond/archive/2009/03/20/database-programming-the-openrowset-trick-revisited.aspx
Conclusion:
OPENROWSET is the only way that you can 100% avoid at least some full-qualification of object names; even with EXEC AT you still have to prefix objects with the database name.
Extra tip: The prevalent opinion seems to be that OPENROWSET shouldn't be used "because it is a security risk" (without any details on the risk). My understanding is that the risk is only if you are using SQL Server Authentication, further details here:
https://technet.microsoft.com/en-us/library/ms187873%28v=sql.90%29.aspx?f=255&MSPPError=-2147217396
When connecting to another data source, SQL Server impersonates the login appropriately for Windows authenticated logins; however, SQL Server cannot impersonate SQL Server authenticated logins. Therefore, for SQL Server authenticated logins, SQL Server can access another data source, such as files, nonrelational data sources like Active Directory, by using the security context of the Windows account under which the SQL Server service is running. Doing this can potentially give such logins access to another data source for which they do not have permissions, but the account under which the SQL Server service is running does have permissions. This possibility should be considered when you are using SQL Server authenticated logins.

Running SQL Transport Schema Generation Wizard with datetime parameter in biztalk

I am trying to run the SQL Transport Schema Generation Wizard against a SQL 2012 server. The stored procedure has a datetime parameter. If i simply put in a date like 12/26/2013 05:00:00 Am, then the "Generate" button doesn't show an argument. If i try putting the date/time in a single quote or using a string like 2013-12-26T05:00:00.000, the parameter is generated, but i get the following error when i try to execute. "Failed to execute SQL Statement. Please ensure that the supplied syntax is correct. "
I got to this point by creating a SQL Query that output it's response using FOR XML AUTO, ELEMENTS in it. I then open my BizTalk solution in VS 2012, Go to "Add Items -> Add Generated Items". Select Add Adapter Metadata. From there, it asks the location of the message box. I use my local server. It then asks for the connection string for the SQL Server with the stored procedure. I enter that (it's the same as the server with the message box). I specify the namespace and the root element name for the document. This is set as a receive port. I next select stored procedure and move to the next screen. I then select the stored proc from a drop down list. Below, in a grid, i am shown the parameters for the stored proc. Here is where i am having trouble. I cannot seem to get it to accept the datetime argument no matter what i put in here.
Is there something i am doing wrong?
It is better to do the following steps and to use the new WCF-SQL rather than the old deprecated SQL adapter.
Add Items
Add Generated Items
Consume Adapter Service
Select sqlBinding and Configure the URI
Click Connect
Select Client (Outbound operations)
Select Strongly-Type Procedures
Select the Stored Procedure from Available categories and operations
Click Add
Give it a Filename Prefix
Click OK
This will generated the schemas plus binding files to create the port.
You also don't need to have the FOR XML AUTO, ELEMENTS in your stored procedure any more.

SQL Server Linked Server Example Query

While in Management Studio, I am trying to run a query/do a join between two linked servers.
Is this a correct syntax using linked db servers:
select foo.id
from databaseserver1.db1.table1 foo,
databaseserver2.db1.table1 bar
where foo.name=bar.name
Basically, do you just preface the db server name to the db.table ?
The format should probably be:
<server>.<database>.<schema>.<table>
For example:
DatabaseServer1.db1.dbo.table1
Update: I know this is an old question and the answer I have is correct; however, I think any one else stumbling upon this should know a few things.
Namely, when querying against a linked server in a join situation the ENTIRE table from the linked server will likely be downloaded to the server the query is executing from in order to do the join operation. In the OP's case, both table1 from DB1 and table1 from DB2 will be transferred in their entirety to the server executing the query, presumably named DB3.
If you have large tables, this may result in an operation that takes a long time to execute. After all it is now constrained by network traffic speeds which is orders of magnitude slower than memory or even disk transfer speeds.
If possible, perform a single query against the remote server, without joining to a local table, to pull the data you need into a temp table. Then query off of that.
If that's not possible then you need to look at the various things that would cause SQL server to have to load the entire table locally. For example using GETDATE() or even certain joins. Others performance killers include not giving appropriate rights.
See http://thomaslarock.com/2013/05/top-3-performance-killers-for-linked-server-queries/ for some more info.
SELECT * FROM OPENQUERY([SERVER_NAME], 'SELECT * FROM DATABASE_NAME..TABLENAME')
This may help you.
For those having trouble with these other answers , try OPENQUERY
Example:
SELECT * FROM OPENQUERY([LinkedServer], 'select * from [DBName].[schema].[tablename]')
If you still find issue with <server>.<database>.<schema>.<table>
Enclose server name in []
You need to specify the schema/owner (dbo by default) as part of the reference. Also, it would be preferable to use the newer (ANSI-92) join style.
select foo.id
from databaseserver1.db1.dbo.table1 foo
inner join databaseserver2.db1.dbo.table1 bar
on foo.name = bar.name
select * from [Server].[database].[schema].[tablename]
This is the correct way to call.
Be sure to verify that the servers are linked before executing the query!
To check for linked servers call:
EXEC sys.sp_linkedservers
right click on a table and click script table as select
select name from drsql01.test.dbo.employee
drslq01 is servernmae --linked serer
test is database name
dbo is schema -default schema
employee is table name
I hope it helps to understand, how to execute query for linked server
Usually direct queries should not be used in case of linked server because it heavily use temp database of SQL server. At first step data is retrieved into temp DB then filtering occur. There are many threads about this. It is better to use open OPENQUERY because it passes SQL to the source linked server and then it return filtered results e.g.
SELECT *
FROM OPENQUERY(Linked_Server_Name , 'select * from TableName where ID = 500')
For what it's worth, I found the following syntax to work the best:
SELECT * FROM [LINKED_SERVER]...[TABLE]
I couldn't get the recommendations of others to work, using the database name. Additionally, this data source has no schema.
In sql-server(local) there are two ways to query data from a linked server(remote).
Distributed query (four part notation):
Might not work with all remote servers. If your remote server is MySQL then distributed query will not work.
Filters and joins might not work efficiently. If you have a simple query with WHERE clause, sql-server(local) might first fetch entire table from the remote server and then apply the WHERE clause locally. In case of large tables this is very inefficient since a lot of data will be moved from remote to local. However this is not always the case. If the local server has access to remote server's table statistics then it might be as efficient as using openquery More details
On the positive side T-SQL syntax will work.
SELECT * FROM [SERVER_NAME].[DATABASE_NAME].[SCHEMA_NAME].[TABLE_NAME]
OPENQUERY
This is basically a pass-through. The query is fully processed on the remote server thus will make use of index or any optimization on the remote server. Effectively reducing the amount of data transferred from the remote to local sql-server.
Minor drawback of this approach is that T-SQL syntax will not work if the remote server is anything other than sql-server.
SELECT * FROM OPENQUERY([SERVER_NAME], 'SELECT * FROM DATABASE_NAME.SCHEMA_NAME.TABLENAME')
Overall OPENQUERY seems like a much better option to use in majority of the cases.
I have done to find out the data type in the table at link_server using openquery and the results were successful.
SELECT * FROM OPENQUERY (LINKSERVERNAME, '
SELECT DATA_TYPE, COLUMN_NAME
FROM [DATABASENAME].INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME =''TABLENAME''
')
Its work for me
Following Query is work best.
Try this Query:
SELECT * FROM OPENQUERY([LINKED_SERVER_NAME], 'SELECT * FROM [DATABASE_NAME].[SCHEMA].[TABLE_NAME]')
It Very helps to link MySQL to MS SQL
PostgreSQL:
You must provide a database name in the Data Source DSN.
Run Management Studio as Administrator
You must omit the DBName from the query:
SELECT * FROM OPENQUERY([LinkedServer], 'select * from schema."tablename"')
For MariaDB (and so probably MySQL), attempting to specify the schema using the three-dot syntax did not work, resulting in the error "invalid use of schema or catalog". The following solution worked:
In SSMS, go to Server Objects > Linked Servers > Providers > MSDASQL
Ensure that "Dynamic parameter", "Level zero only", and "Allow inprocess" are all checked
You can then query any schema and table using the following syntax:
SELECT TOP 10 *
FROM LinkedServerName...[SchemaName.TableName]
Source: SELECT * FROM MySQL Linked Server using SQL Server without OpenQuery
Have you tried adding " around the first name?
like:
select foo.id
from "databaseserver1".db1.table1 foo,
"databaseserver2".db1.table1 bar
where foo.name=bar.name

Resources