SSIS - How to improve a work flow - sql-server

Previously I have asked for a possible solution for a situation that I had to face in order to implement a sql query (which is implementing originally in access). I have reach a solution (after asking a lot) but I would like to know if anyone has another way to execute this query.
I have got two different tables, one in sql and another in oracle (S and O)
O(A, B, C) => PK=(A,B) and S(D,E,F) => PK = (D,E)
The query looks like this
SELECT A,B,C,E,F
FROM S INNER JOIN O ON
S.D = O.A (Only one attribute of the PK in O)
S has over 10.000 registers and O more than 700 millions. Given this, is not logic to implement a merge join, or a look up because I will have only the first match between D and A.
So I thought that it will be better to assemble the query in the Oracle side. To do this I have implemented an scheme like this.
With the sql I have executed this query:
with tmp(A) as ( select distinct D as A from S
)
select cast( select concat(' or A = ', A)
from tmp
for xml path('')) as nvarchar(max)) as ID
I am getting a string with the values that I gonna search on oracle.
Finally in the data flow, I am creating an expression like this:
select A, B, C
from O
where A= '' + #ID
I downnload this values to sql server and then I am able to manipulate them as I wish.
The use of the foreach loop was necessary because I am storing the string of sql inside an object variable. I found that SSIS has some troubles with the nvarchar(max) variables.
Some considerations:
1) The Oracle database is administered for another area of the company and they only gives reading permissions over the tables.
2) The DBA of the sql server does not allow to download the O table on a staging area. Not possibilities of negotiations with him, besides, this tabla is updated every day with more registers. He only manages this server and does not have any authority over Oracle.
3) The solution that was given for some members of my team was to create a query in oracle between different tables that can give me the attributes of O that I need, as a result I could get more than 3 millions of register and not all of the attributes A are presented in S. Even more, some the values of D has been manipulated, so possibly they are not going to be present in O.
With this implementation I am getting more than 150.000 registers from Oracle. But I would like to know if another solution can be implemented or if there are other components that I can use to reach the same results. Believe me when I say that I have read, asked and searched a lot before to implement this flow.

EDITED:
Option 1 (You say that you cannot use this solution – but it would be the first one – the best)
Use a DBLink to let Oracle access S table (you must use Oracle Database Gateway). Create a view in Oracle joining O and S. And finally use linked server to let SQL Server access the Oracle Joining view and get the results.
The process is as follow:
You must convince your Oracle DBA to configure the Oracle Database Gateway for SQL Server (see
http://docs.oracle.com/cd/B28359_01/gateways.111/b31043/conf_sql.htm#CIHGADGB)
. When it is properly configured then you can create a DBLink from SQL Server to Oracle. With the DBLink Oracle will have have a direct
access to S table.
Now create a view V just joining O and S table.
As you want the result back in your SQL Server and you cannot use
SSIS then you can proceed as described in:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/111df59c-b309-4d59-b56c-9cd5574ee181/how-to-access-oracle-table-from-sql-server-?forum=transactsql
Option 2 (You say that you cannot use this solution – but it would be the second one)
As your Oracle admins seem to be monsters that will kill you if they get their paws on you. Then you can try (if they let you create a table in oracle):
Create a linked server in SQL Server (to access Oracle from SQL
Server). As I mentioned in the "normal case".
And Create a (temporary) table in Oracle schema with only 1 column (it will store D values from SQL Server)
Everytime you need to evaluate your query execute in SQL Server:
INSERT INTO ORACLE_LINKED_SERVER.ORACLE_OWNER.TEMP_TABLE
SELECT DISTINCT D FROM S;
SELECT * FROM OPENQUERY('SELECT * FROM ORACLE_OWNER.O WHERE A IN (SELECT D FROM ORACLE_OWNER.TEMP_TABLE)');
And finally don't forget to delete the Oracle's temp table:
DELETE * FROM ORACLE_LINKED_SERVER.ORACLE_OWNER.TEMP_TABLE;
Option 3 (If you have an Oracle license and one available host)
You can install your own Oracle server in your host and use Option 2.
Option 4
If your solution is really the only way out, then let's try to improve it a little bit.
As you know, your solution works but it is a little bit aggressive (you are transforming a relational algebra semijoin operator into a relational algebra selection operator with a monster condition). You say that the Oracle table is updated everyday with more register, but if the update rate of your tables are lower than your query rate then you can create a result cache that you can use while the tables S or O are not changed.
Proceed as follows:
Create a table in your SQL Server to store the Oracle result of your monster query. And before build and launch your query execute this:
SELECT last_user_update
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID( 'YourDatabaseName')
AND OBJECT_ID=OBJECT_ID('S')
This returns the most recent time when your table S was update. Store this value in a table (create a new table or store this value in a typical parameter table).
Create your monster query. But before launch it, send this query to Oracle:
SELECT MAX(ORA_ROWSCN)
FROM O;
It returns the last SCN (System Change Number) that cause a change in the table. Store this value in a table (create a new table or store this value in a typical parameter table).
Launch the big query and store its result into the cache table.
Finally, when you need to repeat the big query, first execute in your SQL Server:
SELECT last_user_update
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID( 'YourDatabaseName')
AND OBJECT_ID=OBJECT_ID('S')
And execute in Oracle:
SELECT MAX(ORA_ROWSCN)
FROM O;
If one or both values have changed with respect the one you have stored in your parameter table, then you must store them in the parameters table (updating the old values) and launch again the big query. But if none of the values have changed, then your cache is up to date, and you can use it.
Note that
SCN is not absolutely precise, but it is a good approximation (see: http://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns007.htm)
The greater is your query rate with respect to your update rate, the better is this solution.
If you can tolerate working with old values, then you can improve the cache with expiration time.

Related

How to edit records in SQL-Server stored procedure

I would like to know the secret of how SQL statements in SQL-Server go from being read-only to editable. Right click on any table, and the interface gives the option of "Selecting" or "Editing" records. Is there a property in the SQL statement that designates the recordset as editable or read-only?
I will use the simplest possible example: I have designed a table with two fields: an integer field, designated as an identity and a unique index. The second is an nvarchar, designed for manual editing. Writing a query window, I write a SQL statement for the table, and I am not able to edit the text field. Also, Stored procedures, which I favor because I can evoke them with the greatest effeciency, also renders an uneditable recordset. The only way I have found to succeed is in SSMS, when choosing the edit feature on a table.
I use Microsoft Access extensively, and all the tables that Access hosts are linked to SQL-Server tables. When I use the Microsoft Access JET engine to write queries on these same tables, I can edit the recordsets the queries generate, but not when I use pass-through queries to evoke the same contents in a table function or stored procedure. With no table joins, no calculated fields, nor anything else that would impose a known reason for me not to be able to edit the recordsets, the inability poses a barrier to producing some of my deliverables.
Thanks, in advance, for your support. Here are quick examples:
Select
IDField
, TextField
From
SampTable
Create Procedure TestProc
AS
BEGIN
Select
IDField
, TextField
From
SampTable
END
Create FUNCTION [dbo].[TestFunction]()
RETURNS TABLE
AS
RETURN
(
Select
IDField
, TextField
From
SampTable
)
SQL Server is not the same kind of thing as MS Access. MS access is a combination of front-end and back end at the same time, which is nice and easy for users, and does have its place. It's like a souped up version of excel with some very limited multi user functionality. But with SQL Server, the expectation is that you are splitting the responsibilities between front end and back end.
Yes, SSMS does provide the ability to right click a table (or a view referencing one table) and "edit top 200 rows". Honestly, I wish it didn't. It shouldn't.
If you have an access "front end" using linked tables in SQL Server in the "back end", that's similar functionality. And yeah, there are some limited uses cases where that's an appropriate sort of solution, ideally as a temporary thing. But really, if you're putting data into SQL Server, the expectation is that you're building some kind of "real" user interface, which uses DML statements constructed by the application, or stored procedure execution, or some kind of ORM and DBContext, to modify the data. Even in MS Access, you should switch from direct table editing to forms.
The reason why you can't edit the results of a stored procedure or function is that the output of those objects is just a temporary copy of the data. It's not the "actual data in the tables". And, if you think about it, how could it be? For example, imagine if I wrote a stored procedure like this:
create table t (i int primary key, j int);
create procedure p as begin
select total_j = sum(j) from t;
end
When I run that stored procedure I'm going to get a single value which is the sum of j across all rows. How could I edit this value? If I changed it from, say, 100 to 200, what does that mean in terms of the contents of column j in the table? Do I add 100 to some arbitrary row? Do I add 1 to each of the first 100 rows in order of the primary key? The concept becomes incoherent.
I know what you're thinking: "But what if my stored procedure doesn't aggregate? Surely then the data that comes back can really just be a "pointer" to the data in the table, not a copy?". And yeah, in principle that could be true. But think about the implications of that. While you're looking at the results, can anyone else change the underlying data in the table? Can you both change it at the same time? Who decides how to resolve that problem - the SQL engine? Can someone else drop the table while you are editing data? And so on and so forth.
It's the wrong way to think about SQL Server (or any "real" database engine). The data you see as the result of a select is read from the tables, and sent over the network to the client as your own personal copy. It is no longer connected to the tables it came from.
Oh... and in case you're wondering how you can edit the data "directly in the tables" if you're using linked tables in MS Access: you still can't. Access does some work under the covers for you. To prove this, try linking a SQL Server table to MS Access, then pulling up the row in access, and starting to edit it. Then, before finishing your edit, go in to SSMS and update the row you are editing in access. Then try to save your changes in Access.

what would be the best way to copy table from server A to server B in sql 2012?

I have table A that is on server 1 and table B that is on server 2.
Table contain around 1.5 million rows.
What would be the fastest way to copy table A to server B? On nightly basis.
Or what would be the fastest way to bring only records that changed in table A and bring it to table B?
So far I tried MERGE along with HASHBYTES function to only capture records that changed. It works perfectly if target and source tables are on the same server. (takes approx 1 min).
But if target is on server B but the source is on server A - than it takes more than 15 min.
What is on your opinion the best and fastest technique for such operations?
Some sorts of replications? Or maybe SSIS would be the best for that?
My 2 cents. Since you qualified your question with "On nightly basis", I'd say do this in SSIS.
I would use SSIS, it is designed to do fast large data copies between servers.
Also, if you can drop table B then you could try using SELECT INTO rather than INSERT INTO.
SELECT INTO is much faster as it is minimally logged but note that table B will be locked while the insert is running.
You could also try disabling indexes on Table B before you insert and re enabling them later.

SQL Server : figure out the tables behind a report, daily email? (I have the to db)?

I have access to the database in SQL Server Management Studio, I can see all the tables.
We have a daily report sent by email - however we want to know what the SQL query behind the report is, as we cannot get hold of the developers.
Hence I found out the foreign keys and which primary keys they are related to.. but half way through I've come across columns and there doesn't seem to be a key associated with them.
I do not have the time to go through 150+ tables.
How can I find out which table the value has come from without a key?, should there always be a key? Can I search through the entire database, all of the tables for a value in that column so I may find the offending tables - wherever they are?
Help - it's like reverse engineering and taking too long... please
On a Microsoft SQL Server you can use SQL Server Profiler to log all DB queries. If you know the time of the day the report is populated, run the trace at that time, and you'll be able to see the exact SQL statements used for it.
See https://youtu.be/IaxG6jbNuj8
If the report is generated from a stored procedure, then finding the stored procedure would give you all of that info.
This might help you find the stored procedure:
select *
from sysobjects so
inner join syscomments sc on so.id = sc.id
where sc.text like '%columnname%'
and xtype = 'P'
Just put in some search strings (maybe the column outputs) between the % signs.

optimizing a lookup task in ssis

I got this doubt about this kind of queries. I am migrating an ETL from Access to SSIS. One query involves an Inner Join with a table in an Oracle Database:
SELECT
SQL_TABLE.COLUMN1,
SQL_TABLE.COLUMN2,
ORACLE_TABLE.COLUMN5,
ORACLE_TABLE.COLUMN6
FROM
SQL_TABLE INNER JOIN ORACLE_TABLE ON
SQL_TABLE.ID_PPAL = ORACLE_TABLE.IDENTIF
WHERE
(((ORACLE_TABLE.COLUMN6) Is Not Null));
The issue is, the Oracle table has more than 18 million registers and the sql table has less than 300 records. The Inner Join should gives something like 2500 records as a result.
First I tried using a merge join task as you can see in the picture, but this is not efficient at all because of the characteristics of the tables, but looking for a possible situation someone proposed me using a look up task, but this only gives me one record for every match it founds, and this is not useful for me, I can not lose any record.
I wonder if is there another way to perform this query, because I can not believe that access would be more efficient than SSIS in this aspect.
In my experience SQL Server will not optimize queries involving Oracle. The fastest approach I found was 1) Use Oracle Drivers to access data from SSIS. 2) Use fast load (with table lock) to load the Oracle table (with a where condition if appropriate) into a SQL Server Work Table. 3) Create a clustered index the table. 4) Do the join. If you are going to reuse the package you will want to truncate the work table and drop the index as the first two steps of the package.
You should check any filters or try to do joins in Oracle database and thus leaking a little. If the result is incorrect, try using variables to store data and create scripts.
This can serve you:
http://www.bidn.com/blogs/ShawnHarrison/ssis/4579/looping-through-variable-values-with-a-foreach-loop-container

How do I filter one of the columns in a SQL Server SQL Query

I have a table (that relates to a number of other tables) where I would like to filter ONE of the columns (RequesterID) - that column will be a combobox where only people that are not sales people should be selectable.
Here is the "unfiltered" query, lets call it QUERY 1:
SELECT RequestsID, RequesterID, ProductsID
FROM dbo.Requests
If using a separate query, lets call it QUERY 2, to filter RequesterID (which is a People related column, connected to People.PeopleID), it would look like this:
SELECT People.PeopleID
FROM People INNER JOIN
Roles ON People.RolesID = Roles.RolesID INNER JOIN
Requests ON People.PeopleID = Requests.RequesterID
WHERE (Roles.Role <> N'SalesGuy')
ORDER BY Requests.RequestsID
Now, is there a way of "merging" the QUERY 2 into QUERY 1?
(dbo.Requests in QUERY 1 has RequesterID populated as a Foreign Key from dbo.People, so no problem there... The connections are all right, just not know how to write the SQL query!)
UPDATE
Trying to explain what I mean in a bit more... :
The result set should be a number of REQUESTS - and the number of REQUESTS should not be limited by QUERY 2. QUERY 2:s only function is to limit the selectable subset in column Requests.RequesterID - and no, it´s not that clear, but in the C# VS2008 implementation I use Requests.RequesterID to eventually populate a ComboBox with [Full name], which is another column in the People table - and in that column I don´t want SalesGuy to show up as possible to select; here I´m trying to clear it out EVEN MORE... (but with wrong syntax, of course)
SELECT RequestsID, (RequesterID WHERE RequesterID != 8), ProductsID
FROM dbo.Requests
Yes, RequesterID 8 happens to be the SalesGuy :-)
here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions. This main thing you need to be concerned with is not the duplication of code, but the use of an index. If your query fails to use an index, it will preform poorly. There are several techniques that can be used, which may or may not allow an index to be used.
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
if you are on the proper version of SQL Server 2008, there is an additional technique that can be used, see: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
If you are on that proper release of SQL Server 2008, you can just add OPTION (RECOMPILE) to the query and the local variable's value at run time is used for the optimizations.
Consider this, OPTION (RECOMPILE) will take this code (where no index can be used with this mess of ORs):
WHERE
(#search1 IS NULL or Column1=#Search1)
AND (#search2 IS NULL or Column2=#Search2)
AND (#search3 IS NULL or Column3=#Search3)
and optimize it at run time to be (provided that only #Search2 was passed in with a value):
WHERE
Column2=#Search2
and an index can be used (if you have one defined on Column2)
How about this? Since the query already joins on the requests table you can simply add the columns to the select-list like so :
SELECT Requests.RequestsID, Requests.RequesterID, Requests.ProductsID
FROM People INNER JOIN
Roles ON People.RolesID = Roles.RolesID INNER JOIN
Requests ON People.PeopleID = Requests.RequesterID
WHERE (Roles.Role <> N'SalesGuy')
ORDER BY Requests.RequestsID
You can in fact select any column from any of the joined tables (Roles, Requests, People, etc.)
It becomes clear if you just replace People.PeopleId with * and it will show you everything retrieved from the tables.

Resources