Call stored procedure from SSIS Dataflow - sql-server

The question in short:
Can I call a stored procedure that has an output parameter in a data flow?
In long:
I have many tables to extract, transform, and load from one db to another one.
Almost all of the tables require one transformation which is fixing the country codes (from 3 letters to two). So my idea is as follows:
for each row: call the stored procedure, pass the wrong country code, replace the wrong code with the correct one (the output of the stored procedure)

There are at least two solutions for this:
Look-Up component: configuring it in advance mode and make sure the last sentence of the SProc is the Select statement that returns the good country code (e.g. SELECT #good_country_code)
Using an OLEDB Command
The latter (OLEDB Command) is actually quite simple, you need to configure it with:
EXEC ? = dbo.StoredProc #param1 = ?, #param2 = ?
As a consequence a #RETURN_VALUE will appear on the Available Destination Columns which you can then map to an existing column in the pipeline. Remember to create a new pipeline field/column (e.g. Good_Country_Code) using a Derived Column component before the OLEDB component and you'll have the chance to have both values, or replace the wrong one using another Derived Column component after OLEDB Command.

No, natively there isn't a component that is going to handle that. You can accomplish it with a Script Component but you don't want to.
What you're describing is a Lookup. The Data Flow Task has a Lookup Component but you'll be better served, especially for a finite list of values like Countries to push your query into the component.
SELECT T.Country3, T.Country2 FROM dbo.Table T;
Then you drag your SourceCountry column and match to Country3. Check Country2 and for all the rows that match, you'll get the 2 letter abbreviation.
A big disadvantage of trying to use your stored procedure is efficiency. The default Lookup is going to cache all those values. With the Script Version, say you have 10k rows come through, all with CAN. That's 10k invocations of your stored procedure where the results never change.
You do pay a startup cost as the default Lookup mode is Full Cache which means it's going to run your query and keep all those values local. This is great with your data set: 1000 countries max, 5 or 10 byte per row. That's nothing.

Yes, you can. You'll want to use a couple Execute SQL Tasks to do this.
Use an Execute SQL Task to gather a Result Set of Wrong_Country_Codes.
Add a ForEach Container as a successor to the previous Execute SQL Task. Pass the Result Set to this Container.
Inside that ForEach container, you will have another Execute SQL Task that will call your sproc, using each row (e.g. Wrong_Country_Code) as a variable parameter.
That should work. Only select the columns necessary to pass to your stored procedure.
Edit
In acknowledgement to the other answer, performance is going to be an issue. Perhaps rather than have the stored procedure produce an output, alter the sproc to do the updates for you.

Related

Getting a call to a job and a result of a complex select in a single procedure in Microsoft SQL

I have an SSIS job, and a relatively complex select, that use the same data. I have to make it so that my client doesn't have to call them separately, but use one thing to get the result of the select and call the job.
My original plan was to create a procedure, which will take necessary input, and then output a table variable with the select result.
However, after reading the Microsoft documentation, I found out that table variables might not be able to hold a result with more than 100 rows, while I might want to select ~10 000 rows. And now I'm stumped. What is the best way to call a job and select data, from one component?
I have permissions to create views, procedures, and I can edit the SSIS job. The user will provide me with 2 parameters.
This is how I would suggest that you do in this scenario, to take the complexity away from the SSIS.
Create the SP that you wanted to; but instead of Table Variable; push your output into a table. This table can be addded on the fly(dynamically using CREATE TABLE script) or can exist on the DB always available as a buffer.
Call this SP in your control flow.
In the Data flow task, select from this buffer table.
After completing the SSIS work, flush the buffer table, i.e. truncate the table.
Caveat: You may face problem in concurrency scenarios; To eliminate that, you should have a column BatchID or BatchStartTimeStamp which can store a unique value for each run.
You can pass data for BatchID or BatchStartTimeStamp from SSIS package.

Stored procedure to update different columns

I have an API that i'm trying to read that gives me just the updated field. I'm trying to take that and update my tables using a stored procedure. So far the only way I have been able to figure out how to do this is with dynamic SQL but i would prefer to not do that if there is a way not to.
If it was just a couple columns, I'd just write a proc for each but we are talking about 100 fields and any of them could be updated together. One ticket might just need a timestamp updated at this time, but the next ticket might be a timestamp and who modified it while the next one might just be a note.
Everything I've read and have been taught have told me that dynamic SQL is bad and while I'll write it if I have too, I'd prefer to have a proc.
YOU CAN PERHAPS DO SOMETHING LIKE THIS:::
IF EXISTS (SELECT * FROM NEWTABLE NOT IN (SELECT * FROM OLDTABLE))
BEGIN
UPDATE OLDTABLE
SET OLDTABLE.OLDRECORDS = NEWTABLE.NEWRECORDS
WHERE OLDTABLE.PRIMARYKEY= NEWTABLE.PRIMARYKEY
END
The best way to solve your problem is using MERGE:
Performs insert, update, or delete operations on a target table based on the results of a join with a source table. For example, you can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.
As you can see your update could be more complex but more efficient as well. Using MERGE requires some proficiency, but when you start to use it you'll use it with pleasure again and again.
I am not sure how your business logic works that determines what columns are updated at what time. If there are separate business functions that require updating different but consistent columns per function, you will probably want to have individual update statements for each function. This will ensure that each process updates only the columns that it needs to update.
On the other hand, if your API is such that you really don't know ahead of time what needs to be updated, then building a dynamic SQL query is a good idea.
Another option is to build a save proc that sets every user-configurable field. As long as the calling process has all of that data, it can call the save procedure and pass every updateable column. There is no harm in having a UPDATE MyTable SET MyCol = #MyCol with the same values on each side.
Note that even if all of the values are the same, the rowversion (or timestampcolumns) will still be updated, if present.
With our software, the tables that users can edit have a widely varying range of columns. We chose to create a single save procedure for each table that has all of the update-able columns as parameters. The calling processes (our web servers) have all the required columns in memory. They pass all of the columns on every call. This performs fine for our purposes.

Count number of times a procedure is executed

Requirement:
To count the number of times a procedure has executed
From what I understand so far, sys.dm_exec_procedure_stats can be used for approximate count but that's only since the last service restart. I found this link on this website relevant but I need count to be precise and that should not flush off after the service restart.
Can I have some pointers on this, please?
Hack: The procedure I need to keep track of has a select statement so returns some rows that are stored in a permanent table called Results. The simplest solution I can think of is to create a column in Results table to keep track of the procedure execution, select the maximum value from this column before the insert and add one to it to increment the count. This solution seems quite stupid to me as well but the best I could think of.
What I thought is you could create a sequence object, assuming you're on SQL Server 2012 or newer.
CREATE SEQUENCE ProcXXXCounter
AS int
START WITH 1
INCREMENT BY 1 ;
And then in the procedure fetch a value from it:
declare #CallCount int
select #CallCount = NEXT VALUE FOR ProcXXXCounter
There is of course a small overhead with this, but doesn't cause similar blocking issue that could happen with using a table because sequences are handled outside transaction.
Sequence parameters: https://msdn.microsoft.com/en-us/library/ff878091.aspx
The only way I can think of keeping track of number of executions even when the service has restarted , is to have a table in your database and insert a row to that table inside your procedure everytime it is executed.
Maybe add a datetime column as well to collect more info about the execution. And a column for user who executed etc..
This can be done, easily and without Enterprise Edition, by using extended events. The sqlserver.module_end event will fire, set your predicates correctly and use a histogram target.
http://sqlperformance.com/2014/06/extended-events/predicate-order-matters
https://technet.microsoft.com/en-us/library/ff878023(v=sql.110).aspx
To consume the value, query the histogram target (under the reviewing target output examples).

Returning a subset of a stored procedure

I have an application that (unfortunately) contains a lot of its business logic is stored procedures.
Some of these return masses of data. Occassionally the code will need a small amount of the data returned from the stored procedure. To get a single clients name, I need to call a stored procedure that returns 12 tables and 950 rows.
I am not able (due to project politics) to change the existing stored procedures or create a replacement stored procedure - the original massive procedure must be called as that contains the logic to find the correct client. I can create a new procedure as long as it uses the original massive procedure.
Is there anyway I can get SQL server to return only a subset, (a single table, or even better a single row of a single table) of a stored procedure?
I have to support sql server 2000 +
It is not possible to conditionally modify the query behaviour of a procedure whose source code you cannot change.
However, you can create a new procedure that calls the original then trims down the result. A SQL 2000 compatible way of doing this might be:
declare #OriginalResult table (
// manually declare every column that is returned in the original procedure's resultset, with the correct data types, in the correct order
)
insert into #OriginalResult execute OriginalProcedure // procedure parameters go here
select MyColumns from #OriginalResult // your joins, groups, filters etc go here
You could use a temporary table instead of a table variable. The principle is the same.
You will definitely pay a performance penalty for this. However, you will only pay the penalty inside the server, you will not have to send lots of unnecessary data over the network connection to the client.
EDIT - Other suggestions
Ask for permission to factor out the magic find client logic into a separate procedure. You can then write a replacement procedure that follows the "rules" instead of bypassing them.
Ask whether support for SQL 2000 can be dropped. If the answer is yes, then you can write a CLR procedure to consume all 12 resultsets, take only the one you want, and filter it.
Give up and call the original procedure from your client code, but find a way of measuring the performance drop, so that you can exert some influence on the decision-making backed up with hard data.
No, you can't. A stored procedure is a single executable entity.
You have to create a new stored proc (to return what you want) or modify the current one (to branch) if you want to do this: project politics can not change real life
Edit: I didn't tell you this...
For every bit of data you need from the database, call the stored procedure each time and use the bit you want.
Don't "re-use" a call to get more data and cache it. After all, this is surely the intention of your Frankenstein stored procedure to give a consistent contract between client and databases...?
You can try to make SQL CLR stored procedure for handle all tables returned by your stored procdure and
in C# code to find data you need and return what you need. But I think that is just is going to make things more complicated.
When you fill your dataset with sored procedure which return more results sets in data set you get for each
result set one DataTable.

grabbing first result set from a stored proc called from another stored proc

I have a SQL Server 2005 stored proc which returns two result sets which are different in schema.
Another stored proc executes it as an Insert-Exec. However I need to insert the first result set, not the last one. What's a way to do this?
I can create a new stored proc which is a copy of the first one which returns just the result set I want but I wanted to know if I can use the existing one which returns two.
Actually, INSERT..EXEC will try to insert BOTH datasets into the table. If the column counts match and the datatype can be implicitly converted, then you will actually get both.
Otherwise, it will always fail because there is no way to only get one of the resultsets.
The solution to this problem is to extract the functionality that you want from the called procedure and incorporate it into the (formerly) calling procedure. And remind yourself while doing it that "SQL is not like client code: redundant code is more acceptable than redundant data".
In case this was not clear above, let me delineate the facts and options available to anyone in this situation:
1) If the two result sets returned are compatible, then you can get both in the same table with the INSERT and try to remove the ones that you do not want.
2) If the two result sets are incompatible then INSERT..EXEC cannot be made to work.
3) You can copy the code out of the called procedure and re-use it in the caller, and deal with the cost of dual-editing maintenance.
4) You can change the called procedure to work more compatibly with your other procedures.
Thats it. Those are your choices in T-SQL for this situation. There are some additional tricks that you can play with SQLCLR or client code but they will involve going about this a little bit differently.
Is there a compelling reason why you can't just have that first sproc return only one result set? As a rule, you should probably avoid having one sproc do both an INSERT and a SELECT (the exception is if the SELECT is to get the newly created row's identity).
Oo to prevent code from getting out of synch between the two processes, why not write a proc that does what you want to for the insert, call that in your process and have the orginal proc call that to get the first recordset and then do whatever else it needs to do.
Depending on how you get to this select, it is possible it might be refactored to a table-valued function instead of a proc that both processes would call.

Resources