I have an SSIS package with a data flow task. The OLE DB source has an execute proc statement. It fails while saving with below error message.
an OLEDB record is available... The metadata could not be determined because the statement 'select appname....' in procedure is not compatible with the statement 'select appid....' in procedure
This proc has several select statements and returns the appropriate result set as per parameters passed. Any pointers to bypass this error?
So you're saying that the SP will return different meta data depending on the parameter passed? SSIS doesn't like this - it can't update the meta data dynamically at run time. i.e. if you create a package that splits or sorts on a certain column, then you run the SP and it doesn't return that column, or the same column is a different data type, what should SSIS do? It can't automatically work it out.
I suggest you create a data source for each possibility of result set returned and conditionally execute each on as required.
In short SP's returning optionally different datasets is often not a good idea, definitely not from an ETL perspective.
Here is some code that shows how to create dynamically built output, (you could use the same method with just one output), but you'll still face the same problems downstream.
http://www.codeproject.com/Articles/32151/How-to-Use-a-Multi-Result-Set-Stored-Procedure-in
I ran into this issue as well. In my case, the result returned looked identical no matter which branch was executed, the difference was just in how that result was obtained (including different source tables). I simply executed all the cases with a union, and each "where" clause included the conditions for its execution instead of using "if" logic to choose a query.
Related
I want to get list of Objects referenced in Snowflake Procedure , let us say It is using Tables, Views Inside it , I want to find those items from Procedure definition , as currently there is no function in Snowflake that can provide this information.
GET_OBJECT_REFERENCES https://docs.snowflake.com/en/sql-reference/functions/get_object_references.html is function now only available for Views and not for Procedure.
Any pointers in scanning the definition of Procedure and figure out objects in it.
As Felipe pointed out, you can pass the name of a table or view as a parameter into the stored procedure. In that case there's no way to know what objects the SP will reference.
If your organization tends not to do that; if your SQL in stored procedures tends to be more along the lines of "select * from my_table" you can simply search for those references in the stored procedure code.
The following statement is crude, but effective. It could be developed and polished a lot, and it could miss references. It also only finds the first match, while a more useful query would return all and flatten out the array. I may have time to work on that a bit. It did find a lot in my test. It simply looks for the following pattern:
SQL Command ... Matching Clause for that Command ... Semicolon
The reason this works is that even if you don't terminate the SQL in the stored procedure, the JavaScript line should be terminated with a semicolon. JavaScript is comparatively forgiving of missing semicolons, but it should hit one eventually and match the SQL statement.
select PROCEDURE_CATALOG
,PROCEDURE_SCHEMA
,PROCEDURE_NAME
,ARGUMENT_SIGNATURE
,regexp_substr(PROCEDURE_DEFINITION, 'SELECT\\s.*FROM.*;|INSERT\\s.*INTO.*;|UPDATE\\s.*SET.*;|MERGE\\s.*INTO.*;|DELETE\\s.*FROM.*;|MERGE\\s.*USING.*;',1, 1, 'ims') STATEMENT
from MY_DATABASE.INFORMATION_SCHEMA.PROCEDURES
where STATEMENT is not null;
I write a lot of stored procedures, and to Felipe's point this returns a lot of rows like this for me:
select ${params.leftColumnList} from ${params.leftObject} order by ${leftTimestamp};`);
In those cases, you'd need to have someone who can read code figure out what it's referencing. In this case, the SP accepts parameters for those fields, so they could be any tables.
Having seen other questions with answers that don't totally address what I am after, I am wondering how in SSIS to use an OLE DB Command transformation to do an Insert and immediately get the resulting primary key for each row inserted as a new column, all within the same Data Flow Task. That sounds like it should be a common, built-in, fairly simple thing to ask for in SSIS, right?
So the obvious first choice for me would be to use an OLE DB Command where I do a SELECT and include an OUTPUT clause in my command:
INSERT INTO dbo.MyReleaseTable(releaseDate)
OUTPUT ?=Inserted.id
VALUES (?)
Only I can't figure out how to do this in an OLE DB Command (with an output) and it not complain. I've read about using stored procedures to do this, so am I required to use a stored procedure if I want to do this?
Let's say this won't work. I could use a Script Transformation and execute direct SQL in that, right? Well if that's what I must do, then the line between using custom code and SSIS block-components gets blurred and I am tempted to throw SSIS away and just do the whole ETL in code.
Then I hear talk about using an Execute SQL task. So now I can't even do 1 data flow within 1 data flow task? Am I getting that right? I'd like to keep 1 single data flow contained within 1 data flow task and not have to break my 1 flow out between separate tasks.
If it turns out that this seemingly simple data flow objective is not built into SSIS then I will consider dumping SSIS altogether. Talend has a free ETL offering, don't they?
Well, this can be done with SSIS inside DataFlow, but with some tricks. You need to create a stored procedure with input and output parameters and reuse it in DataFlow, as described here, fetching result value.
Drawbacks of this approach:
You need to create a Stored Procedure
Each row is processed with SP, which causes implicit transactions, instead of batch processing. This can slow down your package.
Solution without performance penalty - do it in two DataFlows, first doing value insert into some temp table, and the second DF - doing SQL MERGE command at OLE DB source and handling output data as you wish. All this inside transaction, handled either by MSDTC or by your own.
In my Informatica mapping, when an SP is called via an unconnected stored procedure transformation, the workflow succeeds. However there is a divide-by-zero error in the SP and ideally the workflow should fail. The source and target used in this mapping are dummy flat-files.
However, when I use dummy tables instead for source and target, with a connected SP transformation this time, the error bubbles up successfully.
Any idea why this would happen? Why does the error show only with source table and connected SP transformation, and not with flat-file source unconnected SP transformation?
I've been dealing with it. One thing you can try is using SET NOCOUNT ON in Stored Procedure. Otherwise the x rows affected message can be treated by PowerCenter as the indicator that Stored Procedure has been successfully executed. No matter what else would be returned.
The other thing is it depends on whether you're using Native or ODBC Connector - with the latter the errors are not properly escalated. You can read more about my tials here: http://powercenternotes.blogspot.com/2014/09/ms-sql-server-stored-procedure-error.html
These components have the ability to retrieve multiple result sets (e.g. from a stored proc) in one go, and using D5 I can successfully use NextRecordSet to get at the second and subsequent ones from a Sql Server.
However, this only works if I specify the cursor location as clClient; if I use clServer, I get a "Does not return multiple result sets" error. Is this an inherent limitation (e.g imposed by the MDac layer on the client) or can multiple recordsets be successfully retrieved from a server-side cursor?
It is an inherent limitation to server side cursors. As stated in following MSDN link:
Server cursors cannot be used with statements that generate more than
one recordset.
This restriction applies to all statements described in Generating
Multiple Recordsets. For more information, see Generating Multiple
Recordsets. If a server cursor is used with any statement that
generates multiple recordsets, an application can return one of the
following errors:
Cannot open a cursor on a stored procedure that has anything other than a single SELECT statement in it.
sp_cursoropen. The statement parameter can only be a single SELECT statement or stored procedure.
I'm trying to build a designer in .NET, and would like to be able to retrieve the columns and column types of the output from a stored procedure without calling it so the designer can be used to map the output. Is this possible? I'm even willing to use an unmanaged API if necessary.
I believe what you're looking for is SET FMTONLY (Documentation).
This allows you to execute a SP (or select statement) and see what columns would be returned without actually executing the query.
This isn't possible, in general, because even a single stored procedure can return different result sets, with different columns or column data types.
In extreme cases even the number of returned result sets may depend on parameters and when the stored procedure uses dynamic sql its definitely impossible.