How to specify snowflake session parameters in Talend - snowflake-cloud-data-platform

I am using Talend to load data from Oracle to Snowflake. I am able to set up the load pipeline, but I wanted to set the query tag as part of the load pipeline so that I can do some analysis based on the tag. However, I could not find any way to specify the query tag along with query statements (ALTER SESSION SET QUERY_TAG='TALENDLOAD') in the load pipeline.
Is it that Talend does not allow to set the session parameters?

You need to first run ALTER SESSION SET MULTI_STATEMENT_COUNT=0; as the default value is 1, which allows only one statement in JDBC and ODBC connectors info here
Then you may pass ALTER SESSION SET QUERY_TAG='TALENDLOAD' along with other query statements.

Related

Snowflake performance test

I have an ETL pipeline which is scheduled in Airflow , the Airflow DAG calls the snowflake stored procedure .
The stored procedure reads the data from a view and writes into the table by performing 'Merge'
I am doing some changes in the pipeline by rewriting query in the view .
specifically , removing the filter in the view and applying in the stored proc .
How can I test this by not using any cache in snowflake .
I have tested with separate warehouses
I have tested with ALTER SESSION SET USE_CACHED_RESULT=FALSE;
I have checked the view's query plans.
I have tested the new pipeline via airflow DAG in non prod environment , but I am not able to fetch the query id of this pipeline in 'query history' table . to check the query plan .
How can I get the query ID of the non prod pipeline ?
and any suggestion's on the easy way of test ?
While running this kind of tests you can set USE_CACHED_RESULT=FALSE on the user's level. Then you do not need to worry about setting it for the session.
ALTER USER <your_test_user> SET USE_CACHED_RESULT=FALSE;
The queries generated by external apps might not appear in your query history if the filter is disabled. You can enable it. Go to Activity - Query History - Filters - Client generated statements. Plus make sure that if you use a different usernames for AirFlow and for UI, the respective filter is also set accordingly. See screenshot:

Setting Multi Statement as 0 for Snowflake in Talend tdbconnection

I am trying to set MULTI_STATEMENT_COUNT=0 in Talend when making tdbconnection.
I had to add a seperate tSnowflakeRow right now to set this parameter using ALTER SESSION.
Is there any way to do that during making connection when using tDBConnection?
A session parameter can be set at user level (CREATE USER).
So this parameter will be set as soon as you initiate your connection.
But as indicated by #hkandpal and noticed in Snowflake documentation you should be cautious with MULTI_STATEMENT_COUNT parameter as it opens up the possibility for SQL injection.

AWS DataPipeline insert status with SQLActivity

I am looking for a way to record the status of the pipeline in a DB table. Assuming this is a very common use case.
Is there any way where I can record
status and time of completion of the complete pipeline.
status and time of completion of selected individual activities.
the ID of individual runs/execution.
The only way I found was using SQLActivity that is dependent on an individual activity but even there I cannot access the status or timestamp of the parent/node.
I am using a jdbc connection to connect to a remote SQLServer. And the pipeline is for coping S3 files into the SQLServer DB.
Hmmm... I haven't tried this but I can hit you with some pointers to possibly achieve the desired results. However, you will have to do research & figure out actual implementation.
Option 1
Create a ShellCommandActivity, which has depends on set to last activity in your pipeline. Your shell will use aws-cli to list-runs details of the current run, you can use filters to achieve this.
Use Staging Data to move output of previous ShellActivity to SQLActivity to eventually insert into the destination SQLServer.
Option 2
Use AWS lambda to run aws-cli data-pipeline list-runs periodically, with filters, & update the destination table with latest activities. Resource

How to set Timeout for a specific sql query?

I want to set Timeout value for a specific sql query which will execute inside a stored procedure.
Is it possible to set the timeout value for a particular query?
It is the client API rather than SQL Server that enforces query timeouts (RPC or batch). Consequently, you can't set a client command timeout at a more granular level when a stored procedure contains multiple statements. You'll need to split the proc such that the desired query is executed separately by the client, and specify a different timeout for that command.
The specifics of how to set the timeout vary depending on the client API. In the case of .NET, it is the SqlCommand.CommandTimeout property.

Why must QUOTED_IDENTIFIER be on for the whole db if you have an indexed view?

Yesterday I added some indexes on a view in my MSSQL 2008 db. After that it seems like all the store procedures need to run with QUOTED_IDENTIFIER set to ON, even those that don't use the view in question.
Why is it so? Is this something I can configure on the db or do I have to update all my stored procedures to set the QUOTED_IDENTIFIER to ON? I think it is rather weird that this is required for the stored procedures not using the view.
Do these stored procedures relate to the base table(s) that the view is based upon? To quote from Creating Indexed Views:
After the clustered index is created,
any connection that tries to modify
the base data for the view must also
have the same option settings required
to create the index. SQL Server
generates an error and rolls back any
INSERT, UPDATE, or DELETE statement
that will affect the result set of the
view if the connection executing the
statement does not have the correct
option settings. For more information,
see SET Options That Affect Results.
And it's kind of obvious, when you think about it - you're potentially going to be updating the contents of the view whenever you touch these base tables, and so you inherit the same responsibilities as when you created the index.
You can set the defaults at multiple levels:
Any application can explicitly override any default settings by executing a SET statement after it has connected to a server. The SET statement overrides all previous settings and can be used to turn options on and off dynamically as the application runs. The option settings are applicable only to the current connection session.
OLE DB and ODBC applications can specify the option settings that are in effect at connection time by specifying option settings in connection strings. The option settings are applicable only to the current connection session.
SET options specified for a SQL Server ODBC data source by using the ODBC application in Control Panel or the ODBC SQLConfigDataSource function.
Default settings for a database. You can specify these values by using ALTER DATABASE or the Object Explorer in SQL Server Management Studio.
Default settings for a server. You can specify these values by using either sp_configure or Object Explorer in SQL Server Management Studio to set the server configuration option named user options.

Resources