AWS DataPipeline insert status with SQLActivity - sql-server

I am looking for a way to record the status of the pipeline in a DB table. Assuming this is a very common use case.
Is there any way where I can record
status and time of completion of the complete pipeline.
status and time of completion of selected individual activities.
the ID of individual runs/execution.
The only way I found was using SQLActivity that is dependent on an individual activity but even there I cannot access the status or timestamp of the parent/node.
I am using a jdbc connection to connect to a remote SQLServer. And the pipeline is for coping S3 files into the SQLServer DB.

Hmmm... I haven't tried this but I can hit you with some pointers to possibly achieve the desired results. However, you will have to do research & figure out actual implementation.
Option 1
Create a ShellCommandActivity, which has depends on set to last activity in your pipeline. Your shell will use aws-cli to list-runs details of the current run, you can use filters to achieve this.
Use Staging Data to move output of previous ShellActivity to SQLActivity to eventually insert into the destination SQLServer.
Option 2
Use AWS lambda to run aws-cli data-pipeline list-runs periodically, with filters, & update the destination table with latest activities. Resource

Related

snowflake session_id on query_history, can it be reinitialized to have new id per application execution

I want to use snowflake query_history's sessionId to find all the queries executed in one session. It works fine on the snowflake end when I have different worksheets which create different sessions. But from other tools (it looks to be using the same connection pool until it recreates the connection), it creates the same session id for multiple jobs on the snowflake side query_history. Is there a way to have sessionID created on every execution? I am using the control-m scheduling/job automation tool to execute multiple jobs which execute different snowflake stored procs. I want to see if I can get different sessionID for each execution of the procedure on the snowflake query_history table.
Thanks
Djay
You can change the "idle session timeout". See documentation here
You can set it to as low as 5, which means any queries that are at least 5 minutes apart will need to reauthenticate and will have a new session.
CREATE [OR REPLACE] SESSION POLICY DO_NOT_IDLE SESSION_IDLE_TIMEOUT_MINS = 0
Though I believe this will affect any applications that use your account, and will make your applications need to reauthenticate every time the session expires.
Another option, if you need a window smaller than 5 minutes, is to get the sessionid and explicitely run an ABORT_SESSION after your query has finished.
which would look something like this
SEELCT SYSTEM$ABORT_SESSION(CURRENT_SESSION())

Liquibase stuck on Computed checksum

Running Liquibase with Java/Spring against a Snowflake database. The first deployment works fine, I let Liquibase create the DatabaseChangeLogTable and the DatabaseChangeLogLockTable. They get created, and written to and the database objects are created.
The second time I try to run it, it will acquire the change log lock, but then sit for a long time at liquibase.util : Computed checksum for xxxx. Then timeout after 5 minutes (due to other config settings). If I drop the DatabaseChangeLogTable and DatabaseChangeLogLockTable (interactively), and update the lock status to false, it works fine again. Any ideas on why it can't seem to finish when the DatabaseChangeLogTable and DatabaseChangeLogLockTable are already there? When I log into the database using the same credentials that Liquibase is using, I can select and update those tables just fine.
Could you try using clearChecksums?
clearCheckSums clears all checksums and nullifies the MD5SUM column of the DATABASECHANGELOG table so they will be re-computed on the next database update.Changesets that have been deployed will have their checksums re-computed, and pending changesets will be deployed. For more details about this approach, please visit this link

How can I get Azure to notify me when a db Copy operation is complete?

Our deployment process involves two db copy procedures, one where we copy the production db to our rc site for rc testing, and then one where we copy the production db to our staging deployment slot for rollback purposes. Both of these can take as long as ten minutes, even though our db is very small. Ah, well.
What I'd like to do is have a way to get notified when a db Copy operation is done. Ideally, I could link this to an SMS alert or email.
I know that Azure has a big Push Notification subsystem but I'm not sure if it can hook the completion of an arbitrary db copy, and if there's a lighterweight solution.
There are some information about copy database in this page, http://msdn.microsoft.com/en-us/library/azure/ff951631.aspx. If you you are using T-SQL you can check the copy process through the query likes SELECT name, state, state_desc FROM sys.databases WHERE name = 'DEST_DB'. So you can keep running this query and send SMS when it shows finished.

Using a scalar as a condition in an OLE DB data flow

I am having a problem with a data flow task in an ssis package i am trying to build. The objective of the package is to update tables situated in our local server using a connection to a distant server containing the source of the data, through a vpn connection.
There are no problems for tables which are re-downloaded entirely.
But some of the tables must be updated for real. What I mean is they're not re-downloaded. For each of those tables, I have to check the maximum value of the date column in our local server (int YYYMMDD type) and ask the package to download only the data added after that date.
I thought about using a scalar (#MAXDATE for ex) but the issue is, I have to declare this scalar in a session with our local server, and I cannot use it as a condition in an OLE DB Source task, because the latter implies a new session, this time with the distant server.
I can only view the database on the distant server and import it. So no way to create a table on it.
I hope it is clear enough. Would you have any tips to solve this problem?
Thank you in advance.
Ozgur
You can do this easily by using an execute SQL Task, a Data Flow task and one variable. I would probably add some error checking just in case no value is found on the local system, but that depends very much on what might go wrong.
Assuming VS2008
Declare a package level variable of type datetime. Give it an appropriate default value.
Create an Execute SQL Task with a query that returns the appropriate date value. On the first page of the properties window, make sure the Result Set is set to "Single Row." On the Result Set page, map the date column to the package variable.
Create a Data Flow task. In the OLE DB Data Source, write your query to include a question mark for the incoming date value. "and MaxDate>?". Now when you click on the Paramaters button, you should get a pop-up that allows you to map "Parameter0" to your package level variable.

How do I configure cocoon to use a database as a store for quartz jobs and triggers

I'm using Cocoon and want to store the jobs and triggers for the quartz scheduler in the database so they are persisted. I can see where I need to make the change in cocoon.xconf but I can't find much on how to configure the datasource etc.
How do I configure this to use our existing (postgres) database?
You need to do 2 things:
Add the following configuration to quartz.properties with appropriate values substituted for the $ placeholders
org.quartz.jobStore.dataSource=myDS
org.quartz.dataSource.myDS.URL=$URL
org.quartz.dataSource.myDS.driver=$driver
org.quartz.dataSource.myDS.maxConnections=5
org.quartz.dataSource.myDS.password=$password
org.quartz.dataSource.myDS.user=$user
org.quartz.dataSource.myDS.validationQuery=$any query that doesn't return an error when properly connected
org.quartz.jobStore.tablePrefix=QREPL_
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
Create the database tables in which Quartz stores the job data - you should find a DDL script included in the Quartz distribution that will create them for you. Each of the Quartz table names should begins with the same prefix. In the configuration above, I've assumed this prefix is "QREPL_"
Hope this helps,
Don

Resources