SSIS 2012 - Package Task Run Time Status - sql-server

In Table [catalog].[execution_component_phases] there is an column called as
Phase. The Value of the Phase Column are:
PreExecute
Validate
ProcessInput
ReleaseConnection
AcquireConnection.
Can someone please suggest which value is saying specific Task in a
package is in Running Status.
Is there any value which states task is started but not completed yet.
Regards

Based on the official documentation of [catalog].[execution_component_phases] table:
Displays the time spent by a data flow component in each execution phase.
This view displays a row for each execution phase of a data flow component, such as Validate, Pre-Execute, Post-Execute, PrimeOutput, and ProcessInput. Each row displays the start and end time for a specific execution phase.
Based on my experience i can assume that the Order of the execution phases is:
AcquireConnection : Acquiring the related connections required
Validate : Validating the Task/Component
Pre-Execute
ProcessInput : Processing phase
PrimeOutput : Generating outputs
Post-Execute
ReleaseConnection : Release acquired connections
In the official documentation they provided the following query to read the time spent in each phase:
use SSISDB
select package_name, task_name, subcomponent_name, execution_path,
SUM(DATEDIFF(ms,start_time,end_time)) as active_time,
DATEDIFF(ms,min(start_time), max(end_time)) as total_time
from catalog.execution_component_phases
where execution_id = 1841
group by package_name, task_name, subcomponent_name, execution_path
order by package_name, task_name, subcomponent_name, execution_path
Based on the information above, you can - as example - check whether the current task phase to know if it is still running or not.
References
catalog.execution_component_phases

Related

Lightning Experience Specialist - Step 6 - Unable to complete

I am stucked in this challenge and not sure why is it not completing. Please have a look at below details.
Error Message -
Challenge Not yet complete... here's what's wrong: The Fulfillment Cancellation Automation process does not appear to be working properly. Make sure that a cancelled Fulfillment updates the Adventure Package correctly.
My Process builder is as follows:
Object: Fulfillment
Entry Criteria: [Fulfillment__c].Status__c = Cancelled AND [Fulfillment__c].Schedule_Date__c > TODAY()
Immediate Actions:
Based on [Fulfillment__c].Opportunity.OpportunityLineItems
Field Update Filter condition :
Line Item ID equals Formula [FullFillment__c].AdventurePackageId__c
Field to Update :
Sales Price equal to [Fulfillment__c].Deposit__c
I did some finding on web and have changed the below things as well but not working for me.
The Explorer__c field was set to "Required" and "What to do if the lookup record is deleted?" was set to "Don't allow deletion of the lookup record that's part of a lookup relationship.".
I updated the "Required" to false and changed "What to do if the lookup record is deleted?" to "Clear the value of this field. You can't choose this option if you make this field required."
I have unrequired the Explorer__c field on the layout too.
After all the above changes, I am still not able to complete the challenge.
Any help will be really appreciated.
Thanks in advance.
I'm getting this as well, and I think there may well be a bug in their test.
I've manually tested the processes, and it works as described. The Sales Price on the Adventure Package gets updated to the Fulfillment's Deposit amount.
Looking in the debug logs, the query clearly selects 1 record (which is what we'd expect) into a List called fullfillmentList before the code immediately fails an assertion with the message Fulfillment list is empty.
this error is showing because, you might have deactivated the previous process flow i.e Fulfillment Creation, which also should be active for completion of this step in the superbadge

Asynchronous cursor execution in Snowflake

(Submitting on behalf of a Snowflake user)
At the time of query execution on Snowflake, I need its query id. So I am using following code snippet:
cursor.execute(query, _no_results=True)
query_id = cursor.sfqid
cursor.query_result(query_id)
This code snippet working fine for small running queries. But for query which takes more than 40-45 seconds to execute, query_result function fails with KeyError u'rowtype'.
Stack trace:
File "snowflake/connector/cursor.py", line 631, in query_result
self._init_result_and_meta(data, _use_ijson)
File "snowflake/connector/cursor.py", line 591, in _init_result_and_meta
for column in data[u'rowtype']:
KeyError: u'rowtype'
Why would this error occur? How to solve this problem?
Any recommendations? Thanks!
The Snowflake Python Connector allows for async SQL execution by using ​cur.execute(sql, _no_results=True)​
This ​"fire and forget"​ style of SQL execution allows for the parent process to continue without waiting for the SQL command to complete (think long-running SQL that may time-out).
If this is used, many developers will write code that captures the unique Snowflake Query ID (like you have in your code) and then use that Query ID to ​"check back on the query status later"​, in some sort of looping process. When you check back to get the status, you can then get the results from that query_id using the result_scan( ) function.
https://docs.snowflake.net/manuals/sql-reference/functions/result_scan.html
I hope this helps...Rich

Power BI SFDC Object Connection - Task - Navigation Query Error

I am hoping to get some help with the following error when I try to connect to the Task object in Salesforce. It is showing at the Navigation query (Task1 below) and I am unclear as to the nature of the error or resolution.
DataSource.Error: exceeded 100000 distinct who/what's
Details:
List
Query:
let
Source = Salesforce.Data(),
Task1 = Source{[Name="Task"]}[Data]
in
Task1
The task volume in our SFDC instance is excessive, though I have previously been applying a date filter after the Navigation query (Task1 above) which is where I am currently experiencing the error, thus precluding me from using a date filter as I had been with other objects.
Thanks,
Rich
It is a permission problem/limitation. You can find the solution here.

How do I check the number of tasks currently in the queue?

According to the Push queue in GAE there are a number of task request headers.
X-AppEngine-QueueName, the name of the queue (possibly default)
X-AppEngine-TaskName, the name of the task, or a system-generated unique ID if no name was specified
X-AppEngine-TaskRetryCount, the number of times this task has been retried; for the first attempt, this value is 0. This number includes attempts where the task failed due to a lack of available instances and never reached the execution phase.
X-AppEngine-TaskExecutionCount, the number of times this task has previously failed during the execution phase. This number does not include failures due to a lack of available instances.
X-AppEngine-TaskETA, the target execution time of the task, specified in milliseconds since January 1st 1970.
Is there a way to check how many tasks are already enqueued?
Not from the headers, no. But you can use the QueueStatistics class to query that information.

How to debug long-running mapreduce jobs with millions of writes?

I am using the simple control.start_map() function of the appengine-mapreduce library to start a mapreduce job. This job successfully completes and shows ~43M mapper-calls on the resulting /mapreduce/detail?mapreduce_id=<my_id> page. However, this page makes no mention of the reduce step or any of the underlying appengine-pipeline processes that I believe are still running. Is there some way to return the pipeline ID that this calls makes so I can look at the underlying pipelines to help debug this long-running job? I would like to retrieve enough information to pull up this page: /mapreduce/pipeline/status?root=<guid>
Here is an example of the code I am using to start up my mapreduce job originally:
from third_party.mapreduce import control
mapreduce_id = control.start_map(
name="Backfill",
handler_spec="mark_tos_accepted",
reader_spec=(
"third_party.mapreduce.input_readers.DatastoreInputReader"),
mapper_parameters={
"input_reader": {
"entity_kind": "ModelX"
},
},
shard_count=64,
queue_name="backfill-mapreduce-queue",
)
Here is the mapping function:
# This is where we keep our copy of appengine-mapreduce
from third_party.mapreduce import operation as op
def mark_tos_accepted(modelx):
# Skip users who have already been marked
if (not modelx
or modelx.tos_accepted == myglobals.LAST_MATERIAL_CHANGE_TO_TOS):
return
modelx.tos_accepted = user_models.LAST_MATERIAL_CHANGE_TO_TOS
yield op.db.Put(modelx)
Here are the relevant portions of the ModelX:
class BackupModel(db.Model):
backup_timestamp = db.DateTimeProperty(indexed=True, auto_now=True)
class ModelX(BackupModel):
tos_accepted = db.IntegerProperty(indexed=False, default=0)
For more context, I am trying to debug a problem I am seeing with writes showing up in our data warehouse.
On 3/23/2013, we launched a MapReduce job (let's call it A) over a db.Model (let's call it ModelX) with ~43M entities. 7 hours later, the job "finished" and the /mapreduce/detail page showed that we had successfully mapped over all of the entities, as shown below.
mapper-calls: 43613334 (1747.47/sec avg.)
On 3/31/2013, we launched another MapReduce job (let's call it B) over ModelX. 12 hours later, the job finished with status Success and the /mapreduce/detail page showed that we had successfully mapped over all of the entities, as shown below.
mapper-calls: 43803632 (964.24/sec avg.)
I know that MR job A wrote to all ModelX entities, since we introduced a new property that none of the entities contained before. The ModelX contains an auto_add property like so.
backup_timestamp = ndb.DateTimeProperty(indexed=True, auto_now=True)
Our data warehousing process runs a query over ModelX to find those entities that changed on a certain day and then downloads those entities and stores them in a separate (AWS) database so that we can run analysis over them. An example of this query is:
db.GqlQuery('select * from ModelX where backup_timestamp >= DATETIME(2013, 4, 10, 0, 0, 0) and backup_timestamp < DATETIME(2013, 4, 11, 0, 0, 0) order by backup_timestamp')
I would expect that our data warehouse would have ~43M entities on each of the days that the MR jobs completed, but it is actually more like ~3M, with each subsequent day showing an increase, as shown in this progression:
3/16/13 230751
3/17/13 193316
3/18/13 344114
3/19/13 437790
3/20/13 443850
3/21/13 640560
3/22/13 612143
3/23/13 547817
3/24/13 2317784 // Why isn't this ~43M ?
3/25/13 3701792 // Why didn't this go down to ~500K again?
3/26/13 4166678
3/27/13 3513732
3/28/13 3652571
This makes me think that although the op.db.Put() calls issued by the mapreduce job are still running in some pipeline or queue and causing this trickle effect.
Furthermore, if I query for entities with an old backup_timestamp, I can go back pretty far and still get plenty of entities, but I would expect all of these queries to return 0:
In [4]: ModelX.all().filter('backup_timestamp <', 'DATETIME(2013,2,23,1,1,1)').count()
Out[4]: 1000L
In [5]: ModelX.all().filter('backup_timestamp <', 'DATETIME(2013,1,23,1,1,1)').count()
Out[5]: 1000L
In [6]: ModelX.all().filter('backup_timestamp <', 'DATETIME(2012,1,23,1,1,1)').count()
Out[6]: 1000L
However, there is this strange behavior where the query returns entities that it should not:
In [8]: old = ModelX.all().filter('backup_timestamp <', 'DATETIME(2012,1,1,1,1,1)')
In [9]: paste
for o in old[1:100]:
print o.backup_timestamp
## -- End pasted text --
2013-03-22 22:56:03.877840
2013-03-22 22:56:18.149020
2013-03-22 22:56:19.288400
2013-03-22 22:56:31.412290
2013-03-22 22:58:37.710790
2013-03-22 22:59:14.144200
2013-03-22 22:59:41.396550
2013-03-22 22:59:46.482890
2013-03-22 22:59:46.703210
2013-03-22 22:59:57.525220
2013-03-22 23:00:03.864200
2013-03-22 23:00:18.040840
2013-03-22 23:00:39.636020
Which makes me think that the index is just taking a long time to be updated.
I have also graphed the number of entities that our data warehousing downloads and am noticing some cliff-like drops that makes me think that there is some behind-the-scenes throttling going on somewhere that I cannot see with any of the diagnostic tools exposed on the appengine dashboard. For example, this graph shows a fairly large spike on 3/23, when we started the mapreduce job, but then a dramatic fall shortly thereafter.
This graph shows the count of entities returned by the BackupTimestamp GqlQuery for each 10-minute interval for each day. Note that the purple line shows a huge spike as the MapReduce job spins up, and then a dramatic fall ~1hr later as the throttling kicks in. This graph also shows that there seems to be some time-based throttling going on.
I don't think you'll have any reducer functions there, because all you've done is start a mapper. To do a complete mapreduce, you have to explicitly instantiate a MapReducePipeline and call start on it. As a bonus, that answers your question, as it returns the pipeline ID which you can then use in the status URL.
Just trying to understand the specific problem. Is it that you are expecting a bigger number of entities in your AWS database? I would suspect that the problem lies with the process that downloads your old ModelX entities into an AWS database, that it's somehow not catching all the updated entities.
Is the AWS-downloading process modifying ModelX in any way? If not, then why would you be surprised at finding entities with an old modified time stamp? modified would only be updated on writes, not on read operations.
Kind of unrelated - with respect to throttling I've usually found a throttled task queue to be the problem, so maybe check how old your tasks in there are or if your app is being throttled due to a large amount of errors incurred somewhere else.
control.start_map doesn't use pipeline and has no shuffle/reduce step. When the mapreduce status page shows its finished, all mapreduce related taskqueue tasks should have finished. You can examine your queue or even pause it.
I suspect there are problems related to old indexes for the old Model or to eventual consistency. To debug MR, it is useful to filter your warnings/errors log and search by the mr id. To help with your particular case, it might be useful to see your Map handler.

Resources