How to resolve a Salesforce flow error where Flow calls an apex class - salesforce

I have a flow which calls an apex class and here is execution logs error,how to resolve this error
Error Occurred During Flow "ServiceAppointment_API": An Apex error occurred: System.AsyncException:
Warning: Approaching hourly email limit for this flow.
Each flow in your organization is limited to 100 error emails per hour. Since 12:00 PM, Salesforce has sent 99 error emails for ServiceAppointment_API flow. After 100 emails, Salesforce suppresses all error emails for this flow. The limit resets at 1:00 PM.
Error element myWaitEvent_myWait_myRule_3_event_0_SA1 (FlowActionCall).
An Apex error occurred: System.AsyncException: You've exceeded the limit of 100 jobs in the flex queue for org 00D36000000rhGR.
Wait for some of your batch jobs to finish before adding more. To monitor and reorder jobs, use the Apex Flex Queue page in Setup.
Your helps is appreciated
Regards,
Carolyn

The apex class schedules some asynchronous (background) processing. Could be a batch job, could be method annotated with #future or something called Queue able. You can have up to 100 of these submitted and up to 5 active (running) at given time.
It's hard to say how to fix without seeing the code.
Maybe the #future isnt needed, the developer meant well but something went wrong. Maybe it has to be async but could be done with time-based workflow or scheduled action in process builder?
Maybe it's legit your bug, that code could be rewritten to work faster or process more than 1 record at a time.
Maybe it's not your fault, maybe there's a managed package that scheduled lots of jobs and next ones fail to submit.
Maybe you'll need to consider a rewrite that detaches it but more. Say instead of almost instant processing - have code that runs every 5 minutes, checks if there's something recently changed that needs processing and does it.

Related

AWS SageMaker Training Job not saving model output

I'm running a training job on SageMaker. The job doesn't fully complete and hits the MaxRuntimeInSeconds stopping condition. When the job is stopping, documentation says the artifact will still be saved. I've attached the status progression of my training job below. It looks like the training job finished correctly. However the output S3 folder is empty. Any ideas on what is going wrong here? The training data is located in the same bucket so it should have everything it needs.
From the status progression, it seems that the training image download completed at 15:33 UTC and by that time the stopping condition was initiated based on the MaxRuntimeInSeconds parameter that you have specified. From then, it takes 2 mins (15:33 to 15:35) to save any available model artifact but in your case, the training process did not happen at all. The only thing that was done was downloading the pre-built image(containing the ML algorithm). Please refer the following lines from the documentation which says model being saved is subject to the state the training process is in. May be you can try to increase the MaxRuntimeInSeconds and run the job again. Also, please check MaxWaitTimeInSeconds value that you have set if you have.It must be equal to or greater than MaxRuntimeInSeconds.
Please find the excerpts from AWS documentation :
"The training algorithms provided by Amazon SageMaker automatically
save the intermediate results of a model training job when possible.
This attempt to save artifacts is only a best effort case as model
might not be in a state from which it can be saved. For example, if
training has just started, the model might not be ready to save."
If MaxRuntimeInSeconds is exceeded then model upload is only best-effort and really depends on whether the algorithm saved any state to /opt/ml/model at all prior to being terminated.
The two minute wait period between 15:33 to 15:35 in the Stopping stage signifies the max time between a SIGTERM and a SIGKILL signal sent to your algorithm (see SageMaker doc for more detail). If your algorithm traps the SIGTERM it is supposed to use that as a signal to gracefully save its work and shutdown before the SageMaker platform kills it forcibly with a SIGKILL signal 2 minutes later.
Given that the wait period in the Stopping step is exactly 2 minutes as well as the fact Uploading step started at 15:35 and completed almost immediately at 15:35 it's likely that your algo did not take advantage of the SIGTERM warning and that there was nothing saved to /opt/ml/model. To give you a definitive answer as to whether this was indeed the case please create a SageMaker forum post and the SageMaker team can private-message you to gather details of your job.

NDB query().iter() of 1000<n<1500 entities is wigging out

I have a script that, using Remote API, iterates through all entities for a few models. Let's say two models, called FooModel with about 200 entities, and BarModel with about 1200 entities. Each has 15 StringPropertys.
for model in [FooModel, BarModel]:
print 'Downloading {}'.format(model.__name__)
new_items_iter = model.query().iter()
new_items = [i.to_dict() for i in new_items_iter]
print new_items
When I run this in my console, it hangs for a while after printing 'Downloading BarModel'. It hangs until I hit ctrl+C, at which point it prints the downloaded list of items.
When this is run in a Jenkins job, there's no one to press ctrl+C, so it just runs continuously (last night it ran for 6 hours before something, presumably Jenkins, killed it). Datastore activity logs reveal that the datastore was taking 5.5 API calls per second for the entire 6 hours, racking up a few dollars in GAE usage charges in the meantime.
Why is this happening? What's with the weird behavior of ctrl+C? Why is the iterator not finishing?
This is a known issue currently being tracked on the Google App Engine public issue tracker under Issue 12908. The issue was forwarded to the engineering team and progress on this issue will be discussed on said thread. Should this be affecting you, please star the issue to receive updates.
In short, the issue appears to be with the remote_api script. When querying entities of a given kind, it will hang when fetching 1001 + batch_size entities when the batch_size is specified. This does not happen in production outside of the remote_api.
Possible workarounds
Using the remote_api
One could limit the number of entities fetched per script execution using the limit argument for queries. This may be somewhat tedious but the script could simply be executed repeatedly from another script to essentially have the same effect.
Using admin URLs
For repeated operations, it may be worthwhile to build a web UI accessible only to admins. This can be done with the help of the users module as shown here. This is not really practical for a one-time task but far more robust for regular maintenance tasks. As this does not use the remote_api at all, one would not encounter this bug.

Salesforce Schedulable not working

I have several HTTP callouts that are in a schedulable and set to run ever hour or so. After I deployed the app on the app exchange and had a salesforce user download it to test, it seems the jobs are not executing.
I can see the jobs are being scheduled to run accordingly however the database never seems to change. Is there any reason this could be happening or is there a good chance the flaw lies in my code?
I was thinking that it could be permissions however I am not sure (its the first app I am deploying).
Check if the organisation of your end user has added your endpoint to "remote site settings" in the setup. By endpoint I mean an address that's being called (or just the domain).
If the class is scheduled properly (which I believe would be a manual action, not just something that magically happens after installation... unless you've used a post-install script?) you could also examine Setup -> Apex Jobs and check if there are any errors. If I'm right, there will be an error about callout not allowed due to remote site settings. If not - there's still a chance you'll see something that will make you think. For example batch job has executed successfully but there were 0 iterations -> problem?
Last but not least - you can always try the debug logs :) Enable them in Setup (or open the developer console), fire the scheduled class's execute() manually and observe the results? How to fire it manually? Sth like this pasted to "execute anonymous":
MySchedulableClass sched = new MySchedubulableClass();
sched.execute(null);
Or - since you know what's inside the scheduled class - simply experiment.
Please note that if the updates you might be performing somehow violate for example validation rules your client has - yes, the database will be unchanged. But in such case you should still be able to see failures in Setup -> Apex Jobs.

Nagios Core 3 custom notification exception

I'm hoping to avoid a hacked together mishmash to achieve something. I know it can be done with a mishmash but let's see if I'm missing a SIMPLE, easy way. This is Nagios Core 3.
I have a service. That service is checked 24x7x365. Notifications are sent 24x7x365, on WARNING and also on CRITICAL.
That is good--that is what I want.
However...now I want one single exception to that notification setup. NOTE: I do not want an exception to the monitoring setup--I want the console to always show the correct status, 24x7. I just want to make one exception for the notification (via email) on this service.
Here is the exception:
IF service state is WARN AND time of day is between 0300 and 0600, do NOT notify.
That's it. If it's CRITICAL, email-notify 24x7 (as it already does). If it's not between 3 and 6 a.m., notify regardless of WARN vs. CRIT (as it already does). The only exception is WARNING and 3-6 a.m.
Background: This is because we have maintenance that occurs every night between 3 and 6, which we've customized to produce a WARNING (not CRITICAL). I want notifications any time outside of this (admin may have accidentally launched maint in middle of day), and I want CRITICAL any time. I don't want to simply skip CHECKS during that time because I do want the console to be correct (a big bunch of yellows 0300-0600).
So, anyway, seems like I can kludge together a bunch of constructs but does anybody have a simple way to define this one "boolean AND" condition to the notification (only) schedule?
This is what scheduled downtime is for. If you create a scheduled downtime window alerts will be suppressed during this timeframe.
If that's not an option, then you need to different contacts for this service. 1 that notifies 24x7 and only on CRITICAL, and the other notifies 24x7(sans 3-6), and only receives WARNING notifications. Have them both point to the same contact email address.

receive an alert when job is *not* running

I know how to set up a job to alert when it's running.
But I'm writing a job which is meant to run many times a day, and I don't want to be bombarded by emails, but rather I'd like a solution where I get an alert when the job hasn't been executed for X minutes.
This can be acheived by setting the job to alert on execution, and then setting up some process which checks for these alerts, and warns when no such alert is seen for X minutes.
I'm wondering if anyone's already implemented such a thing (or equivalent).
Supporting multiple jobs with different X values would be great.
The danger of this approach is this: suppose you set this up. One day you receive no emails. What does this mean?
It could mean
the supposed-to-be-running job is running successfully (and silently), and so the absence-of-running monitor job has nothing to say
or alternatively
the supposed-to-be-running job is NOT running successfully, but the absence-of-running monitor job has ALSO failed
or even
your server has caught fire, and can't send any emails even if it wants to
Don't seek to avoid receiving success messages - instead devise a strategy for coping with them. Because the only way to know that a job is running successfully is getting a message which says precisely this.

Resources