Flink not giving full reason as to why job submission failed

Flink not giving full reason as to why job submission failed - apache-flink

After upgrading Flink to 1.7.2, when I try to submit a job from the dashboard and there's some issue with the job, the job submission fails with the following error.
Exception occurred in REST handler: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
There's no other reason given as to why the job failed to submit. This was not the case in 1.4.2. Is there a way to see the full reason why the job failed to deploy? I see the same error in the logs too with no additional information.

This is a known issue in version 1.7.2. It has been fixed in 1.8.0.
Jira ticket - https://issues.apache.org/jira/browse/FLINK-11902

Related

Why am I getting client error while running Data wrangler processing job in sagemaker?

I am working in Feature store creation with the help of AWS Data wrangler - a feature of AWS Sagemaker Studio. When I try to run the data wrangler job (for ingestion data into feature store), I am encountering the following error.
"ClientError: API error (404): manifest for XXXXXXXXXXXX.dkr.ecr.ap-south-1.amazonaws.com/sagemaker-data-wrangler-container:1.30.2 not found: manifest unknown: Requested image not found"
Some updations are made in Sagemaker studio I guess and this may be beacuse of that is what I am thinking. But not sure what exactly the error means and how it could be resolved. Reason I am saying this is, even the job which ran properly yesterday is failing today. Can anyone please help me with this?

delete_model() error when cleaning up AWS sagemaker

I followed the tutorial on https://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/
I got an error when trying to clean up with the following code.
xgb_predictor.delete_endpoint()
xgb_predictor.delete_model()
ClientError: An error occurred (ValidationException) when calling the DescribeEndpointConfig operation: Could not find the endpoint configuration.
Does it mean I need to delete the model first instead?
I checked on the console and deleted the model manually.

No, you don't need to delete the model prior to deleting the endpoint. From the error logs looks like its not able to find the endpoint configuration. Can you verify if you are setting delete_endpoint_config to True
xgb_predictor.delete_endpoint(delete_endpoint_config=True)
Additionally, you can verify if the endpoint_config is still avaiable on the AWS console.

Unable to run a Flink Job on Kubernetes

I am running a Flink Job on Kubernetes and trying to read JSON messages from a Kafka topic as shown below:
var consumer = new FlinkKafkaConsumer("inv-json", new SimpleStringSchema(), properties);
And I get the following error and the job fails,
ERROR org.apache.flink.runtime.webmonitor.handlers.JarRunHandler [] - Exception occurred in REST handler: Could not execute application.
I am not sure how to resolve this (the same code runs file locally) - I googled this error, but did not find any solutions. Thanks.

This is in reference to Flink v1.12.1-scala.2.12-java11 application deployed on Kubernetes using Lyft's K8S operator. The application that I was trying to run reads streaming data from Kafka and does processing. But the job never got submitted/ran and failed with the above error which is not quite helpful in identifying and fixing the issue. I just had to try out several options including creating a fat jar with all the dependencies included in the jar and run it on Kubernetes and that resolved the issue. I later improvized by switching to a shaded jar.

How to Stop "Salesforce Error" Emails Reporting Batch Apex errors

We regularly receive automated emails from Salesforce about Batch Apex errors, but are having trouble 1) disabling the error emails and 2) tracking down the issues. Is there a way to disable these error messages (many people in the organization receive them regularly and I'd rather that just one user, if any, receive it)? Is there a way to see specifically which Apex script is triggering these errors or get any more information about the error?
Here is an example error message:
Organization: Organization Name(0000000000000000)
User: email#gmail.com(0000000000000000)
Salesforce reported the below errors as NPSP was attempting to execute its batch jobs, or at a time when it was unable to display error messages directly to a user. It’s likely that NPSP was attempting to update summary fields on Accounts and Contacts, but was unable to save certain records. This failure might have been caused by a variety of issues unrelated to NPSP, such as custom code or validation rules.
Read this article on the Power of Us Hub to learn how these Scheduled Jobs work: https://powerofus.force.com/NPSP_Scheduled_Jobs
If you’re not sure how to resolve these errors, post a message in the Nonprofit Success Pack group in the Power of Us Hub: https://powerofus.force.com/HUB_NPSP_Group
Errors:
----------
Error #1:
Error Type: Batch Apex error
Error Date: 2017-09-11 04:00:25
Message: "First error: Update failed. First exception on row 0 with id 003i000001ILolWAAT; first error: FIELD_CUSTOM_VALIDATION_EXCEPTION, Please enter a Mailing Country.: []"
Context: npsp__RLLP_OppSoftCreditRollup_BATCH
Stack Trace:
null

This is not a standard email sent by the platform. This is a custom email sent by the NPSP, which you must have installed.
If you don't need these batch processes to work, you can stop them from running by going to Setup | Scheduled Jobs.
If you do need them to run, then you can just go to the record indicated in the error Contact 003i000001ILolWAAT and populate the Mailing Country. You could also turn off the validation rule requiring Mailing Country or update it not to apply to your user so that when the batch process runs it will be able to get past it.

Google app engine down: Server Error

I built a system running on: http://www.hijgoo.com.tw and all the sudden, there was the following message and it did persist. I am using GAE SDK 1.7.2 with Python 2.7. Can anyone help?
Error: Server Error
The server encountered an error and could not complete your request.
If the problem persists, please report your problem and mention this error message and the query that caused it.

Looks like you have a session that does not contain the key account, but your code assumes that it does.
The reason for this isn't easy to see without the actual code, but it sounds like a programming error.