Unable to run a Flink Job on Kubernetes - apache-flink

I am running a Flink Job on Kubernetes and trying to read JSON messages from a Kafka topic as shown below:
var consumer = new FlinkKafkaConsumer("inv-json", new SimpleStringSchema(), properties);
And I get the following error and the job fails,
ERROR org.apache.flink.runtime.webmonitor.handlers.JarRunHandler [] - Exception occurred in REST handler: Could not execute application.
I am not sure how to resolve this (the same code runs file locally) - I googled this error, but did not find any solutions. Thanks.

This is in reference to Flink v1.12.1-scala.2.12-java11 application deployed on Kubernetes using Lyft's K8S operator. The application that I was trying to run reads streaming data from Kafka and does processing. But the job never got submitted/ran and failed with the above error which is not quite helpful in identifying and fixing the issue. I just had to try out several options including creating a fat jar with all the dependencies included in the jar and run it on Kubernetes and that resolved the issue. I later improvized by switching to a shaded jar.

Related

Why am I getting client error while running Data wrangler processing job in sagemaker?

I am working in Feature store creation with the help of AWS Data wrangler - a feature of AWS Sagemaker Studio. When I try to run the data wrangler job (for ingestion data into feature store), I am encountering the following error.
"ClientError: API error (404): manifest for XXXXXXXXXXXX.dkr.ecr.ap-south-1.amazonaws.com/sagemaker-data-wrangler-container:1.30.2 not found: manifest unknown: Requested image not found"
Some updations are made in Sagemaker studio I guess and this may be beacuse of that is what I am thinking. But not sure what exactly the error means and how it could be resolved. Reason I am saying this is, even the job which ran properly yesterday is failing today. Can anyone please help me with this?

Query on automating Flink Job submission

I am trying to use Flink REST APIs to automate Flink job submission process via pipeline. To call any Flink Rest endpoint we should be aware about the Job Manager Web interface IP. For my POC, I got the IP after running flink-yarn-session command on CLI, but what is the way to get it from code?
Fo automation, I am planning to call following REST API in sequence
request. get('http://ip-10-0-127-59.ec2.internal:8081/jobs/overview') // Get Running job Id
requests.post('http://ip-10-0-127-59.ec2.internal:8081/jobs/:jobID/savepoints/') // Cancel job with savepoint
requests.get('http://ip-10-0-127-59.ec2.internal:8081/jobs/:JobId/savepoints/
:savepointId') // Get savepoint status
requests. Post("http://ip-10-0-127-59.ec2.internal:8081/jars/upload"). // Upload jar for new job
requests.post(
"http://ip-10-0-127-59.ec2.internal:8081/jars/de05ced9-03b7-4f8a-bff9-4d26542c853f_ATVPlaybackStateMachineFlinkJob-1.0-super-2.3.3.jar/run") // submit new job
requests.get('http://ip-10-0-116-99.ec2.internal:35497/jobs/:jobId') // Get status of new job
If you have the flexibility to run on Kubernetes instead on Yarn (looks like you are on AWS from your hostnames, so you could use EKS) then I would recommend using the official Flink Kubernetes Operator - it is built for exactly this purpose by the community.
If Yarn is a given for your use case then you may follow the code examples that Flink uses to talk to the Yarn ResourceManager in the flink-yarn package, especially the following:
https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L384
https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManagerDriver.java#L258

Azure logic app - Transform to XML - MapNotReady

I'm trying to translate an X12 edi message using a map created in VS2015, but I get the following error;
MapNotReady. The map '' is still being processed. Please try again later.
Running the input in VS2015 I get the correct result, but not using Azure Logic Apps
Resolved this issue by creating a new Integration account in a new Resource Group and different Location.
Looks like a bug in Azure, will log call with MS
I faced the same issue after deploying a logic app using ARM template.
What was I doing?
In deploy powershell, I was creating integration account and adding schemas and maps.
Deploying logic app using ARM template.
Immediately after deployment, I tried to execute the logic app. At that point, I received MapNotReady exception in transform action.
However after 10 minutes when I retried the message again, the problem was gone. It looks like, map service was not fully deployed.
So no need to deploy to different resource group. Probably wait for few minutes before executing LogicApps.

Carrot2 dcs webapp setup

I have been struggling with setting up Carrot2 for use PHP, on a local machine. The plan is to have Carrot2 retrieve cluster from Solr populated by Nutch. Currently Solr and Nutch are correctly configured and I have been able to access the information via Carrot2 Workbench. Carrot2-dcs-3.10.0 has been set up what I believed to be correctly deployed through the tomcat6 manager although the documentation on setting this up is horrible vague and incomplete. Changes to source-solr-attributes.xml were made according to https://sites.google.com/site/profileswapnilkulkarni/tech-talk/howtoconfigureandruncarrot2webapplicationwithsolrdocumentsource . Tomcat is set up on port 8080. The Carrot2 DCS php example example.php works and displays the test output correctly. Although, when I try to perform a cluster using localIPAddress:8080/carrot2-dcs/index.html I run into a problem. When I set document source to Solr and the query to : then click cluster I get the following error message.
HTTP Status 500 - Could not perform processing: org.apache.http.conn.HttpHostConnectException: Connection to localhost:8983 refused
type Status report
message Could not perform processing: org.apache.http.conn.HttpHostConnectException: Connection to localhost:8983 refused
description The server encountered an internal error that prevented it from fulfilling this request.
I have searched everywhere in the deployed webapp folder for carrot2 and can't find where it is getting localhost:8983 from.
Any assistance would be appreciated, thank you.
It turns out that the source-solr-attributes.xml file had an extra overridden-attributes. one was before the default block comment with the example parameters and the second was added in by me with the parameters needed for my config. Deleting one of the line so there was only one corrected the problem. Apparently with two of those it ignores the server settings and uses default values instead.

'Version is not ready' error on update - GAE Python

I am unable to update my frontends nor my backends. I get the error message 'Version is not ready'. This bug has persisted for coming up to 24 hours now. I have a task perpetually running in a queue. My best guess is that this task is stopping the update. I am unable to delete the task as it is perpetually running, nor can I delete the queue as I am unable to upload a new queue.yaml definition. The same task previously failed due to a maximum recursion error as I had a synchronous RPC within an asynchronous tasklet.
I'm pretty sure the fix will require someone from the GAE side forcibly resetting the task queue. Thus, this question would be more suitably directed to the GAE team with details about my app in a less public forum. Though, from what I can see, they do not allow direct support questions and suggest posting the question here. My follow up question, then, is when you have a GAE issue that requires action from the GAE team - how do you get hold of them (other than paying US$500/month for a premium support account)?
EDIT:
The task is/was meant to be running on a backend instance. I intended to shutdown all backend and frontend instances via the console assuming that they would cancel the task and restart themselves. But I found that only one frontend instance was running - no backends. After shutting down that frontend instance, the dashboard has reported that I have 0 instances running, yet the website is still serving and the task remains perpetually running.
EDIT:
Disabling the app stopped the task from running. After reenabling the app, I was able to update it. Though I am left with a ghost task in my queue.
If you have a stuck task queue job, I'd try disabling the queue and killing the instance running that job. If that doesn't work, I'd try disabling the app temporarily.

Resources