Zeppelin: Showing wrong job running time - apache-zeppelin

I executed a Spark Job multiple time over Zeppelin and its progress is
show over the Zeppelin JobManager. But its displaying wrong time as when
it was executed. I had executed it recently, but its showing 20 hours ago

Related

Flink - Why does Flink throw error when I submit job by 'flink run' after running Flink stand-alone more than 1 month?

I'm running Flink in stand-alone mode on 1 host (JobManager, TaskManager on the same host). At first, I'm able to submit and cancel jobs normally, the jobs showed up in the web UI and ran.
However, after ~1month, when I canceled the old job and submitting a new one, I faced org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.
At this moment, I was able to run flink list to list current jobs and flink cancel to cancel the job, but flink run failed. Exception was thrown and the job was now shown in the web UI.
When I tried to stop the current stand-alone cluster using stop-cluster, it said 'no cluster was found'. Then I had to find the pid of flink processes and stop them manually. Then if I run start-cluster to create a new stand-alone cluster, I was able to submit jobs normally.
The shortened stack-trace: (full stack-trace at google docs link)
org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result. (JobID: 7ef1cbddb744cd5769297f4059f7c531)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob (RestClusterClient.java:261)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted. Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.ConnectionClosedException: Channel became inactive.
Caused by: org.apache.flink.runtime.rest.ConnectionClosedException: Channel became inactive.
... 37 more
The error is consistent. It always happens after I let Flink run for a while, usually more than 1 month). Why am I not able to submit job to flink after a while? What happened here?

Jenkins plugin monitoring for stuck jobs

Does anybody know if there is Jenkins plugin that detects stuck jobs and e.g. sends an email?
I know that I can abort the build after n minutes (via the Binding configuration). My actual problem is that I've to check 60 projects and reconfigure each of them ...
Thx
Build-timeout Plugin gives the option to stop build that answer specific configuration. for example if the build takes more than T minutes or if the build takes 3 times more than an average successful build.
Is that what you meant ?

How to get Flink started in FLIP6 mode

We are using Flink-1.4 on a cluster of 3 machines.
We started the JobManager on one machine with the following command
bin/jobmanager.sh start flip6
Next, we started the TaskManager on two machines with the following command:
bin/taskmanager.sh start flip6
However, we do not see the Flink Dashboard Web UI up.
Neither do we see any errors in the logs.
Is there something we are missing, maybe in the config file?
Here is the log for the JobManager:
https://gist.github.com/jamesisaactm/72cda2bb286d3a3e20f91e64138941b6
For 1.4 the FLIP-6 mode is still a WIP and missing major parts, like the WebUI.
You will have to wait for 1.5 to use the FLIP-6 mode.

Google app engine jobs in datastore admin freeze

I tried to delete one kind of entity at once from GAE datastore admin page. The problem is, I fired two jobs for deleting one kind (same kind). After one job successfully finished, another just freeze, preventing other jobs from being run.
The job description is:
Job #158264924762856ED17CF
Overview
Running
Elapsed time: 00:00:00
Start time: Tue Nov 20 2012 09:58:27 GMT+0800
entity_kind: "CacheObj"
Counters
How can I clear these jobs? Deleting them from task queue won't help much, they are still inside datastore admin page.
I faced the same problem, but the frozen job didn't prevent new jobs to be executed. However, getting a frozen job is misleading. My workaround:
Go to the Datastore Viewer
Select _AE_DatastoreAdmin_Operation as kind
Find the frozen job
Delete it
You might get an error saying that the app failed
Go back to the Datastore Admin, and check that the job is no longer there

App Engine upload issues.

I'm no longer able to upload code to my app engine app. The following are the symptoms that I'm experiencing:
Code compiles and files are uploaded. Update fails during "Verifying availability" Here's the specific stuff from the console:
Deploying new version.
Verifying availability:
Will check again in 1 seconds.
Will check again in 2 seconds.
Will check again in 4 seconds.
Will check again in 8 seconds.
Will check again in 16 seconds.
Will check again in 32 seconds.
Will check again in 60 seconds.
.
.
.
Will check again in 60 seconds.
Will check again in 60 seconds.
Will check again in 60 seconds.
on backend null.
java.lang.RuntimeException: Version not ready.
When I look in the Admin Console it indicates that the new version uploaded, however, the old code is still running. I tried using a different machine to upload a Hello World example to isolate my main development machine and project. I get the same result with the other machine/app. I also tried to upload the app from the command line but got the same result. I ran appcfg.sh rollback (I'm on a Mac) and it completes without error but still the problem persists.
I don't recall a specific change to my app engine configuration immediately before the problem began, but I had changed the version to 0.1 within a couple of hours of my problems starting. The app engine rejected the version number due to the period in the version number. It took a long time to figure this out because the feedback was cryptic. I mention this because a) the feedback was cryptic and b) the error specifically mentions "Version not ready".
Can anyone help me figure out what I did and how to undo it? If it helps, my app ID is milkmooseexperimental
Thank you.
It's a Server Side issue. I'm experiencing the same problem as well, and have been unable to deploy for the last 12 hours. I'm not sure whether this has anything to do with the scheduled outage window.

Resources