Apache Flink example job fails to run with "Job not found"

Apache Flink example job fails to run with "Job not found" - apache-flink

Attempting to run the SocketWindowWordCount example tutorial found on the flink site here.
I started the flink cluster, then ran a local socket server:
nc -l 9000
After compiling the example source taken from github, I run the job
flink run target/SocketWindowWordCount.jar --port 9000
I then input some words to the terminal running nc. Nothing goes to the expected output file and the log has this error repeating:
2019-07-09 15:54:32,673 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job f9b3c58ca3026855fd2612e3c86551fa not found
2019-07-09 15:54:35,673 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job f9b3c58ca3026855fd2612e3c86551fa not found
2019-07-09 15:54:38,673 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job f9b3c58ca3026855fd2612e3c86551fa not found
2019-07-09 15:54:39,769 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception occurred in REST handler: Job f9b3c58ca3026855fd2612e3c86551fa not found

This is usually happening when you have a Flink UI tab open in your browser, which is left open from previous job deployments.
So the UI is pointing to a URL with http://.../f9b3c58ca3026855fd2612e3c86551fa which is a JobID that doesn't exist. This causes the above log to show up.

Related

Failed to create collection 'techproducts' due to: Underlying core creation failed while creating collection: techproducts

I just started to learn solr with official documentation and during the first exercise "Index Techproducts Example Data" I failed due to following error: " Failed to create collection 'techproducts' due to: Underlying core creation failed while creating collection: techproducts".
I tried to change java version from 13 to 8 but it didn't helped.
Here is link to the documentation: https://lucene.apache.org/solr/guide/8_5/solr-tutorial.html#exercise-1
Stacktrace from solr Admin console
Collection: techproducts operation: create failed:org.apache.solr.common.SolrException: Underlying core creation failed while creating collection: techproducts
at org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:304)
at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:263)
at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:504)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

I had run into similar situation while following Solr
's official tutorial as following
➜ solr-8.7.0 ERROR: Failed to create collection 'techproducts' due to: Underlying core creation failed while creating collection: techproducts
And problem solved my turning off my vpn. I guess the vpn routing probably messed up with solr's localhost setting somehow.

I had the same Underlying core creation failed... error too. Using Java 11, Windows 10.
The log file was ${solr-home}\example\cloud\node1\logs\solr.log. Inside it had:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://192.168.1.16:7574/solr: Error CREATEing SolrCore 'techproducts_shard1_replica_n1': Unable to create core [techproducts_shard1_replica_n1] Caused by: no segments* file found in LockValidatingDirectoryWrapper(NRTCachingDirectory(MMapDirectory#{solr_home}\example\cloud\node2\solr\techproducts_shard1_replica_n1\data\index lockFactory=org.apache.lucene.store.NativeFSLockFactory#16326253; maxCacheMB=48.0 maxMergeSizeMB=4.0)): files: [write.lock] at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681) ~[?:?]
at (etc. etc.)e
But this was the second time I launched solr. The first time it timed out trying to contact one of the nodes and the tutorial script aborted. But the nodes were still running. I killed them off using the windows task manager and not by using solr stop. So I suspect I left an instable mess behind and the second time the tutorial ran it crashed into this mess.
I erased everything and started over from unzipping and this third time there were no timeouts and the tutorial completed without error.

File: /opt/solr/server/etc/jetty.xml
(1) Name="requestHeaderSize" set Property name "solr.jetty.request.header.size" default="81920"
(2) Name="responseHeaderSize"> set Property name="solr.jetty.response.header.size" default="81920"
(3) Restart Solr

Hm, tried this, still getting the exact same error.
After Change:
[Set name="requestHeaderSize"][Property name="solr.jetty.request.header.size" default="81920" /][/Set]
[Set name="responseHeaderSize"][Property name="solr.jetty.response.header.size" default="81920" /][/Set]
I stopped everything and retried, then I had Windows Firewall prompt me for 'SAP Machine' authorization for java 11 message, I accepted it, and retried. Then it worked. Seems Windows Firewall related.

flink job submission org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find Flink job

Getting the following flink job submission error,
#centos1 flink-1.10.0]$ ./bin/flink run -m 10.0.2.4:8081 ./examples/batch/WordCount.jar --input file:///storage/flink-1.10.0/test.txt --output file:///storage/flink-1.10.0/wordcount_out
Job has been submitted with JobID 33d489aee848401e08c425b053c854f9
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: org.apache.flink.runtime.rest.util.RestClientException: [org.apache.flink.runtime.rest.handler.RestHandlerException: org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find Flink job (33d489aee848401e08c425b053c854f9)
....
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find Flink job (33d489aee848401e08c425b053c854f9)
Caused by: org.apache.flink.runtime.messages.FlinkJobNotFoundException: Could not find Flink job (33d489aee848401e08c425b053c854f9)
at org.apache.flink.runtime.dispatcher.Dispatcher.getJobMasterGatewayFuture(Dispatcher.java:776)
at org.apache.flink.runtime.dispatcher.Dispatcher.requestJobStatus(Dispatcher.java:505)
... 27 more
]
logs from the taskmanger nodes: saying the file not found.. Is the correct way of pointing files in a flink cluster setup.
2020-03-19 13:15:29,843 ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN DataSource (at main(WordCount.java:69) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:84)) -> Combine (SUM(1), at main(WordCount.java:87) (1/2)
java.io.IOException: Error opening the Input Split file:/storage/flink-1.10.0/test.txt [0,19]: /storage/flink-1.10.0/test.txt (No such file or directory)
at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:824)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:470)
how to troubleshoot the above error, what to check , very less clues in the flink logs

The reason why is happening is because you are submitting a job across a distributed cluster and the location you have specified is perhaps only accessible by Job manager or machine from where you have submitted your job. However, actual program and Job execution takes place in Task Manager. Better approach for this would be by specifying a location which is accessible by all the nodes, may be HDFS or NFS.

IdentityServer4 QuickStart2_ResourceOwnerPasswords System.Net.Http.HttpRequestException

I am trying to run the QuickStart2_ResourceOwnerPasswords sample. I could view the http://localhost:5000/.well-known/openid-configuration without issues. However when I run the Client.dll from the console it gives me following error:
Unhandled Exception: System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.Http.WinHttpException: A connection with the server could not be established
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Threading.Tasks.RendezvousAwaitable`1.GetResult()
at System.Net.Http.WinHttpHandler.<StartRequest>d__105.MoveNext()
Any help is appreciated!
Note: I am running the QuickStartIdentityServer as start up project (this runs the Identity Server implementation in the background with command window showing logs). I am currently running the Client.dll from a separate window using the command:
dotnet Client.dll
Not sure if this is the correct way of testing it...

Neo4j Desktop DB failed to start with with status 'KILLED'

Running Neo4j Desktop version 1.0.15. Trying to start DB causes the start fail:
Database failed to start:
DB [database-f8950fdd-6b5f-4fea-8c9f-e8457ee1da9a] 'v3.3.1' exited
with status 'KILLED'. Check the logs
Major Log parts are below
2018-02-26 23:03:38.004+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#6411d3c8' was successfully initialized, but failed to start. Please see the attached cause exception "Connection timed out: connect". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#6411d3c8' was successfully initialized, but failed to start. Please see the attached cause exception "Connection timed out: connect".
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory, C:\Users\kiril\AppData\Roaming\Neo4j Desktop\Application\neo4jDatabases\database-f8950fdd-6b5f-4fea-8c9f-e8457ee1da9a\installation-3.3.1\data\databases\graph.db
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.backup.OnlineBackupKernelExtension#c6e0f32' was successfully initialized, but failed to start. Please see the attached cause exception "Connection timed out: connect".
Suppressed: org.neo4j.kernel.lifecycle.LifecycleException: Exception during graceful attempt to stop partially started component. Please use non suppressed exception to see original component failure.
Caused by: java.io.IOException: Unable to establish loopback connection

It might be your neo4j was still running.
I solved this problem by shutting down the database and restarting it.
Platform: Windows 10;
Neo4j Desktop version: 1.1.13
Open a commend-line window and go to the directory of your neo4j database.
run
bin\neo4j status
to check the status of your database.
If it is running, run
bin\neo4j stop
to shut down your database.
Then go back to the Neo4j Desktop, click the start button, and it could be symptom-free.

I had the same problem and the issue was resolved when I cloned the graph I was connecting. Thereafter, I could connect to the new graph without any problem.

Getting CamelExecutionException

I'm getting the below error in my production environment
DEBUG org.apache.camel.processor.DefaultErrorHandler - Failed delivery for exchangeId: ID-*-56874-1372457272212-0-1. On delivery attempt: 0 caught: org.apache.camel.CamelExecutionException: Exception occurred during execution on the exchange: Exchange
DEBUG org.apache.camel.processor.Pipeline - Message exchange has failed: so breaking out of pipeline for exchange: Exchange[***] Handled by the error handler.
Same build with same dependecies is working in 3 of my other environments in LINUX Box. We are using camel-core-2.8.0-fuse and Java 1.6

There were 2 JAR for beans and it was pointing to wrong JAR. After removing the old JAR, error is rectified. Not sure why CAMEL was giving Camel Execution Exception for JAR Issues :(