Flow has failed with error Shutting down because of violation of the Reactive Streams specification

Flow has failed with error Shutting down because of violation of the Reactive Streams specification - akka-stream

It seems I can never get the error handling right when using Akka Streams.
So this is my code
var db = Database.forConfig("oracle")
var mysqlDb = Database.forConfig("mysql_read")
var mysqlDbWrite = Database.forConfig("mysql_write")
implicit val actorSystem = ActorSystem()
val decider : Supervision.Decider = {
case _: Exception =>
println("got an exception restarting connections")
// let us restart our connections
db.close()
mysqlDb.close()
mysqlDbWrite.close()
db = Database.forConfig("oracle")
mysqlDb = Database.forConfig("mysql_read")
mysqlDbWrite = Database.forConfig("mysql_write")
Supervision.Restart
}
implicit val materializer = ActorMaterializer(ActorMaterializerSettings(actorSystem).withSupervisionStrategy(decider))
and I have a flow like this
val alreadyExistsFilter : Flow[Foo, Foo, NotUsed] = Flow[Foo].mapAsync(10){ foo =>
try {
val existsQuery = sql"""SELECT id FROM foo WHERE id = ${foo.id}""".as[Long]
mysqlDbWrite.run(existsQuery).map(v => (foo, v))
} catch {
case e: Throwable =>
println(s"Lookup failed for ${foo}")
throw e // will restart the stream
}
}.collect {case (f, v) if v.isEmpty => f}
So basically if the foo already exists in MySQL then the record should not be processed any further by the stream.
My hope with this code was that if anything fails with the mysql lookup (the mysql machine is pretty bad and timeouts are common), the record will be printed and discarded and the stream will continue with the remaining records courtesy of the supervision.
When I run this code. I see errors like
[error] (mysql_write network timeout executor) java.lang.RuntimeException: java.sql.SQLException: Invalid socket timeout value or state
java.lang.RuntimeException: java.sql.SQLException: Invalid socket timeout value or state
at com.mysql.jdbc.ConnectionImpl$12.run(ConnectionImpl.java:5576)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: Invalid socket timeout value or state
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:998)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:937)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:872)
at com.mysql.jdbc.MysqlIO.setSocketTimeout(MysqlIO.java:4852)
at com.mysql.jdbc.ConnectionImpl$12.run(ConnectionImpl.java:5574)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Socket is closed
at java.net.Socket.setSoTimeout(Socket.java:1137)
at com.mysql.jdbc.MysqlIO.setSocketTimeout(MysqlIO.java:4850)
at com.mysql.jdbc.ConnectionImpl$12.run(ConnectionImpl.java:5574)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
and
[error] (mysql_write network timeout executor) java.lang.NullPointerException
java.lang.NullPointerException
at com.mysql.jdbc.MysqlIO.setSocketTimeout(MysqlIO.java:4850)
at com.mysql.jdbc.ConnectionImpl$12.run(ConnectionImpl.java:5574)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
One thing which surprises me here is that these exceptions don't come from my catch block. because I don't see the println statement of my catch block. The stack trace doesn't show me where it originated from... but since it is saying mysql_write I can assume that its the Flow above because only this Flow uses mysql_write.
Finally the entire stream crashes with the error
[trace] Stack trace suppressed: run last compile:runMain for the full output.
flow has failed with error Shutting down because of violation of the Reactive Streams specification.
14:51:06,973 |-INFO in ch.qos.logback.classic.AsyncAppender[asyncKafkaAppender] - Worker thread will flush remaining events before exiting.
[success] Total time: 3480 s, completed Sep 26, 2017 2:51:07 PM
14:51:07,603 |-INFO in ch.qos.logback.core.hook.DelayingShutdownHook#2320545b - Sleeping for 1 seconds
I don't know what I did to violate the reactive streams specification!!

A first stab at getting a more predictable solution would be removing the blocking behaviour (Await.result) and use mapAsync. A rewrite of the alreadyExistsFilter flow could be:
val alreadyExistsFilter : Flow[Foo, Foo, NotUsed] = Flow[Foo].mapAsync(3) { foo ⇒
val existsQuery = sql"""SELECT id FROM foo WHERE id = ${foo.id}""".as[Long]
foo → Await.result(mysqlDbWrite.run(existsQuery), Duration.Inf)
}.collect{
case (foo, res) if res.isDefined ⇒ foo
}
More info on blocking in Akka can be found in the docs.

The answer given by Stefano is correct. The error was indeed coming because of blocking code in the flow.
Although, my initial program was running against scala 2.11 and even after switching to the mapAsync, the problem persisted.
Since this is a command line tool it was easy for me to switch to scala 2.12 and try again.
When I tried with Scala 2.12 it worked perfectly.
One thing which greatly helped me is to have "ch.qos.logback" % "logback-classic" % "1.2.3", in the dependencies. This will show you each and every SQL statement which is being executed and easily see if something is going wrong.

Related

Flink adding rebalance to stream cause to job failure when StreamExecutionEnvironment set using TimeCharacteristic.IngestionTime

I am trying to run streaming job that consumes messages from Kafka transforms them and sinks to the Cassandra.
Current code snippet is failing
val env: StreamExecutionEnvironment = getExecutionEnv("dev")
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)
.
.
.
.
val source = env.addSource(kafkaConsumer)
.uid("kafkaSource")
.rebalance
val transformedObjects = source.process(new EnrichEventWithIngestionTimestamp)
.setParallelism(dataSinkParallelism)
sinker.apply(transformedObjects,dataSinkParallelism)
class EnrichEventWithIngestionTimestamp extends ProcessFunction[RawData, TransforemedObjects] {
override def processElement(rawData: RawData,
context: ProcessFunction[RawData, TransforemedObjects]#Context,
collector: Collector[TransforemedObjects]): Unit = {
val currentTimestamp=context.timerService().currentProcessingTime()
context.timerService().registerProcessingTimeTimer(currentTimestamp)
collector.collect(TransforemedObjects.fromRawData(rawData,currentTimestamp))
}
}
but if rebalance is commented out, or the job is changed to use TimeCharacteristic.EventTime and watermark assignment, as in fallowing snippet, then it works.
val env: StreamExecutionEnvironment = getExecutionEnv("dev")
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
.
.
val source = env.addSource(kafkaConsumer)
.uid("kafkaSource")
.rebalance
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessRawDataTimestampExtractor[RawData](Time.seconds(1)))
val transformedObjects = source.map(rawData=>TransforemedObjects.fromRawData(rawData))
.setParallelism(dataSinkParallelism)
sinker.apply(transformedObjects,dataSinkParallelism)
The stack trace is:
java.lang.Exception: java.lang.RuntimeException: 1
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.checkThrowSourceExecutionException(SourceStreamTask.java:217)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.processInput(SourceStreamTask.java:133)
at org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: 1
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at org.apache.flink.streaming.api.collector.selector.DirectedOutput.collect(DirectedOutput.java:143)
at org.apache.flink.streaming.api.collector.selector.DirectedOutput.collect(DirectedOutput.java:45)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$AutomaticWatermarkContext.processAndCollect(StreamSourceContexts.java:176)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$AutomaticWatermarkContext.processAndCollectWithTimestamp(StreamSourceContexts.java:194)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collectWithTimestamp(StreamSourceContexts.java:409)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:91)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:156)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:715)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:203)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.getBufferBuilder(RecordWriter.java:246)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:169)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:154)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:120)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 16 more
Am I doing something wrong?
Or there Is a limitation of using rebalance function when TimeCharacteristic set to IngestionTime?
Thank you in advance...

Can you provide the flink version that you are using?
Seems your issue is related to this Jira ticket
https://issues.apache.org/jira/browse/FLINK-14087
Did u only use rebalance once in your task? The recordWriter may share same channelSelector which decides where the record will be forwarded to. Your stack trace shows it is trying to select an out-of-bound channel.

Flink - Failed to recover from a checkpoint

I'm running my cluster on kubernetes with a single jobmanager and 2 taskmanagers.
I tested the mechanism of checkpoint by killing one of the taskmanager pods while the job is running.
I got the following exceptions on the jobmanager and the restarted taskmanager:
Jobmanager exception:
java.lang.Exception: Exception while creating StreamOperatorStateContext.
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for WindowOperator_54288f79b169ee3e8cb1feb33bbad4c3_(1/8) from any of the 1 provided restore options.
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
... 6 more
Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected exception.
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:326)
at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
... 8 more
Caused by: java.nio.file.NoSuchFileException: /rocksdb/job_0a1a61f5cbecc09fbaef1257b3392b3a_op_WindowOperator_54288f79b169ee3e8cb1feb33bbad4c3__1_8__uuid_8b95eb2f-f6cf-4c35-8274-a9055376163d/db/000021.sst -> /rocksdb/job_0a1a61f5cbecc09fbaef1257b3392b3a_op_WindowOperator_54288f79b169ee3e8cb1feb33bbad4c3__1_8__uuid_8b95eb2f-f6cf-4c35-8274-a9055376163d/f1a97117-3810-400e-85ca-6e8c998a5ed4/000021.sst
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476)
at java.nio.file.Files.createLink(Files.java:1086)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreInstanceDirectoryFromPath(RocksDBIncrementalRestoreOperation.java:473)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:212)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)
... 12 more
Taskmanager exception:
2020-01-13 09:26:01,943 ERROR org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder - Caught unexpected exception.
org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Failed to sanitize XML document destined for handler class org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:219)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:317)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:834)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listPrefix(PrestoS3FileSystem.java:484)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.access$000(PrestoS3FileSystem.java:112)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem$1.<init>(PrestoS3FileSystem.java:271)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listLocatedStatus(PrestoS3FileSystem.java:269)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listStatus(PrestoS3FileSystem.java:258)
at org.apache.flink.fs.s3.common.hadoop.HadoopFileSystem.listStatus(HadoopFileSystem.java:157)
at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.listStatus(SafetyNetWrapperFileSystem.java:97)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreInstanceDirectoryFromPath(RocksDBIncrementalRestoreOperation.java:460)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:212)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)
at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.AbortedException:
at org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:53)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:81)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.read1(BufferedReader.java:210)
at java.io.BufferedReader.read(BufferedReader.java:286)
at java.io.Reader.read(Reader.java:140)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:191)
... 44 more
2020-01-13 09:26:01,944 WARN org.apache.flink.streaming.api.operators.BackendRestorerProcedure - Exception while restoring keyed state backend for WindowOperator_54288f79b169ee3e8cb1feb33bbad4c3_(7/8) from alternative (1/1), will retry while more alternatives are available.
org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected exception.
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:326)
at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Failed to sanitize XML document destined for handler class org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:219)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:317)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:834)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listPrefix(PrestoS3FileSystem.java:484)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.access$000(PrestoS3FileSystem.java:112)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem$1.<init>(PrestoS3FileSystem.java:271)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listLocatedStatus(PrestoS3FileSystem.java:269)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listStatus(PrestoS3FileSystem.java:258)
at org.apache.flink.fs.s3.common.hadoop.HadoopFileSystem.listStatus(HadoopFileSystem.java:157)
at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.listStatus(SafetyNetWrapperFileSystem.java:97)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreInstanceDirectoryFromPath(RocksDBIncrementalRestoreOperation.java:460)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:212)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)
... 12 more
Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.AbortedException:
at org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:53)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:81)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.read1(BufferedReader.java:210)
at java.io.BufferedReader.read(BufferedReader.java:286)
at java.io.Reader.read(Reader.java:140)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:191)
... 44 more
When I tried to restore from a savepoint, everything works as expected.
Any idea?

From our experience, one possible cause could be you come across below exception first:
Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Failed to sanitize XML document destined for handler class org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
Then the procedure to restore rocksdb state-backend would be interrupted leading to file /rocksdb/job_0a1a61f5cbecc09fbaef1257b3392b3a_op_WindowOperator_54288f79b169ee3e8cb1feb33bbad4c3__1_8__uuid_8b95eb2f-f6cf-4c35-8274-a9055376163d/f1a97117-3810-400e-85ca-6e8c998a5ed4/000021.sst deleted in https://github.com/apache/flink/blob/390926e61aeb69837c70a024ad6e7ff02eccdf2d/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/restore/RocksDBIncrementalRestoreOperation.java#L197
That's why you found the NoSuchFileException.

How can I see if a job failed and why?

How can I use ClusterClient to check if a job failed and why?
ClusterClient#getJobStatus may seem like a good first candidate but it only says if the job failed without any information regarding the exceptions.
The submission of the job is being done with a detached client therefore waiting for its ClusterClient#run to return a JobExecutionResult is not an option.
I've also tried
RestClusterClient#retrieveJob also does not work, failing with:
org.apache.flink.runtime.client.JobRetrievalException: Couldn't
retrieve leading JobManager. at
org.apache.flink.runtime.client.JobListeningContext.getJobManager(JobListeningContext.java:157)
at
org.apache.flink.runtime.client.JobListeningContext.getClassLoader(JobListeningContext.java:141)
at
org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:262)
at
org.apache.flink.client.program.ClusterClient.retrieveJob(ClusterClient.java:586)
at java.lang.Thread.run(Thread.java:745)
Caused by:
org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException:
Could not retrieve the leader gateway. at
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:82)
at
org.apache.flink.runtime.client.JobListeningContext.getJobManager(JobListeningContext.java:152)
... 10 more
Caused by: java.util.concurrent.TimeoutException: Futures
timed out after [10000 milliseconds] at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190) at
scala.concurrent.Await.result(package.scala) at
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:80)
... 11 more

Use NewClusterClient#requestJobResult which can be done using a RestClusterClient.

Transaction commit error is not captured in EJB code

When I shutdown DB after flush() and before commit(), an exception is logged but not captured by the code:
#Stateless
#TransactionAttribute(TransactionAttributeType.NEVER)
public class OuterService {
#EJB InnerService innerService;
public String outerMethod() {
try {
innerService.innerMethod();
return "success";
} catch (Exception e) {
return "failure";
}
}
}
#Stateless
#TransactionAttribute(TransactionAttributeType.REQUIRED)
public class InnerService {
#PersistenceContext EntityManager em;
public void innerMethod() {
em.persist(new Entity());
em.flush();
} //put the breakpoint here
}
I run the code in debug mode and set a breakpoint after flush but before exiting the transactional method. When the execution is paused, I stop the db service and then resume the code.
An exception is logged with the following root cause:
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException:
Communications link failure during commit(). Transaction resolution
unknown.
but it is not captured by the outer try..catch block and the method successfully completes. It seems that JTA implementation suffocates the exception. How can I be notified of an error?
I already tried BMT and CDI events but none worked. Plain JDBC and JPA (Hibernate, with built-in and C3p0 pools) in Java SE environment, however, do work.
My setup: Ubuntu 17.10, Wildfly 10, MySQL 5.7.20, Connector/J 5.1.44
Here is the log (some lines removed because of character limit):
2018-01-07 12:38:44,980 INFO [stdout] (default task-1) Hibernate: insert into Entity values ( )
2018-01-07 12:39:06,027 WARN [org.jboss.jca.core.connectionmanager.listener.TxConnectionListener] (default task-1) IJ000305: Connection error occured: org.jboss.jca.core.connectionmanager.listener.TxConnectionListener#f0b0aed[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection#327b7fd0 connection handles=0 lastReturned=1515316110106 lastValidated=1515316098805 lastCheckedOut=1515316124981 trackByTx=true pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool#a4e7bad mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool#42037075[pool=TestDS] xaResource=LocalXAResourceImpl#306327f6[connectionListener=f0b0aed connectionManager=6110d60 warned=false currentXid=null productName=MySQL productVersion=5.7.20-0ubuntu0.17.10.1 jndiName=java:/datasources/TestDS] txSync=TransactionSynchronization#360457732{tx=TransactionImple < ac, BasicAction: 0:ffff7f000101:-7fd727eb:5a51e37d:1d status: ActionStatus.COMMITTING > wasTrackByTx=true enlisted=true cancel=false}]: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during commit(). Transaction resolution unknown.
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.Util.getInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
at com.mysql.jdbc.ConnectionImpl.commit(ConnectionImpl.java:1552)
at org.jboss.jca.adapters.jdbc.local.LocalManagedConnection.commit(LocalManagedConnection.java:96)
at org.jboss.jca.core.tx.jbossts.LocalXAResourceImpl.commit(LocalXAResourceImpl.java:172)
at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:96)
at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162)
at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:126)
at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:89)
at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:239)
at org.wildfly.security.manager.WildFlySecurityManager.doChecked(WildFlySecurityManager.java:636)
at InnerService$$$view26.innerMethod(Unknown Source)
at OuterService.outerMethod(OuterService.java:23)
at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52)
at OuterService$$$view33.outerMethod(Unknown Source)
at RestManager.test(RestManager.java:112)
2018-01-07 12:39:06,032 WARN [com.arjuna.ats.jta] (default task-1) ARJUNA016039: onePhaseCommit on < formatId=131077, gtrid_length=47, bqual_length=36, tx_uid=0:ffff7f000101:-7fd727eb:5a51e37d:1d, node_name=mypc, branch_uid=0:ffff7f000101:-7fd727eb:5a51e37d:20, subordinatenodename=null, eis_name=java:/datasources/TestDS > (LocalXAResourceImpl#306327f6[connectionListener=f0b0aed connectionManager=6110d60 warned=false currentXid=null productName=MySQL productVersion=5.7.20-0ubuntu0.17.10.1 jndiName=java:/datasources/TestDS]) failed with exception XAException.XAER_RMFAIL: org.jboss.jca.core.spi.transaction.local.LocalXAException: IJ001156: Could not commit local transaction
at org.jboss.jca.core.tx.jbossts.LocalXAResourceImpl.commit(LocalXAResourceImpl.java:177)
at com.arjuna.ats.internal.jta.resources.arjunacore.XAOnePhaseResource.commit(XAOnePhaseResource.java:120)
at org.jboss.as.ejb3.tx.CMTTxInterceptor.endTransaction(CMTTxInterceptor.java:91)
at InnerService$$$view26.innerMethod(Unknown Source)
at OuterService.outerMethod(OuterService.java:23)
at OuterService$$$view33.outerMethod(Unknown Source)
at RestManager.test(RestManager.java:112)
at RestManager$Proxy$_$$_Weld$EnterpriseProxy$.test(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.jboss.jca.core.spi.transaction.local.LocalResourceException: Communications link failure during commit(). Transaction resolution unknown.
at org.jboss.jca.adapters.jdbc.local.LocalManagedConnection.commit(LocalManagedConnection.java:103)
at org.jboss.jca.core.tx.jbossts.LocalXAResourceImpl.commit(LocalXAResourceImpl.java:172)
... 248 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during commit(). Transaction resolution unknown.
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
at com.mysql.jdbc.ConnectionImpl.commit(ConnectionImpl.java:1552)
at org.jboss.jca.adapters.jdbc.local.LocalManagedConnection.commit(LocalManagedConnection.java:96)
... 249 more

I investigated on this and it sounds to me as an issue. I created the jira here: https://issues.jboss.org/browse/JBTM-2983. Feel free to follow discussion there if it's confirmed so.
I expect the caller should be informed that there was an exception during commit. Btw. in case I of some further investigation I created a small test project based of your issue: https://github.com/ochaloup/catch-ejb-exception-test.git

I'm not sure. There could be several causes.
First I would try is to increase the net_write_timeout property in your mysql configuration.
Also your Exception is caused by error code XAException.XAER_RMFAIL
Jboss Javadoc: XAException:
Error code indicating that the resource manager has failed and is not
available.
So it seems that PersistenceContext is broken or invalid because of break point interruption.

Why does Dataflow erratically fail in Datastore access?

My simple Dataflow pipeline successfully copies multiple Kinds from one project's Datastore to another in most cases. But in certain Kinds (about 5% of them), we always get these errors.
Dataflow retries 4-8 times with a delay of about 75 seconds, and then the pipeline fails.
How can I diagnose and resolve this?
EDIT: The answer includes: (1) there was a bug in the Datastore library used by Dataflow; after they fixed this bug, you can see the underlying cause and (2) the default batch size for putting entities in this library is 500, which is also the max, and that goes over the 10 Mb limit of the Datastore API.
The (very simple) Pipeline looks like this:
Query.Builder qb = Query.newBuilder();
qb.addKindBuilder().setName(kindName);
Query query = qb.build();
Read dsRead = DatastoreIO.v1().read().withProjectId(inputProject).withQuery(query);
Write dsWrite = DatastoreIO.v1().write().withProjectId(outputProject);
PCollection<Entity> sourceEntities = pipeline.apply("read", dsRead);
Bound<Entity, Entity> entityFromSrcToTarget = ParDo.of(new EntityDoFn());/*Simple DoFn that copies Entities for insertion to target*/
PCollection<Entity> clonedEntities = sourceEntities.apply("clone-entity", entityFromSrcToTarget);
clonedEntities.apply("write-to-ds", dsWrite);
First stacktrace
com.google.datastore.v1.client.DatastoreException: I/O error, code=UNAVAILABLE at
com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:126) at
com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:95) at
com.google.datastore.v1.client.Datastore.commit(Datastore.java:84) at
com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.flushBatch(DatastoreV1.java:925) at
com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.finishBundle(DatastoreV1.java:899) Caused by: java.io.IOException: insufficient data written at
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3500) at
com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:81) at
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981) at
com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:87) at
com.google.datastore.v1.client.Datastore.commit(Datastore.java:84) at
com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.flushBatch(DatastoreV1.java:925) at
com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.finishBundle(DatastoreV1.java:899) at
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.finishBundle(DoFnRunnerBase.java:158) at
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:196) at
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.finishBundle(ForwardingParDoFn.java:47) at
com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.finish(ParDoOperation.java:65) at
com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:80) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:287) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:223) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at
java.lang.Thread.run(Thread.java:745)
Also
(9908b474b1492772): java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException:
com.google.datastore.v1.client.DatastoreException: I/O error, code=UNAVAILABLE at
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:162) at
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:287) at
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:283) at
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext$1.outputWindowedValue(DoFnRunnerBase.java:507) at
com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowsViaIteratorsDoFn.processElement(GroupAlsoByWindowsViaIteratorsDoFn.java:125) at
com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49) at
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138) at
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:190) at
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47) at
com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55) at
com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) at
com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:202) at
com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:143) at
com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:72) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:287) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:223) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at
java.lang.Thread.run(Thread.java:745)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Flow has failed with error Shutting down because of violation of the Reactive Streams specification - akka-stream

Related

Flink adding rebalance to stream cause to job failure when StreamExecutionEnvironment set using TimeCharacteristic.IngestionTime

Flink - Failed to recover from a checkpoint

How can I see if a job failed and why?

Transaction commit error is not captured in EJB code

Why does Dataflow erratically fail in Datastore access?

Categories

Resources