How to diagnose problems (Failed to route and batch data on XXX channel)? - database

I am using SymmetricDS and i have stumbled across a problem on one of my clients.
Log says:
2015-11-11 09:10:57,688 ERROR [blagajna_XXX] [RouterService] [blagajna_XXX-job-17] Failed to route and batch data on 'cFpromet' channel
java.lang.NullPointerException
There is no further explanation for null pointer exception, so i can't debug it myself. Replication itself works for some time and then this error appears and replication stops working. Identical system works without any problems.
select * from sym_outgoing_batch where error_flag=1; returns 0 rows, so how can i debug this problem?
GregaJ
EDIT:
java.lang.NullPointerException
at org.jumpmind.db.platform.AbstractJdbcDdlReader.getTableNamePattern(AbstractJdbcDdlReader.java:638)
at org.jumpmind.db.platform.AbstractJdbcDdlReader$3.execute(AbstractJdbcDdlReader.java:574)
at org.jumpmind.db.platform.AbstractJdbcDdlReader$3.execute(AbstractJdbcDdlReader.java:563)
at org.jumpmind.db.sql.JdbcSqlTemplate.execute(JdbcSqlTemplate.java:432)
at org.jumpmind.db.platform.AbstractJdbcDdlReader.readTable(AbstractJdbcDdlReader.java:563)
at org.jumpmind.db.platform.AbstractDatabasePlatform.readTableFromDatabase(AbstractDatabasePlatform.java:239)
at org.jumpmind.db.platform.AbstractDatabasePlatform.getTableFromCache(AbstractDatabasePlatform.java:314)
at org.jumpmind.symmetric.db.AbstractSymmetricDialect.getTable(AbstractSymmetricDialect.java:377)
at org.jumpmind.symmetric.service.impl.RouterService.routeData(RouterService.java:689)
at org.jumpmind.symmetric.service.impl.RouterService.selectDataAndRoute(RouterService.java:634)
at org.jumpmind.symmetric.service.impl.RouterService.routeDataForChannel(RouterService.java:436)
at org.jumpmind.symmetric.service.impl.RouterService.routeDataForEachChannel(RouterService.java:328)
at org.jumpmind.symmetric.service.impl.RouterService.routeData(RouterService.java:175)
at org.jumpmind.symmetric.job.RouterJob.doJob(RouterJob.java:40)
at org.jumpmind.symmetric.job.AbstractJob.invoke(AbstractJob.java:180)
at org.jumpmind.symmetric.job.AbstractJob.run(AbstractJob.java:224)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Related

Unable to use extended_choice_parameter.ExtendedChoiceParameterValue for security reasons in jenkins

please find the error while trying to pass the parameters during build -
java.lang.UnsupportedOperationException: Refusing to marshal com.cwctravel.hudson.plugins.extended_choice_parameter.ExtendedChoiceParameterValue for security reasons; see https://jenkins.io/redirect/class-filter/
at hudson.util.XStream2$BlacklistedTypesConverter.marshal(XStream2.java:541)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:88)
at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.writeItem(AbstractCollectionConverter.java:64)
at com.thoughtworks.xstream.converters.collections.CollectionConverter.marshal(CollectionConverter.java:74)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:84)
at hudson.util.RobustReflectionConverter.marshallField(RobustReflectionConverter.java:264)
at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:251)
Caused: java.lang.RuntimeException: Failed to serialize hudson.model.ParametersAction#parameters for class hudson.model.ParametersAction
at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:255)
at hudson.util.RobustReflectionConverter$2.visit(RobustReflectionConverter.java:223)
at com.thoughtworks.xstream.converters.reflection.PureJavaReflectionProvider.visitSerializableFields(PureJavaReflectionProvider.java:138)
at hudson.util.RobustReflectionConverter.doMarshal(RobustReflectionConverter.java:209)
at hudson.util.RobustReflectionConverter.marshal(RobustReflectionConverter.java:150)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:88)
at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.writeItem(AbstractCollectionConverter.java:64)
at com.thoughtworks.xstream.converters.collections.CollectionConverter.marshal(CollectionConverter.java:74)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:84)
at hudson.util.RobustReflectionConverter.marshallField(RobustReflectionConverter.java:264)
at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:251)
Caused: java.lang.RuntimeException: Failed to serialize hudson.model.Actionable#actions for class org.jenkinsci.plugins.workflow.job.WorkflowRun
at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:255)
at hudson.util.RobustReflectionConverter$2.visit(RobustReflectionConverter.java:223)
at com.thoughtworks.xstream.converters.reflection.PureJavaReflectionProvider.visitSerializableFields(PureJavaReflectionProvider.java:138)
at hudson.util.RobustReflectionConverter.doMarshal(RobustReflectionConverter.java:209)
at hudson.util.RobustReflectionConverter.marshal(RobustReflectionConverter.java:150)
at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)
at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43)
at com.thoughtworks.xstream.core.TreeMarshaller.start(TreeMarshaller.java:82)
at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.marshal(AbstractTreeMarshallingStrategy.java:37)
at com.thoughtworks.xstream.XStream.marshal(XStream.java:1026)
at com.thoughtworks.xstream.XStream.marshal(XStream.java:1015)
at com.thoughtworks.xstream.XStream.toXML(XStream.java:988)
at hudson.XmlFile.write(XmlFile.java:195)
at hudson.model.Run.save(Run.java:2077)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.forRun(EnvActionImpl.java:136)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl$Binder.getValue(EnvActionImpl.java:149)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl$Binder.getValue(EnvActionImpl.java:142)
at org.jenkinsci.plugins.workflow.cps.CpsScript.getProperty(CpsScript.java:121)
at org.codehaus.groovy.runtime.InvokerHelper.getProperty(InvokerHelper.java:174)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.getProperty(ScriptBytecodeAdapter.java:456)
at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:355)
at org.kohsuke.groovy.sandbox.GroovyInterceptor.onGetProperty(GroovyInterceptor.java:68)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onGetProperty(SandboxInterceptor.java:354)
at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:353)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:357)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:333)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:333)
at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.getProperty(SandboxInvoker.java:29)
at com.cloudbees.groovy.cps.impl.PropertyAccessBlock.rawGet(PropertyAccessBlock.java:20)
Caused: java.io.IOException
at hudson.XmlFile.write(XmlFile.java:202)
at hudson.model.Run.save(Run.java:2077)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.forRun(EnvActionImpl.java:136)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl$Binder.getValue(EnvActionImpl.java:149)
at org.jenkinsci.plugins.workflow.cps.EnvActionImpl$Binder.getValue(EnvActionImpl.java:142)
at org.jenkinsci.plugins.workflow.cps.CpsScript.getProperty(CpsScript.java:121)
at org.codehaus.groovy.runtime.InvokerHelper.getProperty(InvokerHelper.java:174)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.getProperty(ScriptBytecodeAdapter.java:456)
at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:355)
at org.kohsuke.groovy.sandbox.GroovyInterceptor.onGetProperty(GroovyInterceptor.java:68)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onGetProperty(SandboxInterceptor.java:354)
at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:353)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:357)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:333)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:333)
at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.getProperty(SandboxInvoker.java:29)
at com.cloudbees.groovy.cps.impl.PropertyAccessBlock.rawGet(PropertyAccessBlock.java:20)
at WorkflowScript.run(WorkflowScript:8)
at ___cps.transform___(Native Method)
at com.cloudbees.groovy.cps.impl.PropertyishBlock$ContinuationImpl.get(PropertyishBlock.java:74)
at com.cloudbees.groovy.cps.LValueBlock$GetAdapter.receive(LValueBlock.java:30)
at com.cloudbees.groovy.cps.impl.PropertyishBlock$ContinuationImpl.fixName(PropertyishBlock.java:66)
at jdk.internal.reflect.GeneratedMethodAccessor629.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
at com.cloudbees.groovy.cps.Next.step(Next.java:83)
at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:237)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:331)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:243)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:231)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:136)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Finished: FAILURE
We are stated to see security issue related to extended choice parameters in jenkins ,which gets json array as input. Please find the error . Can you please help on this. FYI There was no jenkins or plugins update done recently. This stared happening all of the sudden (2 weeks back) and all the "extended choice parameters" got deleted without any clue in jenkins build.
extended choice parameters VERSION - 0.78
jenkins VERSION - 2.263.1
Issue has been fixed by replacing the extended choice parameter jar file with the latest version which had fix to extended_choice_parameter.ExtendedChoiceParameterValue parameters. Also, make sure to White list "export JDK_JAVA_OPTIONS="Dhudson.remoting.ClassFilter=com.cwctravel.hudson.plugins.extended_choice_parameter.ExtendedChoiceParameterValue" in .bash_profile of your jenkins machine

Flink - Failed to recover from a checkpoint

I'm running my cluster on kubernetes with a single jobmanager and 2 taskmanagers.
I tested the mechanism of checkpoint by killing one of the taskmanager pods while the job is running.
I got the following exceptions on the jobmanager and the restarted taskmanager:
Jobmanager exception:
java.lang.Exception: Exception while creating StreamOperatorStateContext.
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for WindowOperator_54288f79b169ee3e8cb1feb33bbad4c3_(1/8) from any of the 1 provided restore options.
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
... 6 more
Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected exception.
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:326)
at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
... 8 more
Caused by: java.nio.file.NoSuchFileException: /rocksdb/job_0a1a61f5cbecc09fbaef1257b3392b3a_op_WindowOperator_54288f79b169ee3e8cb1feb33bbad4c3__1_8__uuid_8b95eb2f-f6cf-4c35-8274-a9055376163d/db/000021.sst -> /rocksdb/job_0a1a61f5cbecc09fbaef1257b3392b3a_op_WindowOperator_54288f79b169ee3e8cb1feb33bbad4c3__1_8__uuid_8b95eb2f-f6cf-4c35-8274-a9055376163d/f1a97117-3810-400e-85ca-6e8c998a5ed4/000021.sst
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476)
at java.nio.file.Files.createLink(Files.java:1086)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreInstanceDirectoryFromPath(RocksDBIncrementalRestoreOperation.java:473)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:212)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)
... 12 more
Taskmanager exception:
2020-01-13 09:26:01,943 ERROR org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder - Caught unexpected exception.
org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Failed to sanitize XML document destined for handler class org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:219)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:317)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:834)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listPrefix(PrestoS3FileSystem.java:484)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.access$000(PrestoS3FileSystem.java:112)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem$1.<init>(PrestoS3FileSystem.java:271)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listLocatedStatus(PrestoS3FileSystem.java:269)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listStatus(PrestoS3FileSystem.java:258)
at org.apache.flink.fs.s3.common.hadoop.HadoopFileSystem.listStatus(HadoopFileSystem.java:157)
at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.listStatus(SafetyNetWrapperFileSystem.java:97)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreInstanceDirectoryFromPath(RocksDBIncrementalRestoreOperation.java:460)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:212)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)
at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.AbortedException:
at org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:53)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:81)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.read1(BufferedReader.java:210)
at java.io.BufferedReader.read(BufferedReader.java:286)
at java.io.Reader.read(Reader.java:140)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:191)
... 44 more
2020-01-13 09:26:01,944 WARN org.apache.flink.streaming.api.operators.BackendRestorerProcedure - Exception while restoring keyed state backend for WindowOperator_54288f79b169ee3e8cb1feb33bbad4c3_(7/8) from alternative (1/1), will retry while more alternatives are available.
org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected exception.
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:326)
at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Failed to sanitize XML document destined for handler class org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:219)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:317)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:834)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listPrefix(PrestoS3FileSystem.java:484)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.access$000(PrestoS3FileSystem.java:112)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem$1.<init>(PrestoS3FileSystem.java:271)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listLocatedStatus(PrestoS3FileSystem.java:269)
at org.apache.flink.fs.s3presto.shaded.com.facebook.presto.hive.s3.PrestoS3FileSystem.listStatus(PrestoS3FileSystem.java:258)
at org.apache.flink.fs.s3.common.hadoop.HadoopFileSystem.listStatus(HadoopFileSystem.java:157)
at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.listStatus(SafetyNetWrapperFileSystem.java:97)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreInstanceDirectoryFromPath(RocksDBIncrementalRestoreOperation.java:460)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:212)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)
... 12 more
Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.AbortedException:
at org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:53)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:81)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.read1(BufferedReader.java:210)
at java.io.BufferedReader.read(BufferedReader.java:286)
at java.io.Reader.read(Reader.java:140)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:191)
... 44 more
When I tried to restore from a savepoint, everything works as expected.
Any idea?
From our experience, one possible cause could be you come across below exception first:
Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Failed to sanitize XML document destined for handler class org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
Then the procedure to restore rocksdb state-backend would be interrupted leading to file /rocksdb/job_0a1a61f5cbecc09fbaef1257b3392b3a_op_WindowOperator_54288f79b169ee3e8cb1feb33bbad4c3__1_8__uuid_8b95eb2f-f6cf-4c35-8274-a9055376163d/f1a97117-3810-400e-85ca-6e8c998a5ed4/000021.sst deleted in https://github.com/apache/flink/blob/390926e61aeb69837c70a024ad6e7ff02eccdf2d/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/restore/RocksDBIncrementalRestoreOperation.java#L197
That's why you found the NoSuchFileException.

Why does Dataflow erratically fail in Datastore access?

My simple Dataflow pipeline successfully copies multiple Kinds from one project's Datastore to another in most cases. But in certain Kinds (about 5% of them), we always get these errors.
Dataflow retries 4-8 times with a delay of about 75 seconds, and then the pipeline fails.
How can I diagnose and resolve this?
EDIT: The answer includes: (1) there was a bug in the Datastore library used by Dataflow; after they fixed this bug, you can see the underlying cause and (2) the default batch size for putting entities in this library is 500, which is also the max, and that goes over the 10 Mb limit of the Datastore API.
The (very simple) Pipeline looks like this:
Query.Builder qb = Query.newBuilder();
qb.addKindBuilder().setName(kindName);
Query query = qb.build();
Read dsRead = DatastoreIO.v1().read().withProjectId(inputProject).withQuery(query);
Write dsWrite = DatastoreIO.v1().write().withProjectId(outputProject);
PCollection<Entity> sourceEntities = pipeline.apply("read", dsRead);
Bound<Entity, Entity> entityFromSrcToTarget = ParDo.of(new EntityDoFn());/*Simple DoFn that copies Entities for insertion to target*/
PCollection<Entity> clonedEntities = sourceEntities.apply("clone-entity", entityFromSrcToTarget);
clonedEntities.apply("write-to-ds", dsWrite);
First stacktrace
com.google.datastore.v1.client.DatastoreException: I/O error, code=UNAVAILABLE at
com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:126) at
com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:95) at
com.google.datastore.v1.client.Datastore.commit(Datastore.java:84) at
com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.flushBatch(DatastoreV1.java:925) at
com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.finishBundle(DatastoreV1.java:899) Caused by: java.io.IOException: insufficient data written at
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3500) at
com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:81) at
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981) at
com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:87) at
com.google.datastore.v1.client.Datastore.commit(Datastore.java:84) at
com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.flushBatch(DatastoreV1.java:925) at
com.google.cloud.dataflow.sdk.io.datastore.DatastoreV1$DatastoreWriterFn.finishBundle(DatastoreV1.java:899) at
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.finishBundle(DoFnRunnerBase.java:158) at
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:196) at
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.finishBundle(ForwardingParDoFn.java:47) at
com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.finish(ParDoOperation.java:65) at
com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:80) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:287) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:223) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at
java.lang.Thread.run(Thread.java:745)
Also
(9908b474b1492772): java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException:
com.google.cloud.dataflow.sdk.util.UserCodeException:
com.google.datastore.v1.client.DatastoreException: I/O error, code=UNAVAILABLE at
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:162) at
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:287) at
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:283) at
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext$1.outputWindowedValue(DoFnRunnerBase.java:507) at
com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowsViaIteratorsDoFn.processElement(GroupAlsoByWindowsViaIteratorsDoFn.java:125) at
com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49) at
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138) at
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:190) at
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47) at
com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55) at
com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) at
com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:202) at
com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:143) at
com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:72) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:287) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:223) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173) at
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at
java.lang.Thread.run(Thread.java:745)

deserializeObject exception on WindowOperator

It's seem that the data in the WindowOperator can serialize successfully. however deserialize failed when the taskmanager restarted by jobmanager.
env: state
backend: hdfs;
jobmanager: high-availability
Root exception:
java.lang.Exception: Could not restore checkpointed state to operators and functions
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:414)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.StreamCorruptedException: invalid type code: 00
at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2508)
at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2543)
at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2615)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2820)
at java.io.ObjectInputStream.readInt(ObjectInputStream.java:971)
at java.util.HashMap.readObject(HashMap.java:1158)
at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:294)
at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator$Context.<init>(WindowOperator.java:446)
at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.restoreState(WindowOperator.java:621)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateLazy(StreamTask.java:406)
I have experienced the same issue, when I run multiple flink application with the same custom window/trigger class reused.
(I don't know if it's your case)
But I add generated serialVersionUID in my different reused class and it's work fine now

DSE 3.2 SOLR FileNotFoundException

Just updated to DSE 3.2 from 3.1 using the guide to run the update, now the logs littered with this exception. When querying via SOLR we are getting missing data, however it seems that when querying using cqlsh or the cli, the data is there.
ERROR [IndexPool work thread-6] 2013-11-18 22:32:18,748 AbstractSolrSecondaryIndex .java (line 912) _yaqn8_Lucene41_0.tip
java.io.FileNotFoundException: _yaqn8_Lucene41_0.tip
at org.apache.lucene.store.bytebuffer.ByteBufferDirectory.fileLength( ByteBufferDirectory.java:129)
at org.apache.lucene.store.NRTCachingDirectory.sizeInBytes(NRTCachingDirectory .java:158)
at org.apache.lucene.store.NRTCachingDirectory.doCacheWrite( NRTCachingDirectory.java:289)
at org.apache.lucene.store.NRTCachingDirectory.createOutput( NRTCachingDirectory.java:199)
at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput( TrackingDirectoryWrapper.java:62)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.<init>( CompressingStoredFieldsWriter.java:107)
at com.datastax.bdp.cassandra.index.solr.CassandraStoredFieldsWriter.<init>( CassandraStoredFieldsWriter.java:25)
at com.datastax.bdp.cassandra.index.solr.CassandraStoredFieldsFormat. fieldsWriter(CassandraStoredFieldsFormat.java:39)
at org.apache.lucene.index.StoredFieldsProcessor.initFieldsWriter( StoredFieldsProcessor.java:86)
at org.apache.lucene.index.StoredFieldsProcessor.finishDocument( StoredFieldsProcessor.java:119)
at org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument( TwoStoredFieldsConsumers.java:65)
at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor. java:274)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument( DocumentsWriterPerThread.java:274)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter. java:376)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1485)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2. java:201)
at com.datastax.bdp.cassandra.index.solr.CassandraDirectUpdateHandler2.addDoc( CassandraDirectUpdateHandler2.java:103)
at com.datastax.bdp.cassandra.index.solr.AbstractSolrSecondaryIndex.doIndex( AbstractSolrSecondaryIndex.java:929)
at com.datastax.bdp.cassandra.index.solr.AbstractSolrSecondaryIndex. doUpdateOrDelete(AbstractSolrSecondaryIndex.java:586)
at com.datastax.bdp.cassandra.index.solr.ThriftSolrSecondaryIndex. updateColumnFamilyIndex(ThriftSolrSecondaryIndex.java:114)
at com.datastax.bdp.cassandra.index.solr.AbstractSolrSecondaryIndex$3.run( AbstractSolrSecondaryIndex.java:896)
at com.datastax.bdp.cassandra.index.solr.concurrent.IndexWorker.run( IndexWorker.java:38)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. java:615)
at java.lang.Thread.run(Thread.java:724)
alo this:
ERROR 22:53:01,426 auto commit error...:org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: Error opening new searcher
at com.datastax.bdp.cassandra.index.solr.CassandraDirectUpdateHandler2.commit(CassandraDirectUpdateHandler2.java:318)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1457)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1569)
at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:557)
at com.datastax.bdp.cassandra.index.solr.CassandraDirectUpdateHandler2.commit(CassandraDirectUpdateHandler2.java:276)
... 9 more
Caused by: java.io.FileNotFoundException: _xfgfw_Lucene41_0.tim
at org.apache.lucene.store.bytebuffer.ByteBufferDirectory.fileLength(ByteBufferDirectory.java:129)
at org.apache.lucene.store.NRTCachingDirectory.sizeInBytes(NRTCachingDirectory.java:158)
at org.apache.lucene.store.NRTCachingDirectory.doCacheWrite(NRTCachingDirectory.java:289)
at org.apache.lucene.store.NRTCachingDirectory.createOutput(NRTCachingDirectory.java:199)
at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:62)
at org.apache.lucene.codecs.lucene42.Lucene42FieldInfosWriter.write(Lucene42FieldInfosWriter.java:49)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:88)
at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:493)
at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:365)
at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255)
at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:250)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1393)
... 12 more
This is known issue that is fixed in DSE 3.2.1.
We just released 3.2.1, which should address your issues. Our developers where able to replicate the stack trace, and resolved that. We also addressed the issue with the indexes not properly handled after a restart.
That looks like some files did not flush correctly on shutdown. You will have to do a full re-index (with deleting) on nodes showing those errors to get the lucene indexes to rebuild.
This page shows how to initiate a re-index. http://www.datastax.com/docs/datastax_enterprise3.2/solutions/dse_search_upload#reloading-a-solr-core
A workaround for this is to change your solr config to use (we are working on a proper fix):
<directoryFactory name="DirectoryFactory" class="solr.StandardDirectoryFactory"/>
If the problem continues, then the CF needs to be re-indexed.

Resources