i just configure my flink cluster (1.0.3 version) and i whanted to run tersort with the flink framework, but its me returned the next erreor, i used this comamnde to run my programm:
/bin/flink run -c --class es.udc.gac.flinkbench.ScalaTeraSort flinkbench-assembly-1.0.jar hdfs://192.168.3.89:8020/teragen/10mo hdfs://192.168.3.89:8020/teragen/rstflink
and its me return this :
Cluster configuration: Standalone cluster with JobManager at localhost/127.0.0.1:6123
Using address localhost:6123 to connect to JobManager.
JobManager web interface address http://localhost:8081
Starting execution of program
2017-06-23 10:28:43,692 INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
Spent 2278ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
Computing input splits took 2281ms
Sampling 2 splits of 2
Making -1 from 100000 sampled records
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:339)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
Caused by: java.lang.NegativeArraySizeException
at es.udc.gac.flinkbench.terasort.TeraInputFormat$TextSampler.createPartitions(TeraInputFormat.java:103)
at es.udc.gac.flinkbench.terasort.TeraInputFormat.writePartitionFile(TeraInputFormat.java:183)
at es.udc.gac.flinkbench.ScalaTeraSort$.main(ScalaTeraSort.scala:49)
at es.udc.gac.flinkbench.ScalaTeraSort.main(ScalaTeraSort.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
... 13 more
Related
I'm using flink mysql connector with a single executor of 32Gb RAM, 16vCPU with 32 slots. If I run a job with parallelism 32 (job parallelism 224) that is doing temporal lookup joins with 10 MySQL tables, it starts to fail after 2-3 successful runs with below error.
org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228)
at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218)
at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209)
at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679)
at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79)
at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444)
at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
at akka.actor.ActorCell.invoke(ActorCell.scala:548)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: java.lang.IllegalArgumentException: open() failed.
at org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction.open(JdbcRowDataLookupFunction.java:138)
at LookupFunction$55178.open(Unknown Source)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34)
at org.apache.flink.table.runtime.operators.join.lookup.LookupJoinRunner.open(LookupJoinRunner.java:67)
at org.apache.flink.table.runtime.operators.join.lookup.LookupJoinWithCalcRunner.open(LookupJoinWithCalcRunner.java:51)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:100)
at org.apache.flink.streaming.api.operators.ProcessOperator.open(ProcessOperator.java:56)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:110)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:711)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:687)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:403)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:990)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:335)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2187)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2220)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2015)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:768)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:403)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:385)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:323)
at org.apache.flink.connector.jdbc.internal.connection.SimpleJdbcConnectionProvider.getOrEstablishConnection(SimpleJdbcConnectionProvider.java:121)
at org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction.establishConnectionAndStatement(JdbcRowDataLookupFunction.java:211)
at org.apache.flink.connector.jdbc.table.JdbcRowDataLookupFunction.open(JdbcRowDataLookupFunction.java:129)
... 17 more
Caused by: java.net.SocketException: Too many open files
at java.net.Socket.createImpl(Socket.java:478)
at java.net.Socket.getImpl(Socket.java:538)
at java.net.Socket.setTcpNoDelay(Socket.java:998)
at com.mysql.jdbc.StandardSocketFactory.configureSocket(StandardSocketFactory.java:132)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:203)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:299)
... 32 more
Did Some debugging, the process list on MySQL shows ~ 2* (total job parallelism) connections, i.e. 448 connections from Task Manager IP. The output of lsof | grep mysql-cj- | wc -l on task manager also reached to 12k from 3k. But after cancelling job, sometime this number doesn't go down. Am I missing something ?
The error is mainly because there are too many connections requesting mysql at the same time. Provide several optimization ideas for reference
Consider reducing the total concurrency of tasks
By default, lookup cache is not enabled. You can enable it by setting both lookup.cache.max-rows and lookup.cache.ttl, refer to https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/table/jdbc/
Recently we got "(Too many open files)" error in production and this leads to very high CPU spike(98%) on Linux machine and finally brought machine down. We have to reboot the machine to bring it back.
Flow of code is like this :-
Consume message from one of IBM-MQ queue -> start processing it -> make some update entries into SQL-Server DB and commit the transaction.
We are using Hikari for connection pool management and value is set as 30
hikariConfig.setMaximumPoolSize(30);
While checking the logs, i found following stack trace :-
2021-03-24T20:00:22.404 WARNING org.apache.catalina.loader.WebappClassLoaderBase Failed to open JAR [null]
java.io.FileNotFoundException: /opt/apache-tomcat-7.0.88/webapps/ROOT/WEB-INF/lib/HikariCP-2.5.1.jar (Too many open files)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:219)
at java.util.zip.ZipFile.<init>(ZipFile.java:149)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:130)
at org.apache.tomcat.util.compat.JreCompat.jarFileNewInstance(JreCompat.java:221)
at org.apache.catalina.loader.WebappClassLoaderBase.openJARs(WebappClassLoaderBase.java:3102)
at org.apache.catalina.loader.WebappClassLoaderBase.findResourceInternal(WebappClassLoaderBase.java:3414)
at org.apache.catalina.loader.WebappClassLoaderBase.findResource(WebappClassLoaderBase.java:1494)
at org.apache.catalina.loader.WebappClassLoaderBase.getResourceAsStream(WebappClassLoaderBase.java:1722)
at java.util.ResourceBundle$Control$1.run(ResourceBundle.java:2677)
at java.util.ResourceBundle$Control$1.run(ResourceBundle.java:2662)
at java.security.AccessController.doPrivileged(Native Method)
at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2661)
at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1501)
at java.util.ResourceBundle.findBundle(ResourceBundle.java:1465)
at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1361)
at java.util.ResourceBundle.getBundle(ResourceBundle.java:773)
at com.ibm.mq.jmqi.JmqiException.buildMessage(JmqiException.java:428)
at com.ibm.mq.jmqi.JmqiException.getMessage(JmqiException.java:543)
at com.ibm.mq.jmqi.JmqiException.getMessage(JmqiException.java:515)
at java.lang.Throwable.getLocalizedMessage(Throwable.java:391)
at com.ibm.mq.jmqi.internal.JmqiTools.getExSumm(JmqiTools.java:947)
at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiConnect(RemoteFAP.java:2282)
at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiConnect(RemoteFAP.java:1294)
at com.ibm.msg.client.wmq.internal.WMQSession.connect(WMQSession.java:387)
at com.ibm.msg.client.wmq.internal.WMQSession.<init>(WMQSession.java:331)
at com.ibm.msg.client.wmq.internal.WMQConnection.createSession(WMQConnection.java:909)
at com.ibm.msg.client.jms.internal.JmsConnectionImpl.createSession(JmsConnectionImpl.java:896)
at com.ibm.mq.jms.MQConnection.createSession(MQConnection.java:337)
at org.springframework.jms.connection.SingleConnectionFactory.createSession(SingleConnectionFactory.java:437)
at org.springframework.jms.connection.CachingConnectionFactory.getSession(CachingConnectionFactory.java:236)
at org.springframework.jms.connection.SingleConnectionFactory$SharedConnectionInvocationHandler.invoke(SingleConnectionFactory.java:604)
at com.sun.proxy.$Proxy195.createSession(Unknown Source)
at org.springframework.jms.support.JmsAccessor.createSession(JmsAccessor.java:192)
at org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:475)
at org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:526)
at com.test.config.messaging.MainframeMessageProducer.lambda$sendMessageToQueueWithHeaders$0(MainframeMessageProducer.java:120)
at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:287)
at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:164)
at com.test.config.messaging.MainframeMessageProducer.sendMessageToQueueWithHeaders(MainframeMessageProducer.java:118)
at com.test.config.messaging.MainframeMessageProducer.sendMessageToMainframeQueue(MainframeMessageProducer.java:101)
at com.test.config.messaging.MainframeMessageProducer$$FastClassBySpringCGLIB$$f2d092d2.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:721)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:85)
at com.test.common.aop.RecordTimingAspect.recordTimingAspect(RecordTimingAspect.java:28)
at sun.reflect.GeneratedMethodAccessor505.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:629)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:618)
at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:168)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:656)
at com.test.config.messaging.MainframeMessageProducer$$EnhancerBySpringCGLIB$$dc5ae691.sendMessageToMainframeQueue(<generated>)
at com.test.posting.GDISPurchasePostServiceImpl.postMessage(GDISPurchasePostServiceImpl.java:315)
at com.test.posting.GDISPostingService.processMessage(GDISPostingService.java:56)
at com.test.posting.GDISPostingService$$FastClassBySpringCGLIB$$3c0c40af.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:721)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:85)
at com.test.common.aop.CorrelationIdAspect.correlationIdAdvice(CorrelationIdAspect.java:40)
at sun.reflect.GeneratedMethodAccessor554.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:629)
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:618)
at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:168)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:656)
at com.test.posting.GDISPostingService$$EnhancerBySpringCGLIB$$dd870248.processMessage(<generated>)
at com.test.posting.GDISPostingMessageListener.onMessage(GDISPostingMessageListener.java:167)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:746)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:684)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:651)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:317)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:255)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1166)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1158)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1055)
at java.lang.Thread.run(Thread.java:748)
I am confused here what leads to max open files error here :-
is it due to Hikari CP as this is first line we got as an error (java.io.FileNotFoundException: /opt/apache-tomcat-7.0.88/webapps/ROOT/WEB-INF/lib/HikariCP-2.5.1.jar (Too many open files) ) ?
Is it due to IBM-MQ connection not available ?
You could use the lsof command to LiSt Open Files.
It produces lots of output, so you'll want to capture and post process it ( eg grep or sort)
You can list open files by pid, userid, disk etc.
You might try lsof > before wait... lsof > after then use diff to see the differences
Flink job failed,The error information is as follows
2020-12-02 09:37:27
java.util.concurrent.CompletionException: java.lang.reflect.UndeclaredThrowableException
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy41.submitTask(Unknown Source)
at org.apache.flink.runtime.jobmaster.RpcTaskManagerGateway.submitTask(RpcTaskManagerGateway.java:77)
at org.apache.flink.runtime.executiongraph.Execution.lambda$deploy$9(Execution.java:735)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
... 7 more
Caused by: java.io.IOException: The rpc invocation size exceeds the maximum akka framesize.
at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.createRpcInvocationMessage(AkkaInvocationHandler.java:270)
at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:200)
at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invoke(AkkaInvocationHandler.java:129)
... 11 more
The logic of this job is simple,Consumption data of Kafka is saved to Clickhouse.
Start command
flink run -m yarn-cluster -p 2 -ys 2 -yjm 2048 -ytm 2048 -ynm xx --class xx /data/flink/lib/xx.jar -name --input --groupId xx --bootstrapServers xx:9092 --CheckpointInterval 60000 --CheckpointTimeout 600000 --clientId xx
Why is that? thanks
The exception means the payload of message(JM submits task to TM) exceeds max size. Try to increase the max size by adding akka.framesize to flink-conf.yaml.
The default for this is: 10485760b. Try to set a bigger number for this. Probably needing to restart the JM/TM or Flink cluster.
Doc: https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#akka-framesize
I have 4 SOLR indexes running on a single node Hadoop environment.
The VM went out of space so i freed some space on the OS but after i restarted the VM , the indexes were not starting.
The VM where I am running SOLR has 64 GB of RAM
I am getting solr.DirectUpdateHandler2 failed to instantiate error
I need help on this as my system is not working anymore due to this issue and the backup i have is old.
This is the error:
null:org.apache.solr.common.SolrException: SolrCore 'EMAIL_DOMAINS_shard1_replica1' is not available due to init failure: Error Instantiating Update Handler, solr.DirectUpdateHandler2 failed to instantiate org.apache.solr.update.UpdateHandler
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:752)
at org.apache.solr.servlet.SolrDispatchFilter.checkProps(SolrDispatchFilter.java:768)
at org.apache.solr.servlet.SolrDispatchFilter.getCoreByCollection(SolrDispatchFilter.java:742)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:325)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:211)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.solr.servlet.SolrHadoopAuthenticationFilter$2.doFilter(SolrHadoopAuthenticationFilter.java:394)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:589)
at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:552)
at org.apache.solr.servlet.SolrHadoopAuthenticationFilter.doFilter(SolrHadoopAuthenticationFilter.java:399)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.solr.servlet.HostnameFilter.doFilter(HostnameFilter.java:86)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:620)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Error Instantiating Update Handler, solr.DirectUpdateHandler2 failed to instantiate org.apache.solr.update.UpdateHandler
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:893)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:663)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:498)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:262)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:256)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
Caused by: org.apache.solr.common.SolrException: Error Instantiating Update Handler, solr.DirectUpdateHandler2 failed to instantiate org.apache.solr.update.UpdateHandler
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:581)
at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:637)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:855)
... 8 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:567)
... 10 more
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.hadoop.hdfs.DFSOutputStream.start(DFSOutputStream.java:2359)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1934)
at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1853)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1878)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1871)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:329)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:325)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:325)
at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1172)
at org.apache.solr.update.HdfsTransactionLog.<init>(HdfsTransactionLog.java:93)
at org.apache.solr.update.HdfsUpdateLog.init(HdfsUpdateLog.java:200)
at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:136)
at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:94)
at org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:100)
... 15 more
In the exception is clearly written that you have an Out Of Memory (OOM) problem.
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
Given that you already had an out of space, please double check if you have already freed enough disk space.
After this, even if there are many reasons for an OOM, the common reason is that your JVM hasn't enough memory so double check your heap size configuration (i.e. –Xms and –Xmx).
If freeing more disk space and allocate more memory heap doesn't resolve your problem, I suggest also to:
check and if you need to increase user limits (ulimit –a), this means user max thread limits, max open processes and max open files limits.
check threads system wide settings and, even in this case, raise the limit if you need it.
On the other hand, rarely happens that you have allocated so much heap space or so much resource that the operating system has no enough room to run your application.
In conclusion just try to find the right balance between allocated resources and application needs.
I have a very simple ftp route that should recursively download files from an URL. It would be very important to get it running still today.
There are no proxies, no extra authentication, no firewall. However, it only downloads the first file and then the socket will be closed. I have experimented with the different timeouts but they did not solve the issue. Either error code 226 is retuned if I don't use any timeout or extra config option or if I use the commented out things, error code 221 is returned. The 226 does not seem to be an error as it just counts as an indication that the server completed the transfer. Stack traces I copied below. I would appreciate the responses and thanks also in advance.
The route I am using in pollEnrich as I have to start it depending on a timer.
The code:
[...]
from("direct:ftp").routeId("ftp")
.log("### FTP is in progress ")
.pollEnrich().simple(ConfigData.getConfigData().getFtpUrl()
+ "?binary=true&"
+ "recursive=true"
/* + "soTimeout=300000&"
+ "stepwise=true&"
+ "ignoreFileNotFoundOrPermissionError=true&"
+ "ftpClient.dataTimeout=30000&"
+ "disconnect=true&"
+ "consumer.delay=1000" */
)
.to("file:modelFiles")
.end();
[...]
Update 1
I removed pollEnrich() and without it it works fine. However, I cannot start it from another route e.g. from a timer. So this is a quick hack to get FTP running. I also copy the code here as I did it with an idempotent consumer, so that only those files will be downloaded that are not yet on the disk. It can maybe be useful also to other people. The idempotent consumer example you can find here (Apache Camel ftp consumer loads the same files again and again) for more information.
Any further comments?
from(ConfigData.getConfigData().getFtpUrl()
+ "?binary=true&"
+ "recursive=true&"
+ "passiveMode=true&"
+ "ftpClient.bufferSize=10000000&"
+ "localWorkDirectory=" + ConfigData.getConfigData().getLocalTmpDirectory())
.idempotentConsumer(header("CamelFileName"), FileIdempotentRepository.fileIdempotentRepository(new File("data", "repo.dat")))
.to("file:modelFiles")
.log("Downloaded file ${file:name} complete.")
.end();
Stack Trace
2016-03-19 15:27:53,921 [WARN|org.apache.camel.component.file.remote.FtpConsumer|MarkerIgnoringBase] Error processing file RemoteFile[2009-03-25/BioModels_Database-r13-sbml_files.tar.gz] due to File operation failed: null socket closed. Code: 221. Caused by: [org.apache.camel.component.file.GenericFileOperationFailedException - File operation failed: null socket closed. Code: 221]
org.apache.camel.component.file.GenericFileOperationFailedException: File **operation failed: null socket closed. Code: 221**
[...]
Caused by: java.net.SocketException: socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.read(Unknown Source)
at org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:314)
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:294)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:483)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:608)
at org.apache.commons.net.ftp.FTP.cwd(FTP.java:828)
at org.apache.commons.net.ftp.FTPClient.changeWorkingDirectory(FTPClient.java:1128)
at org.apache.camel.component.file.remote.FtpOperations.doChangeDirectory(FtpOperations.java:769)
2016-03-19 12:42:16,068 [WARN|org.apache.camel.component.file.remote.FtpConsumer|MarkerIgnoringBase] Error processing file RemoteFile[2008-12-03/BioModels_Database-r12-sbml_files.tar.gz] due to File operation failed: null Socket Closed. Code: 226. Caused by: [org.apache.camel.component.file.GenericFileOperationFailedException - **File operation failed: null Socket Closed. Code: 226]**
[...]
Caused by: java.net.SocketException: Socket Closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.read(Unknown Source)
at org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:314)
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:294)
at org.apache.commons.net.ftp.FTP.getReply(FTP.java:692)
at org.apache.commons.net.ftp.FTPClient.completePendingCommand(FTPClient.java:1813)
at org.apache.commons.net.ftp.FTPClient._retrieveFile(FTPClient.java:1885)
at org.apache.commons.net.ftp.FTPClient.retrieveFile(FTPClient.java:1845)
at org.apache.camel.component.file.remote.FtpOperations.retrieveFileToStreamInBody(FtpOperations.java:367)
Solution:
As added in Upadete 1, I do not use pollEnrich(). The route cannot be started from a timer now but it works. So I will close the question. I really like the idea of the idempotent consumer (unrelated to the original question).
Don't forget that a pollEnrich delete the oldExchange and keep the newExchange.
So you don't want to do a from("ftp").pollEnrich("timer") but a from("timer").pollEnrich("ftp").
See the Solution at the end of the question.