flink Connection reset by peer - apache-flink

I have a Flink Streaming job, it failed and I got the log as below.Can anyone tell me how to solve the problem?
It sometimes failed after one day running, and sometimes failed after a few hours.
09:30:25 948 INFO (org.apache.flink.runtime.executiongraph.ExecutionGraph:1240) - TriggerWindow(TumblingProcessingTimeWindows(600000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer#ece0f926}, ProcessingTimeTrigger(), WindowedStream.process(WindowedStream.scala:563)) -> Filter -> Filter -> Map (40/48) (19ea993ced2b161422c345c9b633853a) switched from RUNNING to FAILED.
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Lost connection to task manager . This indicates that the remote task manager was lost.
at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:146)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
at org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
at org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
at org.apache.flink.shaded.netty4.io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253)
at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.flink.shaded.netty4.io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
at org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
... 6 more

I ended up finding the root cause in job manager log:
- Closing TaskExecutor connection container_e06_1554425226316_0158_01_000024 because: Container [pid=14446,containerID=container_e06_1554425226316_0158_01_000024] is running beyond physical memory limits. Current usage: 12.5 GB of 12.5 GB physical memory used; 14.7 GB of 26.2 GB virtual memory used. Killing container.
so I increased TM memory

Related

Error occurred while executing a write operation to database 'component' due to limited free space on the disk (1759 MB)

I am getting database error while creating tasks in nexus repository management, in the logs it showing as follows.
Error Log
2022-09-23 09:57:34,637+0000 ERROR [status-delayed-tasks-2-thread-1] *SYSTEM com.orientechnologies.orient.core.db.OPartitionedDatabasePool$DatabaseDocumentTxPooled - $ANSI{green {db=component}} Error on transaction commit `52E3D568`
com.orientechnologies.orient.core.exception.OLowDiskSpaceException: Error occurred while executing a write operation to database 'component' due to limited free space on the disk (1751 MB). The database is now working in read-only mode. Please close the database (or stop OrientDB), make room on your hard drive and then reopen the database. The minimal required space is 4096 MB. Required space is now set to 4096MB (you can change it by setting parameter storage.diskCache.diskFreeSpaceLimit) .
DB name="component"
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkLowDiskSpaceRequestsAndReadOnlyConditions(OAbstractPaginatedStorage.java:5073)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:1729)
at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:541)
at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:99)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2908)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2870)
at org.sonatype.nexus.orient.transaction.OrientTransaction.commit(OrientTransaction.java:74)
at org.sonatype.nexus.transaction.TransactionalWrapper.proceedWithTransaction(TransactionalWrapper.java:69)
at org.sonatype.nexus.transaction.Operations.proceedWithTransaction(Operations.java:232)
at org.sonatype.nexus.transaction.Operations.transactional(Operations.java:223)
at org.sonatype.nexus.transaction.Operations.run(Operations.java:175)
at org.sonatype.nexus.orient.transaction.OrientOperations.run(OrientOperations.java:62)
at org.sonatype.nexus.orient.internal.status.OrientStatusHealthCheckStore.checkWritable(OrientStatusHealthCheckStore.java:82)
at org.sonatype.nexus.orient.internal.status.OrientStatusHealthCheckStore$$EnhancerByGuice$$180293120.GUICE$TRAMPOLINE(<generated>)
at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:74)
at org.sonatype.nexus.common.stateguard.MethodInvocationAction.run(MethodInvocationAction.java:39)
at org.sonatype.nexus.common.stateguard.StateGuard$GuardImpl.run(StateGuard.java:272)
at org.sonatype.nexus.common.stateguard.GuardedInterceptor.invoke(GuardedInterceptor.java:54)
at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:75)
at com.google.inject.internal.InterceptorStackCallback.invoke(InterceptorStackCallback.java:55)
at org.sonatype.nexus.orient.internal.status.OrientStatusHealthCheckStore$$EnhancerByGuice$$180293120.checkWritable(<generated>)
at org.sonatype.nexus.orient.internal.freeze.OrientFreezeService.checkWritable(OrientFreezeService.java:119)
at org.sonatype.nexus.thread.DatabaseStatusDelayedExecutor.lambda$1(DatabaseStatusDelayedExecutor.java:103)
at org.sonatype.nexus.thread.DatabaseStatusDelayedExecutor.lambda$0(DatabaseStatusDelayedExecutor.java:90)
at org.sonatype.nexus.thread.internal.MDCAwareRunnable.run(MDCAwareRunnable.java:40)
at org.apache.shiro.subject.support.SubjectRunnable.doRun(SubjectRunnable.java:120)
at org.apache.shiro.subject.support.SubjectRunnable.run(SubjectRunnable.java:108)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
so anyone knows what's the wrong with it or what may cause this problem?

Cassandra failed to connect

I'm newbie in cassandra apache. In the tutorial video, it says type bin/nodetools status to check the status of node but when I tried to input it. Terminal returns
Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection
refused (Connection refused)'.
Check this image
I tried to change JVM_OPTS to "$JVM_OPTS -Djava.rmi.server.hostname=localhost" in cassandra-env.sh
but still can't connect.
What I gonna do to fix this error?
Debug.logs
DEBUG [main] 2017-01-21 13:57:48,095 ColumnFamilyStore.java:881 - Enqueuing flush of local: 38.338KiB (0%) on-heap, 0.000KiB (0%) off-heap
DEBUG [PerDiskMemtableFlushWriter_0:1] 2017-01-21 13:57:48,167 Memtable.java:435 - Writing Memtable-local#858986260(8.879KiB serialized bytes, 1 ops, 0%/0% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(9223372036854775807)]
DEBUG [PerDiskMemtableFlushWriter_0:1] 2017-01-21 13:57:48,168 Memtable.java:464 - Completed flushing /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-56-big-Data.db (5.367KiB) for commitlog position CommitLogPosition(segmentId=1484978256521, position=32861)
DEBUG [MemtableFlushWriter:1] 2017-01-21 13:57:48,471 ColumnFamilyStore.java:1184 - Flushed to [BigTableReader(path='/usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-56-big-Data.db')] (1 sstables, 9.527KiB), biggest 9.527KiB, smallest 9.527KiB
DEBUG [CompactionExecutor:1] 2017-01-21 13:57:48,472 CompactionTask.java:150 - Compacting (896b3470-df9e-11e6-9508-7dc463a45cc9) [/usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-53-big-Data.db:level=0, /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-54-big-Data.db:level=0, /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-55-big-Data.db:level=0, /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-56-big-Data.db:level=0, ]
DEBUG [main] 2017-01-21 13:57:48,539 StorageService.java:2084 - Node localhost/127.0.0.1 state NORMAL, token [-1035692197905104867, -1103547951527719073, -1136980347732340590, -1150272208899529050, -1184340318934652250, -1251847845785777189, -1355083122390358187,
INFO [main] 2017-01-21 13:57:48,539 StorageService.java:2087 - Node localhost/127.0.0.1 state jump to NORMAL
DEBUG [main] 2017-01-21 13:57:48,545 StorageService.java:1336 - NORMAL
DEBUG [PendingRangeCalculator:1] 2017-01-21 13:57:48,575 PendingRangeCalculatorService.java:66 - finished calculation for 3 keyspaces in 19ms
INFO [main] 2017-01-21 13:57:49,125 NativeTransportService.java:70 - Netty using native Epoll event loop
DEBUG [CompactionExecutor:1] 2017-01-21 13:57:49,286 CompactionTask.java:230 - Compacted (896b3470-df9e-11e6-9508-7dc463a45cc9) 4 sstables to [/usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-57-big,] to level=0. 9.869KiB to 4.938KiB (~50% of original) in 812ms. Read Throughput = 12.145KiB/s, Write Throughput = 6.077KiB/s, Row Throughput = ~2/s. 4 total partitions merged to 1. Partition merge counts were {4:1, }
INFO [main] 2017-01-21 13:57:49,368 Server.java:159 - Using Netty Version: [netty-buffer=netty-buffer-4.0.39.Final.38bdf86, netty-codec=netty-codec-4.0.39.Final.38bdf86, netty-codec-haproxy=netty-codec-haproxy-4.0.39.Final.38bdf86, netty-codec-http=netty-codec-http-4.0.39.Final.38bdf86, netty-codec-socks=netty-codec-socks-4.0.39.Final.38bdf86, netty-common=netty-common-4.0.39.Final.38bdf86, netty-handler=netty-handler-4.0.39.Final.38bdf86, netty-tcnative=netty-tcnative-1.1.33.Fork19.fe4816e, netty-transport=netty-transport-4.0.39.Final.38bdf86, netty-transport-native-epoll=netty-transport-native-epoll-4.0.39.Final.38bdf86, netty-transport-rxtx=netty-transport-rxtx-4.0.39.Final.38bdf86, netty-transport-sctp=netty-transport-sctp-4.0.39.Final.38bdf86, netty-transport-udt=netty-transport-udt-4.0.39.Final.38bdf86]
INFO [main] 2017-01-21 13:57:49,369 Server.java:160 - Starting listening for CQL clients on localhost/127.0.0.1:9042 (unencrypted)...
INFO [main] 2017-01-21 13:57:49,429 CassandraDaemon.java:521 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
Get rid of JVM_OPTS to "$JVM_OPTS -Djava.rmi.server.hostname=localhost.
Set listen_address and broadcast_rpc_address to local ip (ifconfig > ip-address-of-system).
Restart Cassandra.
Find and uncomment the following line from
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname="
to
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=127.0.0.1"
If you face any difficulty on start cassandra, then delete the commit logs of datastax,
C:\Program Files\DataStax-DDC\data\commitlog
Check your system Memory i had a same issue but after increasing RAM 4GB its working properly.
Connection refused can have multiple causes, the most common one being that the application to connect to is not there. Check this using
sudo service cassandra status # exit by pressing 'q'
If it says active (exited) in bold then Cassandra is not even running!
Check Cassandra's log for error messages:
grep -A2 ERROR /var/log/cassandra/system.log
Watch htop after you sudo service cassandra restart -- if it fills up all of your available memory, Cassandra will die without an error message. On my EC2 instance an empty Cassandra takes up about 1.3 GB of RAM, which would be too little for a t2.nano or t2.micro instance.

cassandra connection error: Unable to connect to any servers

cassandra doesn't work for my VM.
cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
when I use the command:
cassandra
......
INFO 07:55:31 Enqueuing flush of local: 578 (0%) on-heap, 0 (0%) off-heap
INFO 07:55:31 Writing Memtable-local#2014850649(0.081KiB serialized bytes, 4 ops, 0%/0% of on/off-heap limit)
INFO 07:55:31 Completed flushing /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/tmp-la-305-big-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1448697324414, position=105487)
INFO 07:55:31 Enqueuing flush of local: 51468 (0%) on-heap, 0 (0%) off-heap
INFO 07:55:31 Writing Memtable-local#280469114(8.354KiB serialized bytes, 259 ops, 0%/0% of on/off-heap limit)
INFO 07:55:31 Completed flushing /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/tmp-la-306-big-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1448697324414, position=117466)
INFO 07:55:32 Node localhost/127.0.0.1 state jump to normal
INFO 07:55:32 Compacted (64dd8610-95a5-11e5-af1d-a752adc4283f) 4 sstables to [/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/la-91-big,] to level=0. 20,658 bytes to 20,029 (~96% of original) in 2,376ms = 0.008039MB/s. 0 total partitions merged to 225. Partition merge counts were {1:225, }
then cqlsh can work:
cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.1 | CQL spec 3.3.0 | Native protocol v4]
Use HELP for help.
cqlsh>
but a few minutes later, the cqlsh is down:
cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
Anyone can help me! Thanks in advance!
Sound like the server is going down after a few minutes. You should check the logs for the reason
I found the root cause: the memory is not enough. I create the linux swap, then everything is ok.
how to add swap on ubuntu

Error during node startup: Unable to start DSE server / Plugin activation failed / Cannot find core

I've been having these issues for quite a while already but I ignored them initially because I can still start my nodes. However, one of these issues became more serious recently that it now takes me a lot of tries in order to successfully start a node.
Issue #1: Unable to start DSE server / Plugin activation failed / Cannot find core
ERROR [main] 2015-01-28 03:30:40,058 DseDaemon.java (line 492) Unable to start DSE server.
java.lang.RuntimeException: com.datastax.bdp.plugin.PluginManager$PluginActivationException: Plugin activation failed
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:135)
at com.datastax.bdp.server.DseDaemon.start(DseDaemon.java:480)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:509)
at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:659)
Caused by: com.datastax.bdp.plugin.PluginManager$PluginActivationException: Plugin activation failed
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:284)
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:128)
... 3 more
Caused by: java.lang.IllegalStateException: Cannot find core: myks.mycf
at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.doWaitForCore(SolrCoreResourceManager.java:742)
at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.waitForCore(SolrCoreResourceManager.java:478)
at com.datastax.bdp.plugin.SolrContainerPlugin.waitForSecondaryIndexesLoading(SolrContainerPlugin.java:237)
at com.datastax.bdp.plugin.SolrContainerPlugin.onActivate(SolrContainerPlugin.java:98)
at com.datastax.bdp.plugin.PluginManager.initialize(PluginManager.java:334)
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:263)
... 4 more
INFO [Thread-3] 2015-01-28 03:30:40,059 DseDaemon.java (line 505) DSE shutting down...
INFO [StorageServiceShutdownHook] 2015-01-28 03:30:40,164 Gossiper.java (line 1307) Announcing shutdown
INFO [Thread-3] 2015-01-28 03:30:40,620 PluginManager.java (line 356) All plugins are stopped.
INFO [Thread-3] 2015-01-28 03:30:40,620 CassandraDaemon.java (line 463) Cassandra shutting down...
INFO [StorageServiceShutdownHook] 2015-01-28 03:30:42,165 MessagingService.java (line 701) Waiting for messaging service to quiesce
INFO [ACCEPT-/144.76.201.233] 2015-01-28 03:30:42,814 MessagingService.java (line 941) MessagingService has terminated the accept() thread
This exception started as a "mild" issue - mild because although it prevents a node from starting up when it happens, it usually takes me 1 more try to successfully start the affected node. However, about two weeks ago, after having not restarted any of my nodes for quite a while, I discovered that I now need a lot more attempts (20+) in order to start a node.
From the stack trace, it looks like a timeout issue (in doWaitForCore()); but I cannot find a setting to increase the amount of time that DSE would wait for a core to load during startup before giving up. The core that is mentioned in the stack trace is always the same, and I assume that this is because it is my biggest core (~1.4 billions records) and it takes the longest time to load. But when I manage to start the node successfully, there are no signs of errors - I can query the core like any other core.
--
There are two other issues that may or may not be related to the one above. Both of them always appear during startup; and unlike the first one, they do not cause a startup failure (i.e. they also appear when a node starts successfully)
Issue #2: Invalid Number: static
ERROR [searcherExecutor-67-thread-1] 2015-01-28 04:26:49,691 SolrException.java (line 124) org.apache.solr.common.SolrException: Invalid Number: static
at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396)
at org.apache.solr.schema.FieldType.getFieldQuery(FieldType.java:697)
at org.apache.solr.schema.TrieField.getFieldQuery(TrieField.java:343)
at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:741)
at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:545)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:153)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:135)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:183)
I looked at the data that I imported and I couldn't find a supposedly-numeric value that was incorrectly supplied as "static". In the java application that I wrote to convert CSVs to SSTables, I cast all numeric values to int/long/double depending on the field type so I honestly don't think that it has something to do with my data.
Issue #3: Could not getStatistics on info bean com.datastax.bdp.search.solr.FilterCacheMBean
WARN [SolrSecondaryIndex myks.mycf2 index initializer.] 2015-01-28 04:26:51,770 JmxMonitoredMap.java (line 256) Could not getStatistics on info bean com.datastax.bdp.search.solr.FilterCacheMBean
java.lang.RuntimeException: java.lang.ClassCastException: org.apache.lucene.search.FieldCache$CreationPlaceholder cannot be cast to org.apache.solr.search.SolrCache
at com.datastax.bdp.search.solr.FilterCacheMBean.getStatistics(FilterCacheMBean.java:185)
at org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:236)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:140)
at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.registerExtraMBeans(CassandraCoreContainer.java:679)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.register(CassandraCoreContainer.java:427)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.doLoad(CassandraCoreContainer.java:757)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.load(CassandraCoreContainer.java:162)
at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex$2.run(AbstractSolrSecondaryIndex.java:882)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: org.apache.lucene.search.FieldCache$CreationPlaceholder cannot be cast to org.apache.solr.search.SolrCache
at com.datastax.bdp.search.solr.FilterCacheMBean.getStatistics(FilterCacheMBean.java:174)
... 16 more
I have absolutely no idea what this is.
--
Has anyone encountered these errors/exceptions/warnings before? What did you do?
Issue #1: The max waiting time to load a core was hard-coded at 1 min. So, your assumption is right: a very large core or hundreds of cores could prevent the node starting due to the excessive time to load this particular core. In the next patch release (4.5.6, 4.6.1) we address this issue by creating a new option load_max_time_per_core in dse.yaml. This option allows you to increase the max waiting time for core loading, starting at 1 min. For 500 cores you would need to increase load_max_time_per_core to about 3 minutes, for example.
Issue #2: Unfortunately, I don't know what could be causing this. We would need further info about this to see why it's happening.
Issue #3: We have currently investigating what this can be.
Regarding issue #2, are you sure you don't have a QuerySenderListener with a wrong warmup query in your solrconfig?

Solr error when doing full-import 250000 rows org.apache.solr.common.SolrException;null:org.eclipse.jetty.io.EofException

I am using solr 4.6.0 with jetty on windows 7 enterrpise with max heap of 2G.I can do a full-import for 200,000 records properly from the Solr Admin UI but as soon as I increase to 250,000 records, it starts giving me this error below:
webapp=/solr path=/dataimport params={optimize=false&clean=false&indent=true&commit=true&verbose=true&entity=files&command=full-import&debug=true&wt=json&rows=250000} {add=[8065121, 8065126, 8065128, 8065146, 8065963, 7838189, 7838186, 8065155, 8065174, 8065179, ... (250001 adds)],commit=} 0 2693420
org.apache.solr.common.SolrException; null:org.eclipse.jetty.io.EofException
at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
at org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:507)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:170)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)
at su
Caused by: java.net.SocketException: Software caused connection abort: socket write error at java.net.SocketOutputStream.socketWrite0(Native Method)
at j......
org.apache.solr.common.SolrException;null:org.eclipse.jetty.io.EofException at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
org.eclipse.jetty.servlet.ServletHandler; /solr/dihdb/dataimport
java.lang.IllegalStateException: Committed
at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1144)
I have changed example/etc/jetty.xml as follows for maxIdleTime=3500000.
I changed example/etc/webdefault.xml for session-timeout=720.
I still keep getting the error above.
TIA,
Vijay
I changed -Xmx5120M and that seems to have fixed the issue with 500K and 1 million records.Lack of memory in essence was the issue for this misleading error showing up.
Also tried 100000 1800000 for DataImportHandler.

Resources