Cassandra failed to connect - database

I'm newbie in cassandra apache. In the tutorial video, it says type bin/nodetools status to check the status of node but when I tried to input it. Terminal returns
Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection
refused (Connection refused)'.
Check this image
I tried to change JVM_OPTS to "$JVM_OPTS -Djava.rmi.server.hostname=localhost" in cassandra-env.sh
but still can't connect.
What I gonna do to fix this error?
Debug.logs
DEBUG [main] 2017-01-21 13:57:48,095 ColumnFamilyStore.java:881 - Enqueuing flush of local: 38.338KiB (0%) on-heap, 0.000KiB (0%) off-heap
DEBUG [PerDiskMemtableFlushWriter_0:1] 2017-01-21 13:57:48,167 Memtable.java:435 - Writing Memtable-local#858986260(8.879KiB serialized bytes, 1 ops, 0%/0% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(9223372036854775807)]
DEBUG [PerDiskMemtableFlushWriter_0:1] 2017-01-21 13:57:48,168 Memtable.java:464 - Completed flushing /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-56-big-Data.db (5.367KiB) for commitlog position CommitLogPosition(segmentId=1484978256521, position=32861)
DEBUG [MemtableFlushWriter:1] 2017-01-21 13:57:48,471 ColumnFamilyStore.java:1184 - Flushed to [BigTableReader(path='/usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-56-big-Data.db')] (1 sstables, 9.527KiB), biggest 9.527KiB, smallest 9.527KiB
DEBUG [CompactionExecutor:1] 2017-01-21 13:57:48,472 CompactionTask.java:150 - Compacting (896b3470-df9e-11e6-9508-7dc463a45cc9) [/usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-53-big-Data.db:level=0, /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-54-big-Data.db:level=0, /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-55-big-Data.db:level=0, /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-56-big-Data.db:level=0, ]
DEBUG [main] 2017-01-21 13:57:48,539 StorageService.java:2084 - Node localhost/127.0.0.1 state NORMAL, token [-1035692197905104867, -1103547951527719073, -1136980347732340590, -1150272208899529050, -1184340318934652250, -1251847845785777189, -1355083122390358187,
INFO [main] 2017-01-21 13:57:48,539 StorageService.java:2087 - Node localhost/127.0.0.1 state jump to NORMAL
DEBUG [main] 2017-01-21 13:57:48,545 StorageService.java:1336 - NORMAL
DEBUG [PendingRangeCalculator:1] 2017-01-21 13:57:48,575 PendingRangeCalculatorService.java:66 - finished calculation for 3 keyspaces in 19ms
INFO [main] 2017-01-21 13:57:49,125 NativeTransportService.java:70 - Netty using native Epoll event loop
DEBUG [CompactionExecutor:1] 2017-01-21 13:57:49,286 CompactionTask.java:230 - Compacted (896b3470-df9e-11e6-9508-7dc463a45cc9) 4 sstables to [/usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-57-big,] to level=0. 9.869KiB to 4.938KiB (~50% of original) in 812ms. Read Throughput = 12.145KiB/s, Write Throughput = 6.077KiB/s, Row Throughput = ~2/s. 4 total partitions merged to 1. Partition merge counts were {4:1, }
INFO [main] 2017-01-21 13:57:49,368 Server.java:159 - Using Netty Version: [netty-buffer=netty-buffer-4.0.39.Final.38bdf86, netty-codec=netty-codec-4.0.39.Final.38bdf86, netty-codec-haproxy=netty-codec-haproxy-4.0.39.Final.38bdf86, netty-codec-http=netty-codec-http-4.0.39.Final.38bdf86, netty-codec-socks=netty-codec-socks-4.0.39.Final.38bdf86, netty-common=netty-common-4.0.39.Final.38bdf86, netty-handler=netty-handler-4.0.39.Final.38bdf86, netty-tcnative=netty-tcnative-1.1.33.Fork19.fe4816e, netty-transport=netty-transport-4.0.39.Final.38bdf86, netty-transport-native-epoll=netty-transport-native-epoll-4.0.39.Final.38bdf86, netty-transport-rxtx=netty-transport-rxtx-4.0.39.Final.38bdf86, netty-transport-sctp=netty-transport-sctp-4.0.39.Final.38bdf86, netty-transport-udt=netty-transport-udt-4.0.39.Final.38bdf86]
INFO [main] 2017-01-21 13:57:49,369 Server.java:160 - Starting listening for CQL clients on localhost/127.0.0.1:9042 (unencrypted)...
INFO [main] 2017-01-21 13:57:49,429 CassandraDaemon.java:521 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it

Get rid of JVM_OPTS to "$JVM_OPTS -Djava.rmi.server.hostname=localhost.
Set listen_address and broadcast_rpc_address to local ip (ifconfig > ip-address-of-system).
Restart Cassandra.

Find and uncomment the following line from
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname="
to
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=127.0.0.1"
If you face any difficulty on start cassandra, then delete the commit logs of datastax,
C:\Program Files\DataStax-DDC\data\commitlog

Check your system Memory i had a same issue but after increasing RAM 4GB its working properly.

Connection refused can have multiple causes, the most common one being that the application to connect to is not there. Check this using
sudo service cassandra status # exit by pressing 'q'
If it says active (exited) in bold then Cassandra is not even running!
Check Cassandra's log for error messages:
grep -A2 ERROR /var/log/cassandra/system.log
Watch htop after you sudo service cassandra restart -- if it fills up all of your available memory, Cassandra will die without an error message. On my EC2 instance an empty Cassandra takes up about 1.3 GB of RAM, which would be too little for a t2.nano or t2.micro instance.

Related

VOLTTRON actuator agent RPC revert not working

I have a BACnet system for HVAC controls where I am using the VOLTTRON actuator agent to write # priority 10 in BACnet to a value of 2 which works good.
result = self.vip.rpc.call('platform.actuator', 'set_multiple_points', self.core.identity, set_multi_topic_values_master).get(timeout=20)
_log.debug(f'*** [Setter Agent INFO] *** - set_multiple_points ON ALL VAVs WRITE SUCCESS!')
Then the system sleeps for some time period for testing purposes:
_log.debug(f'*** [Setter Agent INFO] *** - SETTING UP GEVENT SLEEP!')
gevent.sleep(120)
_log.debug(f'*** [Setter Agent INFO] *** - GEVENT SLEEP DONE!')
Where after the gevent sleep I am running into some issues on the revert point not working. The code below executes just fine but using a BACnet scanning tool the priority 10 value of 2 are still present on the HVAC controls, like the revert point isn't doing anything.
for device in revert_topic_devices_jci:
response = self.vip.rpc.call('platform.actuator', 'revert_point', self.core.identity, topic_jci, self.jci_setpoint_topic).get(timeout=20)
_log.debug(f'*** [Setter Agent INFO] *** - REVERT POINTS ON {device} SUCCESS!')
_log.debug(f'*** [Setter Agent INFO] *** - REVERT POINTS JCI DONE DEAL SUCCESS!')
One thing I notice is the building automation writes occupancy/unoccupancy to the HVAC controls # BACnet priority 12. Its either ALWAYS a 1 for occupancy or a 2 for unoccupancy.
What I am trying to do with VOLTTRON is write in BACnet at priority 10 a value of 2, and then release to nothing on the revert. Could this by the revert isnt doing anything because there was nothing to revert too? I was hoping that VOLTTRON could write # BACnet priority 10 and then just release. On BACnet scan tool I can do the same thing write # priority 10 then release priority 10 with a priority 10 write null
Should I just be writing at priority 12 same as the building automation system so VOLTTRON can just revert back too whatever the building automation was doing?
I have a few observations:
In your revert loop, the third code-block above, you're not actually
changing the topic being passed to the RPC call. Each call will use
the device topic which is not in that code-block (but that we can
see is not being changed inside the block) and a device topic, which
similarly is not defined in the block but at least appears not to be
being changed. It is likely worth setting some breakpoints and/or
debug statements here to be sure that you're passing the correct
topics to revert on.
Your use of priority appears to be consistent
with BACnet protocol specification, and with the VOLTTRON BACnet
driver implementation. We would not recommend that you attempt to
write at the same priority as an existing building automation
system.
The BACnet driver code will send a NULL (None) value in a "writeProperty"
service request when the "revert_point" function is called by the
Platform Driver. This functionality I am frankly not terribly
familiar with, but given that your scan tool performs the expected
revert functionality when passed a NULL value, I suspect this is the expected
way of performing a "revert to previous value" type function in BACnet protocol.
I do not have reason to believe that the behavior you're experiencing is the
result of a bug in the driver code base.
Overall, I suggest debugging the topics being passed in the "revert_point" RPC call.
I am having a good luck to revert point using set_multiple_points to None
Something like this:
self.jci_device_map = {
'VMA-2-6': '27',
'VMA-2-4': '29',
'VMA-2-7': '30',
'VMA-1-8': '6',
}
revert_multi_topic_values_master = []
set_multi_topic_values_master = []
for device in self.jci_device_map.values():
topic_jci = '/'.join([self.building_topic, device])
final_topic_jci = '/'.join([topic_jci, self.jci_setpoint_topic])
# BACnet enum point for VAV occ
# 1 == occ, 2 == unnoc
# create a (topic, value) tuple and add it to our topic values
set_multi_topic_values_master.append((final_topic_jci, self.unnoccupied_value)) # TO SET UNNOCUPIED
revert_multi_topic_values_master.append((final_topic_jci, None)) # TO SET FOR REVERT
result = self.vip.rpc.call('platform.actuator', 'set_multiple_points', self.core.identity, revert_multi_topic_values_master).get(timeout=20)

Solr recovery mode

I am running Solr cluster 7.4 with 2 nodes and 9 shards and 2 replicas for each shard.
When one of the servers crashes, I see this message (Skipping download for _3nap.fnm because it already exists) in logs:
2019-04-16 09:20:21.333 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr
x:telegram_channel_post_archive_shard5_replica_n53
c:telegram_channel_post_archive s:shard5 r:core_node54)
[c:telegram_channel_post_archive s:shard5 r:core_node54
x:telegram_channel_post_archive_shard5_replica_n53]
o.a.s.h.IndexFetcher Skipping download for _3nap.fnm because it already exists
2019-04-16 09:20:35.265 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr x:telegram_channel_post_archive_shard5_replica_n53 c:telegram_channel_post_archive s:shard5 r:core_node54) [c:telegram_channel_post_archive s:shard5 r:core_node54 x:telegram_channel_post_archive_shard5_replica_n53] o.a.s.h.IndexFetcher Skipping download for _3nap.dim because it already exists
2019-04-16 09:20:51.437 INFO (recoveryExecutor-4-thread-36-processing-n:192.168.1.2:4239_solr x:telegram_channel_post_archive_shard5_replica_n53 c:telegram_channel_post_archive s:shard5 r:core_node54) [c:telegram_channel_post_archive s:shard5 r:core_node54 x:telegram_channel_post_archive_shard5_replica_n53] o.a.s.h.IndexFetcher Skipping download for _3nap.si because it already exists
2019-04-16 09:21:00.528 INFO (qtp1543148593-32) [c:telegram_channel_post_archive s:shard20 r:core_node41 x:telegram_channel_post_archive_shard20_replica_n38] o.a.s.u.p.LogUpdateProcessorFactory [telegram_channel_post_archive_shard20_replica_n38] webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=dedupe&distrib.from=http://192.168.1.1:4239/solr/telegram_channel_post_archive_shard20_replica_n83/&min_rf=2&wt=javabin&version=2}{add=[9734588300_4723 (1630961769251864576), 9734588300_4693 (1630961769253961728), 9734588300_4670 (1630961769255010304), 9734588300_4656 (1630961769255010305)]} 0 80197
How is the recovery method in Solar?
Will they transfer all the documents from the shard or only the broken parts?
I found this note in the document:
If a leader goes down, it may have sent requests to some replicas and not others. So when a new potential leader is identified, it runs a synch process against the other replicas. If this is successful, everything should be consistent, the leader registers as active, and normal actions proceed. If a replica is too far out of sync, the system asks for a full replication/replay-based recovery.
but I don't understand this part and what does this mean?
If a replica is too far out of sync
The note just says that it'll attempt to sync as little as possible, but if that's not possible - i.e. the sync is so far behind that the transaction log isn't usable any longer, the complete set of files in the index will be replicated to the index. This takes longer than regular replication.
The message you're getting is that the file in question has already been replicated, so it doesn't have to be sent to the replica again.

SolrCloud becoming slow over time

I have a 3 node SolrCloud setup (replication factor 3), running on Ubuntu 14.04 Solr 6.0 on SSDs. Much indexing taking place, only softCommits. After some time, indexing speed becomes really slow, but when i restart the solr service on the node that became slow, everything gets back to normal. Problem is that i need to guess which node becomes slow.
I have 5 collections, but only one collection (mostly used) is getting slow. Total data size is 144G including tlogs.
Said core/collection is 99G including tlogs, tlog is just 313M. Heap size is 16G, Total memory is 32G, data is stored on SSD. Every node is configured the same.
What appears to be strange is that i have literally hundreds or thousands of log lines per second on both slaves when this hits:
2016-09-16 10:00:30.476 INFO (qtp1190524793-46733) [c:mycollection s:shard1 r:core_node2 x:mycollection_shard1_replica1] o.a.s.u.p.LogUpdateProcessorFactory [mycollection_shard1_replica1] webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=add-unknown-fields-to-the-schema&distrib.from=http://192.168.0.3:8983/solr/mycollection_shard1_replica3/&wt=javabin&version=2}{add=[ka2PZAqO_ (1545622027473256450)]} 0 0
2016-09-16 10:00:30.477 INFO (qtp1190524793-46767) [c:mycollection s:shard1 r:core_node2 x:mycollection_shard1_replica1] o.a.s.u.p.LogUpdateProcessorFactory [mycollection_shard1_replica1] webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=add-unknown-fields-to-the-schema&distrib.from=http://192.168.0.3:8983/solr/mycollection_shard1_replica3/&wt=javabin&version=2}{add=[nlFpoYNt_ (1545622027474305024)]} 0 0
2016-09-16 10:00:30.477 INFO (qtp1190524793-46766) [c:mycollection s:shard1 r:core_node2 x:mycollection_shard1_replica1] o.a.s.u.p.LogUpdateProcessorFactory [mycollection_shard1_replica1] webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=add-unknown-fields-to-the-schema&distrib.from=http://192.168.0.3:8983/solr/mycollection_shard1_replica3/&wt=javabin&version=2}{add=[tclMjXH6_ (1545622027474305025), 98OPJ3EJ_ (1545622027476402176)]} 0 0
2016-09-16 10:00:30.478 INFO (qtp1190524793-46668) [c:mycollection s:shard1 r:core_node2 x:mycollection_shard1_replica1] o.a.s.u.p.LogUpdateProcessorFactory [mycollection_shard1_replica1] webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=add-unknown-fields-to-the-schema&distrib.from=http://192.168.0.3:8983/solr/mycollection_shard1_replica3/&wt=javabin&version=2}{add=[btceXK4M_ (1545622027475353600)]} 0 0
2016-09-16 10:00:30.479 INFO (qtp1190524793-46799) [c:mycollection s:shard1 r:core_node2 x:mycollection_shard1_replica1] o.a.s.u.p.LogUpdateProcessorFactory [mycollection_shard1_replica1] webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=add-unknown-fields-to-the-schema&distrib.from=http://192.168.0.3:8983/solr/mycollection_shard1_replica3/&wt=javabin&version=2}{add=[3ndK3HzB_ (1545622027476402177), riCqrwPE_ (1545622027477450753)]} 0 1
2016-09-16 10:00:30.479 INFO (qtp1190524793-46820) [c:mycollection s:shard1 r:core_node2 x:mycollection_shard1_replica1] o.a.s.u.p.LogUpdateProcessorFactory [mycollection_shard1_replica1] webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=add-unknown-fields-to-the-schema&distrib.from=http://192.168.0.3:8983/solr/mycollection_shard1_replica3/&wt=javabin&version=2}{add=[wr5k3mfk_ (1545622027477450752)]} 0 0
In this case 192.168.0.3 is the master.
My workflow is that i insert batches of 2500 docs with ~10 threads at the same time which works perfectly fine for most of the time but sometimes it becomes slow as described. Ocassionally there are updates / indexing calls from other sources, but it's less than a percent.
UPDATE
Complete config (output from Config API) is http://pastebin.com/GtUdGPLG
UPDATE 2
These are the command line args:
-DSTOP.KEY=solrrocks
-DSTOP.PORT=7983
-Dhost=192.168.0.1
-Djetty.home=/opt/solr/server
-Djetty.port=8983
-Dlog4j.configuration=file:/var/solr/log4j.properties
-Dsolr.install.dir=/opt/solr
-Dsolr.solr.home=/var/solr/data
-Duser.timezone=UTC
-DzkClientTimeout=15000
-DzkHost=192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:ConcGCThreads=4
-XX:MaxTenuringThreshold=8
-XX:NewRatio=3
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs
-XX:ParallelGCThreads=4
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90-Xloggc:/var/solr/logs/solr_gc.log
-Xms16G
-Xmx16G
-Xss256k
-verbose:gc
UPDATE 3
Happened again, these are some Sematext Graphs:
Sematext Dashboard for Master:
Sematext Dashboard for Secondary 1:
Sematext Dashboard for Secondary 2:
Sematext GC for Master:
Sematext GC for Secondary 1:
Sematext GC for Secondary 2:
UPDATE 4 (2018-01-10)
This is a quite old question, but i recently discovered that someone installed a cryptocoin miner on all of my solr machines using CVE-2017-12629 which i fixed with an upgrade to 6.6.2.
If you're not sure if your system is infiltrated check the processes for user solr using ps aux | grep solr. If you see two or more processes, especially a non-java process, you might be running a miner.
So you're seeing disk I/O hitting 100% during indexing with a high-write throughput application.
There are two major drivers of disk I/O with Solr indexing:
Flushing in-memory index segments to disk.
Merging disk segments into new larger segments.
If your indexer isn't directly calling commit as a part of the indexing process (and you should make sure it isn't), Solr will flush index segments to disk based on your current settings:
Every time your RAM buffer fills up ("ramBufferSizeMB":100.0)
Based on your 3 min hard commit policy ("maxTime":180000)
If your indexer isn't directly calling optimize as a part of the indexing process (and you should make sure it isn't), Solr will periodically merge index segments on disk based on your current settings (the default merge policy):
mergeFactor: 10, or roughly each time the number of on-disk index segments exceeds 10.
Based on the way you've described your indexing process:
2500 doc batches per thread x 10 parallel threads
... you could probably get away with a larger RAM buffer, to yield larger initial index segments (that are then flushed to disk less frequently).
However the fact that your indexing process
works perfectly fine for most of the time but sometimes it becomes slow
... makes me wonder if you're just seeing the effect of a large merge happening in the background, and cannibalizing system resources needed for fast indexing at that moment.
Ideas
You could experiment with a larger mergeFactor (e.g. 25). This will reduce the frequency of background index segment merges, but not the resource drain when they happen. (Also, be aware that more index segments often translates to worse query performance).
In the indexConfig, you can try overriding the default settings for the ConcurrentMergeScheduler to throttle the number of merges that can be running at one time (maxMergeCount), and/or throttle the number of threads that can be used for merges (maxThreadCount), based on the system resources you're willing to make available.
You could increase your ramBufferSizeMB. This will reduce the frequency of in-memory index segments being flushed to disk, also serving to slow down the merge cadence.
If you are not relying on Solr for durability, you'll want /var/solr/data pointing to a local SSD volume. If you're going over a network mount (this has been documented with Amazon's EBS), there is a significant write throughput penalty, up to 10x less than writing to ephemeral/local storage.
Do you have the CPU load of each core of the master and not only the combined CPU graph ? What I noticed is when I index with Solr when Xmx is too small (could be the case if you have 144GB data and Xmx=16GB), when the indexing progresses, merging will take more and more time.
During merging, typically one core=100% CPU and other cores do nothing.
Your master combined CPU graph looks like that: only 20% combined load during sequences.
So, check that the merge factor is a reasonable value (between 10 and 20 or something) and potentially raise Xmx.
That's the two things I would play with to start with.
Question: you don't have anything special with your analyzers (custom tokenizer, etc) ?

cassandra connection error: Unable to connect to any servers

cassandra doesn't work for my VM.
cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
when I use the command:
cassandra
......
INFO 07:55:31 Enqueuing flush of local: 578 (0%) on-heap, 0 (0%) off-heap
INFO 07:55:31 Writing Memtable-local#2014850649(0.081KiB serialized bytes, 4 ops, 0%/0% of on/off-heap limit)
INFO 07:55:31 Completed flushing /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/tmp-la-305-big-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1448697324414, position=105487)
INFO 07:55:31 Enqueuing flush of local: 51468 (0%) on-heap, 0 (0%) off-heap
INFO 07:55:31 Writing Memtable-local#280469114(8.354KiB serialized bytes, 259 ops, 0%/0% of on/off-heap limit)
INFO 07:55:31 Completed flushing /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/tmp-la-306-big-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1448697324414, position=117466)
INFO 07:55:32 Node localhost/127.0.0.1 state jump to normal
INFO 07:55:32 Compacted (64dd8610-95a5-11e5-af1d-a752adc4283f) 4 sstables to [/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/la-91-big,] to level=0. 20,658 bytes to 20,029 (~96% of original) in 2,376ms = 0.008039MB/s. 0 total partitions merged to 225. Partition merge counts were {1:225, }
then cqlsh can work:
cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.1 | CQL spec 3.3.0 | Native protocol v4]
Use HELP for help.
cqlsh>
but a few minutes later, the cqlsh is down:
cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
Anyone can help me! Thanks in advance!
Sound like the server is going down after a few minutes. You should check the logs for the reason
I found the root cause: the memory is not enough. I create the linux swap, then everything is ok.
how to add swap on ubuntu

Error during node startup: Unable to start DSE server / Plugin activation failed / Cannot find core

I've been having these issues for quite a while already but I ignored them initially because I can still start my nodes. However, one of these issues became more serious recently that it now takes me a lot of tries in order to successfully start a node.
Issue #1: Unable to start DSE server / Plugin activation failed / Cannot find core
ERROR [main] 2015-01-28 03:30:40,058 DseDaemon.java (line 492) Unable to start DSE server.
java.lang.RuntimeException: com.datastax.bdp.plugin.PluginManager$PluginActivationException: Plugin activation failed
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:135)
at com.datastax.bdp.server.DseDaemon.start(DseDaemon.java:480)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:509)
at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:659)
Caused by: com.datastax.bdp.plugin.PluginManager$PluginActivationException: Plugin activation failed
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:284)
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:128)
... 3 more
Caused by: java.lang.IllegalStateException: Cannot find core: myks.mycf
at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.doWaitForCore(SolrCoreResourceManager.java:742)
at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.waitForCore(SolrCoreResourceManager.java:478)
at com.datastax.bdp.plugin.SolrContainerPlugin.waitForSecondaryIndexesLoading(SolrContainerPlugin.java:237)
at com.datastax.bdp.plugin.SolrContainerPlugin.onActivate(SolrContainerPlugin.java:98)
at com.datastax.bdp.plugin.PluginManager.initialize(PluginManager.java:334)
at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:263)
... 4 more
INFO [Thread-3] 2015-01-28 03:30:40,059 DseDaemon.java (line 505) DSE shutting down...
INFO [StorageServiceShutdownHook] 2015-01-28 03:30:40,164 Gossiper.java (line 1307) Announcing shutdown
INFO [Thread-3] 2015-01-28 03:30:40,620 PluginManager.java (line 356) All plugins are stopped.
INFO [Thread-3] 2015-01-28 03:30:40,620 CassandraDaemon.java (line 463) Cassandra shutting down...
INFO [StorageServiceShutdownHook] 2015-01-28 03:30:42,165 MessagingService.java (line 701) Waiting for messaging service to quiesce
INFO [ACCEPT-/144.76.201.233] 2015-01-28 03:30:42,814 MessagingService.java (line 941) MessagingService has terminated the accept() thread
This exception started as a "mild" issue - mild because although it prevents a node from starting up when it happens, it usually takes me 1 more try to successfully start the affected node. However, about two weeks ago, after having not restarted any of my nodes for quite a while, I discovered that I now need a lot more attempts (20+) in order to start a node.
From the stack trace, it looks like a timeout issue (in doWaitForCore()); but I cannot find a setting to increase the amount of time that DSE would wait for a core to load during startup before giving up. The core that is mentioned in the stack trace is always the same, and I assume that this is because it is my biggest core (~1.4 billions records) and it takes the longest time to load. But when I manage to start the node successfully, there are no signs of errors - I can query the core like any other core.
--
There are two other issues that may or may not be related to the one above. Both of them always appear during startup; and unlike the first one, they do not cause a startup failure (i.e. they also appear when a node starts successfully)
Issue #2: Invalid Number: static
ERROR [searcherExecutor-67-thread-1] 2015-01-28 04:26:49,691 SolrException.java (line 124) org.apache.solr.common.SolrException: Invalid Number: static
at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396)
at org.apache.solr.schema.FieldType.getFieldQuery(FieldType.java:697)
at org.apache.solr.schema.TrieField.getFieldQuery(TrieField.java:343)
at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:741)
at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:545)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:153)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:135)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:183)
I looked at the data that I imported and I couldn't find a supposedly-numeric value that was incorrectly supplied as "static". In the java application that I wrote to convert CSVs to SSTables, I cast all numeric values to int/long/double depending on the field type so I honestly don't think that it has something to do with my data.
Issue #3: Could not getStatistics on info bean com.datastax.bdp.search.solr.FilterCacheMBean
WARN [SolrSecondaryIndex myks.mycf2 index initializer.] 2015-01-28 04:26:51,770 JmxMonitoredMap.java (line 256) Could not getStatistics on info bean com.datastax.bdp.search.solr.FilterCacheMBean
java.lang.RuntimeException: java.lang.ClassCastException: org.apache.lucene.search.FieldCache$CreationPlaceholder cannot be cast to org.apache.solr.search.SolrCache
at com.datastax.bdp.search.solr.FilterCacheMBean.getStatistics(FilterCacheMBean.java:185)
at org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:236)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:140)
at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.registerExtraMBeans(CassandraCoreContainer.java:679)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.register(CassandraCoreContainer.java:427)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.doLoad(CassandraCoreContainer.java:757)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.load(CassandraCoreContainer.java:162)
at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex$2.run(AbstractSolrSecondaryIndex.java:882)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: org.apache.lucene.search.FieldCache$CreationPlaceholder cannot be cast to org.apache.solr.search.SolrCache
at com.datastax.bdp.search.solr.FilterCacheMBean.getStatistics(FilterCacheMBean.java:174)
... 16 more
I have absolutely no idea what this is.
--
Has anyone encountered these errors/exceptions/warnings before? What did you do?
Issue #1: The max waiting time to load a core was hard-coded at 1 min. So, your assumption is right: a very large core or hundreds of cores could prevent the node starting due to the excessive time to load this particular core. In the next patch release (4.5.6, 4.6.1) we address this issue by creating a new option load_max_time_per_core in dse.yaml. This option allows you to increase the max waiting time for core loading, starting at 1 min. For 500 cores you would need to increase load_max_time_per_core to about 3 minutes, for example.
Issue #2: Unfortunately, I don't know what could be causing this. We would need further info about this to see why it's happening.
Issue #3: We have currently investigating what this can be.
Regarding issue #2, are you sure you don't have a QuerySenderListener with a wrong warmup query in your solrconfig?

Resources