We upgraded Cassandra (5+5 nodes) 2.0.9 to 2.1.2 (binaries) and ran nodetool upgradesstables one-by-one (bash script), after this we observe some problems:
on every node we observe about 50 "Pending Tasks" on one of them more than 500, it has persist for 5 days - when we started nodetool upgradesstables, even if concurrent_compactors is set to 8 cassandra never run more than 3-4 at the same time. One node with more than 500 tasks pending has about 11k files in column family directory...
we have 2 ssd disks but during compacting there is up to 10MB/s reads and maximum 5MB/s writes - even if compaction_throughput_mb_per_sec is set to 32 or 64 or 256
during upgradesstables on some tables got :
WARN [RMI TCP Connection(100)-10.64.72.34] 2014-12-21 23:53:18,953 ColumnFamilyStore.java:2492 - Unable to cancel in-progress compactions for reco_active_items_v1. Perhaps there is an unusually large row in progress somewhere, or the system is simply overloaded.
INFO [RMI TCP Connection(100)-10.64.72.34] 2014-12-21 23:53:18,953 CompactionManager.java:247 - Aborting operation on reco_prod.reco_active_items_v1 after failing to interrupt other compaction operations
nodetool is failing with:
Aborted upgrading sstables for atleast one column family in keyspace reco_prod, check server logs for more information.
on some nodes nodetool upgradesstables finished succefully but still can see jb files in column family directory.
nodetool upgradesstables on some nodes returns:
error: null
-- StackTrace --
java.lang.NullPointerException
at org.apache.cassandra.io.sstable.SSTableReader.cloneWithNewStart(SSTableReader.java:952)
at org.apache.cassandra.io.sstable.SSTableRewriter.moveStarts(SSTableRewriter.java:250)
at org.apache.cassandra.io.sstable.SSTableRewriter.switchWriter(SSTableRewriter.java:300)
at org.apache.cassandra.io.sstable.SSTableRewriter.abort(SSTableRewriter.java:186)
at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:204)
at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:75)
at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at org.apache.cassandra.db.compaction.CompactionManager$4.execute(CompactionManager.java:340)
at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
This is our production env (24h) and we observe higher load on nodes , higher read latency even more than 1 sec.
Any advise...?
Related
I am using Artemis 2.7.0.redhat-00056 (AMQ 5.11 I think)
One of the camel routes was set up ages ago. It sends XML on a queue, which is then handled in a bean.
So it uses JMS style code:
producer = session.createProducer(destination);
producer.setDeliveryMode(DeliveryMode.PERSISTENT);
...
TextMessage message = session.createTextMessage(xStream.toXML(corMessage));
producer.send(message);
This works 99% of the time, but today we started getting:
Caused by: java.lang.IndexOutOfBoundsException: Error reading in simpleString, length=6702096 is greater than readableBytes=963134
at org.apache.activemq.artemis.api.core.SimpleString.readSimpleString(SimpleString.java:183)
In addition to this, the warning:
AMQ212054: Destination address=... is blocked. If the system is configured to block make sure you consume messages on this configuration.
To fix this, I went into the docker and checked out our /amq/broker/data/large-messages folder, and moved all those files somewhere else. It threw a bunch of exceptions, but new messages seem to be going through now.
But I am still getting the AMQ212054 'destination blocked' warning.
So, how do I
a) get rid of the warning
b) fix this going forward?
I have looked at the docs, but I don't see anything in particular that would help. There's a minLargeMessageSize field, but would setting it make any difference? Then would I be checking the size of the XML before sending, and then have some if/else statement, to either send with producer.send(message); or as a ByteMessage, like section 9.4/9.5, if it's a 'large' message?
EDIT: (FROM INSIDE DOCKER)
...
<max-disk-usage>90</max-disk-usage>
...
[root#409a74d7eadd /]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-253:2-268789825-123 10G 2.8G 7.3G 28% /
tmpfs 64M 0 64M 0% /dev
tmpfs 12G 0 12G 0% /sys/fs/cgroup
/dev/mapper/centos-root 36G 14G 23G 37% /temp
How can I track down this dump error?
And most important, which process is causing it?
What are the consequences?
It happens almost every weekend:
See sql dump output below:
*Current time is 23:26:40 11/05/17.
=====================================================================
BugCheck Dump
=====================================================================
This file is generated by Microsoft SQL Server
version 13.0.4446.0
upon detection of fatal unexpected error. Please return this file,
the query or program that produced the bugcheck, the database and
the error log, and any other pertinent information with a Service Request.
Computer type is Intel(R) Xeon(R) CPU E5-2698B v3 # 2.00GHz.
Bios Version is VRTUAL - 5001223
Intel(R) Xeon(R) CPU E5-2698B v3 # 2.00GHz
4 X64 level 8664, 10 Mhz processor (s).
Windows NT 6.2 Build 9200 CSD .
Memory
MemoryLoad = 96%
Total Physical = 32767 MB
Available Physical = 994 MB
Total Page File = 39679 MB
Available Page File = 5602 MB
Total Virtual = 134217727 MB
Available Virtual = 134132460 MB
**Dump thread - spid = 0, EC = 0x000001DE6E277240
***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\LOG\SQLDump0006.txt
* *******************************************************************************
*
* BEGIN STACK DUMP:
* 11/05/17 23:26:40 spid 38
*
* Latch timeout
*
*
* *******************************************************************************
* -------------------------------------------------------------------------------
* Short Stack Dump*
You can use SQL Server Diagnostics (Preview) available as an extension from SSMS 17.1 onwards to check for any potential causes and any available resolutions
After installing you will find a screen like below and after uploading dump, you can find potential solutions or patches which may help you.Ensure you upload DUMP to a location near to you
You also can load the dump using windbg and play with it if you have right symbols..Further event logs can show you more info
Most of the times Stackdumps are dumped due to bugs..Best way to proceed with them is to raise a ticket with microsoft.
Elapsed time: 0 hours 0 minu
tes 6 seconds. Internal database snapshot has split point LSN = 00014377:000000a5:0001 and first LSN = 00014377:
000000a3:0001.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (master.
**Dump thread - spid = 0, EC = 0x0000022824F95600
You nee check dump file where stack dump detail provides.
We are sharing small piece of code for analysis.
I am using zeppelin from the last 3 months and noticed this strange problem recently. Everyday morning I had to restart zeppelin for it to work or else the paragraph execution will go to pending state and never run. I tried to dig deeper to check what is the problem. The state of the zeppelin application in yarn is finshed. I tried to check the log and it shows the below error. Couldn't make out anything out of it.
2017-06-28 22:04:08,986 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56876 for container-id container_1498627544571_0001_01_000002: 1.2 GB of 4 GB physical memory used; 4.0 GB of 20 GB virtual memory used
2017-06-28 22:04:08,995 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56787 for container-id container_1498627544571_0001_01_000001: 330.2 MB of 1 GB physical memory used; 1.4 GB of 5 GB virtual memory used
2017-06-28 22:04:09,964 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1498627544571_0001_01_000002 is : 1
2017-06-28 22:04:09,965 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1498627544571_0001_01_000002 and exit code: 1
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-06-28 22:04:09,972 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2017-06-28 22:04:09,972 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1498627544571_0001_01_000002
2017-06-28 22:04:09,972 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 1
I am the only user in that environment and no one else is using it. There isn't any process running at that time as well. Couldn't understand why it is happening.
My instance on Google Compute Engine is not booting up due to having some boot order issues.
So, I have created a another instance and re-configured my machine.
My questions:
How can I handle these issues when I host some websites?
How can I recover my data from old disk?
logs
[ 0.348577] Key type trusted registered
[ 0.349232] Key type encrypted registered
[ 0.349769] AppArmor: AppArmor sha1 policy hashing enabled
[ 0.350351] ima: No TPM chip found, activating TPM-bypass!
[ 0.351070] evm: HMAC attrs: 0x1
[ 0.351549] Magic number: 11:333:138
[ 0.352077] block ram3: hash matches
[ 0.352550] rtc_cmos 00:00: setting system clock to 2015-12-19 17:06:53 UTC (1450544813)
[ 0.353492] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[ 0.354108] EDD information not available.
[ 0.536267] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
[ 0.537862] md: Waiting for all devices to be available before autodetect
[ 0.538979] md: If you don't use raid, use raid=noautodetect
[ 0.539969] md: Autodetecting RAID arrays.
[ 0.540699] md: Scanned 0 and added 0 devices.
[ 0.541565] md: autorun ...
[ 0.542093] md: ... autorun DONE.
[ 0.542723] VFS: Cannot open root device "sda1" or unknown-block(0,0): error -6
[ 0.543731] Please append a correct "root=" boot option; here are the available partitions:
[ 0.545011] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[ 0.546199] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-39-generic #44~14.04.1-Ubuntu
[ 0.547579] Hardware name: Google Google, BIOS Google 01/01/2011
[ 0.548728] ffffea00008ae140 ffff880024ee7db8 ffffffff817af92b 000000000000111e
[ 0.549004] ffffffff81a7c7c8 ffff880024ee7e38 ffffffff817a976b ffff880024ee7dd8
[ 0.549004] ffffffff00000010 ffff880024ee7e48 ffff880024ee7de8 ffff880024ee7e38
[ 0.549004] Call Trace:
[ 0.549004] [] dump_stack+0x45/0x57
[ 0.549004] [] panic+0xc1/0x1f5
[ 0.549004] [] mount_block_root+0x210/0x2a9
[ 0.549004] [] mount_root+0x54/0x58
[ 0.549004] [] prepare_namespace+0x16d/0x1a6
[ 0.549004] [] kernel_init_freeable+0x1f6/0x20b
[ 0.549004] [] ? initcall_blacklist+0xc0/0xc0
[ 0.549004] [] ? rest_init+0x80/0x80
[ 0.549004] [] kernel_init+0xe/0xf0
[ 0.549004] [] ret_from_fork+0x58/0x90
[ 0.549004] [] ? rest_init+0x80/0x80
[ 0.549004] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 0.549004] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
What Causes This?
That is the million dollar question. After inspecting my GCE VM, I found out there were 14 different kernels installed taking up several hundred MB's of space. Most of the kernels didn't have a corresponding initrd.img file, and were therefore not bootable (including 3.19.0-39-generic).
I certainly never went around trying to install random kernels, and once removed, they no longer appear as available upgrades, so I'm not sure what happened. Seriously, what happened?
Edit: New response from Google Cloud Support.
I received another disconcerting response. This may explain the additional, errant kernels.
"On rare occasions, a VM needs to be migrated from one physical host to another. In such case, a kernel upgrade and security patches might be applied by Google."
1. "How can I handle these issues when I host some websites?"
My first instinct is to recommend using AWS instead of GCE. However, GCE is less expensive. Before doing any upgrades, make sure you take a snapshot, and try rebooting the server to see if the upgrades broke anything.
2. How can I recover my data from old disk?
Even Better - How to recover your instance...
After several back-and-forth emails, I finally received a response from support that allowed me to resolve the issue. Be mindful, you will have to change things to match your unique VM.
Take a snapshot of the disk first in case we need to roll back any of the changes below.
Edit the properties of the broken instance to disable this option: "Delete boot disk when instance is deleted"
Delete the broken instance.
IMPORTANT: ensure not to select the option to delete the boot disk. Otherwise, the disk will get removed permanently!!
Start up a new temporary instance.
Attach the broken disk (this will appear as /dev/sdb1) to the temporary instance
When the temporary instance is booted up, do the following:
In the temporary instance:
# Run fsck to fix any disk corruption issues
$ sudo fsck.ext4 -a /dev/sdb1
# Mount the disk from the broken vm
$ sudo mkdir /mnt/sdb
$ sudo mount /dev/sdb1 /mnt/sdb/ -t ext4
# Find out the UUID of the broken disk. In this case, the uuid of sdb1 is d9cae47b-328f-482a-a202-d0ba41926661
$ ls -alt /dev/disk/by-uuid/
lrwxrwxrwx. 1 root root 10 Jan 6 07:43 d9cae47b-328f-482a-a202-d0ba41926661 -> ../../sdb1
lrwxrwxrwx. 1 root root 10 Jan 6 05:39 a8cf6ab7-92fb-42c6-b95f-d437f94aaf98 -> ../../sda1
# Update the UUID in grub.cfg (if necessary)
$ sudo vim /mnt/sdb/boot/grub/grub.cfg
Note: This ^^^ is where I deviated from the support instructions.
Instead of modifying all the boot entries to set root=UUID=[uuid character string], I looked for all the entries that set root=/dev/sda1 and deleted them. I also deleted every entry that didn't set an initrd.img file. The top boot entry with correct parameters in my case ended up being 3.19.0-31-generic. But yours may be different.
# Flush all changes to disk
$ sudo sync
# Shut down the temporary instance
$ sudo shutdown -h now
Finally, detach the HDD from the temporary instance, and create a new instance based off of the fixed disk. It will hopefully boot.
Assuming it does boot, you have a lot of work to do. If you have half as many unused kernels as me, then you might want to purge the unused ones (especially since some are likely missing a corresponding initrd.img file).
I used the second answer (the terminal-based one) in this askubuntu question to purge the other kernels.
Note: Make sure you don't purge the kernel you booted in with!
How to handle these issues when I host some websites?
I'm not sure how you got into this situation, but it would be nice to have additional information (see my comment above) to be able to understand what triggered this issue.
How to recover my data from old disk?
Attach and mount the disk
Assuming you did not delete the original disk when you deleted the instance, you can simply mount this disk from another VM to read the data from it. To do this:
attach the disk to another VM instance, e.g.,
gcloud compute instances attach-disk $INSTANCE --disk $DISK
mount the disk:
sudo mkdir -p /mnt/disks/[MNT_DIR]
sudo mount [OPTIONS] /dev/disk/by-id/google-[DISK_NAME] /mnt/disks/[MNT_DIR]
Note: you'll need to substitute appropriate values for:
MNT_DIR: directory
OPTIONS: options appropriate for your disk and filesystem
DISK_NAME: the id of the disk after you attach it to the VM
Unmounting and detaching the disk
When you are done using the disk, reverse the steps:
Note: Before you detach a non-root disk, unmount the disk first. Detaching a mounted disk might result in incomplete I/O operation and data corruption.
unmount the disk
sudo umount /dev/disk/by-id/google-[DISK_NAME]
detach the disk from the VM:
gcloud compute instances detach-disk $INSTANCE --device-name my-new-device
In my case grub's (/boot/grub/grub.cfg) first menuentry (3.19.0-51-generic) was missing an initrd entry and was unable to boot.
Upon further investigating, looking at dpkg for the specific kernel its marked as failed and unconfigured
dpkg -l | grep 3.19.0-51-generic
iF linux-image-3.19.0-51-generic 3.19.0-51.58~14.04.1
iU linux-image-extra-3.19.0-51-generic 3.19.0-51.58~14.04.1
This all stemmed from the Ubuntu image supplied by Google having unattended-upgrades enabled. For some reason the initrd was killed when it was being built and something else came along and ran update-grub2.
unattended-upgrades-dpkg_2016-03-10_06:49:42.550403.log:update-initramfs: Generating /boot/initrd.img-3.19.0-51-generic
Killed
E: mkinitramfs failure cpio 141 xz -8 --check=crc32 137
unattended-upgrades-dpkg_2016-03-10_06:49:42.550403.log:update-initramfs: failed for /boot/initrd.img-3.19.0-51-generic with 1.
To work around the immediate problem run.
dpkg --force-confold --configure -a
Although unattended-upgrades in theory is a great idea, having it enabled by default can have unattended consequences.
There are a few cases where the kernel fails to handle the initrdless boot. Disable the GRUB_FORCE_PARTUUID options so that it boots with initrd.
The problem incident:
Our production system started denying services with an error message "Too many open files in system". Most of the services were affected, including inability to start a new ssh session, or even log in into virtual console from the physical terminal. Luckily, one root ssh session was open, so we could interact with the system (morale: keep one root session always open!). As a side effect, some services (named, dbus-daemon, rsyslogd, avahi-daemon) saturated the CPU (100% load). The system also serves a large directory via NFS to a very busy client which was backing up 50000 small files at the moment. Restarting all kinds of services and programs normalized their CPU behavior, but did not solve the "Too many open files in system" problem.
The suspected cause
Most likely, some program is leaking file handles. Probably the culprit is my tcl program, which also saturated the CPU (not normal). However, killing it did not help, but, most disturbingly, lsof would not reveal large amounts of open files.
Some evidence
We had to reboot, so whatever information was collected is all we have.
root#xeon:~# cat /proc/sys/fs/file-max
205900
root#xeon:~# lsof
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 8,6 4096 2 /
init 1 root rtd DIR 8,6 4096 2 /
init 1 root txt REG 8,6 124704 7979050 /sbin/init
init 1 root mem REG 8,6 42580 5357606 /lib/i386-linux-gnu/libnss_files-2.13.so
init 1 root mem REG 8,6 243400 5357572 /lib/i386-linux-gnu/libdbus-1.so.3.5.4
...
A pretty normal list, definitely not 200K files, more like two hundred.
This is probably, where the problem started:
less /var/log/syslog
Mar 27 06:54:01 xeon CRON[16084]: (CRON) error (grandchild #16090 failed with exit status 1)
Mar 27 06:54:21 xeon kernel: [8848865.426732] VFS: file-max limit 205900 reached
Mar 27 06:54:29 xeon postfix/master[1435]: warning: master_wakeup_timer_event: service pickup(public/pickup): Too many open files in system
Mar 27 06:54:29 xeon kernel: [8848873.611491] VFS: file-max limit 205900 reached
Mar 27 06:54:32 xeon kernel: [8848876.293525] VFS: file-max limit 205900 reached
netstat did not show noticeable anomalies either.
The man pages for ps and top do not indicate an ability to show open file count. Probably the problem will repeat itself after a few months (that was our uptime).
Any ideas on what else can be done to identify the open files?
UPDATE
This question has changed the meaning, after qehgt identified the likely cause.
Apart from the bug in NFS v4 code, I suspect there is a design limitation in Linux and kernel-leaked file handles can NOT be identified. Consequently, the original question transforms into:
"Who is responsible for file handles in the Linux kernel?" and "Where do I post that question?". The 1st answer was helpful, but I am willing to accept a better answer.
Probably the root cause is a bug in NFSv4 implementation: https://stackoverflow.com/a/5205459/280758
They have similar symptoms.