why is my linux system experiencing "Log I/O error Detected. Shutting down filesystem" problem - filesystems

system info:
[root#cpe ~]# uname -a
Linux cpe 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root#cpe ~]# cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
[root#cpe ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/cl_cpe-root 34721216 34721196 20 100% /
devtmpfs 32843124 0 32843124 0% /dev
tmpfs 32855080 0 32855080 0% /dev/shm
tmpfs 32855080 942644 31912436 3% /run
tmpfs 32855080 0 32855080 0% /sys/fs/cgroup
/dev/mapper/cl_cpe-home 16947200 32944 16914256 1% /home
/dev/sda1 1038336 85484 952852 9% /boot
tmpfs 6571016 0 6571016 0% /run/user/0
[root#cpe ~]# mount | grep root
/dev/mapper/cl_cpe-root on / type xfs (rw,relatime,attr2,inode64,noquota)
problem:
when system is running, there are error log as below. then can't run any command line.
XFS (dm-0): metadata I/O error: block 0x2128c70 ("xlog_iodone") error 5 numblks 64
XFS (dm-0): Log I/O Error Detected. Shutting down filesystem
XFS (dm-0): Please umount the filesystem and rectify the problems(s)
XFS (dm-0): metadata I/O error: block 0x2128c7f ("xlog_iodone") error 5 numblks 64
XFS (dm-0): metadata I/O error: block 0x2128c82 ("xlog_iodone") error 5 numblks 64
Who can help me to analyze or locate the problem.
Thanks in advance.

Related

Why is gdb aborting when I try to print a cosine?

Here's my interaction with it. I first start gdb, set a breakpoint, run the program gdb halts at the breakpoint. Then:
<code>
(gdb) b 89
Breakpoint 1 at 0x18cc: file parseGaia3DataToSqDeg.c, line 89.
(gdb) r
Starting program: /sixTB/astro/catalogs/gaia3/shSqDeg/fj
Star 0.0281655 -89.857 not found in 0 tries.
Breakpoint 1, main (argc=1, argv=0x7fffffffe5c8) at parseGaia3DataToSqDeg.c:89
89 exit(0); //TEST
(gdb) p cos(.333)
Abort
</code>
Gdb simply quits, and I'm back at my command line.
Data on gdb:
gdb --version
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
My machine:
total used free shared buff/cache available
Mem: 27Gi 3.1Gi 1.2Gi 123Mi 23Gi 23Gi
Swap: 976Mi 3.0Mi 973Mi
CPU family: 25
AMD Ryzen 5 5600G with Radeon Graphics
CPU MHz: 1397.031
CPU max MHz: 5000.6831
CPU min MHz: 1400.0000
BogoMIPS: 7784.71
CPU cache size: 512 KB
No brand USB OPTICAL MOUSE
Microsoft Corp. Microsoft Ergonomic Keyboard
Filesystem Size Used Avail Use% Mounted on
udev 14G 0 14G 0% /dev
tmpfs 2.8G 1.5M 2.8G 1% /run
/dev/nvme0n1p2 233G 22G 199G 10% /
tmpfs 14G 0 14G 0% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
/dev/nvme0n1p1 511M 3.5M 508M 1% /boot/efi
/dev/sdb1 3.6T 93G 3.4T 3% /fourTB
/dev/sda1 5.5T 2.3T 2.9T 45% /sixTB
tmpfs 2.8G 132K 2.8G 1% /run/user/1000
FWIW, in previous versions of gdb, I could always print a cosine or other math function.
OK, the above comment's solution worked once, and then quit. cos(.333) aborted gdb. Oh well... I'm wondering if it's a gdb or Debian problem, or that my machine's hardware is simply weird. I also neglected to include in the above comment's command "install" The command should read:
apg-get install gdb gdb-doc build-essential devscript

Logwatch is too noisy

I've been using Logwatch for at least 12 years, but since I've moved to Ubuntu 18.04 I've gotten soooo annoyed about the daily e-mail is listing 37 /snap in the filesystem check:
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p2 439G 268G 149G 65% /
/dev/loop0 83M 83M 0 100% /snap/shotcut/119
/dev/loop1 234M 234M 0 100% /snap/gimp/322
/dev/loop3 291M 291M 0 100% /snap/vlc/1620
/dev/loop4 218M 218M 0 100% /snap/gnome-3-34-1804/60
/dev/loop2 256K 256K 0 100% /snap/gtk2-common-themes/13
etc...
I have looked for a solution before without luck and I've been looking in the logwatch files, I couldn't find any settings to do this.
I looked in /usr/share/logwatch/scripts/services/zz-disk_space, where the df command is:
df -h -x tmpfs -x devtmpfs -x udf -x iso9660
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p2 439G 268G 150G 65% /
/dev/loop0 83M 83M 0 100% /snap/shotcut/119
/dev/loop1 234M 234M 0 100% /snap/gimp/322
/dev/loop3 291M 291M 0 100% /snap/vlc/1620
etc... (37 of those in total)
By adding '-x squashfs' i get what i want:
df -h -x tmpfs -x devtmpfs -x udf -x iso9660 -x squashfs
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p2 439G 268G 150G 65% /
/dev/sda 3.6T 580G 2.9T 17% /backup
/dev/nvme0n1p1 511M 7.4M 504M 2% /boot/efi
//192.168.0.200/nas-office/backup 1.9T 723G 1.2T 39% /mnt/nas
Excellent!

How can I check integrity of a extracted zImage?

$ binwalk -e linux_image.img
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 Android bootimg, kernel size: 6897653 bytes, kernel addr: 0x81C08000, ramdisk size: 5959520 bytes, ramdisk addr: 0x81C08000, product name: ""
2048 0x800 Linux kernel ARM boot executable zImage (little-endian)
18479 0x482F gzip compressed data, maximum compression, from Unix, last modified: 1970-01-01 00:00:00 (null date)
6761720 0x672CF8 device tree image (dtb)
6883304 0x6907E8 Unix path: /dev/block/platform/soc/7824900.sdhci/by-name/vendor
6899712 0x694800 gzip compressed data, maximum compression, has original file name: "rootfs.cpio", from Unix, last modified: 2019-04-06 00:42:26
9706949 0x941DC5 MySQL ISAM compressed data file Version 11
$ dd if=linux_image.img of=vmlinuz bs=1 skip=2048 count=6897653
$ file vmlinuz
vmlinuz: Linux kernel ARM boot executable zImage (little-endian)
$ dd if=vmlinuz bs=1 skip=$(LC_ALL=C grep -a -b -o $'\x1f\x8b\x08\x00\x00\x00\x00\x00' vmlinuz-3.18.66-perf | head -n 1 | cut -d ':' -f 1) | zcat | grep -a 'Linux version'
Linux version 3.18.66 (build#test) (gcc version 4.9.3 (GCC) ) #1 SMP PREEMPT Fri Apr 1 13:16:33 PDT 2018
Running 'qemu-system-arm.exe -machine vexpress-a9 -cpu cortex-a7 -smp 4 -kernel vmlinuz' blank screen
If you pull a random Arm Linux kernel (including Android) from somewhere and try to run it on anything other than the hardware that it is intended to boot on, the expected result is that it crashes very early in bootup without being able to output anything to screen or serial port, ie you get a black screen and nothing happens. The most likely situation here is that your image is fine and not corrupt, it's just not built to run on the vexpress-a9 board you're running it on.
In the unlikely event that this really is a kernel built for the vexpress-a9, the next problem you have is that you haven't passed QEMU a device tree blob via the -dtb option. Modern Linux kernels don't hardcode all the information about the boards they can run on, but instead expect the bootloader (which is QEMU in this case) to pass them a data file which provides information about where all the devices are for the board. If you don't do that, then the result is the same as above: kernel crashes very early in bootup without being able to output any information, so black screen.

Unit Testing (assert.h) on Beaglebone Black (ARM) with Linux Headers installed on SD Card

Ok so here it goes:
I'm developing a DMA Kernel Driver on the Beaglebone Black (ARM Cortex-A8) - currently my file system looks like this (important for the question):
/dev/mmcblk1p2 1.7G 1.1G 511M 69% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 247M 4.0K 247M 1% /dev
tmpfs 50M 224K 50M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 248M 0 248M 0% /run/shm
none 100M 0 100M 0% /run/user
/dev/mmcblk1p1 71M 20M 52M 28% /boot/uboot
/dev/mmcblk0p1 3.6G 571M 2.8G 17% /media/microsd
rootfs and /boot are sitting on the eMMC NAND Flash Memory Chip
mounting /media/microsd to give myself an extra ~4GB of space
My driver code base is sitting in __/home/user/__
the Linux Headers were too big to install on __rootfs__ (NAND Flash) so I wrote a little script that installed them to the __/media_microsd__ filesystem, then symbolically linked __/lib/modules/3.8.13-bone28/build__ to __/media/microsd/usr/src/linux-3.8.13-bone28__ then in my makefile I run this: __make -C /lib/modules/3.8.13-bone28/build M=$(PWD) modules__ so that the driver is built where the linux headers are living (/media/microsd ...) and then I can include them easily within my code by doing #include <linux/whatever.h>
code Reference: GitHub - Mighty_DMA
My issue comes when trying to build Unit Tests using the #include <assert.h> header file which lives in /usr/include ... since my Makefile uses the -C flag to change immediately to the SD card directory (to access Linux Headers and Build) then Make tries to look for assert.h in /media/microsd/usr/include instead of /usr/include/
What is the best way to build Unit Tests using either Check (check.h) or Assert (assert.h) when I can not include them in my code because of the divergence of the file systems living on both NAND Flash and SD Card
I have tried modifying AutoTools and Makefiles to include the directory path /usr/include/ but because the -C flag, it becomes relative. I tried giving #include </usr/include/assert.h> direct path to the file but that doesn't solve the problem recursively - it will begin to error about header files assert calls ... and so on
Thank you in advance for your help, I really don't know what the best route to take is here.
<3,
-q

open() returns with "No such device" error, but there is such a device (linux)

I'm trying to use a somewhat old DAQ, and had to jump through a few hoops to get an old (circa 2004) device driver for it to compile (DTI-DT340 Linux-DAQ-PCI).
I've gotten to the point where it compiles, I can load the kernel module, it finds the card, and I can create the character devices using mknod.
But I can't seem to open these devices and keep getting errno 19 (ENODEV) 'No such device' when I try to
open("/dev/dt340/0",O_RDWR);
but mknod had no complaints about making it, and it's there:
# ls -l /dev/dt340/
total 0
crw-rw-r-- 1 root staff 250, 0 2009-04-23 11:02 0
crw-rw-r-- 1 root staff 250, 1 2009-04-23 11:02 1
crw-rw-r-- 1 root staff 250, 2 2009-04-23 11:02 2
crw-rw-r-- 1 root staff 250, 3 2009-04-23 11:02 3
Is there something I'm neglecting to do? What might be a reason open fails?
Here's the script I use to load the driver and make the devices.
#!/bin/bash
module="dt340"
device="dt340"
mode="664"
# invoke modprobe with all arguments we were passed
#/sbin/modprobe -t misc -lroot -f -s $module.o $* || exit 1
insmod $module.ko
# remove stale nodes
rm -f /dev/${device}/[0-3]
major=`awk "\\$2==\"$module\" {print \\$1}" /proc/devices`
mkdir -p /dev/${device}
mknod /dev/${device}/0 c $major 0
mknod /dev/${device}/1 c $major 1
mknod /dev/${device}/2 c $major 2
mknod /dev/${device}/3 c $major 3
# give appropriate group/permissions, and change the group
# not all distributions have staff; some have "users" instead
group="staff"
grep '^staff:' /etc/group > /dev/null || group="users"
chgrp $group /dev/${device}/[0-3]
chmod $mode /dev/${device}/[0-3]
Some additional info:
#grep dt340 /proc/devices
250 dt340
# lsmod | grep dt340
dt340 21516 0
# tail /var/log/messages
Apr 23 11:59:26 ve kernel: [ 412.862139] dt340 0000:03:01.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
Apr 23 11:59:26 ve kernel: [ 412.862362] dt340: In function dt340_init_one:
Apr 23 11:59:26 ve kernel: [ 412.862363] Device DT340 Rev 0x0 detected at address 0xfebf0000
#lspci | grep 340
03:01.0 Multimedia controller: Data Translation DT340
ANSWER: A printk confirmed that the -ENODEV was thrown from inside open(). Following an oldstyle
while ((pdev = pci_find_device(PCI_VENDOR_ID_DTI, PCI_ANY_ID, pdev)))
(which is deprecated), if(!pdev) ends up true, and returns the -ENODEV.
I'm inching closer - I guess I have to work through and update the pci code to use more modern mechanisms...
If the device shows up in /proc/devices, and you're sure you've got the number right in mknod, then the driver itself is refusing the open. The driver can return any error code from open() - including "no such device", which it might if it discovered a problem initialising the hardware.
I'd guess it is a problem in the driver, check the open function.
It shows up in /proc/devices, so all the generic device stuff seems to be ok.
mknod doesn't care if there is an device corresponding to the given major/minor numbers. Are you sure insmod is installing your module? What does lsmod tell you?
I'm unfamiliar with having to add the ".ko" extension. Is that something specific to your device driver?
Check through lspci and make sure hardware is properly initialized. If your system supports hotplug, pci_find_device won't work. The problem with this is a refcnt. The best way to deal and learn is to dissect the API. BOL !!

Resources