How to find which process is leaking file handles in Linux? - file

The problem incident:
Our production system started denying services with an error message "Too many open files in system". Most of the services were affected, including inability to start a new ssh session, or even log in into virtual console from the physical terminal. Luckily, one root ssh session was open, so we could interact with the system (morale: keep one root session always open!). As a side effect, some services (named, dbus-daemon, rsyslogd, avahi-daemon) saturated the CPU (100% load). The system also serves a large directory via NFS to a very busy client which was backing up 50000 small files at the moment. Restarting all kinds of services and programs normalized their CPU behavior, but did not solve the "Too many open files in system" problem.
The suspected cause
Most likely, some program is leaking file handles. Probably the culprit is my tcl program, which also saturated the CPU (not normal). However, killing it did not help, but, most disturbingly, lsof would not reveal large amounts of open files.
Some evidence
We had to reboot, so whatever information was collected is all we have.
root#xeon:~# cat /proc/sys/fs/file-max
205900
root#xeon:~# lsof
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 8,6 4096 2 /
init 1 root rtd DIR 8,6 4096 2 /
init 1 root txt REG 8,6 124704 7979050 /sbin/init
init 1 root mem REG 8,6 42580 5357606 /lib/i386-linux-gnu/libnss_files-2.13.so
init 1 root mem REG 8,6 243400 5357572 /lib/i386-linux-gnu/libdbus-1.so.3.5.4
...
A pretty normal list, definitely not 200K files, more like two hundred.
This is probably, where the problem started:
less /var/log/syslog
Mar 27 06:54:01 xeon CRON[16084]: (CRON) error (grandchild #16090 failed with exit status 1)
Mar 27 06:54:21 xeon kernel: [8848865.426732] VFS: file-max limit 205900 reached
Mar 27 06:54:29 xeon postfix/master[1435]: warning: master_wakeup_timer_event: service pickup(public/pickup): Too many open files in system
Mar 27 06:54:29 xeon kernel: [8848873.611491] VFS: file-max limit 205900 reached
Mar 27 06:54:32 xeon kernel: [8848876.293525] VFS: file-max limit 205900 reached
netstat did not show noticeable anomalies either.
The man pages for ps and top do not indicate an ability to show open file count. Probably the problem will repeat itself after a few months (that was our uptime).
Any ideas on what else can be done to identify the open files?
UPDATE
This question has changed the meaning, after qehgt identified the likely cause.
Apart from the bug in NFS v4 code, I suspect there is a design limitation in Linux and kernel-leaked file handles can NOT be identified. Consequently, the original question transforms into:
"Who is responsible for file handles in the Linux kernel?" and "Where do I post that question?". The 1st answer was helpful, but I am willing to accept a better answer.

Probably the root cause is a bug in NFSv4 implementation: https://stackoverflow.com/a/5205459/280758
They have similar symptoms.

Related

Does GDB have a userid?

I am trying to debug a C program using GDB. But when I try to run it from inside gdb I get the following error :
note: the FATAL error line is user defined
gdb-peda$ run
Starting program: /home/masterdungeon/HTAOEBookPrograms/0x200/0x280/0x287/GameOfChance
**************** WELCOME to the GAME OF CHANCE *****************
This game will essentially tell you how lucky you are today ;)
---- New player ----
Please enter your name : user_gdb
[!!!] Fatal Error in register_user() while opening DATAFILE
: Permission denied
[Inferior 1 (process 10636) exited with code 0377]
Warning: not running
gdb-peda$
This program is actually a command line game named "GameOfChance" (from the book HTAOE). Whenever a user runs the program, the program first checks its UserID to see whether the user is already registered as a player in the DATAFILE. If there is no entry of that UID in the DATAFILE(i.e player not registered already), then the program allows to create a new player and accept a username, thus registering as a player with that UID and accepted username. But I think GDB does not have a UID since there is no entry of gdb in /etc/passwd. How do I make the program run while debugging and register GDB as a new player? Is it even possible ?
The code looks like this :
12 #define DATAFILE "/var/gameofchance.data" // File to store user data
46 int main(){
//lines of code
53 uid = getuid(); // get current user_id i.e player_id
54 player_exists = get_player_data(uid); // returns -1 if player does not exist
55 //otherwise returns 0 and puts all player data into struct player
56
57 if(player_exists == -1) {
58 register_player(uid);
59 }
//lines of code
148 return 0;
149 } //end main()
314 void register_player(int uid){
//lines of code
327 fd = open(DATAFILE, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR);
329
330 if(fd==-1){
331 fatal(" in register_user() while opening DATAFILE\n");
332 }
//lines of code
344 } //end register_player
the permissions for DATAFILE are :
-rw------- 1 root masterdungeon 240 Apr 19 13:54 gameofchance.data
the permissions for executable game GameOfChance are :
-rwsrwxr-x 1 root root 29064 Jan 4 19:45 GameOfChance
Another thing I couldn't understand is when I set a breakpoint at line 54 and check for value of uid I get 1000 as UID of GDB.
Breakpoint 16, main () at gameofchance.c:54
54 player_exists = get_player_data(uid); // returns -1 if player does not exist
gdb-peda$ x/wd &uid
0x7ffd4ed4aee8: 1000
How's it possible that GDB has userid of 1000 ? as there is no entry of gdb in /etc/passwd. 1000 is userid of masterdungeon.
Okay so it works when gdb is run using sudo gdb. But why do I have to run it as root to get it run nicely in GDB ?
Otherwise in BASH the program runs successfully as user masterdungeon. Only in GDB it require to be run as root
Does GDB have a userid?
Yes. Every process that runs, including GDB processes, has both an effective UID and a real UID. Often these are the same. But you seem to have a misunderstanding. These do not describe the process itself. Rather they describe the user on whose behalf the process is running.
How's it possible that GDB has userid of 1000 ? as there is no entry of gdb in /etc/passwd. 1000 is userid of masterdungeon.
Because you're running gdb as user "masterdungeon", or as another user with the same UID number.
Okay so it works when gdb is run using sudo gdb. But why do I have to run it as root to get it run nicely in GDB ?
Your data file is accessible only to root:
-rw------- 1 root masterdungeon 240 Apr 19 13:54 gameofchance.data
. When run directly, the program accommodates that by being root-owned and having its SUID bit set:
-rwsrwxr-x 1 root root 29064 Jan 4 19:45 GameOfChance
(note the "s" in the first triad of permission bits). That causes the program, when run directly, to run with the effective UID of root, even though root did not actually launch it. This is one of the cases where the effective and real UIDs differ. It is also a very poor use case for SUID, because SUID root programs present an existential security risk to the host system, and that risk is not justified for a game.
The risk would be much worse if the SUID bit were honored when the program is running under control of a debugger. A debugger can make arbitrary changes to program data and even binary code while the program is running, and that would present an easy vector for privilege escalation if SUID were honored in such contexts. Accordingly, the SUID bit on an executable has no effect when the program is run in a debugger. (See also Can gdb debug suid root programs?)
Thus, if you debug the program as a user other than root, it will not be able to open the data file, but if you use sudo to run the debugger then you obtain the needed privelege to access the data file through sudo, and the fact that the SUID bit on the executable is not honored is irrelevant.
The best way to debug the program is in its build environment, before installation, such that it is owned by you and does not need (or have) its SUID bit set. This may require some manipulation of where or how it looks for its data file, which should also be owned by you.
As for how the program is installed, you have a tension between priorities:
Programs available for all users to run should be owned by root and writable only by root, to make it difficult for other users to modify them or substitute different program for them, both of which could lead to data breach and (further) privilege escalation.
You apparently require that users running the game program be able to write to a shared data file. It's unclear what this file contains, but a shared high score list might be an example.
But you do not (presumably) want to allow users to manipulate the data file arbitrarily, under their own authority, lest they cheat in some way, or worse.
The easiest approach would be to give each user their own, unshared data file, created at need by the program within the user's home directory, and accessible to that user. Then you don't need to mess with SUID / SGID, nor do you need to have any concern about users interfering with each other. Sure, they may be able to cheat, but it will affect only them. And you will be able to debug the program with GDB.
If it is essential that the data file be both shared among program users and writable (via the program) to all of them, then a better approach than making the program SUID-root would be to make it SGID-some_group_not_root, and make the data file writable by that group. Better still, avoid the SGID bit, and just require users to be members of the chosen group in order to use the program. Do note that SGID is not honored when debugging, either.

Copy files from emmc via uboot to tftp-server

i got the problem that a device isnt booting up into linux.
It just holds on "Starting kernel ...".
To get a better grip on what goes wrong i thought it would be nice to get access to the logs from linux.
I can access the userland from uboot via "ls":
Zynq> ls mmc 0:2
ostree/deploy/poky/deploy/9d325972b955e6584d3fad0a7ff1bf1a8.0/etc
<DIR> 2048 .
<DIR> 1024 ..
<DIR> 1024 modprobe.d
0 motd
<DIR> 1024 xdg
<DIR> 1024 logrotate.d
58 rpcbind.conf
1633 inputrc
828 mke2fs.conf
15 timestamp
10929 login.defs
324 issue
<DIR> 1024 sudoers.d
etc ...
Now im looking for a way to copy files from the userland to another device(remote-pc).
I learned about "tftpput" which is available in uboot.
My problem is that "tftpput" expects a save address and size. But i dont know how to get those information.
tftpput - TFTP put command, for uploading files to a server
Usage:
tftpput Address Size [[hostIPaddr:]filename]
I was not able to find a good documentation on "tftpput". Maybe someone has a link for me or provide me a small "how to" about this?
Thanks in advance
To answer the specific question, you need a tftp server on another machine. Then when you use 'load' to bring a file into memory you will now have that address, $filesize will now be set for you (for the size parameter) and the machine you setup a tftp server on is the final part of the command.
That said, if you only see "Starting kernel" and nothing else, it is quite likely that the linux kernel isn't getting to the point where the rootfs is mounted, userland runs and you're going to see log files. Without more information it's hard to say what you need to do here, but your bootargs are the first place to make sure are correct.
To analyze why the kernel is not booting you could enable the early console.
For ARM 64bit systems the early console is enabled via the kernel command line parameters. U-Boot takes these from the environment variable bootargs.
The arguments for earlycon depend on your board, e.g. for the Odroid C2:
setenv bootargs earlycon=meson,0xc81004c0
For an early console on 32bit ARM system you will have to compile the kernel with appropriate configuration options, e.g. for the Banana Pi:
CONFIG_DEBUG_LL=y
CONFIG_DEBUG_SUNXI_UART0=y
CONFIG_EARLY_PRINTK=y
lets assume that file.txt has 16bytes of size (it is 10 in hex)
First it is necessary load the file into the memory
fatload mmc 1:1 0x40400000 file.txt
Then you can send it to tftp server:
tftpput 0x40400000 10 192.168.7.1:filetxt

How do I revover a btrfs filesystem that will not mount (but mount returns without error), checks ok, errors out on restore?

SYNOPSIS
mount -o degraded,ro /dev/disk/by-uuid/ec3 /mnt/ec3/ && echo noerror
noerror
DESCRIPTION
mount -t btrfs fails but returns with noerror as above and only since the last reboot.
btrfs check seems clean to me (I am simple user).
btrfs restore errors out with "We have looped trying to restore files in"...
I had a lingering artifact btrfs filesystem show giving " *** Some devices missing " on the volume. This meant it would not automount on boot and I have been manually mounting (+ searching for a resolution to that)
I have previously used rdfind to deduplicate with hard links (as many as 10 per file)
I had just backed up using btrfs send/recieve but have to check if I have everything - this was the main Raid1 server
DETAILS
btrfs-find-root /dev/disk/by-uuid/ec3
Superblock thinks the generation is 103093
Superblock thinks the level is 1
Found tree root at 8049335181312 gen 103093 level 1
btrfs restore -Ds /dev/disk/by-uuid/ec3 restore_ec3
We have looped trying to restore files in
df -h /mnt/ec3/
Filesystem Size Used Avail Use% Mounted on
/dev/dm-0 16G 16G 483M 97% /
mount -o degraded,ro /dev/disk/by-uuid/ec3 /mnt/ec3/ && echo noerror
noerror
df /mnt/ec3/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 16775168 15858996 493956 97% /
btrfs filesystem show /dev/disk/by-uuid/ec3
Label: none uuid: ec3
Total devices 3 FS bytes used 1.94TiB
devid 6 size 2.46TiB used 1.98TiB path /dev/mapper/26d2e367-65ea-47ad-b298-d5c495a33efe
devid 7 size 2.46TiB used 1.98TiB path /dev/mapper/3c193018-5956-4637-9ec2-dd5e49a4a412
*** Some devices missing #### comment, this is an old artifact unchanged since before unable to mount
btrfs check /dev/disk/by-uuid/ec3
Checking filesystem on /dev/disk/by-uuid/ec3
UUID: ec3
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 2132966506496 bytes used err is 0
total csum bytes: 2077127248
total tree bytes: 5988204544
total fs tree bytes: 3492638720
total extent tree bytes: 242151424
btree space waste bytes: 984865976
file data blocks allocated: 3685012271104
referenced 3658835013632
btrfs-progs v4.1.2
Update:
After a reboot (had to wait for a slot to go down) the system mounts manually but not completely clean.
Now asking question on irc #btrfs:
! http://pastebin.com/359EtZQX
Hi, I'm scratching my head and searched in vain to remove *** Some devices missing. Can anyone help give me a clue to clean this up?
- is there a good way to 'fix' the artifacts I am seeing? trying: scrub, balance. Try: resize, defragment.
- would I be advised to move to a new clean volume set?
- would a fix via a btrfs send/recieve be safe from propogating errors?
- or (more painfully) rsync to a clean volume? http://pastebin.com/359EtZQX (My first ever day using irc)

Debugging the boot filesystem environment seen by syslinux?

Hope it's OK to jot this down, even if I cannot accept answer immediately (and hope it's OK for SO - as there is a C patch below):
It seems I screwed up the hard disk on my desktop PC ({DRDY err}). So I wanted to run a bootable media to run fsck, but the CD on this desktop is broken, so I can only use USB flash. I have a couple of USB thumbdrives with Ubuntu and Suse - these start booting on the desktop; but during boot, udev tries to detect hard drives, and since the hard disk is screwed, it just loops there, and the respective OS never finishes booting.
So I tried to download SystemRescueCd; I have this USB thumbdrive, on which I tried to install SystemRescueCD:
# lsusb with sudo, to retrieve all info
$ sudo lsusb -v -d 058f:6387 | grep -i 'id\|iManufacturer\|iProduct\|iSerial\|bInterface'
Bus 001 Device 043: ID 058f:6387 Alcor Micro Corp. Transcend JetFlash Flash Drive
idVendor 0x058f Alcor Micro Corp.
idProduct 0x6387 Transcend JetFlash Flash Drive
iManufacturer 1 takeMS
iProduct 2 Mem-drive Mini
iSerial 3 C5E7F0CC
bInterfaceNumber 0
bInterfaceClass 8 Mass Storage
bInterfaceSubClass 6 SCSI
bInterfaceProtocol 80 Bulk (Zip)
# search by serial:
$ find /dev/disk/by-id/ -name '*C5E7F0CC*'
/dev/disk/by-id/usb-takeMS_Mem-drive_Mini_C5E7F0CC-0:0-part1
/dev/disk/by-id/usb-takeMS_Mem-drive_Mini_C5E7F0CC-0:0
# list and get device node
$ ls -la /dev/disk/by-id/usb-takeMS_Mem-drive_Mini_C5E7F0CC-0:0
lrwxrwxrwx 1 root root 9 2013-03-25 20:37 /dev/disk/by-id/usb-takeMS_Mem-drive_Mini_C5E7F0CC-0:0 -> ../../sdc
$ ls -la /dev/disk/by-id/usb-takeMS_Mem-drive_Mini_C5E7F0CC-0\:0-part1
lrwxrwxrwx 1 root root 10 2013-03-25 20:37 /dev/disk/by-id/usb-takeMS_Mem-drive_Mini_C5E7F0CC-0:0-part1 -> ../../sdc1
# it is /dev/sdc - list disk info
$ sudo fdisk -l /dev/sdc
Disk /dev/sdc: 2108 MB, 2108686336 bytes
94 heads, 29 sectors/track, 1510 cylinders
Units = cylinders of 2726 * 512 = 1395712 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003e405
Device Boot Start End Blocks Id System
/dev/sdc1 * 1 1511 2059263+ c W95 FAT32 (LBA)
I tried to use my Ubuntu 11.04 Natty netbook to image the thumbdrive - and I used both
the recommended usb_inst.sh installer; and
I tried to use unetbootin (via sudo apt-get install unetbootin);
in both of these cases, when I try to boot the USB thumbdrive on the desktop, the boot procedure fails with:
SYSLINUX 4.02 debian-20101016 CHS Copyright (C) 1993-2010 H. Peter Anvin et al
ERROR: No configuration file found
No DEFAULT or UI configuration directive found!
boot:
.... with prompt at boot. (In fact, unetbootin fails at "Verifying DMI Pool Data", before entering syslinux - probably because it is much older than the .iso I'm trying to image).
First I checked the md5 as mentioned in No Default or UI Configuration Found!
$ md5sum ./systemrescuecd-x86-3.5.0.iso
48552b9e905872bd5061eb112b73ea20 ./systemrescuecd-x86-3.5.0.iso
... but it seems OK, as per Sysresccd-versions.
Then I tried to reformat the drive to FAT16 (via sudo gparted /dev/sdc); and repeated both usb_inst.sh and unetbootin methods - again no dice. Funny enough, in all of these cases, if I try to run the flash USB thumbdrive in the QEMU emulator:
# sudo apt-get install qemu
sudo qemu -hda /dev/sdc
... it boots fine - showing the syslinux menu and so on; however, boot always fails on the desktop.
Here I should mention, that I could write down the following from the boot screen of the problematic desktop PC:
Award Modular BIOS v6.00PG
AMDRS740 BIOS
It has a boot menu accessed via F12, and in the boot menu, among other options, these are for USB:
...
USB-FDD
USB-ZIP
USB-CDROM
USB-HDD
...
Typically, I choose USB-HDD - but I've tried the others; either the procedure freezes before even entering syslinux - or the boot fails as described above.
There is advice to rename directories/files manually from isolinux to syslinux (Trying to boot from usb - Ask Ubuntu) - when I used usb_inst.sh, only syslinux/isolinux.bin would have to be renamed. There is also advice to copy syslinux.cfg to the root of the USB flash thumbdrive (Cannot boot Live USB, Linux - Super User). But still no improvements - syslinux is still complaining that it is missing the configuration file - which apparently is the syslinux.cfg.
Then I tried to look if it is possible to somehow "debug" syslinux; found log tracing/debugging/trouble shooting in syslinux - The Syslinux Project - reboot.pro:
> Do we have specific commands to trace or log syslinux?
Being open source, one is able to compile Syslinux and enable extra debugging output.
also [SOLVED] Stuck on boot: Syslinux Problem [Archive] - Ubuntu Forums: "_
Debugging syslinux is described at http://www.syslinux.org/wiki/index.php/Development/Debugging , but effective debugging (if I recall correctly) requires recompiling it to add the debug hooks._". However, Development/Debugging - Syslinux Wiki talks about something called bochs; and I suspect that is to debug syslinux itself - not necessarily to "debug" (or query) the environment it is in.
Anyways, at last, I could see no way out but to get syslinux from source; basically, this was needed so it builds:
sudo apt-get install nasm
sudo apt-get install uuid-dev
git clone git://git.kernel.org/pub/scm/boot/syslinux/syslinux.git syslinux-git
cd syslinux-git/
make OPTFLAGS+=-DDEBUG=1
Turns out, it isn't really clear how to enable such debugging, that will show what syslinux "sees" when plugged in a given computer; given that I do load into syslinux at boot, the problem is what does it see as a filesystem. I tried to enable the DEBUG environment variable as shown above (after adding override OPTFLAGS := to the Makefile) - but that, in itself, generated no new messages during boot failure. I have used the following command to "burn" the USB thumbdrive (after unmounting it from the Gnome applet):
sudo ./linux/syslinux --stupid --directory /syslinux --install /dev/sdc1
... and I've tried both with stupid and without (and both for the source-built version, and the one from the Ubuntu package repositories for Natty).
Grepping through the source, I realized there is something called rosh (Read-Only SHell) - however, that compiles as a rosh.c32 - and one is supposed to have it as a boot kernel option in syslinux.cfg - which, as noted, I cannot load. So rosh.c32 is unfortunately not much help for my problem.
However, given that rosh implements the ls command, I tried to copy relevant portions into the code of syslinux - and trigger a ls / listing of the root when syslinux scans for the configuration file. With those changes, recorded in syslinux-e40ba60-rosh-ls.patch; now I get the following when I boot:
SYSLINUX 4.06 CHS 5-ge40ba60* Copyright (C) 1993-2010 H. Peter Anvin et al
Listing: "/"
rosh_ls_arg_dir 0 files found
Listing: "/syslinux"
Listing: ""
CurrentDirName: "/syslinux/"
confignamebuf: /syslinux/extlinux.conf; realpath -1
confignamebuf: /syslinux/syslinux.cfg; realpath -1
confignamebuf: /boot/syslinux/extlinux.conf; realpath -1
confignamebuf: /boot/syslinux/syslinux.cfg; realpath -1
confignamebuf: /syslinux/extlinux.conf; realpath -1
confignamebuf: /syslinux/syslinux.cfg; realpath -1
confignamebuf: /extlinux.conf; realpath -1
confignamebuf: /syslinux.cfg; realpath -1
ERROR: No configuration file found
No DEFAULT or UI configuration directive found!
Interestingly; for the root /, the _ls function at least returns "0 files"; the others ("/syslinux", and the empty string "") already fail at the opendir call - and so the _ls function doesn't even get called!
I would have thought that my slapstick copying of the ls function would not work as intended; but running the thumbdrive in qemu on netbook, does in fact provide a full listing of files - and given that at least for /, the function gets called and returns on the desktop - I'd suspect that it does indeed work.
However, that still doesn't solve my problem - why does syslinux, after boot, see 0 files under the root /? What else could I do to debug this problem? I wouldn't mind patching some C code into syslinux - but I just don't know what I should be looking for, that would point me to correct preparation of the USB thumbdrive for booting on the desktop machine...
OK, I got it to boot...
First, I noted there are alternative mbr's in the built git source as per Mbr - Syslinux Wiki and HowTos - Syslinux Wiki, so I tried both mbr.bin and altmbr.bin - altmbr.bin like this:
$ printf '\1' | cat mbr/altmbr.bin - | sudo dd bs=440 count=1 conv=notrunc iflag=fullblock of=/dev/sdc
... but that didn't help much.
Finally, I noted that lsusb says "bInterfaceProtocol 80 Bulk (Zip)"; and I remembered reading something about ZIP drives somewhere, so tried to look it up - and finally found this:
syslinux/doc/usbkey.txt
The proper mode to boot a USB key drive in is "USB-HDD". That is the
ONLY mode in which the C/H/S geometry encoded on the disk itself
doesn't have to match what the BIOS thinks it is. Since geometry on
USB drives is completely arbitrary, and can vary from BIOS to BIOS,
this is the only mode which will work in general.
Some BIOSes have been reported (in particular, certain versions of the
Award BIOS) that cannot boot USB keys in "USB-HDD" mode. This is a
very serious BIOS bug, but it is unfortunately rather typical of the
kind of quality we're seeing out of major BIOS vendors these days. On
these BIOSes, you're generally stuck booting them in USB-ZIP mode.
THIS MEANS THE FILESYSTEM IMAGE ON THE DISK HAS TO HAVE A CORRECT
ZIPDRIVE-COMPATIBLE GEOMETRY.
....
The script "mkdiskimage" which is supplied with the syslinux
distribution can be used to initialize USB keys in a Zip-like fashion.
To do that, calculate the correct number of cylinders (31 in the
example above), and, if your USB key is /dev/sda (CHECK THE KERNEL
MESSAGES CAREFULLY - IF YOU ENTER THE WRONG DISK DRIVE IT CANNOT BE
RECOVERED), run:
mkdiskimage -4 /dev/sda 0 64 32
(The 0 means automatically determine the size of the device, and -4
means mimic a zipdisk by using partition 4.)
So, as recommended there, first I find the number of cylinders for my thumbdrive:
$ grep 512-byte /var/log/syslog | tail -n 1
Mar 25 22:33:34 mypc kernel: [50884.608687] sd 45:0:0:0: [sdc] 4118528 512-byte logical blocks: (2.10 GB/1.96 GiB)
# get number of cylinders:
$ wcalc '4118528/(64*32)'
= 2011
... then I continue with mkdiskimage. After that was done, I tried usb_inst.sh again - and realized that it will overwrite the partition 4 that mkdiskimage made, and make a partition 1 for itself instead. That means, one should copy those files fron usb_inst.sh in a backup elsewhere, then run mkdiskimage - then finally copy the backed up files back to thumbdrive again; here is a command line log:
# mkdiskimage is present in syslinux-git:
$ ./utils/mkdiskimage
Usage: ./utils/mkdiskimage [-doFMz4][-i id] file c h s (max: 1024 256 63)
....
# ... but also in Debian/Ubuntu packaging of syslinux
$ mkdiskimage -4 /dev/sdc 0 64 32
/usr/bin/mkdiskimage: /dev/sdc: don't know how to determine the size of this device
# use sudo - note this command takes a while to complete:
$ sudo mkdiskimage -4 /dev/sdc 0 64 32
Warning: more than 1024 cylinders (2011).
Not all BIOSes will be able to boot this device.
$ ls /dev/sdc*
/dev/sdc /dev/sdc4
$ sudo fdisk -l /dev/sdc
Disk /dev/sdc: 2108 MB, 2108686336 bytes
64 heads, 32 sectors/track, 2011 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x866262cc
Device Boot Start End Blocks Id System
/dev/sdc4 * 1 2011 2059248 e W95 FAT16 (LBA)
# (make sure umounted / ejected)
# cd to usb_inst.sh directory; and
# run usb_inst.sh for /dev/sdc; note it will:
# write MBR and "Creating filesystem on /dev/sdc1..."
# and "installing boot loader on /dev/sdc1";
# regardless of the previous setup on partition 4:
sudo bash ./usb_inst.sh
# now no more partition 4:
$ ls /dev/sdc*
/dev/sdc /dev/sdc1
# ( mount /dev/sdc1 via disk applet )
$ rsync -a /media/SYSRESC /media/backup/
# ... duhh... - again now
# ( umount/eject via disk applet )
$ sudo mkdiskimage -4 /dev/sdc 0 64 32
Warning: more than 1024 cylinders (2011).
Not all BIOSes will be able to boot this device.
$ sudo ./linux/syslinux --install /dev/sdc4
# ( mount via disk applet )
$ rsync -a /media/backup/SYSRESC/ /media/31A8-40E9/
$ sudo qemu -hda /dev/sdc # works
# ( umount/eject via disk applet )
# boot on desktop - works! loads rescue64 and initram.igz...
The interesting thing is - even if there is the warning "Not all BIOSes will be able to boot this device."; somehow this problematic BIOS loads this thumbdrive without a problem (and the _ls function above lists fine). Also interesting - here I choose the USB-HDD boot option (not the USB-ZIP) and it still works ?!
So, as a partial answer - I guess the way to debug this, would be for syslinux to somehow write on the thumbdrive the CHS geometry it sees during the syslinux installation; and on boot, to query the BIOS (I guess) about which CHS geometry the BIOS sees - and then dump these two geometries to screen; if there is a mismatch, then it is likely one should run mkdiskimage (unfortunately, I wouldn't know how to code that into syslinux)
Going back to my original HDD problem - turns out also SystemRescueCD uses udev to probe for devices - and again the boot process cannot complete (even if I choose the boot option "all files to memory (docache)")... So I get messages like:
udevadm settle - timeout of 180 seconds reached, the event queue contains:
Activating dmraid (fake hardware raid) ...
Starting mdadm (linux software raid) ....
udevd[88] worker [91] unexpectedly returned with status 0x0100 ...
udevd[88] worker [91] failed while handling '/devices/pci0000:00/.../sdb/sdb1'
So, I either find a Live USB distro which does not probe for disks using udev - or I better take this HDD out, toss it into a HDD USB enclosure, and try fsck it on another computer (hopefully I'll be able to blacklist this drive from udev on a running system)
Edit Aug 24 2013: Back to this problem, I thought I'd jot down few extra notes:
Since I cannot yet afford the time to fix this PC and its faulty drive, I've used this USB thumbdrive to boot multiple operating systems: PartedMagic and SliTaz did also encounter errors on the hard disk - but apparently use different drivers to access it (so the DRDY ERR loop didn't start), and they could finish booting relatively fast. Then I tried building a custom Ubuntu 12.04 image (using ubuntu-builder) - and this one ended up in a DRDY ERR loop, which may take more than 5 minutes to complete, before the OS finishes booting. I have posted more about this in Bug #1216397 “
It should be possible to ignore (skip probing) a known bad disk partition at boot” : Bugs : “linux” package : Ubuntu.
There are a few interesting things in respect to syslinux, now that this USB thumbdrive is used to boot multiple operating systems. First of all, the thumbdrive is, still, first made bootable with syslinux --install while empty (which places a file ldlinux.sys in the partition's root) - which corresponds to the mkdiskimage step above; and only afterwards are files (like kernel images, and including /boot/syslinux/syslinux.cfg) copied to it.
Now, I'd first build the CD image ISO in ubuntu-builder, and test it using VirtualBox (as qemu on my machine is way too slow for that). Once the ISO image was shown to work as expected, then only the files under its casper directory are relevant for the USB thumbdrive thus prepared; and they can be referenced through a boot menu entry in syslinux.cfg. So, I'd edit the syslinux.cfg on the thumbdrive, and copy the casper image files (e.g. filesystem.squashfs) to the thumbdrive - and test it with qemu as above. Once this qemu step passed, I'd move the USB thumbdrive on the target PC with the broken drive - and interestingly, here I might get syslinux boot failures of multiple sorts (during different boot stages):
"No DEFAULT or UI configuration directive found!" (or sometimes a "Bad <something> ..." message), before the syslinux boot menu is shown - even if the debug, as above, would show that syslinux reads the filesystem on the thumbdrive correctly, and finds the /boot/syslinux/syslinux.cfg (which does have proper directives)!
"Invalid or corrupt kernel image", once the syslinux menu is shown, and the new kernel image (Ubuntu) chosen - even if the other images (found previously on the thumb) boot fine on the broken drive PC; and the new image boots fine from thumb in qemu on a different machine!
"/init: line 7: can't open /dev/sr0: no medium found", once the new (Ubuntu) image is chosen from syslinux menu, and it starts booting; this seems an Ubuntu specific message, appearing a few seconds after it starts booting. I still encounter it even if booting completes succesfully - when it's a problem, this message just loops repeatedly, not allowing the rest of the boot process to complete
It turns out, any of these can appear whenever I try to change and save the syslinux.cfg file on the thumbdrive; or when I make changes in the casper image files, and I rsync or copy them to the thumbdrive. Maybe the copying process (since it may change the sectors where the files are located on the thumb), "confuses" parts of the boot process - although, this shouldn't happen, since also the working procedure above starts from a blanked, syslinux'd thumbdrive, to which files are copied after; so I think this may point to failing sectors on the thumbdrive.
However, even in this state, the working procedure above seemed to be useful - because using it, I could recover the thumb back to a working state! In more detail, it goes like this:
Keep a copy of the thumbdrive files somewhere on a different disk (e.g. ~/thumbcopy) - but without the ldlinux.sys file.
Whenever you want to make a change (to syslinux.cfg or to bootable image files) - make sure this change is saved in ~/thumbcopy first
Now, say I've changed some files on the bootable thumbdrive directly, and I encounter one of the errors above. Then:
First, delete all files but the ldlinux.sys on the thumbdrive, e.g.: rm -rf $(ls -I"ldlinux.sys" /media/31A8-40E9/)
Then, rsync or copy (cp -arv ...) the files in ~/thumbcopy to the thumbdrive, e.g.: rsync -aP ~/thumbcopy/ /media/31A8-40E9/
Now, try boot the thumbdrive in the PC again - it usually boots fine!
I've encountered all three types of errors, because I'd often try to change/copy individual files directly in the thumbdrive: sometimes the change doesn't introduce a problem, so booting is fine - however, in many cases, it does introduce a problem. For some reason, using the above procedure I managed to recover the thumbdrive from either type of abovementioned problems - maybe it has to do with USB Flash delayed writes, maybe with USB Flash failing sectors, I cannot really tell... But in any case: deleting all files, and re-copying them in one go, does seem to be a worthwhile procedure to try in case of errors like that.
It's an ancient post, but in case others stumble upon this, I'll add an answer anyway.
If you're struggling to get syslinux to boot, ROSH (Read-only Shell) can be useful, as you mentioned. To start ROSH, you can simply type rosh at the boot: prompt (if you do have a working graphical menu, press escape to drop back to the boot: prompt.
Inside the shell, you have some basic commands to look around in your environment. For more documentation, see https://wiki.syslinux.org/wiki/index.php?title=Read-Only_SHell(rosh.c32)

Linux programming: which device a file is in

I would like to know which entry under /dev a file is in. For example, if /dev/sdc1 is mounted under /media/disk, and I ask for /media/disk/foo.txt, I would like to get /dev/sdc as response.
Using stat system call on that file I will get its partition major and minor numbers (8 and 33, for sdc1). Now I need to get the "root" device (sdc) or its major/minor from that. Is there any syscall or library function I could use to link a partition to its main device? Or even better, to get that device directly from the file?
brw-rw---- 1 root floppy 8, 32 2011-04-01 20:00 /dev/sdc
brw-rw---- 1 root floppy 8, 33 2011-04-01 20:00 /dev/sdc1
Thanks in advance!
The quick and dirty version: df $file | awk 'NR == 2 {print $1}'.
Programmatically... well, there's a reason I started with the quick and dirty version. There's no portable way to programmatically get the list of mounted filesystems. (getmntent() gets fstab entries, which is not the same thing.) Moreover, you can't even parse the output of mount(8) reliably; on different Unixes, the mountpoint may be the first or the last item. The most portable way to do this ends up being... parsing df output (And even that is iffy, as you noticed with the partition number.). So you're right back to the quick and dirty shell solution anyway, unless you want to traverse /dev and look for block devices with matching major(st_rdev) (major() being from sys/types.h).
If you restrict this to Linux, you can use /proc/mounts to get the list of mounted filesystems. Other specific Unixes can similarly be optimized: for example, on OS X and I think FreeBSD, you can use sysctl() on the vfs tree to get mountpoints. At worst you can find and use the appropriate header file to decipher whatever the mount table file is (and yes, even that varies: on Solaris it's /etc/mnttab, on many other systems it's /etc/mtab, some systems put it in /var/run instead of /etc, and on many Linuxes it's either nonexistent or a symlink to /proc/mounts). And its format is different on pretty much every Unix-like OS.
The information you want exists in sysfs which exposes the linux device tree. This models the relationships between the devices on the system and since you are trying to determine a parent disk device from a partition, this is the place to look. I don't know if there are any hard and fast rules you can rely on to stop your code breaking with future versions of the kernel, but the kernel developers do try to maintain sysfs as a stable interface.
If you look at /sys/dev/block/<major>:<minor>, you'll see it is a symlink with the tail components being block/<disk-device-name>/<partition-device-name>. If you were to perform a readlink(2) system call on that, you could parse the link destination to get the disk device name. In shell (since it's easier to express this way, but doing it in C will be pretty easy):
$ echo $(basename $(dirname $(readlink /sys/dev/block/8:33)))
sdc
Alternatively, you could take advantage of the nesting of partition directories in the disk directories (again in shell, but from C, its an open(2), read(2), and close(2)):
$ cat /sys/dev/block/8:33/../dev
8:32
That assumes your starting major:minor is actually for a partition, not some other sort of non-nested device.
What you looking for is impossible - there is no 1:1 connection between a block device file and the partition it is describing.
Consider:
You can create multiple block device files with different names (but the same major and minor numbers) and they are indistinguishable (N:1)
You can use a block device file as an argument to mount to mount a partition and then delete the block device file leaving the partition mounted. (0:1)
So there is no way to do what you want except in a few specific and narrow cases.
Major number will tell you which device it is: 3 - IDE on 1st controller, 22 - IDE on 2nd controller and 8 for SCSI.
Minor number will tell you partition number and - for IDE devices - if it's primary or secondary drive. This calculation is different for IDE and SCSI.
For IDE it is: x*64 + p, x is drive number on the controller (0 or 1) and p is partition
For SCSI it is: y*16 + p, where y is drive number and p is partition
Not a syscall, but:
df -h /path/to/my/file
From https://unix.stackexchange.com/questions/128471/determine-what-device-a-directory-is-located-on
So you could look at df's source code and see what it does.
I realize this post is old, but this question was the 2nd result in my search and no one has mentioned df -h

Resources