Different number of blocks allocated using stat() and ls -s - c

I was trying to get number of blocks allocated to a file using C. I used the stat struct with its variable called st_blocks. However this is returning different number of blocks as compared to ls -s. Can anybody explain the reason for this and if there is a way to correct this?

There is no discrepancy; just a misunderstanding. There are two separate "block sizes" here. Use ls -s --block-size=512 to use 512 byte block size for ls, too.
The ls -s command lists the size allocated to the file in user-specified units ("blocks"), the size of which you can specify using the --block-size option.
The st_blocks field in struct stat is in units of 512 bytes.
You see a discrepancy, because the two "block sizes" are not the same. They just happen to be called the same name.
Here is an example that you can examine this effect. This works on all POSIXy/Unixy file systems (that support sparse file), but not on FAT/VFAT etc.
First, let's create a file that but is one megabyte long, but has a hole at the beginning (they read zeros, but are not actually stored on disk), with a single byte at end (I'll use 'X').
We do this by using dd to skip the first 1048575 bytes of the file (creating a "hole", and thus a sparse file on filesystems that support such):
printf 'X' | dd bs=1 seek=1048575 of=sparse-file count=1
We can use the stat utility to examine the file. Format specifier %s provides the logical size of the file (1048576), %b the number of blocks (st_blocks):
stat -c 'st_size=%s st_blocks=%b' sparse-file
On my system, I get st_size=1048576 st_blocks=8, because the actual filesystem block size is 4096 bytes (= 8×512), and this sparse file needs only one filesystem block.
However, using ls -s sparse-file I get 4 sparse-file, because the default ls block size is 1024 bytes. If I run
ls --block-size=512 -s sparse-file
then I see 8 sparse-file, as I'd expect.

"Blocks" here are not real filesystem blocks. They're convenient chunks for display.
st_blocks is using probably 512 byte blocks. See the POSIX spec.
st_blksize is the preferred block size for this file, but not necessarily the actual block size.
BSD ls -s always uses 512 byte "blocks". OS X, for example, uses BSD ls by default.
$ /bin/ls -s index.html
560 index.html
GNU ls appears to use 1K blocks unless overriden with --block-size.
$ /opt/local/bin/gls -s index.html
280 index.html
printf("%lld / %d\n", buf.st_blocks, buf.st_blksize); produces 560 / 4096. The 560 "blocks" are in 512 byte chunks, but the real filesystem blocks are 4k.
The file contains 284938 bytes of data...
$ ls -l index.html
-rw-r--r-- 1 schwern staff 284938 Aug 11 2016 index.html
...but we can see it uses 280K on disk or 70 bytes.
Note that OS X further confuses the issue by using 1000 bytes for a "kilobyte" instead of the correct 1024 bytes, that's why it says 287 KB for 70 4096 KB blocks (ie. 286720 bytes) instead of 280 KB. This was done because hard drive manufacturers started using 1000 byte "kilobytes" in order to inflate their size, and Apple got tired of customers complaining about "lost" disk space.
The 4K block size can be seen by making a tiny file.

Related

How to ensure I am programming on persistent memory

I am working on a final project for school that involves making a program that reads and writes to files on persistent memory. I don't have access to an actual persistent memory device so I followed the tutorial at the following link to emulate it:
https://kb.pmem.io/howto/100000012-How-To-Emulate-Persistent-Memory-using-the-Linux-memmapKernel-Option/
user#node:~/test2$ sudo fdisk -l /dev/pmem0
Disk /dev/pmem0: 8 GiB, 8589934592 bytes, 16777216 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes
According to the above output, I have an emulated persistent memory partition on my dram hard drive named pmem0. I used the tutorial at the following link to get some sample code on how to read and write from files on persistent memory:
https://www.intel.com/content/www/us/en/developer/articles/code-sample/create-a-persistent-memory-hello-world-program-using-libpmemobj-with-c.html
I compile and run the code and the output of the program says it is writing a file to persistent memory. I wanted to check if the file is being written to the correct partition so I used the command:
df -P fileName | tail -1 | cut -d' ' -f 1 (which I got from another stackoverflow post) and the output is /dev/sda1 when I believe it should be /dev/pmem0 if it is actually using the persistent memory partition.
I'm looking for tips on how to ensure I am actually mapping my code to the persistent memory partition I created.
The code from the second link needs to have the pmdk library setup to run properly and I needed to modify my linux kernel boot parameters to setup the partition so I'm not sure if I can supply a much better minimal example.
Edit: I believe the exact lines of code I need to modify would be one of:
pop = pmemobj_create(path, LAYOUT, PMEMOBJ_MIN_POOL, 0666);
PMEMoid root = pmemobj_root(pop, sizeof (struct my_root));
pmemobj_persist(pop, &rootp->len, sizeof (rootp->len));`
This creates the persistent memory pool but the tutorial doesn't seem to mention how to actually map to a persistent memory device.
why does the string get written to the hard drive in sda1, which I believe is regular memory?
The referred C program calls pmemobj_create() with the path you specify as the second program argument, in your case fileName. If the current working directory is on the hard drive, the created memory pool file fileName is of course also residing on the hard drive. If you want a file to be created on your /dev/pmem0 disk, you have to mount that file system and use a path to a file thereon.

What kinds of things are stored in 1 byte files?

Page 301 of Tanenbaum's Modern Operating Systems contains the table below. It gives the file sizes on a 2005 commercial Web server. The chapter is on file systems, so these data points are meant to be similar to what you would see on a typical storage device.
File length (bytes)
Percentage of files less than length
1
6.67
2
7.67
4
8.33
8
11.30
16
11.46
32
12.33
64
26.10
128
28.49
...
...
1KB
47.82
...
...
1 MB
98.99
...
...
128 MB
100
In the table, you will see that 6.67% of files on this server are 1 byte in length. What kinds of processes are creating 1 byte files? What kind of data would be stored in these files?
I wasn't familiar with that table, but it piqued my interest. I'm not sure what the 1-byte files were at the time, but perhaps the 1-byte files of today can shed some light?
I searched for files of size 1 byte with
sudo find / -size 1c 2>/dev/null | while read line; do ls -lah $line; done
Looking at the contents of these files on my system, they contain a single character: a newline. This can be verified by running the file through hexdump. A file with a single newline can exist for multiple reasons, but it probably has to do with the convention of terminating a line with a newline.
There is a second type of file with size 1 byte: symbolic links where the target is a single character. ext4 appears to report the length of the target as the size of the symbolic link (at least for short-length targets).

C : write() fails while writing to a RAW disk after specific size limit reached

I am using a C program open(), close(), read(), write() technique to copy a large file to RAW disk. The disk size is 20 GB and the file size is 17 GB but, every time after around writing 945 MB write() throws No space left on device error.
I have run fdisk -l /dev/sdb and returns 20.5 GB and du /dev/sdb says 945126569
Then I tried, cat mylargefile > /dev/sdb it too throws same No space left on device error and then I do cat /dev/sdb > /tmp/sdb.img it completes normally. Then I do ls -ld /tmp/sdb.img it responds 945126569
I can use the same disk to create ext4 file system and format it without any issues, so disk error is improbable. (I guess ...)
I am using Ubuntu 16.04 LTS amd64 OS with latest GCC to build my program.
Can anyone suggest where am I going wrong or what needs to be done to avoid this?
du /dev/sdb should say 0 if /dev/sdb is a block device. Try also blockdev --report /dev/sdb.
What happened is that in the begining you didn't have a device file named /dev/sdb at all, and you created a regular file named /dev/sdb,
copied 945 MiB into it. This filled the partition on which /dev/ is located, and thus you get the error. fdisk just reads the partition table that is contained in the first 945 MiB and thinks it sees a hard disk of 20 GiB.
When you do cat mylargefile > /dev/sdb, the file /dev/sdb is first truncated to size 0, so there is now 945 MiB free space again that cat will proceed to fill.
To avoid this: make sure that you open a device by its correct name. In C open the device without O_CREAT.

Why does du -k display block counts in 4096-byte blocks even though it says 1024?

When I ran this
du -k *
I expected the output for each file to be ceil(filesize/1024) but the output was ceil(filesize/4096) * 4. Why is that?
Description of -k in $ man du: Display block counts in 1024-byte (1-Kbyte) blocks.
I'm using OS X if that makes any difference.
The file system allocates space in units of 4K (4096 bytes). If you create a 1 byte file, the file system will allocate 4K of storage to hold the file.
The du -k command reports the total storage used by the file system. So du -k reports that the file system is using 4K of space for that file.

Header and structure of a tar format

I have a project for school which implies making a c program that works like tar in unix system. I have some questions that I would like someone to explain to me:
The dimension of the archive. I understood (from browsing the internet) that an archive has a define number of blocks 512 bytes each. So the header has 512 bytes, then it's followed by the content of the file (if it's only one file to archive) organized in blocks of 512 bytes then 2 more blocks of 512 bytes.
For example: Let's say that I have a txt file of 0 bytes to archive. This should mean a number of 512*3 bytes to use. Why when I'm doing with the tar function in unix and click properties it has 10.240 bytes? I think it adds some 0 (NULL) bytes, but I don't know where and why and how many...
The header chcksum. As I know this should be the size of the archive. When I check it with hexdump -C it appears like a number near the real size (when clicking properties) of the archive. For example 11200 or 11205 or something similar if I archive a 0 byte txt file. Is this size in octal or decimal? My bets are that is in octal because all information you put in the header it needs to be in octal. My second question at this point is what is added more from the original size of 10240 bytes?
Header Mode. Let's say that I have a file with 664, the format file will be 0, then I should put in header 0664. Why, on a authentic archive is printed 3 more 0 at the start (000064) ?
There have been various versions of the tar format, and not all of the extensions to previous formats were always compatible with each other. So there's always a bit of guessing involved. For example, in very old unix systems, file names were not allowed to have more than 14 bytes, so the space for the file name (including path) was plenty; later, with longer file names, it had to be extended but there wasn't space, so the file name got split in 2 parts; even later, gnu tar introduced the ##LongLink pseudo-symbolic links that would make older tars at least restore the file to its original name.
1) Tar was originally a *T*ape *Ar*chiver. To achieve constant througput to tapes and avoid starting/stopping the tape too much, several blocks needed to be written at once. 20 Blocks of 512 bytes were the default, and the -b option is there to set the number of blocks. Very often, this size was pre-defined by the hardware and using wrong blocking factors made the resulting tape unusable. This is why tar appends \0-filled blocks until the tar size is a multiple of the block size.
2) The file size is in octal, and contains the true size of the original file that was put into the tar. It has nothing to do with the size of the tar file.
The checksum is calculated from the sum of the header bytes, but then stored in the header as well. So the act of storing the checksum would change the header, thus invalidate the checksum. That's why you store all other header fields first, set the checksum to spaces, then calculate the checksum, then replace the spaces with your calculated value.
Note that the header of a tarred file is pure ascii. This way, In those old days, when a tar file (whose components were plain ascii) got corrupted, an admin could just open the tar file with an editor and restore the components manually. That's why the designers of the tar format were afraid of \0 bytes and used spaces instead.
3) Tar files can store block devices, character devices, directories and such stuff. Unix stores these file modes in the same place as the permission flags, and the header file mode contains the whole file mode, including file type bits. That's why the number is longer than the pure permission.
There's a lot of information at http://en.wikipedia.org/wiki/Tar_%28computing%29 as well.

Resources