I want to write a small program in C which can determine the sector size of a hard disk. I wanted to read the file located in /sys/block/sd[X]/queue/hw_sector_size, and it worked in CentOS 6/7.
However when I tested in CentOS 5.11, the file hw_sector_size is missing, and I have only found max_hw_sectors_kb and max_sectors_kb.
Thus, I'd like to know how can I determine (APIs) the sector size in CentOS 5, or is there an other better way to do so. Thanks.
The fdisk utility displays this information (and runs successfully on kernels older even than than the 2.6.x vintage on CentOS 5), so that seems a likely place to look for an answer. Fortunately, we're living in the wonderful world of open source, so all it requires is a little investigation.
The fdisk program is provided by the util-linux package, so we need that first.
The sector size is displayed in the output of fdisk like this:
Disk /dev/sda: 477 GiB, 512110190592 bytes, 1000215216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
If we look for Sector size in the util-linux code, we find it in disk-utils/fdisk-list.c:
fdisk_info(cxt, _("Sector size (logical/physical): %lu bytes / %lu bytes"),
fdisk_get_sector_size(cxt),
fdisk_get_physector_size(cxt));
So, it looks like we need to find fdisk_get_sector_size, which is defined in libfdisk/src/context.c:
unsigned long fdisk_get_sector_size(struct fdisk_context *cxt)
{
assert(cxt);
return cxt->sector_size;
}
Well, that wasn't super helpful. We need to find out where cxt->sector_size is set:
$ grep -lri 'cxt->sector_size.*=' | grep -v tests
libfdisk/src/alignment.c
libfdisk/src/context.c
libfdisk/src/dos.c
libfdisk/src/gpt.c
libfdisk/src/utils.c
I'm going to start with alignment.c, since that filename sounds promising. Looking through that file for the same regex I used to list the files, we find this:
cxt->sector_size = get_sector_size(cxt->dev_fd);
Which leads me to:
static unsigned long get_sector_size(int fd)
{
int sect_sz;
if (!blkdev_get_sector_size(fd, §_sz))
return (unsigned long) sect_sz;
return DEFAULT_SECTOR_SIZE;
}
Which in turn leads me to the definition of blkdev_get_sector_size in lib/blkdev.c:
#ifdef BLKSSZGET
int blkdev_get_sector_size(int fd, int *sector_size)
{
if (ioctl(fd, BLKSSZGET, sector_size) >= 0)
return 0;
return -1;
}
#else
int blkdev_get_sector_size(int fd __attribute__((__unused__)), int *sector_size)
{
*sector_size = DEFAULT_SECTOR_SIZE;
return 0;
}
#endif
And there we go. There is a BLKSSZGET ioctl that seems useful. A search for BLKSSZGET leads us to this stackoverflow question, which includes the following information in a comment:
For the record: BLKSSZGET = logical block size, BLKBSZGET = physical
block size, BLKGETSIZE64 = device size in bytes, BLKGETSIZE = device
size/512. At least if the comments in fs.h and my experiments can be
trusted. – Edward Falk Jul 10 '12 at 19:33
Related
I want to calculate the SHA256 value of a file, which has a size of more than 1M. In order to get this hash value with the mbedtls library, I need to copy the whole file to the memory. But my memory size is only 100K. So I want to know if there is some method that calculates the file hash value in sections.
In order to get this hash value with mbedtls library, I need to copy the whole file to the memory.
This is not accurate. The mbedtls library supports incremental calculation of hash values.
To calculate a SHA-256 hash with mbedtls, you would have to take the following steps (reference):
Create an instance of the mbedtls_sha256_context struct.
Initialize the context with mbedtls_sha256_init and then mbedtls_sha256_starts_ret.
Feed data into the hash function with mbedtls_sha256_update_ret.
Calculcate the final hash sum with mbedtls_sha256_finish_ret.
Free the context with mbedtls_sha256_free
Note that this does not mean that the mbedtls_sha256_context struct holds the entire data until mbedtls_sha256_finish_ret is called. Instead, mbedtls_sha256_context only holds the intermediate result of the hash calculation. When feeding additional data into the hash function with mbedtls_sha256_update_ret, the state of the calculation is updated and the new intermediate result is stored in the mbedtls_sha256_context.
The total size of a mbedtls_sha256_context, as determined by sizeof( mbedtls_sha256_context), is 108 bytes on my system. We can also see this from the mbedtls source code (reference):
typedef struct mbedtls_sha256_context
{
uint32_t total[2]; /*!< The number of Bytes processed. */
uint32_t state[8]; /*!< The intermediate digest state. */
unsigned char buffer[64]; /*!< The data block being processed. */
int is224; /*!< Determines which function to use:
0: Use SHA-256, or 1: Use SHA-224. */
}
mbedtls_sha256_context;
We can see that the struct holds a counter of size 2*32 bit = 8 byte that keeps track of the total number of bytes processed so far. 8*32 bit = 32 byte are used to track the intermediate result of the hash calculation. 64 byte are used to track the current data block being processed. As you can see, this is a fixed size buffer that does not grow with the amount of data that is being hashed. Finally an int is used to distinguish between SHA-224 and SHA-256. On my system sizeof(int) == 4. So in total, we get the 8+32+64+4 = 108 byte.
Consider the following example program, which reads a file step by step into a buffer of size 4096 and feeds the buffer into the hash function in each step:
#include <mbedtls/sha256.h>
#include <stdio.h>
#include <stdlib.h>
#define BUFFER_SIZE 4096
#define HASH_SIZE 32
int main(void) {
int ret;
// Initialize hash
mbedtls_sha256_context ctx;
mbedtls_sha256_init(&ctx);
mbedtls_sha256_starts_ret(&ctx, /*is224=*/0);
// Open file
FILE *fp = fopen("large_file", "r");
if (fp == NULL) {
ret = EXIT_FAILURE;
goto exit;
}
// Read file in chunks of size BUFFER_SIZE
uint8_t buffer[BUFFER_SIZE];
size_t read;
while ((read = fread(buffer, 1, BUFFER_SIZE, fp)) > 0) {
mbedtls_sha256_update_ret(&ctx, buffer, read);
}
// Calculate final hash sum
uint8_t hash[HASH_SIZE];
mbedtls_sha256_finish_ret(&ctx, hash);
// Simple debug printing. Use MBEDTLS_SSL_DEBUG_BUF in a real program.
for (size_t i = 0; i < HASH_SIZE; i++) {
printf("%02x", hash[i]);
}
printf("\n");
// Cleanup
fclose(fp);
ret = EXIT_SUCCESS;
exit:
mbedtls_sha256_free(&ctx);
return ret;
}
When running a program on a large sample file, the following behavior can be observed:
$ dd if=/dev/random of=large_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB, 977 MiB) copied, 5.78353 s, 177 MB/s
$ sha256sum large_file
ae2d3b46eec018e006533da47a80e933a741a8b1320cfce7392a5472faae0216 large_file
$ gcc -O3 -static test.c /usr/lib/libmbedcrypto.a
$ ./a.out
ae2d3b46eec018e006533da47a80e933a741a8b1320cfce7392a5472faae0216
We can see that the program calculates the correct SHA-256 hash. We can also inspect the memory used by the program:
$ command time -v ./a.out
...
Maximum resident set size (kbytes): 824
...
We can see that the program consumed at most 824 KB of memory. Thus, we have calculated the hash of a 1 GB file with < 1MB of memory. This shows that we do not have to load the entire file into memory at once to calculate its hash with mbedtls.
Keep in mind this measurement was done on a 64 bit desktop computer, not an embedded platform. Also, no further optimizations were performed besides -O3 and static linking (the latter approximately halved the memory usage of the program). I would expect the memory footprint to be even smaller on an embedded device with a smaller address size and a tool chain performing further optimizations.
I am looking for a definitive specification describing the expected arguments and behavior of ioctl 0x1268 (BLKSSZGET).
This number is declared in many places (none of which contain a definitive reference source), such as linux/fs.h, but I can find no specification for it.
Surely, somebody at some point in the past decided that 0x1268 would get the physical sector size of a device and documented that somewhere. Where does this information come from and where can I find it?
Edit: I am not asking what BLKSSZGET does in general, nor am I asking what header it is defined in. I am looking for a definitive, standardized source that states what argument types it should take and what its behavior should be for any driver that implements it.
Specifically, I am asking because there appears to be a bug in blkdiscard in util-linux 2.23 (and 2.24) where the sector size is queried in to a uint64_t, but the high 32-bits are untouched since BLKSSZGET appears to expect a 32-bit integer, and this leads to an incorrect sector size, incorrect alignment calculations, and failures in blkdiscard when it should succeed. So before I submit a patch, I need to determine, with absolute certainty, if the problem is that blkdiscard should be using a 32-bit integer, or if the driver implementation in my kernel should be using a 64-bit integer.
Edit 2: Since we're on the topic, the proposed patch presuming blkdiscard is incorrect is:
--- sys-utils/blkdiscard.c-2.23 2013-11-01 18:28:19.270004947 -0400
+++ sys-utils/blkdiscard.c 2013-11-01 18:29:07.334002382 -0400
## -71,7 +71,8 ##
{
char *path;
int c, fd, verbose = 0, secure = 0;
- uint64_t end, blksize, secsize, range[2];
+ uint64_t end, blksize, range[2];
+ uint32_t secsize;
struct stat sb;
static const struct option longopts[] = {
## -146,8 +147,8 ##
err(EXIT_FAILURE, _("%s: BLKSSZGET ioctl failed"), path);
/* align range to the sector size */
- range[0] = (range[0] + secsize - 1) & ~(secsize - 1);
- range[1] &= ~(secsize - 1);
+ range[0] = (range[0] + (uint64_t)secsize - 1) & ~((uint64_t)secsize - 1);
+ range[1] &= ~((uint64_t)secsize - 1);
/* is the range end behind the end of the device ?*/
end = range[0] + range[1];
Applied to e.g. https://www.kernel.org/pub/linux/utils/util-linux/v2.23/.
The answer to "where is this specified?" does seem to be the kernel source.
I asked the question on the kernel mailing list here: https://lkml.org/lkml/2013/11/1/620
In response, Theodore Ts'o wrote (note: he mistakenly identified sys-utils/blkdiscard.c in his list but it's inconsequential):
BLKSSZGET returns an int. If you look at the sources of util-linux
v2.23, you'll see it passes an int to BLKSSZGET in
sys-utils/blkdiscard.c
lib/blkdev.c
E2fsprogs also expects BLKSSZGET to return an int, and if you look at
the kernel sources, it very clearly returns an int.
The one place it doesn't is in sys-utils/blkdiscard.c, where as you
have noted, it is passing in a uint64 to BLKSSZGET. This looks like
it's a bug in sys-util/blkdiscard.c.
He then went on to submit a patch¹ to blkdiscard at util-linux:
--- a/sys-utils/blkdiscard.c
+++ b/sys-utils/blkdiscard.c
## -70,8 +70,8 ## static void __attribute__((__noreturn__)) usage(FILE *out)
int main(int argc, char **argv)
{
char *path;
- int c, fd, verbose = 0, secure = 0;
- uint64_t end, blksize, secsize, range[2];
+ int c, fd, verbose = 0, secure = 0, secsize;
+ uint64_t end, blksize, range[2];
struct stat sb;
static const struct option longopts[] = {
I had been hesitant to mention the blkdiscard tool in both my mailing list post and the original version of this SO question specifically for this reason: I know what's in my kernel's source, it's already easy enough to modify blkdiscard to agree with the source, and this ended up distracting from the real question of "where is this documented?".
So, as for the specifics, somebody more official than me has also stated that the BLKSSZGET ioctl takes an int, but the general question regarding documentation remained. I then followed up with https://lkml.org/lkml/2013/11/3/125 and received another reply from Theodore Ts'o (wiki for credibility) answering the question. He wrote:
> There was a bigger question hidden behind the context there that I'm
> still wondering about: Are these ioctl interfaces specified and
> documented somewhere? From what I've seen, and from your response, the
> implication is that the kernel source *is* the specification, and not
> document exists that the kernel is expected to comply with; is this
> the case?
The kernel source is the specification. Some of these ioctl are
documented as part of the linux man pages, for which the project home
page is here:
https://www.kernel.org/doc/man-pages/
However, these document existing practice; if there is a discrepancy
between what is in the kernel has implemented and the Linux man pages,
it is the Linux man pages which are buggy and which will be changed.
That is man pages are descriptive, not perscriptive.
I also asked about the use of "int" in general for public kernel APIs, his response is there although that is off-topic here.
Answer: So, there you have it, the final answer is: The ioctl interfaces are specified by the kernel source itself; there is no document that the kernel adheres to. There is documentation to describe the kernel's implementations of various ioctls, but if there is a mismatch, it is an error in the documentation, not in the kernel.
¹ With all the above in mind, I want to point out that an important difference in the patch Theodore Ts'o submitted, compared to mine, is the use of "int" rather than "uint32_t" -- BLKSSZGET, as per kernel source, does indeed expect an argument that is whatever size "int" is on the platform, not a forced 32-bit value.
My AF-XDP userspace program is based on this tutorial: https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP
I am currently trying to parse ~360.000 RTP-packets per second (checking for continuous sequence numbers) but I loose around 25 per second (this means that for 25 packets the statement previous_rtp_sqnz_nmbr + 1 == current_rtp_sqnz_nmbr doesn't hold true).
So I tried to increase the number of allocated packets NUM_FRAMES from 228.000 to 328.000. With default FRAME_SIZE of XSK_UMEM__DEFAULT_FRAME_SIZE = 4096 this results in 1281Mbyte being allocated (no problem because I have 32GB of RAM) but for whatever reason, this function call:
static struct xsk_umem_info *configure_xsk_umem(void *buffer, uint64_t size)
{
printf("Try to allocate %lu\n", size);
struct xsk_umem_info *umem;
int ret;
umem = calloc(1, sizeof(*umem));
if (!umem)
return NULL;
ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
NULL);
if (ret) {
errno = -ret;
return NULL;
}
umem->buffer = buffer;
return umem;
}
fails with
Try to allocate 1343488000
ERROR: Can't create umem "Cannot allocate memory"
I don't know why? But because I know that my RTP-packets are not larger than 1500bytes, I set FRAME_SIZE 3072 so I am now at around 960Mbyte (which works without an error).
However, I am now loosing half of the received packets (this means that for 180.000 packets the previous sequence number doesn't line up with the current sequence number).
Because of this I ask the question: What is the relationship between FRAME_SIZE and the actual size of a packet? Because obviously it can not be the same.
Edit: I am using 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64 GNU/Linux and just copied the libbpf-repository from here into my code-base: https://github.com/libbpf/libbpf
So I don't know whether the error mentioned here: https://github.com/xdp-project/xdp-tutorial/issues/76 is still valid?
I know that a hard drive are system's files (/dev/sdXX) –then treated like files-, I have questions about this:
I tried those following lines of code but nothing positive
-------first attempt----------
int numSecteur=2;
char secteur [512];
FILE* disqueF=fopen("/dev/sda","r"); //tried "rb" and sda1 ...every thing
fseek(disqueF, numSecteur*512,SEEK_SET);
fread(secteur, 512, 1, disqueF);
fclose(disqueF);
-------Second attempt----------
int i=open("/dev/sda1",O_RDONLY);
lseek(i, 0, SEEK_SET);
read(i,secteur,512);
close(i);
------printing the results----------
printf("hex : %04x\n",secteur);
printf("string : %s\n",secteur);
Why the size is of the file /dev/sda1 is just 8 KBytes ?
How the data is stored (binary or hex….) "for the printing"
Please, I need some clues, and if someone need more details just he ask.
Thanks a lot.
Ps: Running kali 2 64bits ”debian” on VMware and i am RooT.
I tried those following lines of code but nothing positive
That's not a question.
Why the size is of the file /dev/sda1 is just 8 KBytes ?
That isn't the file size, it's the device number. (There are two parts, so the device number of sda1 is 8,1)
How the data is stored (binary or hex….) "for the printing"
Data isn't "stored for the printing". Data is stored (in electrical voltages representing binary, but you don't need to know that), and you can print it any way you want.
You cannot printf() array of chat like that, you need right print result. For example as hex dump:
for (int i = 0; i < sizeof(secteur); i++) {
printf ("%02x ", secteur[i]);
if ((i + 1) % 16 == 0)
printf ("\n");
}
I'm creating a very very simple block RAM disk based on sbull.
So far it works fine if I read/write blocks of data using dd, but whenever I try mounting a filesystem on it (and sometimes creating a file system) my driver crashes.
After long weeks of debugging, I finally found out what is wrong, even though I can't really find a way to solve the problem. Hence my question here :)
Whenever a user space application creates a request to the device WITH AN OFFSET, the driver won't work! Let me show you the source code in order to clarify:
First of all, I'm handling requests using mk_request (not using a request_queue):
static void escsi_mk_request(struct request_queue *q, struct bio *bio)
{
struct block_device *bdev = bio->bi_bdev;
struct escsi_dev *esd = bdev->bd_disk->private_data;
int rw;
struct bio_vec *bvec;
sector_t sector;
int i;
int err = -EIO;
printk("request received nr. sectors = %lu\n",bio_sectors(bio));
sector = bio->bi_sector;
if (bio_end_sector(bio) > get_capacity(bdev->bd_disk))
goto out;
if (unlikely(bio->bi_rw & REQ_DISCARD)) {
err = 0;
goto out;
}
rw = bio_rw(bio);
if (rw == READA)
rw = READ;
bio_for_each_segment(bvec, bio, i) {
unsigned int len = bvec->bv_len;
err = esd_do_bvec(esd, bvec->bv_page, len, bvec->bv_offset, rw, sector);
if (err) {
printk("err!\n");
break;
}
sector += len >> SECTOR_SHIFT;
}
out:
bio_endio(bio, err);
}
The esd_do_bvec function:
static int esd_do_bvec(struct escsi_dev *esd, struct page *page,
unsigned int len, unsigned int off, int rw,
sector_t sector)
{
void *mem;
int err = 0;
unsigned int offset;
int i;
offset = off + sector * 512;
printk("ESD RW=%d, len=%d, off=%d, offset=%d, sector=%lu\n",rw,len,off,offset,sector);
mem = kmap_atomic(page);
if (rw == READ) {
memcpy(mem,esd->data+offset,len);
} else {
memcpy(esd->data+offset,mem,len);
}
kunmap_atomic(mem);
out:
return err;
}
OK, so basically when I read or write data using dd, the variable "off" in esd_do_bvec() is always 0, regardless of where and how many bytes I want to write. The file system obviously always performs I/O in 4KB chunks and will write a full block even when only one byte needs to be replaced.
I am sure that reads and writes are working correctly when there's no offset because I created a file that is the same size as my block RAM disk and dumped the entire file into my device using dd, then got the output of the device (also using dd), and the input and output files are exactly the same. I also wrote the same file into a brd (Linux kernel original block RAM disk driver) and the outputs are the same comparing my device and the brd device.
BUT -- in some specific situations I try to mount or create a new file system on my device and somehow it gets I/O requests with an offset, and at that point my driver fails. I assume that I'm not handling the offset properly. For example, when I try "mount -t ext2 /dev/esda":
linux-xjwl:/home/phil/escsi # mount /dev/esda -t ext2 /mnt/esda1/
mount: wrong fs type, bad option, bad superblock on /dev/esda,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
linux-xjwl:/home/phil/escsi # dmesg|tail -n 10
[ 2239.275901] ESD RW=0, len=4096, off=0, offset=16384, sector=32
[ 2239.275947] request received nr. sectors = 8
[ 2239.275959] ESD RW=0, len=4096, off=0, offset=4096, sector=8
[ 2239.276516] request received nr. sectors = 8
[ 2239.276537] ESD RW=0, len=4096, off=0, offset=2097152, sector=4096
[ 2239.276606] request received nr. sectors = 8
[ 2239.276626] ESD RW=0, len=4096, off=0, offset=28672, sector=56
[ 2239.277535] request received nr. sectors = 2
[ 2239.277535] ESD RW=0, len=1024, off=1024, offset=2048, sector=2
[ 2239.277535] EXT4-fs (esda): VFS: Can't find ext4 filesystem
(p.s.: the output shows "EXT4" but I am running with "-t ext2")
I have checked the contents of sector n. 2 in my device and it does contain the ext2 metadata (since I ran mkfs.ext2 prior to trying to mount, of course). So I believe there's a problem with the offset. So far I can't really debug my driver because I wasn't able to come up with a request which would cause an I/O request with an offset (e.g., if I try writing a single byte into my device, Linux will read the whole block and rewrite it with only one different byte).
Hope it's not a too simple question for you.
Thanks in advance,
Phil
Please see the answer provided by Peter below.
If you're wondering what the esd_do_bvec() function looks like now, here it comes:
static int esd_do_bvec(struct escsi_dev *esd, char *buf,
unsigned int len, int rw, sector_t sector)
{
int err = 0;
unsigned int offset;
// Please notice that we STILL have an offset to deal with, but
// this offset comes in sectors and needs to be converted to a
// a byte offset.
offset = sector << SECTOR_SHIFT; // or multiply by 512
//printk("ESD RW=%d, len=%d, off=%d, offset=%d, sector=%lu\n",rw,len,off,offset,sector);
if (rw == READ) {
memcpy(buf,esd->data+offset,len);
} else {
memcpy(esd->data+offset,buf,len);
}
return err;
}
The offset per segment does not refer to an offset from the block device location, but rather an offset into the page. To cause this to be nonzero, you'll probably need to write your own C program that runs read() and write(). Allocate a page-aligned buffer, then read/write to/from different locations in that buffer, and those should show up as offsets in the bvec.
That said, LWN warns of managing this page offset manually, and recommends instead the macro bio_kmap_irq(), which is called on the bio_for_each_segment() variable bio, and takes care of the atomic kmap AND manages the offset entry as well. Source: http://lwn.net/Articles/26404/
Your code will look something like:
bio_for_each_segment(bvec, bio, i) {
unsigned int len = bvec->bv_len;
unsigned long flags;
char *buf = bio_kmap_irq(bio, &flags);
err = esd_do_bvec(esd, buf, len, rw, sector);
bio_kunmap_irq(buf, &flags);
if (err) {
printk("err!\n");
break;
}
sector += len >> SECTOR_SHIFT;
}
Of course this changes the signature of esd_do_bvec to accept the memory buffer directly rather than page/offset.