I am having a problem understanding how to find Block Group Descriptor table. In literature (D.Poirier: "The 2nd extended filesystem") is stated that block group descriptor is located in block right after superblock.
Now, when I look at first disk, with block size of 1024 bytes, structure is like this:
MBR, 0-512 bytes
Superblock, 1536-2560 bytes
BG Descriptor, 2560 - ... bytes
And this structure is fine, because superblock starts with 3rd sector and BGD follows right after. However, when I look at second disk with block size of 4096 bytes, structure is like this:
MBR, 0-512 bytes
Superblock, 1536-2560 bytes
BG Descriptor, 4608 - ... bytes
In this case, BGD is located 3072(?) bytes away from superblock. Could someone enlight me and tell me how exactly is BGD position determined, because I'm writing a program that reads and analyses ext structure, and I can't write a generic program that knows how to find BGD.
the BGD starts offset can vary depending on the block size (1k, 2k, 4k).
In a partition, the first 1024 bytes are reserved, then followed with 1024 bytes of SUPER BLOCK. Depending on the block size, the BGD starts from:
BLK=1K, the BGD starts at partition offset 2048 (1024 reserved + 1024 super block).
BLK=2K, the BGD starts at partition offset 2048 (1024 reserved + 1024 super block).
BLK=4K, the BGD starts at partition offset 4096, which is 1 block from start, that is the result you see 3072 bytes apart from the super block.
Related
So to pull super block in file system (i.e. if my sda storage is ext2 formatted) is easy. I just need to skip 1024 bytes to get the super block fro sda storage
lseek(fd, 1024, SEEK_SET);
read(fd, &super_block, sizeof(super_block));
and to pull the group descriptor is also super easy (only if I understood correctly from looking at code)
lseek(fd, 1024 + [block_size_ext_1024_bytes]=1024, SEEK_SET);
read(fd, &block_group, sizeof(block_group));
or
lseek(fd, 1024 + 1024, SEEK_SET);
read(fd, &block_group, sizeof(block_group));
1024=Base offset
But I am not feeling at confort because the real challege I found is to pull inode is only I have file name. I know file names are stored in directory struct so first challege is to extract directory struct from there and in directory struct I can get the inode number. and from Inode number I can extract inode struct. but I do not know how to extract directory struct in ext2 formatted image. Can anyone please telll me this? thanks
Yes pulling super block is just a matter of skipping Base_Offset=1024 bytes in ext2 and then reading it like so
lseek(fd, BASE_OFFSET + block_size, SEEK_SET);
//BASE_OFFSET for EXT2 == 1024
read(fd, &super_block, sizeof(super_block));
block_size = 1024 << super_block.s_log_block_size;
printf("Block size is [%d]\n",super_block.s_log_block_size);
The size of a super-block is given by s_log_block_size. This value expresses the size of a block as a power of 2, using 1024(specifically for ext2) bytes as the unit. Thus, 0 denotes 1024-byte blocks, 1 denotes 2048-byte blocks, and so on. To calculate the size in bytes of a block:
unsigned int block_size = 1024 << super.s_log_block_size; /* block
super.s_log_block_size always 0 if need to hardcode 1024 and super.s_log_block_size is multiple of 2 so if 1024 block size super.s_log_block_size is should be 0
Then I can extract group descriptor. So for my image there is only one group descrptor. I dont know how many descriptor will I have if I have 1TB of storage as I do have this and file system is ext4. May be someone will tell me this.
Like this to extract group descriptor by further moving forward 1024 bytes
lseek(fd, BASE_OFFSET + block_size, SEEK_SET);
read(fd, &block_group, sizeof(block_group));
I think this gives the idea of finding out how many group desciptors are there in storage in ext2
unsigned int group_count = 1 + (super_block.s_blocks_count-1) / super_block.s_blocks_per_group;
so for example On my device image it has 128 blocks so first block always Boot info, second block contains super block, the third block contains first group descriptor -- still like to know what would be the offset of my second group descriptor if I had more space on my storage. Please someone shed light on this
Moving on, to extract specific inode the formula is this to seek the offset of specific inode
lseek(fd, BLOCK_OFFSET(block_group->bg_inode_table)+(inode_no-1)*sizeof(struct ext2_inode),
SEEK_SET);
bg_inode_table can be used to extract inode
The group descriptor tells us the location of the block/[inode bitmaps] and of the inode table (described later) through the bg_block_bitmap, bg_inode_bitmap and bg_inode_table fields.
Now to extract root inode=(should be ino_num=2) for example I just need to do
lseek(fd, BLOCK_OFFSET(block_group->bg_inode_table)+(2-1)*sizeof(struct ext2_inode),
SEEK_SET);
The block number of the first block of the inode table is stored in the bg_inode_table field of the group descriptor.
so inode table came to help in find specific inode
To extract the directory struct I just need to use inode.i_block[0] array. filled in last step
each i_block element is number that can be used in this way. basically a pointer points to actual blocks containing content of file with inode
lseek(...BASE_OFFSET+(i_block[x]-1)*block_size...)
block_size always 1024 for ext2
This way I can read the block at whose base contain directory struct in ext2 file system
read
void *block;
read(fd, block, block_size);
and the above line give me first directory mapped to specific inode
I can simple do a loop to get all entries
http://www.science.smith.edu/~nhowe/teaching/csc262/oldlabs/ext2.html
NAND flash device has Block size 16384, page size 512, OOB size 16 bytes.
A partition dump (cleaned from OOB data) is 13548080 bytes in size, so its not multiple of 512. Since all writes must be 512 byte aligned, and the blob size must be a multiple of 512 bytes, I should to add 'n' bytes at the end of the binary (fill with 'FF').
13548080 bytes is incomplete 26462 pages. A complete 26462 pages is 26462 x 512 = 13548544 bytes. The difference is 464 bytes that I need to add.
I tried two step way: first, I created 464 bytes padded.bin file dd if=/dev/zero bs=1 count=464 | tr '\000' '\377' >padded.bin, then appended the original file to padded file:dd if=padded.bin bs=1 count=464 >>original.bin
Perhaps there is another way to append n bytes at the end of file use shell command?
Since the number of pages is known, you can use the count= operand of dd; the desired 'FF' bytes can be provided by tr.
(cat inputfile; tr </dev/zero \\0 \\377) | dd count=26462 iflag=fullblock >outputfile
Turns out I misinterpreted wear leveling, I initially thought by accessing the drive as RAW I would lose this feature but as its a feature on the controller this explains why i am hitting millions of writes to the 'logical sector' I am testing
I am writing an application where I will be utilizing a RAW disk partition like a circular buffer, ie no filesystem.
I need somewhere to keep track of my read/write buffer heads that is persistent across boots, I was thinking i can create another partition to store these 2 pointers.
But I am worried about how many times I can write to a sector in the devices SOLID state drive before the sector dies so I wrote the code below to hammer a single sector and see how quickly it fails.
create random block (512 bytes)
write to sector 0 using pwrite
read block from sector 0 using pread
compare each byte
exit when difference found.
but it has been running for millions of sector writes now!!
I would have expected it to fail somewhere standard like between 10,000-100,000 times?
I'm using pread/pwrite as below with a random buffer each loop and a comparison of both buffers afterwards.
void WriteBlock(const char* device, unsigned char* dataBlock, size_t sector, int size)
{
int fd = open(device, O_WRONLY);
if(fd <= 0)
{
std::cout << "error opening " << device << " '" << strerror(errno) << "'\r\n";
exit(EXIT_FAILURE);
}
ssize_t r = pwrite(fd, dataBlock, size, sector*SECTOR_SIZE);
if (r<=0 || r<size)
{
std::cout << "failure writing '" << strerror(errno) << "'\r\n";
exit(EXIT_FAILURE);
}
close(fd);
}
void ReadBlock(const char* device, unsigned char* dataBlock, size_t sector, int size)
{
int fd = open(device, O_RDONLY);
if(fd <= 0)
{
std::cout << "error opening " << device << "\r\n";
exit(EXIT_FAILURE);
}
ssize_t r = pread(fd, dataBlock, size, sector*SECTOR_SIZE);
if (r<=0 || r<size)
{
std::cout << "failure writing '" << strerror(errno) << "'\r\n";
exit(EXIT_FAILURE);
}
close(fd);
}
the code just keeps on running with the write buffer equal to the read buffer every time.
FYI i'm not comparing the write buffer to itself! if i hard code a value into the read buffer it catches this and interprets as a failure.
I would have expected it to fail somewhere standard like between 10,000-100,000 times?
Most solid state drives have wear levelling. What this means is that when you write to logical block 0 the device says "Hey, the old data in logical block 0 is getting overwritten, so I can just pretend a completely different physical block is now logical block 0". By continually writing to the same logical block, you can be writing to many completely different physical blocks.
To defeat wear leveling (and actually write to the same physical block) you have to convince the device that all other blocks are in use. This isn't possible because there's spare capacity. For example, for a 1 TiB device you can fill all 1 TiB of logical blocks with data (without doing any "trim", etc), but there might be an extra 512 GiB of spare space and your writes to the same logical block will be spread across that 512 GiB of spare space; and by the time you actually see an error it may mean that every block in that 512 GiB of spare space has failed (and not just one block).
If you know how much spare space there actually is, then you might be able to do calculations based on that - e.g. if there's 1 thousand spare blocks and you do 1 billion writes before seeing an error, then you might be able to say "1 billion writes / 1 thousand blocks = 1 million writes to each physical block".
Now, imagine you're a manufacturer and you've got a 1000 GiB drive. You decide to sell it as a consumer drive (with the assumption that the drive will be mostly empty and wear leveling will be able to work well) and that you can say it's a 900 GiB drive (with 100 GiB of spare blocks) that will fail after 10000 writes. Then you decide to also sell the exact same drive as an enterprise drive (with the assumption that the drive will probably be full and wear leveling won't be so effective) and that you can say it's a 900 GiB drive (with 100 GiB of spare blocks) that will fail after 2000 writes. Then you decide you can increase the amount of spare space, and also sell the "mostly the same" drive as an 500 GiB enterprise drive (with 500 GiB of spare blocks) that will fail after 20000 writes.
Next; imagine that you test 1000 identical drives (with the software you're writing) and measure "20000 writes before failure on average, under the specific conditions used for testing (e.g. with the rest of the logical blocks all free/empty)". This information is mostly useless - if someone uses the same drive under different conditions (e.g. with less logical blocks free/empty) then it'll fail sooner (or later) than you said it would.
For a (potentially better) alternative; you might be able to use information obtained from "S.M.A.R.T." (see https://en.wikipedia.org/wiki/S.M.A.R.T. ). Unfortunately some of the most useful pieces of information you can get is manufacturer specific (e.g. Used Reserved Block Count Total, Unused Reserved Block Count Total), and some devices (USB flash) don't support it at all.
This works on Linux, but not (as I would like) on FreeBSD:
I wish to exercise my CD-ROM drive, to keep the dust off the lens. On Linux I run (as root) a C program I wrote which seeks back and forth, reading a single block each time as it goes. On FreeBSD this program doesn't get too far. I can open the device and seek to (say) block 1. But when I try to read the block, I get error 22 (EINVAL). It fails on the first read, at block 1, whether or not the device is mounted (-t cd9660). How do I proceed?
Full program is here. The relevant snippet:
lo_fd=Open(ar_argv[1],
O_RDONLY,
0
);
lo_high_bit=1;
while(lo_high_bit>0)
{
if(lseek(lo_fd,
lo_high_bit,
SEEK_SET
)
==
(off_t)-1
)
{
lo_high_bit>>=1;
break;
}
if(read(lo_fd,
lo_buffer,
1
)
!=
1
)
{
lo_high_bit>>=1;
break;
}
lo_high_bit<<=1;
}
It turns out that I was making two errors: trying to read a byte at a time, and lseek()ing to byte 1. fstat() on the device shows st_blksize of 4096.
Seeking to 4096 and reading 4096 bytes works.
Seeking to 2048 and reading 2048 bytes works.
Seeking to 2048 and reading 1024 bytes gives EINVAL on the read().
Seeking to 1024 and reading 2048 bytes gives EINVAL on the read().
I found many links but almost all are pointing to fix not the reason.
I created a 7GB ext4 partition on a sd card connected via USB card reader to PC. I have an application which is writing 10488576 bytes to the mentioned partition (/dev/sdc2). After the application run the filesystem is looking corrupt:
#fsck.ext4 -v /dev/sdc2
e2fsck 1.42.8 (20-Jun-2013)
ext2fs_open2: Bad magic number in super-block
fsck.ext4: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear<y>? no
fsck.ext4: Illegal inode number while checking ext3 journal for /dev/sdc2
/dev/sdc2: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdc2: ********** WARNING: Filesystem still has errors **********
#dumpe2fs /dev/sdc2
dumpe2fs 1.42.8 (20-Jun-2013)
dumpe2fs: Bad magic number in super-block while trying to open /dev/sdc2
Couldn't find valid filesystem superblock.
The application is simply using something like below (i can't post exact code):
char *write_buf; //declared in header
write_buf = (char *) malloc(size) // where size = 10488576. This allocation is happening in function a() called from main
char *buf; // declared locally in function b()
buf = write_buf; // in function b()
write(fd,buf,size); // in function b()
The filesystem block size is 4K.
Backup superblock at 32768 , 98304 ,163840 ,229376 , 294912 ,819200, 884736 ,1605632
Let me know if any more information required. I need to understand what might cause this corruption , because I'm very much affirmative that something may be wrong with application code.
EDIT:
I can see that primary superblock starts at 0 , and the lseek() call before write() is also doing SEEK_SET to 0, which would overwrite superblock information. I am going to try lseek far from superblock before write().
EDIT:
I have got this fixed by doing as I mentioned above. As per dumpe2fs o/p I had below for group 0:
Group 0: (Blocks 0-32767)
Checksum 0x8bba, unused inodes 8069
Primary superblock at 0, Group descriptors at 1-1
Reserved GDT blocks at 2-474
Block bitmap at 475 (+475), Inode bitmap at 491 (+491)
Inode table at 507-1011 (+507)
24175 free blocks, 8069 free inodes, 2 directories, 8069 unused inodes
Free blocks: 8593-32767
Free inodes: 12-8080
So before writing I did lseek to 8593*4096 .Now filesystem is not getting corrupt.
I have got this fixed by doing as I mentioned above. As per dumpe2fs o/p I had below for group 0:
Group 0: (Blocks 0-32767)
Checksum 0x8bba, unused inodes 8069
Primary superblock at 0, Group descriptors at 1-1
Reserved GDT blocks at 2-474
Block bitmap at 475 (+475), Inode bitmap at 491 (+491)
Inode table at 507-1011 (+507)
24175 free blocks, 8069 free inodes, 2 directories, 8069 unused inodes
Free blocks: 8593-32767
Free inodes: 12-8080
So before writing I did lseek to 8593*4096.Now filesystem is not getting corrupt.