I'm creating a very very simple block RAM disk based on sbull.
So far it works fine if I read/write blocks of data using dd, but whenever I try mounting a filesystem on it (and sometimes creating a file system) my driver crashes.
After long weeks of debugging, I finally found out what is wrong, even though I can't really find a way to solve the problem. Hence my question here :)
Whenever a user space application creates a request to the device WITH AN OFFSET, the driver won't work! Let me show you the source code in order to clarify:
First of all, I'm handling requests using mk_request (not using a request_queue):
static void escsi_mk_request(struct request_queue *q, struct bio *bio)
{
struct block_device *bdev = bio->bi_bdev;
struct escsi_dev *esd = bdev->bd_disk->private_data;
int rw;
struct bio_vec *bvec;
sector_t sector;
int i;
int err = -EIO;
printk("request received nr. sectors = %lu\n",bio_sectors(bio));
sector = bio->bi_sector;
if (bio_end_sector(bio) > get_capacity(bdev->bd_disk))
goto out;
if (unlikely(bio->bi_rw & REQ_DISCARD)) {
err = 0;
goto out;
}
rw = bio_rw(bio);
if (rw == READA)
rw = READ;
bio_for_each_segment(bvec, bio, i) {
unsigned int len = bvec->bv_len;
err = esd_do_bvec(esd, bvec->bv_page, len, bvec->bv_offset, rw, sector);
if (err) {
printk("err!\n");
break;
}
sector += len >> SECTOR_SHIFT;
}
out:
bio_endio(bio, err);
}
The esd_do_bvec function:
static int esd_do_bvec(struct escsi_dev *esd, struct page *page,
unsigned int len, unsigned int off, int rw,
sector_t sector)
{
void *mem;
int err = 0;
unsigned int offset;
int i;
offset = off + sector * 512;
printk("ESD RW=%d, len=%d, off=%d, offset=%d, sector=%lu\n",rw,len,off,offset,sector);
mem = kmap_atomic(page);
if (rw == READ) {
memcpy(mem,esd->data+offset,len);
} else {
memcpy(esd->data+offset,mem,len);
}
kunmap_atomic(mem);
out:
return err;
}
OK, so basically when I read or write data using dd, the variable "off" in esd_do_bvec() is always 0, regardless of where and how many bytes I want to write. The file system obviously always performs I/O in 4KB chunks and will write a full block even when only one byte needs to be replaced.
I am sure that reads and writes are working correctly when there's no offset because I created a file that is the same size as my block RAM disk and dumped the entire file into my device using dd, then got the output of the device (also using dd), and the input and output files are exactly the same. I also wrote the same file into a brd (Linux kernel original block RAM disk driver) and the outputs are the same comparing my device and the brd device.
BUT -- in some specific situations I try to mount or create a new file system on my device and somehow it gets I/O requests with an offset, and at that point my driver fails. I assume that I'm not handling the offset properly. For example, when I try "mount -t ext2 /dev/esda":
linux-xjwl:/home/phil/escsi # mount /dev/esda -t ext2 /mnt/esda1/
mount: wrong fs type, bad option, bad superblock on /dev/esda,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
linux-xjwl:/home/phil/escsi # dmesg|tail -n 10
[ 2239.275901] ESD RW=0, len=4096, off=0, offset=16384, sector=32
[ 2239.275947] request received nr. sectors = 8
[ 2239.275959] ESD RW=0, len=4096, off=0, offset=4096, sector=8
[ 2239.276516] request received nr. sectors = 8
[ 2239.276537] ESD RW=0, len=4096, off=0, offset=2097152, sector=4096
[ 2239.276606] request received nr. sectors = 8
[ 2239.276626] ESD RW=0, len=4096, off=0, offset=28672, sector=56
[ 2239.277535] request received nr. sectors = 2
[ 2239.277535] ESD RW=0, len=1024, off=1024, offset=2048, sector=2
[ 2239.277535] EXT4-fs (esda): VFS: Can't find ext4 filesystem
(p.s.: the output shows "EXT4" but I am running with "-t ext2")
I have checked the contents of sector n. 2 in my device and it does contain the ext2 metadata (since I ran mkfs.ext2 prior to trying to mount, of course). So I believe there's a problem with the offset. So far I can't really debug my driver because I wasn't able to come up with a request which would cause an I/O request with an offset (e.g., if I try writing a single byte into my device, Linux will read the whole block and rewrite it with only one different byte).
Hope it's not a too simple question for you.
Thanks in advance,
Phil
Please see the answer provided by Peter below.
If you're wondering what the esd_do_bvec() function looks like now, here it comes:
static int esd_do_bvec(struct escsi_dev *esd, char *buf,
unsigned int len, int rw, sector_t sector)
{
int err = 0;
unsigned int offset;
// Please notice that we STILL have an offset to deal with, but
// this offset comes in sectors and needs to be converted to a
// a byte offset.
offset = sector << SECTOR_SHIFT; // or multiply by 512
//printk("ESD RW=%d, len=%d, off=%d, offset=%d, sector=%lu\n",rw,len,off,offset,sector);
if (rw == READ) {
memcpy(buf,esd->data+offset,len);
} else {
memcpy(esd->data+offset,buf,len);
}
return err;
}
The offset per segment does not refer to an offset from the block device location, but rather an offset into the page. To cause this to be nonzero, you'll probably need to write your own C program that runs read() and write(). Allocate a page-aligned buffer, then read/write to/from different locations in that buffer, and those should show up as offsets in the bvec.
That said, LWN warns of managing this page offset manually, and recommends instead the macro bio_kmap_irq(), which is called on the bio_for_each_segment() variable bio, and takes care of the atomic kmap AND manages the offset entry as well. Source: http://lwn.net/Articles/26404/
Your code will look something like:
bio_for_each_segment(bvec, bio, i) {
unsigned int len = bvec->bv_len;
unsigned long flags;
char *buf = bio_kmap_irq(bio, &flags);
err = esd_do_bvec(esd, buf, len, rw, sector);
bio_kunmap_irq(buf, &flags);
if (err) {
printk("err!\n");
break;
}
sector += len >> SECTOR_SHIFT;
}
Of course this changes the signature of esd_do_bvec to accept the memory buffer directly rather than page/offset.
Related
I am doing a project where I want to write some info to the MT25Q (MT25QL512ABB1EW9-0SIT) flash device from Micron Technology. But when I try to write and read from the first few pages (0-13) I get trash data back. I thought there might be a protected area in the flash so I checked the value of the corresponding bits in the status register and the value corresponds to none of the sectors being protected. Plus none of possible ranges for protected sector correspond to this value either.
I am using zephyr-os which supports the flash device. Here is my code:
#include "MT25Q.h"
#include <flash.h>
struct device *dev = device_get_binding("MT25Q");
struct flash_pages_info myflash;
flash_get_page_info_by_idx(dev, 0, &myflash);
char *test_line = malloc(5);
char *buf = malloc(5);
strcpy(test_line, "test");
size_t page_count = flash_get_page_count(dev);
flash_write_protection_set(dev, false);
for(long i = 0; i < total_pages; i++) {
returnval = flash_get_page_info_by_idx(dev, i, &myflash);
flash_write(dev, myflash.start_offset, test_line, 5);
flash_read(dev, myflash.start_offset, buf, 5);
printk("%s\n", buf);
}
flash_write_protection_set(dev, true);
free(buf);
free(test_line);
return 0;
}
For the the first 15 iterations of the loop I read back some garbage string. And afterwards it works as expected. The device is byte-writeable.
Can someone help me understand why this is happening ? I hope I posted all the required info but just in case:
total_pages = 256
myflash.size = 131072
My AF-XDP userspace program is based on this tutorial: https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP
I am currently trying to parse ~360.000 RTP-packets per second (checking for continuous sequence numbers) but I loose around 25 per second (this means that for 25 packets the statement previous_rtp_sqnz_nmbr + 1 == current_rtp_sqnz_nmbr doesn't hold true).
So I tried to increase the number of allocated packets NUM_FRAMES from 228.000 to 328.000. With default FRAME_SIZE of XSK_UMEM__DEFAULT_FRAME_SIZE = 4096 this results in 1281Mbyte being allocated (no problem because I have 32GB of RAM) but for whatever reason, this function call:
static struct xsk_umem_info *configure_xsk_umem(void *buffer, uint64_t size)
{
printf("Try to allocate %lu\n", size);
struct xsk_umem_info *umem;
int ret;
umem = calloc(1, sizeof(*umem));
if (!umem)
return NULL;
ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
NULL);
if (ret) {
errno = -ret;
return NULL;
}
umem->buffer = buffer;
return umem;
}
fails with
Try to allocate 1343488000
ERROR: Can't create umem "Cannot allocate memory"
I don't know why? But because I know that my RTP-packets are not larger than 1500bytes, I set FRAME_SIZE 3072 so I am now at around 960Mbyte (which works without an error).
However, I am now loosing half of the received packets (this means that for 180.000 packets the previous sequence number doesn't line up with the current sequence number).
Because of this I ask the question: What is the relationship between FRAME_SIZE and the actual size of a packet? Because obviously it can not be the same.
Edit: I am using 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64 GNU/Linux and just copied the libbpf-repository from here into my code-base: https://github.com/libbpf/libbpf
So I don't know whether the error mentioned here: https://github.com/xdp-project/xdp-tutorial/issues/76 is still valid?
I am developing a PCIE device driver for openwrt, and I met a data bus error when trying to access the io-memory in timer interrupt, which I mentioned in my last question. After lots of research I think I might have found the reason, but I am unable to solve it. Below are my troubles.
Last week I found out that the pcie region size might have changed during system startup. The region size of bar0 is 4096 in my driver (return from pci_resource_len) and the region size is 4097 in lspci -vv, which breaks the page size of linux kernel. By reading the source code of pciutil, I find that lspci command fetch the pcie information from /sys/devices/pci0000:00/0000:00:00.0/resouce file. So I remove all my custom components and run the original openwrt on my router. By cat /sys/devices/pci0000:00/0000:00:00.0/resouce, the first line of the result (bar0) is
0x0000000010008000 0x0000000010009000 0x0000000000040200
Moreover, I also check the content of /proc/iomem, and the content related to PCIE is
10000000-13ffffff : mem_base
10000000-13ffffff : PCI memory space
10000000-10007fff : 0000:00:00.0
10008000-10008fff : 0000:00:00.0
It is super weird that the region size of bar0 indicated by the two files above is different! According to the mechanism of PCIE, the region size should always be the power of 2. How come the region size becomes 4097?
After spending weeks reading source code of linux kernel, I find out that this is a bug of linux kernel 4.4.14.
The content of /sys/devices/pci0000:00/0000:00:00.0/resouce is generated through function resource_show in file drivers/pci/pci-sysfs.c. The related code is
for (i = 0; i < max; i++) {
struct resource *res = &pci_dev->resource[i];
pci_resource_to_user(pci_dev, i, res, &start, &end);
str += sprintf(str, "0x%016llx 0x%016llx 0x%016llx\n",
(unsigned long long)start,
(unsigned long long)end,
(unsigned long long)res->flags);
}
The function pci_resource_to_user actually invoked is located in arch/mips/include/asm/pci.h
static inline void pci_resource_to_user(const struct pci_dev *dev, int bar,
const struct resource *rsrc, resource_size_t *start,
resource_size_t *end)
{
phys_addr_t size = resource_size(rsrc);
*start = fixup_bigphys_addr(rsrc->start, size);
*end = rsrc->start + size;
}
The calculation of *end is wrong and should be replace by
*end = rsrc->start + size - (size ? 1 : 0)
The Overview
I am using the low-level calls in the libbzip2 library: BZ2_bzCompressInit(), BZ2_bzCompress() and BZ2_bzCompressEnd() to compress chunks of data to standard output.
I am migrating working code from higher-level calls, because I have a stream of bytes coming in and I want to compress those bytes in sets of discrete chunks (a discrete chunk is a set of bytes that contains a group of tokens of interest — my input is logically divided into groups of these chunks).
A complete group of chunks might contain, say, 500 chunks, which I want to compress to one bzip2 stream and write to standard output.
Within a set, using the pseudocode I outline below, if my example buffer is able to hold 101 chunks at a time, I would open a new stream, compress 500 chunks in runs of 101, 101, 101, 101, and one final run of 96 chunks that closes the stream.
The Problem
The issue is that my bz_stream structure instance, which keeps tracks of the number of compressed bytes in a single pass of the BZ2_bzCompress() routine, seems to claim to be writing more compressed bytes than the total bytes in the final, compressed file.
For example, the compressed output could be a file with a true size of 1234 bytes, while the number of reported compressed bytes (which I track while debugging) is somewhat higher than 1234 bytes (say 2345 bytes).
My rough pseudocode is in two parts.
The first part is a rough sketch of what I do to compress a subset of chunks (and I know that I have another subset coming after this one):
bz_stream bzStream;
unsigned char bzBuffer[BZIP2_BUFFER_MAX_LENGTH] = {0};
unsigned long bzBytesWritten = 0UL;
unsigned long long cumulativeBytesWritten = 0ULL;
unsigned char myBuffer[UNCOMPRESSED_MAX_LENGTH] = {0};
size_t myBufferLength = 0;
/* initialize bzStream */
bzStream.next_in = NULL;
bzStream.avail_in = 0U;
bzStream.avail_out = 0U;
bzStream.bzalloc = NULL;
bzStream.bzfree = NULL;
bzStream.opaque = NULL;
int bzError = BZ2_bzCompressInit(&bzStream, 9, 0, 0);
/* bzError checking... */
do
{
/* read some bytes into myBuffer... */
/* compress bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
/* error checking... */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError == BZ_OK);
}
while (/* while there is a non-final myBuffer full of discrete chunks left to compress... */);
Now we wrap up the output:
/* read in the final batch of bytes into myBuffer (with a total byte size of `myBufferLength`... */
/* compress remaining myBufferLength bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
/* bzError error checking... */
/* increment cumulativeBytesWritten by `bz_stream` struct `total_out_*` members */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError != BZ_STREAM_END);
/* close stream */
bzError = BZ2_bzCompressEnd(&bzStream);
/* bzError checking... */
The Questions
Am I calculating cumulativeBytesWritten (or, specifically, bzBytesWritten) incorrectly, and how would I fix that?
I have been tracking these values in a debug build, and I do not seem to be "double counting" the bzBytesWritten value. This value is counted and used once to increment cumulativeBytesWritten after each successful BZ2_bzCompress() pass.
Alternatively, am I not understanding the correct use of the bz_stream state flags?
For example, does the following compress and keep the bzip2 stream open, so long as I keep sending some bytes?
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
Likewise, can the following statement compress data, so long as there are at least some bytes are available to access from the bzStream.next_in pointer (BZ_RUN), and then the stream is wrapped up when there are no more bytes available (BZ_FINISH)?
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
Or, am I not using these low-level calls correctly at all? Should I go back to using the higher-level calls to continuously append a grouping of compressed chunks of data to one main file?
There's probably a simple solution to this, but I've been banging my head on the table for a couple days in the course of debugging what could be wrong, and I'm not making much progress. Thank you for any advice.
In answer to my own question, it appears I am miscalculating the number of bytes written. I should not use the total_out_* members. The following correction works properly:
bzBytesWritten = sizeof(bzBuffer) - bzStream.avail_out;
The rest of the calculations follow.
I'm trying to access a SPI sensor using the SPIDEV driver but my code gets stuck on IOCTL.
I'm running embedded Linux on the SAM9X5EK (mounting AT91SAM9G25). The device is connected to SPI0. I enabled CONFIG_SPI_SPIDEV and CONFIG_SPI_ATMEL in menuconfig and added the proper code to the BSP file:
static struct spi_board_info spidev_board_info[] {
{
.modalias = "spidev",
.max_speed_hz = 1000000,
.bus_num = 0,
.chips_select = 0,
.mode = SPI_MODE_3,
},
...
};
spi_register_board_info(spidev_board_info, ARRAY_SIZE(spidev_board_info));
1MHz is the maximum accepted by the sensor, I tried 500kHz but I get an error during Linux boot (too slow apparently). .bus_num and .chips_select should correct (I also tried all other combinations). SPI_MODE_3 I checked the datasheet for it.
I get no error while booting and devices appear correctly as /dev/spidevX.X. I manage to open the file and obtain a valid file descriptor. I'm now trying to access the device with the following code (inspired by examples I found online).
#define MY_SPIDEV_DELAY_USECS 100
// #define MY_SPIDEV_SPEED_HZ 1000000
#define MY_SPIDEV_BITS_PER_WORD 8
int spidevReadRegister(int fd,
unsigned int num_out_bytes,
unsigned char *out_buffer,
unsigned int num_in_bytes,
unsigned char *in_buffer)
{
struct spi_ioc_transfer mesg[2] = { {0}, };
uint8_t num_tr = 0;
int ret;
// Write data
mesg[0].tx_buf = (unsigned long)out_buffer;
mesg[0].rx_buf = (unsigned long)NULL;
mesg[0].len = num_out_bytes;
// mesg[0].delay_usecs = MY_SPIDEV_DELAY_USECS,
// mesg[0].speed_hz = MY_SPIDEV_SPEED_HZ;
mesg[0].bits_per_word = MY_SPIDEV_BITS_PER_WORD;
mesg[0].cs_change = 0;
num_tr++;
// Read data
mesg[1].tx_buf = (unsigned long)NULL;
mesg[1].rx_buf = (unsigned long)in_buffer;
mesg[1].len = num_in_bytes;
// mesg[1].delay_usecs = MY_SPIDEV_DELAY_USECS,
// mesg[1].speed_hz = MY_SPIDEV_SPEED_HZ;
mesg[1].bits_per_word = MY_SPIDEV_BITS_PER_WORD;
mesg[1].cs_change = 1;
num_tr++;
// Do the actual transmission
if(num_tr > 0)
{
ret = ioctl(fd, SPI_IOC_MESSAGE(num_tr), mesg);
if(ret == -1)
{
printf("Error: %d\n", errno);
return -1;
}
}
return 0;
}
Then I'm using this function:
#define OPTICAL_SENSOR_ADDR "/dev/spidev0.0"
...
int fd;
fd = open(OPTICAL_SENSOR_ADDR, O_RDWR);
if (fd<=0) {
printf("Device not found\n");
exit(1);
}
uint8_t buffer1[1] = {0x3a};
uint8_t buffer2[1] = {0};
spidevReadRegister(fd, 1, buffer1, 1, buffer2);
When I run it, the code get stuck on IOCTL!
I did this way because, in order to read a register on the sensor, I need to send a byte with its address in it and then get the answer back without changing CS (however, when I tried using write() and read() functions, while learning, I got the same result, stuck on them).
I'm aware that specifying .speed_hz causes a ENOPROTOOPT error on Atmel (I checked spidev.c) so I commented that part.
Why does it get stuck? I though it can be as the device is created but it actually doesn't "feel" any hardware. As I wasn't sure if hardware SPI0 corresponded to bus_num 0 or 1, I tried both, but still no success (btw, which one is it?).
UPDATE: I managed to have the SPI working! Half of it.. MOSI is transmitting the right data, but CLK doesn't start... any idea?
When I'm working with SPI I always use an oscyloscope to see the output of the io's. If you have a 4 channel scope ypu can easily debug the issue, and find out if you're axcessing the right io's, using the right speed, etc. I usually compare the signal I get to the datasheet diagram.
I think there are several issues here. First of all SPI is bidirectional. So if yo want to send something over the bus you also get something. Therefor always you have to provide a valid buffer to rx_buf and tx_buf.
Second, all members of the struct spi_ioc_transfer have to be initialized with a valid value. Otherwise they just point to some memory address and the underlying process is accessing arbitrary data, thus leading to unknown behavior.
Third, why do you use a for loop with ioctl? You already tell ioctl you haven an array of spi_ioc_transfer structs. So all defined transaction will be performed with one ioctl call.
Fourth ioctl needs a pointer to your struct array. So ioctl should look like this:
ret = ioctl(fd, SPI_IOC_MESSAGE(num_tr), &mesg);
You see there is room for improvement in your code.
This is how I do it in a c++ library for the raspberry pi. The whole library will soon be on github. I'll update my answer when it is done.
void SPIBus::spiReadWrite(std::vector<std::vector<uint8_t> > &data, uint32_t speed,
uint16_t delay, uint8_t bitsPerWord, uint8_t cs_change)
{
struct spi_ioc_transfer transfer[data.size()];
int i = 0;
for (std::vector<uint8_t> &d : data)
{
//see <linux/spi/spidev.h> for details!
transfer[i].tx_buf = reinterpret_cast<__u64>(d.data());
transfer[i].rx_buf = reinterpret_cast<__u64>(d.data());
transfer[i].len = d.size(); //number of bytes in vector
transfer[i].speed_hz = speed;
transfer[i].delay_usecs = delay;
transfer[i].bits_per_word = bitsPerWord;
transfer[i].cs_change = cs_change;
i++
}
int status = ioctl(this->fileDescriptor, SPI_IOC_MESSAGE(data.size()), &transfer);
if (status < 0)
{
std::string errMessage(strerror(errno));
throw std::runtime_error("Failed to do full duplex read/write operation "
"on SPI Bus " + this->deviceNode + ". Error message: " +
errMessage);
}
}