How to access block level storage via the kernel (w/o using scsi libraries)?
My intent is to implement a block level storage protocol over network for learning purpose, almost the same way SCSI works. Requests will be generated by initiator and sent to target (both userspace program) which makes call to kernel module and returns the data using TCP protocol to initiator.
So far, I have managed to build a simple "Hello" module and run it (I am new at kernel programming), but unable to proceed with block access.
After searching a lot, I found struct buffer_head * bread(int dev,int block) in linux/fs.h, but the compiler throws error.
error: implicit declaration of function ‘bread’
Please help, also feel free to advice on starting with kernel programming.
Thank you!
bread as used in old kernels.
Looking into struct request *blk_get_request(struct request_queue *, int, gfp_t); in linux/blkdev.h
Accessing the block device has to be accomplished via kernel.
Not a kernel developer, but a few comments:
The implicit declaration error means that the definition you've found somehow isn't in scope when you call the function. Maybe it's hidden in an #ifdef or maybe you forgot to include linux/fs.h somehow.
As far as advice on linux kernel programming, you might want to check out kernelnewbies.org.
There have been various books written on kernel programming, but be aware that the details in the kernel change very rapidly. Most of the concepts in the older books will still be valid, but at least some of the details in some areas will have changed.
Finally, you might have to brave the linux kernel mailing list. It's rather intimidating, I'm sorry to say, so try to have your questions well thought out before you post them.
A block level storage protocol is itself a fair bit of work. Perhaps you want to get the protocol in place in user space first, with the target doing direct access to, eg, /dev/sdc before diving into the kernel.
As I read your question more closely, it appears your main interest is in the storage protocol aspect of this project. If so, why do you need to modify the kernel. If you have a locally attached disk, say /dev/sdX on the target, then you can do something like this from user space:
fd = open("/dev/sdX", O_RDWR);
pwrite(fd, buf, len, offset);
pread(fd, buf, len, offset);
So, unless you're specifically interested in playing around inside the kernel, I don't think you need to do any kernel module to do a basic storage protocol between user processes.
Related
I am re-implementing mmap in a device driver for DMA.
I saw this question: Linux Driver: mmap() kernel buffer to userspace without using nopage that has an answer using vm_insert_page() to map one page at a time; hence, for multiple pages, needed to execute in a loop. Is there another API that handles this?
Previously I used dma_alloc_coherent to allocate a chunk of memory for DMA and used remap_pfn_range to build a page table that associates process's virtual memory to physical memory.
Now I would like to allocate a much larger chunk of memory using __get_free_pages with order greater than 1. I am not sure how to build page table in that case. The reason is as follows:
I checked the book Linux Device Drivers and noticed the following:
Background:
When a user-space process calls mmap to map device memory into its address space, the system responds by creating a new VMA to represent that mapping. A driver that supports mmap (and, thus, that implements the mmap method) needs to help that process by completing the initialization of that VMA.
Problem with remap_pfn_range:
remap_pfn_range won’t allow you to remap conventional addresses, which include the ones you obtain by calling get_free_page. Instead, it maps in the zero page. Everything appears to work, with the exception that the process sees private, zero-filled pages rather than the remapped RAM that it was hoping for.
The corresponding implementation using get_free_pages with order 0, i.e. only 1 page in scullp device driver:
The mmap method is disabled for a scullp device if the allocation order is greater than zero, because nopage deals with single pages rather than clusters of pages. scullp simply does not know how to properly manage reference counts for pages that are part of higher-order allocations.
May I know if there is a way to create VMA for pages obtained using __get_free_pages with order greater than 1?
I checked Linux source code and noticed there are some drivers re-implementing struct dma_map_ops->alloc() and struct dma_map_ops->map_page(). May I know if this is the correct way to do it?
I think I got the answer to my question. Feel free to correct me if I am wrong.
I happened to see this patch: mm: Introduce new vm_map_pages() and vm_map_pages_zero() API while I was googling for vm_insert_page.
Previouly drivers have their own way of mapping range of kernel pages/memory into user vma and this was done by invoking vm_insert_page() within a loop.
As this pattern is common across different drivers, it can be generalized by creating new functions and use it across the drivers.
vm_map_pages() is the API which could be used to mapped kernel memory/pages in drivers which has considered vm_pgoff.
After reading it, I knew I found what I want.
That function also could be found in Linux Kernel Core API Documentation.
As for the difference between remap_pfn_range() and vm_insert_page() which requires a loop for a list of contiguous pages, I found this answer to this question extremely helpful, in which it includes a link to explanation by Linus.
As a side note, this patch mm: Introduce new vm_insert_range and vm_insert_range_buggy API indicates that the earlier version of vm_map_pages() was vm_insert_range(), but we should stick to vm_map_pages(), since under the hood vm_map_pages() calls vm_insert_range().
I have looked into similar questions on this site (listed at the end) but still feel like missing a couple points, hopefully someone can help here:
Is there a hook into the proc file system that connects the /proc/iomem inode to a function that dumps the information? I wasn't able to find where in proc fs this function lives. I did a grep under the linux source tree fs/proc for iomem, got nothing. So maybe it is a more of a procfs question... The answer to this question might help me to dig up the answer to the next question..
The /proc/iomem has more entries than the BIOS E820 information I extracted from either dmesg or /sys/firmware/memmap (these two are actually consistent with each other). For example, /sys/firmware/memmap does not seem to have pci memory mapped regions. Drivers' init code calls the request_mem_region() and add more info to the map, so somewhere there should be a global variable (root of all resources ?) that remembers this graph?
The questions on stackoverflow I have looked into:
How is /proc/io* populated?
Expose information to /proc/iomem
Content of /proc/iomem
struct resource iomem_resource is what you're looking for, and it is defined and initialized in kernel/resource.c (via proc_create_seq_data()). In the same file, the instance struct seq_operations resource_op defines what happens when you, for example cat the file from userland.
iomem_resource is a globally exported symbol, and is used throughout the kernel, drivers included, to request resources. You can find instances scattered across the kernel of devm_/request_resource() which take either iomem_resource or its sibling ioport_resource based on either fixed settings, or based on configurations. Examples of methods that take configurations are a) device trees which is prevalent in embedded settings, and b) E820 or UEFI, which can be found more on x86.
Starting with b) which was asked in the question, the file arch/x86/kernel/e820.c shows examples of how reserved memory gets inserted into /proc/iomem via insert_resource().
This excellent link has more details on the dynamics of requesting memory map details from the BIOS.
Another alternative sequence (which relies on CONFIG_OF) for how a device driver requests the needed resources is:
The Open Firmware API is traversing the device tree, and finds a matching driver. For example via a struct of_device_id.
The driver defines a struct platform_device which contains both the struct of_device_id and a probe function. This probing function is thus called.
Inside the probe function, a call to platform_get_resource() is made which reads the reg property from the device tree. This property defines the physical memory map for a specific device.
A call to devm_request_mem_region() is made (which is just a call to request_region()) to actually allocate the resources and add it to /proc/iomem.
I want to add functions in the Linux kernel to write and read data. But I don't know how/where to store it so other programs can read/overwrite/delete it.
Program A calls uf_obj_add(param, param, param) it stores information in memory.
Program B does the same.
Program C calls uf_obj_get(param) the kernel checks if operation is allowed and if it is, it returns data.
Do I just need to malloc() memory or is it more difficult ?
And how uf_obj_get() can access memory where uf_obj_add() writes ?
Where to store memory location information so both functions can access the same data ?
As pointed out by commentators to your question, achieving this in userspace would probably be much safer. However, if you insist on achieving this by modifying kernel code, one way you can go is implementing a new device driver, which has functions such as read and write that you may implement according to your needs, in order to have your processes access some memory space. Your processes can then work, as you described, by reading from and writing onto the same space more or less as if they are reading from/writing to a regular file.
I would recommend reading quite a bit of materials before diving into kernel code, though. A good resource on device drivers is Linux Device Drivers. Even though a significant portion of its information may not be up-to-date, you may find here a version of the source code used in the book ported to linux 3.x. You may find what you are looking for under the directory scull.
Again, as pointed out by commentators to your question, I do not think you should jump right into updating the execution of the kernel space. However, for educational purposes scull may serve as a good starting point to read kernel code and see how to achieve results similar to what you described.
How do I use Readlink for fetching the values.
The answer is:
Don't do it
At least not in the way you're proposing.
You specified a solution here without specifying what you really want to do [and why?]. That is, what are your needs/requirements? Assuming you get it, what do you want to do with the filename? You posted a bare fragment of your userspace application but didn't post any of your kernel code.
As a long time kernel programmer, I can tell you that this won't work, can't work, and is a terrible hack. There is a vast difference in methods to use inside the kernel vs. userspace.
/proc is strictly for userspace applications to snoop on kernel data. The /proc filesystem drivers assume userspace, so they always do copy_to_user. Data will be written to user address space, and not kernel address space, so this will never work from within the kernel.
Even if you could use /proc from within the kernel, it is a genuinely awful way to do it.
You can get the equivalent data, but it's a bit more complicated than that. If you're intercepting the read syscall inside the kernel, you [already] have access to the current task struct and the fd number used in the call. From this, you can locate the struct for the given open file, and get whatever you want, directly, without involving /proc at all. Use this as a starting point.
Note that doing this will necessitate that you read kernel documentation, sources for filesystem drivers, syscalls, etc. How to lock data structures and lists with the various locking methods (e.g. RCU, rw locks, spinlocks). Also, per-cpu variables. kernel thread preemptions. How to properly traverse the necessary filesystem related lists and structs to get the information you want. All this, without causing lockups, panics, segfaults, deadlocks, UB based on stale or inconsistent/dynamically changing data.
You'll need to study all this to become familiar with the way the kernel does things internally, and understand it, before you try doing something like this. If you had, you would have read the source code for the /proc drivers and already known why things were failing.
As a suggestion, forget anything that you've learned about how a userspace application does things. It won't apply here. Internally, the kernel is organized in a completely different way than what you've been used to.
You have no need to use readlink inside the kernel in this instance. That's the way a userspace application would have to do it, but in the kernel it's like driving 100 miles out of your way to get data you already have nearby, and, as I mentioned previously, won't even work.
I want to provide a user space function that obtains TCP connection stats by implementing a kernel extension. From examining the TCP source, I see the tcpcp struct holds such stats. How can I, given a socket handle from user space, obtain the associated tcpcb struct via a kernel extension and return the stats to user space?
Direct answer to the question: I believe you can't get at this information from a kext without using some private headers to get the memory layout of the structs involved. This will break if/when Apple changes the layout of those structs.
However, it looks like you don't really care about the kext aspect and are happy to get the information from userspace, so have you investigated the TCPCTL_PCBLIST sysctl? This gives you the CBs for the TCP connections in the system, and the xtcpcb64 struct does contain the fields you're after. This mechanism might not be granular enough for your purposes though.