Pci express - communicate kernel -> graphic card

Pci express - communicate kernel -> graphic card - c

The final goal is to be able to write to a PCIE device from the kernel, without the already made functions, to understand the inner working (and then, obviously, use them again).
I saw the PCIE specs which are 800+ pages (the 3.0 especially).
Huge is not enough to describe that.
I cannot afford to go along those at the step I currently am (reading 2.0 and 3.1 would be very time consuming).
I read many sources and it seems that we can write to pcie via messages (and not buses anymore like in pci).
Pcie should be memory mapped so I think that we could write to that memory from the kernel and aknowledge the driver that we did it.
The driver will then make the out(l/b) assembly instruction to notify the device in question.
This my very high level understanding of pcie (I don't want to dive into the spec details now). It may not be correct though.
If someone could tell me where I am wrong in my thinking, that would be very helpful.
Here is the pseudo code of my thinking (no error checking and such):
static int64_t my_driver_address;
pcie_write_device(uint32_t * my_data_physical_address) {
// we pass the physical address where the data is. It has to be contiguous.
pcie_send_address(&my_driver_address, my_data_physical_address);
// now the device is acknowledged that some data has been mmaped, knows where and will treat it as such
}
pcie_read_device(anytype_t ** buff){
// this function calls the inq assembly instruction and store the resulting address in the pointer
ptr * address_to_read = pcie_get_data();
// read the mmaped memory region. No mem allocation code.
*buf = get_data_from region(address_to_read);
// now data from device is in the buff, ready to be sent to the OS or anything..
}

Related

ARM TrustZone: Accessing a non-secure buffer from a secure monitor runtime service

My setup consists of a STM32MP157C-DK2 which uses Trusted Firmware-A to load SP-MIN as BL32 and uBoot+Linux as BL33.
I am trying to get a small example working where I create an SMC from the Linux Kernel which passes a reference to non-secure memory. The data at that location should be altered by the runtime service handling the SMC.
The problem I'm facing is that I can't find any information on what steps are required in order to translate the virtual address from the Linux Kernel at NS:EL1 to the translation regime of EL3.
The code of my runtime service looks like this:
static int32_t my_svc_setup(void)
{
return 0;
}
static uintptr_t my_svc_smc_handler(uint32_t smc_fid,
u_register_t x1,
u_register_t x2,
u_register_t x3,
u_register_t x4,
void *cookie,
void *handle,
u_register_t flags)
{
uint16_t smc_function_number = (uint16_t) smc_fid;
uint32_t *data;
switch(smc_function_number){
case 123:
data = (uint32_t *) x1;
// Address Translation Magic ...
*data = 42;
SMC_RET1(handle, 1);
default:
SMC_RET1(handle, SMC_UNK);
}
}
DECLARE_RT_SVC(
my_svc,
OEN_OEM_START,
OEN_OEM_END,
SMC_TYPE_FAST,
my_svc_setup,
my_svc_smc_handler
);
The SMC reaches the handler without issues, but as soon as I try to dereference the physical address I passed through x1 the CPU (obviously) crashes. If anyone could help me fill in the remaining required steps in order to get a valid reference, that would be greatly appreciated.

The problem I'm facing is that I can't find any information on what steps are required in order to translate the virtual address from the Linux Kernel at NS:EL1 to the translation regime of EL3.
The TrustZone protection is based on a physical address. For either NS:EL1 or EL3, you can map using an MMU in various ways, but both must map to the same physical address. For Linux kernel, you need to add a mapping of the shared memory that is backed by the physical address. You can use virt_to_phys() with such a mapping to find the physical address.
You need to have the same mapping available in the EL3. The simplest is to have a flat virt==phys mapping with sections and super-sections.
Another portion is that you MUST setup the TZASC to have permissions of the physical portion as world shareable. An example of code manipulating TZASC. This depends on your hardware, often this information is only given under NDA with chip manufacturer.
The other caveat is that you SHOULD map the memory as non-cacheable or you rely on flushes, which is error prone and could be a security issue, if the system has a VIVT cache. Some ARM CPUs have a VIPT cache and it maybe possible to use cached memory on those systems.
I would also recommend you do not pass addresses via the SMC API. You know the fixed world shareable buffer size. So, it is better to pass an index that is 0..extent-1 and immediately give an error if the address is outside the range. In this way only your initial Linux code needs to create the mapping and then you can use the virtual address given and only pass the index. Naively this seems more secure. Most attacks against TrustZone will be on the API itself.
Related: DMA and TrustZone, Accessing TZASC

Handling PCI read/write to configuration space in a QEMU device

I'm working on implementing a simple PCI device in QEMU and a kernel driver for it, and I have some trouble with handling pci_read/write_config_* function calls from the device side.
Unlike simple rw operations on a memory mapped bar, where the MemoryRegionOps callbacks receive the exact offset used by the driver, the config_read/write callbacks implemented as members in PCIDevice struct, receive an address that went through some manipulations/mapping that I have a hard time understanding.
Following the code path up to pci_config_host_read/write in QEMU sources, and the same in the kernel side for pci_read/write_config_* functions, didn't provide any clear answers.
Can anyone help me understand how to extract the config offset used by the driver when calling the pci config rw functions?

If you set your PCI device model up to implement the QEMU PCIDevice config_read and config_write methods, the addresses passed to them should be the offsets into the PCI config space (ie starting with the standard 0 == PCI_VENDOR_ID, 2 == PCI_DEVICE_ID, 4 == PCI_COMMAND and so on, and any device-specific stuff after the 64 bytes of standardized config space).

are ALSA hw_params buffer sizes the physical card memory size?

I am trying to come up to speed on the ALSA API and have some questions regarding this extensive API. I am using the "hw" interface not the "plughw" interface.
Question #1: Does the snd_hw_params_any() routine retrieve the default/current parameters?
Some documentation/example code indicate this routine fills in the allocated snd_pcm_hw_params_t struct with some values. Are these the current configured settings for the card:device?
The reason I am confused is because, if that routine retrieves the hw_params values, then I should be able to use any of the snd_hw_params_get routines to actually get those values. This works for the snd_hw_params_get_channels() routine. But a call to either snd_pcm_hw_params_get_format() or snd_pcm_hw_params_get_buffer_size() will fail. If I first set the format or buffer size using the set routines, I can then call the get routines to retrieve the values without error. Another way to express this question would be: why does snd_pcm_hw_params_get_format() and snd_pcm_hw_parmas_get_buffer_size() fail when I use this sequence of calls:
snd_pcm_hw_params_alloca();
snd_pcm_hw_params_any();
snd_pcm_hw_params_get_channels();
snd_pcm_hw_params_get_format();
snd_pcm_hw_params_get_buffer_size();
Question #2: How do I determine the actual size of the physical memory on the sound card?
I have noticed that no matter what size I use when calling snd_pcm_hw_params_set_buffer_size/min/max() and then call the snd_pcm_hw_params_get_buffer_size() that I get the same size. Again, I am using the "hw" interface and not the "plughw" interface so it seems reasonable that I cannot set the buffer size because it is actually the amount of physical memory on the card.
Since the snd_pcm_hw_params_get_buffer_size() retrieves the frame size, then the actually size of available physical capture memory on the sound card would that frame size times the number of channels times the sample word length. For example for:
int nNumberOfChannels = 2;
snd_pcm_format_t tFormat = SND_PCM_FORMAT_S16; // 2 bytes per sample
snd_pcm_uframes_t = 16384;
then the actual physical memory on the card would be: 2 * 2 * 16384 = 65536 bytes of physical memory available on the card.
Is this correct or am I confused here?
Thanks,
-Andres

The buffer size returned by snd_pcm_hw_params_get_buffer_size() is not (and never was) the size of the memory resident on the 'soundcard'. In the case of local audio interfaces (e.g. not serial bus-attached devices such as USB or Firewire), this is the size of a buffer in the system's main memory to or from which DMA of audio samples periodically takes place.
For USB and Firewire audio, the buffer this refers to is an entirely software concept - although it might be DMAd from in the case of Firewire. Don't expect to be able to set the buffer size below the minimum isochronous transfer unit size.
I don't think you get any guarantees from ALSA that you can actually change these parameters - it rather depends on the constraints of the hardware in the DMA controller servicing the audio interface - and on its device driver. Arrangements where the buffer size is a power of two is not unusual because it's far easier to implement in hardware.
You should take great care to check the return values from calls to snd_pcm_hw_* API calls and not assume that you got what you requested.

Kernel level memory handling coding

My requirement is to store data in kernel..Data are incoming packets from networks..which may be different in size and have to store for example 250ms duration..and there should be 5 such candidate for which kernel level memory management is required..since packets are coming very fast..my approach is to allocate a large memory say 2mb memory for each such candidate..bez kmalloc and kfree have timing overhead..any help regarding that?

sk_buffs are a generic answer that is network related or as Mike points out a kernel memory cache is even more generic answer to your question. However, I believe you may have put a solution before the question.
The bottle neck with LTE/HSDPA/GSM is the driver and how you get data from the device to the CPU. This depends on how hardware is connected. Are you using SPI, UART, SDHC, USB, PCI?
Also, at least with HSDPA, you need a ppp connection. Isn't LTE the same? Ethernet is not the model to use in this case. Typically you need to emulate a high speed tty. Also, n_gsm supplies a network interface; I am not entirely familiar with this interface, but I suspect that this is to support LTE. This is not well documented. Also, there is the Option USB serial driver, if this is the hardware you are using. An example patch using n_gsm to handle LTE; I believe this patch was reworked into the current n_gsm network support.
You need to tell us more about your hardware.

As already noted within the comments:
struct sk_buff, and it is created for that exact specific purpose
see for example http://www.linuxfoundation.org/collaborate/workgroups/networking/skbuff

How do I obtain PCI Region size in Windows?

I needed to scan my PCI bus and obtain information for specific devices from specific vendors.
My goal is to find the PCI Region size for the AMD Graphics card, in order to map the PCI memory of that card to userspace in order to do i2c transfers and view information from various sensors.
For scanning the PCI bus I downloaded and compiled pciutils 3.1.7 for Windows x64 around a year ago. It supposedly uses DirectIO.
This is my code.
int scan_pci_bus()
{
struct pci_access *pci;
struct pci_dev *dev;
int i;
pci = pci_alloc();
pci_init(pci);
pci_scan_bus(pci);
for(dev = pci->devices; dev; dev = dev->next)
{
pci_fill_info(dev, PCI_FILL_IDENT | PCI_FILL_CLASS | PCI_FILL_IRQ | PCI_FILL_BASES | PCI_FILL_ROM_BASE | PCI_FILL_SIZES | PCI_FILL_PHYS_SLOT);
if(dev->vendor_id == 0x1002 && dev->device_id == 0x6899)
{
//Vendor is AMD, Device ID is a AMD HD5850 GPU
for(i = 0; i < 6; i++)
{
printf("Region Size %d %x ID %x\n", dev->size[i], dev->base_addr[i], dev->device_id);
}
}
}
pci_cleanup(pci);
return 0;
}
As you see in my printf line, I try to print some data, I am successfully printing device_id and base_addr however size which should contain the PCI region size for this device is always 0. I expected, at least one of the cycles from the loop to display a size > 0.
My code is based on a Linux application which uses the same code, though it uses the pci.h headers that come with Linux(pciutils apparenltly has the same APIs).
Apparently, Windows(that is Windows 7 x64 in my case) does not show this information or the at the very least is not exposed to PCIUtils.
How do you propose I obtain this information? If there are alternatives to pciutils for Windows and provide this information, I'd be glad to obtain a link to them.
EDIT:I have still found no solution. If there are any solutions to my problem and also work for 32-bit Windows, It would be deeply appreciated.

How this works is pretty complicated. PCI devices use Base Address Registers to let the BIOS and Operating System decide where to locate their memory regions. Each PCI device is allowed to specify several memory or IO regions it wants, and lets the BIOS/OS decide where to put it. Complicating matters, there's only one register that is used both to specify the size AND the address. How does this work?
When the card is first powered up, it's 32-bit address register will have something like 0xFFFF0000 in it. Any binary 1 means "the OS can change this", any binary 0 means "must stay zero". So this is telling the OS that any of the top 16 bits can be set to whatever the OS wants, but the bottom 16 bits have to stay zero. Which also means that this memory region takes up 16 bits of address space, or 64k. Because of this, memory regions have to be aligned to their size. If a card wants 64K of address space, the OS can only put it on memory addresses that are a multiple of 64K. When the OS has decided where it wants to locate this card's 64K memory space, it writes it back into this register, overwriting the initial 0xFFFF0000 that was in there.
In other words, the card tells the OS what size/alignment it needs for the memory, then the OS overwrites that same register/variable with the address for the memory. Once it's done this, you can't get the size back out of the register without resetting the address.
This means there's no portable way of asking a card how big its region is, all you can ask it is WHERE the region is.
So why does this work in Linux? Because it's asking the kernel for this information. The kernel has an API to provide this stuff, the same way that lspci works. I am not a Windows expert, but I'm unaware of any way for an application to ask the Windows kernel this information. There may be an API to do this somehow, or you may need to write something that runs on the kernel side to pass this information back to you. If you look in the libpci source, for windows it calls the "generic" version of pci_fill_info(), which returns:
return flags & ~PCI_FILL_SIZES;
which basically means "I'm returning everything you asked for, but the sizes."
BUT, this may not matter anyway. If all you're doing is wanting to read/write to the I2C registers, they're usually (always?) in the first 4K of the control/configuration region. You can probably just map 4K (one page) and ignore the fact that there might be more. Also be warned that you may need to take additional steps to stop the real driver for this card from reading/writing while you are. If you're bit-banging the I2C bus manually, and the driver tries to at the same time, it's likely to cause a mess on the bus.
There also may be an existing way to ask the radeon driver to do I2C requests for you, which might avoid all of this.
(also note I'm simplifying and glossing over a lot of details with how the BARs work, including 64 bit addresses, I/O space, etc, read PCI documentation if you want to learn more)

Well whamma gave a very good answer [but] there's one thing he was wrong about, which is region sizes. Region sizes are pretty easy to find, here i will show two ways, the first by deciphering it from the address of the bar, the second through Windows user interface.
Let's assume that E2000000 is the address of the Base Register. If we convert that to binary we get:
11100010000000000000000000000000
Now there are 32 bits here in total, you can count them if you must. Now if you are not familiar with how the bits in
a BAR are layed out, look here -> http://wiki.osdev.org/PCI , specifically "Base Address Registers" and more specifically
the image that reads "Memory Space BAR Layout". Now lets start reading the bits from the right end to the left end and use
the image in the link i pointed to you above as a guide.
So the first bit(Bit 0) starting from the right is 0, indicating that this is a memory address BAR.
Bits(1-2) are 0, indicating that it's a 32-bit(note this is not the size) memory BAR.
Bit 3 is 0, indicating that it's not Prefetchable memory.
Bits 4-31 represent the address.
The page documents the PCI approved process:
To determine the amount of address space needed by a PCI device, you
must save the original value of the BAR, write a value of all 1's to
the register, then read it back. The amount of memory can then be
determined by masking the information bits, performing a bitwise NOT
('~' in C), and incrementing the value by 1. The original value of the
BAR should then be restored. The BAR register is naturally aligned and
as such you can only modify the bits that are set.
The other way is using Device Manager:
Start->"Device Manager"->Display Adapters->Right Click your video card->Properties->Resources. Each resource type marked
"Memory Range" should be a memory BAR and as you can see it says [start address] to [end address]. For example lets say
it read [00000000E2000000 - 00000000E2FFFFFF], to get the size you would take [start address] from [end address]:
00000000E2FFFFFF - 00000000E2000000 = FFFFFF, FFFFFF in decimal = 16777215 = 16777215 bytes = 16MB.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight