How to write directly to FPGA peripherals from SoC? - arm

I'm working on an Altera Cyclone V SoC. I'm attempting to write directly to FPGA peripherals from my SoC, however, the hwlib library only contains the function alt_write_word, which I understand that this function writes to the cache first before writing it to the main memory. In NIOS II, the built in function IOWR has already configured the memory so that the IOWR function writes directly to the FPGA peripherals. So, my question is, when I'm working with SoC, if the hwlib library doesn't provide such a function, how can I write directly to the FPGA peripherals ? Do I need to configure the memory type or what ?

I assume that your application If your application is bare-metal and thus you are not using the MMU (i.e. virtual addresses), you can try to reach the peripherals directly by pointing them with a volatile qualified pointer. Here is a little example:
volatile uint32_t* registerAddress = PERIPHERAL_BASE_ADDR;
const uint32_t registerValue = *registerAddress;
It is also possible to write onto them if the hardware permits it.
*registerAddress = 0xDEADBEEF;
The addresses of peripherals should be excluded from the cacheable range. If you are already using an SoC with a dedicated bus architecture, then you don't need to worry about it. Otherwise, you might need to adjust your cacheable range when designing the system.

Related

ARM TrustZone: Accessing a non-secure buffer from a secure monitor runtime service

My setup consists of a STM32MP157C-DK2 which uses Trusted Firmware-A to load SP-MIN as BL32 and uBoot+Linux as BL33.
I am trying to get a small example working where I create an SMC from the Linux Kernel which passes a reference to non-secure memory. The data at that location should be altered by the runtime service handling the SMC.
The problem I'm facing is that I can't find any information on what steps are required in order to translate the virtual address from the Linux Kernel at NS:EL1 to the translation regime of EL3.
The code of my runtime service looks like this:
static int32_t my_svc_setup(void)
{
return 0;
}
static uintptr_t my_svc_smc_handler(uint32_t smc_fid,
u_register_t x1,
u_register_t x2,
u_register_t x3,
u_register_t x4,
void *cookie,
void *handle,
u_register_t flags)
{
uint16_t smc_function_number = (uint16_t) smc_fid;
uint32_t *data;
switch(smc_function_number){
case 123:
data = (uint32_t *) x1;
// Address Translation Magic ...
*data = 42;
SMC_RET1(handle, 1);
default:
SMC_RET1(handle, SMC_UNK);
}
}
DECLARE_RT_SVC(
my_svc,
OEN_OEM_START,
OEN_OEM_END,
SMC_TYPE_FAST,
my_svc_setup,
my_svc_smc_handler
);
The SMC reaches the handler without issues, but as soon as I try to dereference the physical address I passed through x1 the CPU (obviously) crashes. If anyone could help me fill in the remaining required steps in order to get a valid reference, that would be greatly appreciated.
The problem I'm facing is that I can't find any information on what steps are required in order to translate the virtual address from the Linux Kernel at NS:EL1 to the translation regime of EL3.
The TrustZone protection is based on a physical address. For either NS:EL1 or EL3, you can map using an MMU in various ways, but both must map to the same physical address. For Linux kernel, you need to add a mapping of the shared memory that is backed by the physical address. You can use virt_to_phys() with such a mapping to find the physical address.
You need to have the same mapping available in the EL3. The simplest is to have a flat virt==phys mapping with sections and super-sections.
Another portion is that you MUST setup the TZASC to have permissions of the physical portion as world shareable. An example of code manipulating TZASC. This depends on your hardware, often this information is only given under NDA with chip manufacturer.
The other caveat is that you SHOULD map the memory as non-cacheable or you rely on flushes, which is error prone and could be a security issue, if the system has a VIVT cache. Some ARM CPUs have a VIPT cache and it maybe possible to use cached memory on those systems.
I would also recommend you do not pass addresses via the SMC API. You know the fixed world shareable buffer size. So, it is better to pass an index that is 0..extent-1 and immediately give an error if the address is outside the range. In this way only your initial Linux code needs to create the mapping and then you can use the virtual address given and only pass the index. Naively this seems more secure. Most attacks against TrustZone will be on the API itself.
Related: DMA and TrustZone, Accessing TZASC

Handling PCI read/write to configuration space in a QEMU device

I'm working on implementing a simple PCI device in QEMU and a kernel driver for it, and I have some trouble with handling pci_read/write_config_* function calls from the device side.
Unlike simple rw operations on a memory mapped bar, where the MemoryRegionOps callbacks receive the exact offset used by the driver, the config_read/write callbacks implemented as members in PCIDevice struct, receive an address that went through some manipulations/mapping that I have a hard time understanding.
Following the code path up to pci_config_host_read/write in QEMU sources, and the same in the kernel side for pci_read/write_config_* functions, didn't provide any clear answers.
Can anyone help me understand how to extract the config offset used by the driver when calling the pci config rw functions?
If you set your PCI device model up to implement the QEMU PCIDevice config_read and config_write methods, the addresses passed to them should be the offsets into the PCI config space (ie starting with the standard 0 == PCI_VENDOR_ID, 2 == PCI_DEVICE_ID, 4 == PCI_COMMAND and so on, and any device-specific stuff after the 64 bytes of standardized config space).

Where does the DMA store ADC values in STM32?

I enabled DMA peripheral to memory tranfer for ADC1 in CubeMX and generated the code. However I'm confused as to where the data from the ADC will be written to? Should I explicitly define a variable to contain this data? How can I retrieve the data in the DMA Channel 1 ISR?
The DMA does not manage memory nor choose a valid address to set the data. General speaking, the DMA allows data transfers without using the CPU, but no more.
The STM32 microcontrollers provide transfers from:
memory to memory
memory to peripheral
peripheral to memory
In all of them, the developers have to be aware about them purpose in order to configure (besides of DMA) the source and destination places, such as address of the peripherals, reserve memory (and what kind of memory), etc.
In your particular case (check RM, AN, docs, etc), the main actors in an ADC to memory (peripheral to memory) transfer are:
Source: ADC peripheral, the developer has to know where the ADC peripheral is located and configure (besides of ADC) the DMA based on the ADC parameters as the source of information.
Destination: memory, the developer has to reserve a bunch of memory (heap/stack/global/etc) and configure the DMA according to the already allocated space of memory. Doing that, DMA will allow you to set the values in different ways (depending on the device), such as continuous ring buffer, one cycle, ping-pong buffer (stm32 uses the term "circular double buffer"), etc.
DMA and ADC configuration: there are vast amount of factors which for the sake of simplicity I am not going to include, usually simplified by the manufacturer's HAL (it is up to you to use it).
You instruct the HAL DMA ADC driver where to put the sample data when you start the conversion:
volatile uint32_t adcBuffer[SAMPLE_COUNT];
HAL_ADC_Start_DMA( &hadc,
adcBuffer,
SAMPLE_COUNT );
Note that some STM32 parts have SRAM divided across multiple buses with one section very much smaller than others. There are performance benefits to be had in reserving this section for DMA buffers since it reduces bus contention with normal software data fetches. So you may want to customise your linker script to create sections and explicitly place DMA buffers in one while excluding placement of application data there.
If you have a look at the HAL documents and examples you findet an example how to use the ADC with DMA.
In short :
To start the conversion you use the function:
HAL_StatusTypeDef HAL_ADC_Start_DMA(ADC_HandleTypeDef* hadc, uint32_t* pData, uint32_t Length);
Where pData is your variable / array where the DMA should put the data.
DMA and uC do not know anything about the variables. DMA peripheral has two configuration registers where you store the peripheral address and the memory address. If you start from reading the uC documentation instead of HAL everything would be clear instantly

Pci express - communicate kernel -> graphic card

The final goal is to be able to write to a PCIE device from the kernel, without the already made functions, to understand the inner working (and then, obviously, use them again).
I saw the PCIE specs which are 800+ pages (the 3.0 especially).
Huge is not enough to describe that.
I cannot afford to go along those at the step I currently am (reading 2.0 and 3.1 would be very time consuming).
I read many sources and it seems that we can write to pcie via messages (and not buses anymore like in pci).
Pcie should be memory mapped so I think that we could write to that memory from the kernel and aknowledge the driver that we did it.
The driver will then make the out(l/b) assembly instruction to notify the device in question.
This my very high level understanding of pcie (I don't want to dive into the spec details now). It may not be correct though.
If someone could tell me where I am wrong in my thinking, that would be very helpful.
Here is the pseudo code of my thinking (no error checking and such):
static int64_t my_driver_address;
pcie_write_device(uint32_t * my_data_physical_address) {
// we pass the physical address where the data is. It has to be contiguous.
pcie_send_address(&my_driver_address, my_data_physical_address);
// now the device is acknowledged that some data has been mmaped, knows where and will treat it as such
}
pcie_read_device(anytype_t ** buff){
// this function calls the inq assembly instruction and store the resulting address in the pointer
ptr * address_to_read = pcie_get_data();
// read the mmaped memory region. No mem allocation code.
*buf = get_data_from region(address_to_read);
// now data from device is in the buff, ready to be sent to the OS or anything..
}

Register mapping in a ARM based SoC

I want to understand how the registers of various peripherals/IPs are mapped to the ARM processor memory map in a microcontroller.
Say, I have a CONTROL register for UART block. When I do a write access to address (40005008), this register gets configured. Where does this mapping happens: Within the peripheral block code itself or while integrating this peripheral to the SoC/microcontroller.
For a simple peripheral like a UART it's straightforward - taking the ARM PL011 UART as an example (since I know where its documentation lives):
The programmer's model defines a bunch of registers at word-aligned offsets in a 4k block.
In terms of the actual hardware, we see the bus interface matches what the programmer's model suggests - PADDR[11:2] means only bits 11:2 of the address are connected, meaning it can only understand word-aligned addresses from 0x000 to 0xffc (similarly, note that only 16 bits of read/write data are connected, since no register is wider than that).
The memory-mapping between the UART's 12-bit address and the full 32-bit address that the CPU core spits out happens in the interconnect hardware between them. At design time, the interconnect address map will be configured to say "this 4k region at 0x40005000 is assigned to the UART0 block", etc., and the resulting bus circuitry will be generated for that.
More complex things like e.g. DMA-capable devices typically have separate interfaces for configuration and data access, so the registers can be mapped in a small relocatable block on a low-speed peripheral bus much like the UART.
Most significant bits are defined by your ASIC design, least significant bits are defined by the IP design. Your IP has several registers. The number of register, their order, is defined by the IP design. Here, your register is at address 8. Then when designing the ASIC, the peripherals are connected to the memory bus, and the way they are connected define their address. Your UART is at 40005000. You may have an other instance of the same IP at (for instance) 40006000. The two UART would be strictly identical, and you would be able to access CONTROL register of your second UART at address 40006008.

Resources