HostOnly and GuestOnly PMC bits supported on AMD Family 0x10h CPUs? - c

My company develops a hypervisor, and this question concerns use of AMD's SVM (Secure-Virtual-Machine) API.
I'd like to track exactly how many instructions have executed in my guest operating system in a given period. AMD has kindly provided so-called "HO" and "GO" or "HostOnly" and "GuestOnly" bits in the PerfEvtSel MSRs (0xc0010000..3) in their implementation of their 0x10h family CPUs (Phenom x2, etc.). The BKDG for Family 0x10h indicates that these bits are 40 and 41 of the 64bit PerfEvtSel register. However, the BKDG for Family 0x11h does not state the existence of HostOnly and GuestOnly bits!
The code I have looks like this:
reg_svm_pes_set_unit_mask(&pes, 0x00);
reg_svm_pes_set_usr(&pes, 1); // Count user mode cycles
reg_svm_pes_set_os(&pes, 1); // Count system cycles
reg_svm_pes_set_e(&pes, 0); // Level, not edge
reg_svm_pes_set_pc(&pes, 0);
reg_svm_pes_set_int(&pes, 1); // Trigger interrupt on overflow
reg_svm_pes_set_en(&pes, enabled);
reg_svm_pes_set_inv(&pes, 0); // No invert sense
reg_svm_pes_set_go(&pes, 1); // Count in the guest
reg_svm_pes_set_ho(&pes, 0); // And not in the host...
You have to take my word for it that each of these is a correctly-written inline function that sets the appropriate bit in the PMC register, and that the given code successfully writes and can read back bits 40 and 41 of the MSR. I have verified this.
What I experience is that the counter counts both in the guest and the host. This makes it very difficult to get an exact accounting of only what has happened in the guest.
My questions are:
Do the HostOnly and GuestOnly bits work on Family 0x10h CPUs?
Is there some other machine state that I need to configure in order for this to function?
Has anyone ever seen this feature of this CPU work?
Does anyone know why the BKDG for Family 0x11h CPUs doesn't list this feature as being present. That is, the bits in question are reserved on that family.
Is there any other known method of making SVM implementations turn off PMCs while in the host?

Related

What is the link between Baud rate for embedded C pointers

I am reading manuals regarding controlling a device with C,and in general its just playing with addresses; however when we are connected through UART we have the BAUDRATE present.
So how putting a value into some address have to do with baud-rate?
Is it necessary in embedded programming?
Those addresses are not memory. They are memory-mapped I/O registers.
The address for your UART's baud rate divisor register is a hardware register. The value in hardware registers directly control the hardware. The value written to baud rate divisor register is typically a counter reload value, and one bit period is the time it takes to count up-to (or down from) the value in the divisor given a specific peripheral clock source. So for exaample if the UART peripheral clock were say 12MHz, and you wanted a baud rate of 19200, you would set the divisor register to 12x106/19200 = 625.
Although you can read and write hardware registers as if they were memory, they do not necessarily behave like memory. Some registers may be read-only, others write only, and some writing may have a different effect than reading, such that if you write a value, the value read back will not be what was written. This often works at the bit level, so that each bit in a register may exhibit different behaviour.
For example on many UART implementations the register to which your write data to be sent, is the same address you read for received data - however they are not the same register, but rather a read-only register and a write-only register mapped to the same address.
It is not specifically an embedded programming thing, but rather an I/O hardware thing; it is simply that outside of embedded systems you are not typically writing directly to the hardware unless you happen to be writing a kernel device driver, where you will encounter the same thing.
As well as the device manuals which necessarily assume existing knowledge and expertise, perhaps you should consult a more general reference. Now you know the key term: "memory-mapped I/O" or MMIO, you are in a better position to Google it. Examples:
http://www.cs.uwm.edu/classes/cs315/Bacon/Lecture/HTML/ch14s03.html
https://en.wikipedia.org/wiki/Memory-mapped_I/O

Programming GPIO pins on a FINTEK F81866A chipset

I have a Cincoze DE-1000 industrial PC, that features a Fintek F81866A chipset. I have to manage the DIO pins to read the input from a phisical button and to set on/off a LED. I have experience in C++ programming, but not at low/hardware level.
On the documentation accompanying the PC, there is the following C code:
#define AddrPort 0x4E
#define DataPort 0x4F
//<Enter the Extended Function Mode>
WriteByte(AddrPort, 0x87)
WriteByte(AddrPort, 0x87) //Must write twice to entering Extended mode
//<Select Logic Device>
WriteByte(AddrPort, 0x07)
WriteByte(DataPort, 0x06)
//Select logic device 06h
//<Input Mode Selection> //Set GP74 to GP77 input mode
WriteByte(AddrPort, 0x80) //Select configuration register 80h
WriteByte(DataPort, 0x0X)
//Set (bit 4~7) = 0 to select GP 74~77 as Input mode.
//<input Value>
WriteByte(AddrPort, 0x82) // Select configuration register 82h
ReadByte(DataPort, Value) // Read bit 4~7(0xFx)= GP74 ~77 as High.
//<Leave the Extended Function Mode>
WriteByte(AddrPort, 0xAA)
As far as I understood, the above code should read the value of the four input PINs (so it should read 1 for each PIN), but I am really struggling to understand how it actually works. I have understood the logic (selecting an address and reading/writing an hex value to it), but I cannot figure out what kind of C instructions WriteByte() and ReadByte() are. Also, I do not understand where Value in the line ReadByte(DataPort, Value) comes from. It should read the 4 PINs all together, so it should be some kind of "byte" type and it should contain 1 in its bits 4-7, but again I cannot really grasp the meaning of that line.
I have found an answer for a similar chip, but it did not help me in understanding.
Please advice me or point me to some relevant documentation.
That chip looks like a fairly typical Super I/O controller, which is basically the hub where all of the "slow" peripherals are combined into a single chipset.
Coreboot has a wiki page that talks about how to access the super I/O.
On the PC architecture, Port I/O is accomplished using special CPU instructions, namely in and out. These are privileged instructions, which can only be used from a kernel-mode driver (Ring 0), or a userspace process which has been given I/O privileges.
Luckily, this is easy in Linux. Check out the man page for outb and friends.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
So we could adapt your function into a Linux environment like this:
/* Untested: Use at your own risk! */
#include <sys/io.h>
#include <stdio.h>
#define ReadByte(port) inb(port)
#define WriteByte(port, val) outb(val, port)
int main(void)
{
if (iopl(3) < 0) {
fprintf(stderr, "Failed to get I/O privileges (are you root?)\n");
return 2;
}
/* Your code using ReadByte / WriteByte here */
}
Warning
You should be very careful when using this method to talk directly to the Super IO, because your operating system almost certainly has device drivers that are also talking to the chip.
The right way to accomplish this is to write a device driver that properly coordinates with other kernel code to avoid concurrent access to the device.
The Linux kernel provides GPIO access to at least some Super I/O devices; it should be straightforward to port one of these to your platform. See this pull request for the IT87xx chipset.
WriteByte() and ReadByte() are not part of the C language. By the looks of things, they are functions intended to be placeholders for some form of system call for the OS kernel port IO (not macros doing memory mapped IO directly as per a previous version of this answer).
The prototypes for the functions would be something along the lines of:
#include <stdint.h>
void WriteByte(unsigned port, uint8_t value);
void ReadByte(unsigned port, uint8_t *value);
Thus the Value variable would be a pointer to an 8 bit unsigned integer (unsigned char could also be used), something like:
uint8_t realValue;
uint8_t *Value = &realValue;
Of course it would make much more sense to have Value just be a uint8_t and have ReadByte(DataPort, &Value). But then the example code also doesn't have any semicolons, so was probably never anything that actually ran. Either way, this is how Value would contain the data you are looking for.
I also found some more documentation of the registers here - https://www.electronicsdatasheets.com/download/534cf560e34e2406135f469d.pdf?format=pdf
Hope this helps.

How to Access GPU Registers on Linux

I'm attempting to access the GPU registers on an AMD HD7000 series card. I have sensors attached to the GPUs power input and consumption can be measured. I'd like to develop a Linux C++ application to access the performance counters of the GPU to assess power consumption based on GPU events/operations. I am familiar with C and C++, but I am not familiar with operating systems, and I'm not exactly sure where to get started for such a task.
I have a PDF. It is the Bios and Kernel Developers Guide to AMD 15h Family Models 30h-3Fh located here:
https://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf
In the linked document, on page 106, the performance counters are described. The counters must be set via a control register before they begin counting. On page 715, the counters are defined in more detail.
This document does not list the memory offset, so I checked the open source AMD Linux 4.14.12 drivers. Under /drivers/gpu/drm/amd/amdgpu/sid.h line 1256, I found a performance counter address of 0x2688:
#define CB_PERFCOUNTER0_SELECT0 0x2688
http://elixir.free-electrons.com/linux/v4.14.12/source/drivers/gpu/drm/amd/amdgpu/sid.h#L1256
Also, a start and stop counter integer beginning line 1457:
#define PERFCOUNTER_START (23 << 0)
#define PERFCOUNTER_STOP (24 << 0)
http://elixir.free-electrons.com/linux/v4.14.12/source/drivers/gpu/drm/amd/amdgpu/sid.h#L1457
I'm assuming the CB_PERFCOUNTER defines are relevant addresses and can be used as an offset to the GPU's register memory location. I have searched on the internet very much, and there are many resources, although it is difficult for me to assess whether each resource has any relevant information or how any step fits into the process as a whole. Any direct explanations or references to material would be greatly appreciated.

How do I obtain PCI Region size in Windows?

I needed to scan my PCI bus and obtain information for specific devices from specific vendors.
My goal is to find the PCI Region size for the AMD Graphics card, in order to map the PCI memory of that card to userspace in order to do i2c transfers and view information from various sensors.
For scanning the PCI bus I downloaded and compiled pciutils 3.1.7 for Windows x64 around a year ago. It supposedly uses DirectIO.
This is my code.
int scan_pci_bus()
{
struct pci_access *pci;
struct pci_dev *dev;
int i;
pci = pci_alloc();
pci_init(pci);
pci_scan_bus(pci);
for(dev = pci->devices; dev; dev = dev->next)
{
pci_fill_info(dev, PCI_FILL_IDENT | PCI_FILL_CLASS | PCI_FILL_IRQ | PCI_FILL_BASES | PCI_FILL_ROM_BASE | PCI_FILL_SIZES | PCI_FILL_PHYS_SLOT);
if(dev->vendor_id == 0x1002 && dev->device_id == 0x6899)
{
//Vendor is AMD, Device ID is a AMD HD5850 GPU
for(i = 0; i < 6; i++)
{
printf("Region Size %d %x ID %x\n", dev->size[i], dev->base_addr[i], dev->device_id);
}
}
}
pci_cleanup(pci);
return 0;
}
As you see in my printf line, I try to print some data, I am successfully printing device_id and base_addr however size which should contain the PCI region size for this device is always 0. I expected, at least one of the cycles from the loop to display a size > 0.
My code is based on a Linux application which uses the same code, though it uses the pci.h headers that come with Linux(pciutils apparenltly has the same APIs).
Apparently, Windows(that is Windows 7 x64 in my case) does not show this information or the at the very least is not exposed to PCIUtils.
How do you propose I obtain this information? If there are alternatives to pciutils for Windows and provide this information, I'd be glad to obtain a link to them.
EDIT:I have still found no solution. If there are any solutions to my problem and also work for 32-bit Windows, It would be deeply appreciated.
How this works is pretty complicated. PCI devices use Base Address Registers to let the BIOS and Operating System decide where to locate their memory regions. Each PCI device is allowed to specify several memory or IO regions it wants, and lets the BIOS/OS decide where to put it. Complicating matters, there's only one register that is used both to specify the size AND the address. How does this work?
When the card is first powered up, it's 32-bit address register will have something like 0xFFFF0000 in it. Any binary 1 means "the OS can change this", any binary 0 means "must stay zero". So this is telling the OS that any of the top 16 bits can be set to whatever the OS wants, but the bottom 16 bits have to stay zero. Which also means that this memory region takes up 16 bits of address space, or 64k. Because of this, memory regions have to be aligned to their size. If a card wants 64K of address space, the OS can only put it on memory addresses that are a multiple of 64K. When the OS has decided where it wants to locate this card's 64K memory space, it writes it back into this register, overwriting the initial 0xFFFF0000 that was in there.
In other words, the card tells the OS what size/alignment it needs for the memory, then the OS overwrites that same register/variable with the address for the memory. Once it's done this, you can't get the size back out of the register without resetting the address.
This means there's no portable way of asking a card how big its region is, all you can ask it is WHERE the region is.
So why does this work in Linux? Because it's asking the kernel for this information. The kernel has an API to provide this stuff, the same way that lspci works. I am not a Windows expert, but I'm unaware of any way for an application to ask the Windows kernel this information. There may be an API to do this somehow, or you may need to write something that runs on the kernel side to pass this information back to you. If you look in the libpci source, for windows it calls the "generic" version of pci_fill_info(), which returns:
return flags & ~PCI_FILL_SIZES;
which basically means "I'm returning everything you asked for, but the sizes."
BUT, this may not matter anyway. If all you're doing is wanting to read/write to the I2C registers, they're usually (always?) in the first 4K of the control/configuration region. You can probably just map 4K (one page) and ignore the fact that there might be more. Also be warned that you may need to take additional steps to stop the real driver for this card from reading/writing while you are. If you're bit-banging the I2C bus manually, and the driver tries to at the same time, it's likely to cause a mess on the bus.
There also may be an existing way to ask the radeon driver to do I2C requests for you, which might avoid all of this.
(also note I'm simplifying and glossing over a lot of details with how the BARs work, including 64 bit addresses, I/O space, etc, read PCI documentation if you want to learn more)
Well whamma gave a very good answer [but] there's one thing he was wrong about, which is region sizes. Region sizes are pretty easy to find, here i will show two ways, the first by deciphering it from the address of the bar, the second through Windows user interface.
Let's assume that E2000000 is the address of the Base Register. If we convert that to binary we get:
11100010000000000000000000000000
Now there are 32 bits here in total, you can count them if you must. Now if you are not familiar with how the bits in
a BAR are layed out, look here -> http://wiki.osdev.org/PCI , specifically "Base Address Registers" and more specifically
the image that reads "Memory Space BAR Layout". Now lets start reading the bits from the right end to the left end and use
the image in the link i pointed to you above as a guide.
So the first bit(Bit 0) starting from the right is 0, indicating that this is a memory address BAR.
Bits(1-2) are 0, indicating that it's a 32-bit(note this is not the size) memory BAR.
Bit 3 is 0, indicating that it's not Prefetchable memory.
Bits 4-31 represent the address.
The page documents the PCI approved process:
To determine the amount of address space needed by a PCI device, you
must save the original value of the BAR, write a value of all 1's to
the register, then read it back. The amount of memory can then be
determined by masking the information bits, performing a bitwise NOT
('~' in C), and incrementing the value by 1. The original value of the
BAR should then be restored. The BAR register is naturally aligned and
as such you can only modify the bits that are set.
The other way is using Device Manager:
Start->"Device Manager"->Display Adapters->Right Click your video card->Properties->Resources. Each resource type marked
"Memory Range" should be a memory BAR and as you can see it says [start address] to [end address]. For example lets say
it read [00000000E2000000 - 00000000E2FFFFFF], to get the size you would take [start address] from [end address]:
00000000E2FFFFFF - 00000000E2000000 = FFFFFF, FFFFFF in decimal = 16777215 = 16777215 bytes = 16MB.

Writing Flash on STM32

I am implementing a emulated EEPROM in flash memory on a STM32 microprocessor, mostly based on the Application Note by ST (AN2594 - EEPROM emulation in STM32F10x microcontrollers).
The basics outline there and in the respective Datasheet and Programming manual (PM0075) are quite clear. However, I am unsure regarding the implications of power-out/system reset on flash programming and page erasure operations. The AppNote considers this case, too but does not clarify what exactly happens when a programming (write) operations is interrupted:
Does the address have a arbitrary (random) value? OR
Are only part of the bits written? OR
Does it have the default erase value 0xFF?
Thanks for hints or pointers to the relevant documentation.
Arne
This is not really a software question (much less C++). It belongs on electronics.se, but there does not seem to be an option to migrate questions thereā€¦ only to sites such as superuser or webmasters.se.
The short answer is that hardware is inherently unreliable. Something can always in theory go wrong that interrupts the write process or causes the wrong bit to be written.
The long answer is that Flash circuits are usually designed for maximum reliability. A sudden power loss on write will probably not cause corruption because the driver circuit may have enough capacitance or the capability to operate under a low-voltage condition long enough to finish draining the charge as necessary. A power loss on erasure might be trickier. You really need to consult the manufacturer.
For a "soft" system reset with no power interruption, it would be pretty surprising if the hardware didn't always completely erase whatever bytes it was immediately working on. Usually the bytes are erased in a predefined order, so you can use the first or last ones to indicate whether a page is full or empty.
#include "stm32f10x.h"
#define FLASH_KEY1 ((uint32_t)0x45670123)
#define FLASH_KEY2 ((uint32_t)0xCDEF89AB)
#define Page_127 0x0801FC00
uint16_t i;
int main()
{
//FLASH_Unlock
FLASH->KEYR = FLASH_KEY1;
FLASH->KEYR = FLASH_KEY2;
//FLASH_Erase Page
while((FLASH->SR&FLASH_SR_BSY));
FLASH->CR |= FLASH_CR_PER; //Page Erase Set
FLASH->AR = Page_127; //Page Address
FLASH->CR |= FLASH_CR_STRT; //Start Page Erase
while((FLASH->SR&FLASH_SR_BSY));
FLASH->CR &= ~FLASH_CR_PER; //Page Erase Clear
//FLASH_Program HalfWord
FLASH->CR |= FLASH_CR_PG;
for(i=0; i<1024; i+=2)
{
while((FLASH->SR&FLASH_SR_BSY));
*(__IO uint16_t*)(Page_127 + i) = i;
}
FLASH->CR &= ~FLASH_CR_PG;
FLASH->CR |= FLASH_CR_LOCK;
while(1);
}
If you are using the EEProm Emulation driver, you shouldn't worry too much about the flash corruption issues as the EEProm emulation driver always keeps a shadow copy in another page. Worst come worst, you will lose the most recent values that are being written into the flash. If you look closely on the emulation driver, you will notice that it is nothing but essentially a wrapper to stm32fxx_flash.c in the standard peripheral library.
If you look at the application note, you will see the times that the emulation library take for the flash operations. Erasing a page typically takes the longest time (tens of milliseconds on M0 core - this depends on the clock frequency).
If you are using the EEProm Emulation driver, you had bettern add a function such as check the data after write finished.
For example, if you have 10 data to save, so you need write 11 bytes to flash. The last byte is checksum. And check the data after read from flash.

Resources