How do I obtain PCI Region size in Windows? - c

I needed to scan my PCI bus and obtain information for specific devices from specific vendors.
My goal is to find the PCI Region size for the AMD Graphics card, in order to map the PCI memory of that card to userspace in order to do i2c transfers and view information from various sensors.
For scanning the PCI bus I downloaded and compiled pciutils 3.1.7 for Windows x64 around a year ago. It supposedly uses DirectIO.
This is my code.
int scan_pci_bus()
{
struct pci_access *pci;
struct pci_dev *dev;
int i;
pci = pci_alloc();
pci_init(pci);
pci_scan_bus(pci);
for(dev = pci->devices; dev; dev = dev->next)
{
pci_fill_info(dev, PCI_FILL_IDENT | PCI_FILL_CLASS | PCI_FILL_IRQ | PCI_FILL_BASES | PCI_FILL_ROM_BASE | PCI_FILL_SIZES | PCI_FILL_PHYS_SLOT);
if(dev->vendor_id == 0x1002 && dev->device_id == 0x6899)
{
//Vendor is AMD, Device ID is a AMD HD5850 GPU
for(i = 0; i < 6; i++)
{
printf("Region Size %d %x ID %x\n", dev->size[i], dev->base_addr[i], dev->device_id);
}
}
}
pci_cleanup(pci);
return 0;
}
As you see in my printf line, I try to print some data, I am successfully printing device_id and base_addr however size which should contain the PCI region size for this device is always 0. I expected, at least one of the cycles from the loop to display a size > 0.
My code is based on a Linux application which uses the same code, though it uses the pci.h headers that come with Linux(pciutils apparenltly has the same APIs).
Apparently, Windows(that is Windows 7 x64 in my case) does not show this information or the at the very least is not exposed to PCIUtils.
How do you propose I obtain this information? If there are alternatives to pciutils for Windows and provide this information, I'd be glad to obtain a link to them.
EDIT:I have still found no solution. If there are any solutions to my problem and also work for 32-bit Windows, It would be deeply appreciated.

How this works is pretty complicated. PCI devices use Base Address Registers to let the BIOS and Operating System decide where to locate their memory regions. Each PCI device is allowed to specify several memory or IO regions it wants, and lets the BIOS/OS decide where to put it. Complicating matters, there's only one register that is used both to specify the size AND the address. How does this work?
When the card is first powered up, it's 32-bit address register will have something like 0xFFFF0000 in it. Any binary 1 means "the OS can change this", any binary 0 means "must stay zero". So this is telling the OS that any of the top 16 bits can be set to whatever the OS wants, but the bottom 16 bits have to stay zero. Which also means that this memory region takes up 16 bits of address space, or 64k. Because of this, memory regions have to be aligned to their size. If a card wants 64K of address space, the OS can only put it on memory addresses that are a multiple of 64K. When the OS has decided where it wants to locate this card's 64K memory space, it writes it back into this register, overwriting the initial 0xFFFF0000 that was in there.
In other words, the card tells the OS what size/alignment it needs for the memory, then the OS overwrites that same register/variable with the address for the memory. Once it's done this, you can't get the size back out of the register without resetting the address.
This means there's no portable way of asking a card how big its region is, all you can ask it is WHERE the region is.
So why does this work in Linux? Because it's asking the kernel for this information. The kernel has an API to provide this stuff, the same way that lspci works. I am not a Windows expert, but I'm unaware of any way for an application to ask the Windows kernel this information. There may be an API to do this somehow, or you may need to write something that runs on the kernel side to pass this information back to you. If you look in the libpci source, for windows it calls the "generic" version of pci_fill_info(), which returns:
return flags & ~PCI_FILL_SIZES;
which basically means "I'm returning everything you asked for, but the sizes."
BUT, this may not matter anyway. If all you're doing is wanting to read/write to the I2C registers, they're usually (always?) in the first 4K of the control/configuration region. You can probably just map 4K (one page) and ignore the fact that there might be more. Also be warned that you may need to take additional steps to stop the real driver for this card from reading/writing while you are. If you're bit-banging the I2C bus manually, and the driver tries to at the same time, it's likely to cause a mess on the bus.
There also may be an existing way to ask the radeon driver to do I2C requests for you, which might avoid all of this.
(also note I'm simplifying and glossing over a lot of details with how the BARs work, including 64 bit addresses, I/O space, etc, read PCI documentation if you want to learn more)

Well whamma gave a very good answer [but] there's one thing he was wrong about, which is region sizes. Region sizes are pretty easy to find, here i will show two ways, the first by deciphering it from the address of the bar, the second through Windows user interface.
Let's assume that E2000000 is the address of the Base Register. If we convert that to binary we get:
11100010000000000000000000000000
Now there are 32 bits here in total, you can count them if you must. Now if you are not familiar with how the bits in
a BAR are layed out, look here -> http://wiki.osdev.org/PCI , specifically "Base Address Registers" and more specifically
the image that reads "Memory Space BAR Layout". Now lets start reading the bits from the right end to the left end and use
the image in the link i pointed to you above as a guide.
So the first bit(Bit 0) starting from the right is 0, indicating that this is a memory address BAR.
Bits(1-2) are 0, indicating that it's a 32-bit(note this is not the size) memory BAR.
Bit 3 is 0, indicating that it's not Prefetchable memory.
Bits 4-31 represent the address.
The page documents the PCI approved process:
To determine the amount of address space needed by a PCI device, you
must save the original value of the BAR, write a value of all 1's to
the register, then read it back. The amount of memory can then be
determined by masking the information bits, performing a bitwise NOT
('~' in C), and incrementing the value by 1. The original value of the
BAR should then be restored. The BAR register is naturally aligned and
as such you can only modify the bits that are set.
The other way is using Device Manager:
Start->"Device Manager"->Display Adapters->Right Click your video card->Properties->Resources. Each resource type marked
"Memory Range" should be a memory BAR and as you can see it says [start address] to [end address]. For example lets say
it read [00000000E2000000 - 00000000E2FFFFFF], to get the size you would take [start address] from [end address]:
00000000E2FFFFFF - 00000000E2000000 = FFFFFF, FFFFFF in decimal = 16777215 = 16777215 bytes = 16MB.

Related

Usage of EN4B command

Can anybody explains the usage of EN4B command of micron SPI chips.
I want to know the difference between 3 byte and 4 byte address mode in SPI.
I was going through the SPI drivers where I found this commands.
Thanks in Advance !!
From a legacy point of view, SPI commands have always used 3 bytes for the address interested by their operation.
This was fine as with 24 bits it is possible to address up to 128MiB.
When the Flashes grew larger it was needed to switch from 3 bytes to 4 bytes addressing.
Whenever you have any doubts regarding the hardware you can find the answers in the proper datasheet, I don't know what specific chip you are referring to however.
I found the Micron N25Q512A NOR Flash, which is 512MiB so it needs a form of 4 bytes addressing; from it you can learn that
There are 3 bytes legacy commands and new 4 bytes commands.
For example 03h and 13h for the single read.
You can supply a default fourth address byte with a specific register.
The Extended Address Register let you choose the region of the flash for the legacy commands.
You can enable 4 bytes addressing for legacy command.
Either write the appropriate bit in the Nonvolatile Configuration Register or use the ENTER / EXIT 4-BYTE ADDRESS MODE (opcodes B7h and E9h respectively)
This Linux patch also have some insights, basically telling that some chips only support one of the three points above.
Macronix seems to have first opted for the number 3 only and Spansion for the number 1.
Checking some datasheet of theirs seems to suggests that now both support all three methods.

c166 bootloader write to internal flash

I'm writing a bootloader for a c166 chip, to be exact, the 169FH. The bootloader can currently open a TCP/IP Connection so a PC can send an Intel hex file to the bootloader. This intel hex file is saved in the RAM. After receiving the hex file it is read line by line to set the bytes to the correct location in the flash. The flash location where the bootloader is stored is ofcourse different from where the main program can be saved.
This are the first two lines of the intel hex file:
:0200000400C03A
:20100000CA11EE111212341258127A129A12BC12DE12001322134413601388138813881349
The first line is to get the highest 16 bits of the 32bit flash address, which is in this case 0x00C0
in the second line are the lower 16 bits of the 32 bit flash address, which is in this case 0x1000. This creates to total address of 0x00C01000, the byte written to that address should be 0xCA.
I'm trying to write the byte to that address using the following code:
uint8_t u8Byte = (uint8_t )XHugeStringToDec((const __huge char *)&Ext_RamFlashSave.Byte[u32Idx], 9+(u8ByteIdx*2), 2);
uint8_t *u8Address = (uint8_t*)((uint32_t)(u32ExtendedLinearAddress << 16) + (uint32_t)u16BaseAddress + (uint32_t)u8ByteIdx);
*u8Address = (u8Byte);
XHugeStringToDec() is a function to get the hexadecimal value from the intel hex string. I know this is going correct.
Ext_RamFlashSave.Byte is the array where the intel hex file is storedin.
The u32ExtendedLinearAddress variable is 0x0C00, and is set earlier. The u16BaseAddress is 0x1000 and is also set earlier in code.
The problem is in the last line:
*u8Address = (u8Byte);
I have verified that u8Address is indeed 0x0C01000 and u8Byte is indeed 0xCA. But when I monitor my flash address, I do not see the byte written.
I can imagine that it is some kind of write protection, but I cannot find out how to work around this, or do I need a different way to write to the Flash address?
More info of how the intel-hex file is build is described here:
https://en.wikipedia.org/wiki/Intel_HEX
I am not familier with the chip you said.
But for writing to flash, Generally, following algorithm is used:
Unlock the flash. This is usually sending specific data sequence. The flash I use right now has 0xA0A0-> delay of 1msec -> 0x0505. This will enable writing to flash
Clear the error flags. Flash has some error flags like Write error or read error. Also, check for Busy flag to make sure Flash is ready
Erase the Page. Yes... You have to erase the page before writing to it. Even if you want to change a single bit, You have to erase entire page.
Write the data (Finally) But make sure that endien-ness is correct. Sometimes you Controller is Little Endien and Flash is Big Endien. So you have to match to Flash's.
Lock the Flash. This is usually done by sending same sequence which used for unlocking. (Refer Datasheet)
You cannot write to a flash directly... Have to go through entire process. That's why Flash are slow.
Some flash support 8bit writing while some only 16bit or 32 bit etc. So, you have to send those many bits while writing.
When you have to modify a small portion of a page, Read the page in a buffer. Modify data in the buffer. Erase the page and write entire buffer.
If you are modifying a

Pci express - communicate kernel -> graphic card

The final goal is to be able to write to a PCIE device from the kernel, without the already made functions, to understand the inner working (and then, obviously, use them again).
I saw the PCIE specs which are 800+ pages (the 3.0 especially).
Huge is not enough to describe that.
I cannot afford to go along those at the step I currently am (reading 2.0 and 3.1 would be very time consuming).
I read many sources and it seems that we can write to pcie via messages (and not buses anymore like in pci).
Pcie should be memory mapped so I think that we could write to that memory from the kernel and aknowledge the driver that we did it.
The driver will then make the out(l/b) assembly instruction to notify the device in question.
This my very high level understanding of pcie (I don't want to dive into the spec details now). It may not be correct though.
If someone could tell me where I am wrong in my thinking, that would be very helpful.
Here is the pseudo code of my thinking (no error checking and such):
static int64_t my_driver_address;
pcie_write_device(uint32_t * my_data_physical_address) {
// we pass the physical address where the data is. It has to be contiguous.
pcie_send_address(&my_driver_address, my_data_physical_address);
// now the device is acknowledged that some data has been mmaped, knows where and will treat it as such
}
pcie_read_device(anytype_t ** buff){
// this function calls the inq assembly instruction and store the resulting address in the pointer
ptr * address_to_read = pcie_get_data();
// read the mmaped memory region. No mem allocation code.
*buf = get_data_from region(address_to_read);
// now data from device is in the buff, ready to be sent to the OS or anything..
}

are ALSA hw_params buffer sizes the physical card memory size?

I am trying to come up to speed on the ALSA API and have some questions regarding this extensive API. I am using the "hw" interface not the "plughw" interface.
Question #1: Does the snd_hw_params_any() routine retrieve the default/current parameters?
Some documentation/example code indicate this routine fills in the allocated snd_pcm_hw_params_t struct with some values. Are these the current configured settings for the card:device?
The reason I am confused is because, if that routine retrieves the hw_params values, then I should be able to use any of the snd_hw_params_get routines to actually get those values. This works for the snd_hw_params_get_channels() routine. But a call to either snd_pcm_hw_params_get_format() or snd_pcm_hw_params_get_buffer_size() will fail. If I first set the format or buffer size using the set routines, I can then call the get routines to retrieve the values without error. Another way to express this question would be: why does snd_pcm_hw_params_get_format() and snd_pcm_hw_parmas_get_buffer_size() fail when I use this sequence of calls:
snd_pcm_hw_params_alloca();
snd_pcm_hw_params_any();
snd_pcm_hw_params_get_channels();
snd_pcm_hw_params_get_format();
snd_pcm_hw_params_get_buffer_size();
Question #2: How do I determine the actual size of the physical memory on the sound card?
I have noticed that no matter what size I use when calling snd_pcm_hw_params_set_buffer_size/min/max() and then call the snd_pcm_hw_params_get_buffer_size() that I get the same size. Again, I am using the "hw" interface and not the "plughw" interface so it seems reasonable that I cannot set the buffer size because it is actually the amount of physical memory on the card.
Since the snd_pcm_hw_params_get_buffer_size() retrieves the frame size, then the actually size of available physical capture memory on the sound card would that frame size times the number of channels times the sample word length. For example for:
int nNumberOfChannels = 2;
snd_pcm_format_t tFormat = SND_PCM_FORMAT_S16; // 2 bytes per sample
snd_pcm_uframes_t = 16384;
then the actual physical memory on the card would be: 2 * 2 * 16384 = 65536 bytes of physical memory available on the card.
Is this correct or am I confused here?
Thanks,
-Andres
The buffer size returned by snd_pcm_hw_params_get_buffer_size() is not (and never was) the size of the memory resident on the 'soundcard'. In the case of local audio interfaces (e.g. not serial bus-attached devices such as USB or Firewire), this is the size of a buffer in the system's main memory to or from which DMA of audio samples periodically takes place.
For USB and Firewire audio, the buffer this refers to is an entirely software concept - although it might be DMAd from in the case of Firewire. Don't expect to be able to set the buffer size below the minimum isochronous transfer unit size.
I don't think you get any guarantees from ALSA that you can actually change these parameters - it rather depends on the constraints of the hardware in the DMA controller servicing the audio interface - and on its device driver. Arrangements where the buffer size is a power of two is not unusual because it's far easier to implement in hardware.
You should take great care to check the return values from calls to snd_pcm_hw_* API calls and not assume that you got what you requested.

HostOnly and GuestOnly PMC bits supported on AMD Family 0x10h CPUs?

My company develops a hypervisor, and this question concerns use of AMD's SVM (Secure-Virtual-Machine) API.
I'd like to track exactly how many instructions have executed in my guest operating system in a given period. AMD has kindly provided so-called "HO" and "GO" or "HostOnly" and "GuestOnly" bits in the PerfEvtSel MSRs (0xc0010000..3) in their implementation of their 0x10h family CPUs (Phenom x2, etc.). The BKDG for Family 0x10h indicates that these bits are 40 and 41 of the 64bit PerfEvtSel register. However, the BKDG for Family 0x11h does not state the existence of HostOnly and GuestOnly bits!
The code I have looks like this:
reg_svm_pes_set_unit_mask(&pes, 0x00);
reg_svm_pes_set_usr(&pes, 1); // Count user mode cycles
reg_svm_pes_set_os(&pes, 1); // Count system cycles
reg_svm_pes_set_e(&pes, 0); // Level, not edge
reg_svm_pes_set_pc(&pes, 0);
reg_svm_pes_set_int(&pes, 1); // Trigger interrupt on overflow
reg_svm_pes_set_en(&pes, enabled);
reg_svm_pes_set_inv(&pes, 0); // No invert sense
reg_svm_pes_set_go(&pes, 1); // Count in the guest
reg_svm_pes_set_ho(&pes, 0); // And not in the host...
You have to take my word for it that each of these is a correctly-written inline function that sets the appropriate bit in the PMC register, and that the given code successfully writes and can read back bits 40 and 41 of the MSR. I have verified this.
What I experience is that the counter counts both in the guest and the host. This makes it very difficult to get an exact accounting of only what has happened in the guest.
My questions are:
Do the HostOnly and GuestOnly bits work on Family 0x10h CPUs?
Is there some other machine state that I need to configure in order for this to function?
Has anyone ever seen this feature of this CPU work?
Does anyone know why the BKDG for Family 0x11h CPUs doesn't list this feature as being present. That is, the bits in question are reserved on that family.
Is there any other known method of making SVM implementations turn off PMCs while in the host?

Resources