PAPI_num_counters() shows the system doesn't have available counters

PAPI_num_counters() shows the system doesn't have available counters - c

I have a question regarding PAPI (Performance Application Programming Interface). I downloaded and installed PAPI library. Still not sure how to use it correctly and what additional things I need, to make it work. I am trying to use it in C. I have this simple program:
int retval;
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT && retval > 0) {
printf("PAPI error: 1\n");
exit(1);
}
if (retval < 0)
printf("PAPI error: 2\n");
retval = PAPI_is_initialized();
if (retval != PAPI_LOW_LEVEL_INITED)
printf("PAPI error: 2\n");
int num_hwcntrs = 0;
if ((num_hwcntrs = PAPI_num_counters()) <= PAPI_OK)
printf("This system has %d available counters. \n", num_hwcntrs);
I have included papi.h library and I am compiling with gcc -lpapi flag. I added library in path so it is able to compile and run, but as a result I get this:
This system has 0 available counters.
Thought initialization seems to work as it doesn't give error code.
Any advice or suggestion would be helpful to determine what I have not done right or missed to run it correctly. I mean, I should have available counters in my system, more precisely I need cache miss and cache hit counters.
I tried to count available counters after I run this another simple program and it gave error code -25:
int numEvents = 2;
long long values[2];
int events[2] = {PAPI_L3_TCA,PAPI_L3_TCM};
printf("PAPI error: %d\n", PAPI_start_counters(events, numEvents));
UPDATE: I just tried to check from terminal hardware information with command: papi_avail | more; and I got this:
Available PAPI preset and user defined events plus hardware information.
PAPI version : 5.7.0.0
Operating system : Linux 4.15.0-45-generic
Vendor string and code : GenuineIntel (1, 0x1)
Model string and code : Intel(R) Core(TM) i5-6200U CPU # 2.30GHz (78, 0x4e)
CPU revision : 3.000000
CPUID : Family/Model/Stepping 6/78/3, 0x06/0x4e/0x03
CPU Max MHz : 2800
CPU Min MHz : 400
Total cores : 4
SMT threads per core : 2
Cores per socket : 2
Sockets : 1
Cores per NUMA region : 4
NUMA regions : 1
Running in a VM : no
Number Hardware Counters : 0
Max Multiplex Counters : 384
Fast counter read (rdpmc): no
PAPI Preset Events
Name Code Avail Deriv Description (Note)
PAPI_L1_DCM 0x80000000 No No Level 1 data cache misses
PAPI_L1_ICM 0x80000001 No No Level 1 instruction cache misses
PAPI_L2_DCM 0x80000002 No No Level 2 data cache misses
PAPI_L2_ICM 0x80000003 No No Level 2 instruction cache misses
.......
So because Number Hardware Counters is 0, I can't use this tool to count cache misses with PAPI's preset events? Is there any configuration that can be useful or should I forget about it till I change my laptop?

Related

Linux kernel development, how to get physical core id?

I am totally beginning in Linux kernel development. I am trying to learn about processes and scheduling (I know what I want to do is not "useful" it is just to learn).
I wrote a syscall which returns me the id of the thread/logical core on which my process is running.
But now, what I would like to do is: write a syscall which returns me the id of the physical core on which my process is running.
I tried to read the task_struct, but I did not find any clue.
I am a lost with all this code. I have no idea where I can start my research, and so on.
I am interested by your methodology. I'm on x86_64 and I'm using Linux 5.6.2.

What you want to do is basically the same thing that /proc/cpuinfo already does:
$ cat /proc/cpuinfo
processor : 0 <== you have this
...
core id : 0 <== you want to obtain this
...
We can therefore take a look at how /proc/cpuinfo does this, in arch/x86/kernel/cpu/proc.c. By analyzing the code a little bit, we see that the CPU information is obtained calling cpu_data():
static void *c_start(struct seq_file *m, loff_t *pos)
{
*pos = cpumask_next(*pos - 1, cpu_online_mask);
if ((*pos) < nr_cpu_ids)
return &cpu_data(*pos); // <=== HERE
return NULL;
}
The cpu_data() macro returns a struct cpuinfo_x86, which contains all the relevant information, and is then used here to print the core ID:
seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
Therefore, in a kernel module, you can do the following:
#include <linux/smp.h> // get_cpu(), put_cpu()
#include <asm/processor.h> // cpu_data(), struct cpuinfo_x86
// ...
static int __init modinit(void)
{
unsigned cpu;
struct cpuinfo_x86 *info;
cpu = get_cpu();
info = &cpu_data(cpu);
pr_info("CPU: %u, core: %d\n", cpu, info->cpu_core_id);
put_cpu(); // Don't forget this!
return 0;
}
Dmesg output on my machine after inserting/removing the module three times:
[14038.937774] cpuinfo: CPU: 1, core: 1
[14041.084951] cpuinfo: CPU: 5, core: 1
[14087.329053] cpuinfo: CPU: 6, core: 2
Which is consistent with the content of my /proc/cpuinfo:
$ cat /proc/cpuinfo | grep -e 'processor' -e 'core id'
processor : 0
core id : 0
processor : 1
core id : 1
processor : 2
core id : 2
processor : 3
core id : 3
processor : 4
core id : 0
processor : 5
core id : 1
processor : 6
core id : 2
processor : 7
core id : 3
Note that as Martin James rightfully pointed out, this information is not very useful since your process could be preempted and moved to a different core by the time your syscall finishes execution. If you want to avoid this you can use the sched_setaffinity() syscall in your userspace program to set the affinity to a single CPU.

How to generate x86 CPU to main memory traffic in C to fill and invalidate TLB buffer on linux machine

I want to generate x86 CPU to Memory traffic on Linux OS (Ubuntu 18) using gcc tool chain that should first fill up tlb (translation look aside buffer) and then cause tlb invalidates as it has already filled up. I have created simple code below but I am not sure if it can achieve the goal of filling up tlb and then invalidating it
#include<stdio.h>
int main()
{
int array[1000];
int i;
long sum = 0;
for(i=0; i < 1000;i++)
{
array[i] = i;
}
for(i=0; i < 1000;i++)
{
sum += array[i]
}
return 0;
}
Here is the processor specific info in case it is useful
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 1
model name : AMD EPYC 7281 16-Core Processor
stepping : 2
microcode : 0x8001227
cpu MHz : 2694.732
cache size : 512 KB
physical id : 0
siblings : 32
core id : 0
cpu cores : 16
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
The crux is to have answers to the following
what triggers tlb invalidates to happen?
How can one test tlb invalidates did happen?

How to get number of cores on linux using c programming?

I know how to get the number of logical cores in C.
sysconf(_SC_NPROCESSORS_CONF);
This will return 4 on my i3 processor. But actually there are only 2 cores in an i3.
How can I get physical core count?

This is a C solution using libcpuid.
cores.c:
#include <stdio.h>
#include <libcpuid.h>
int main(void)
{
struct cpu_raw_data_t raw;
struct cpu_id_t data;
cpuid_get_raw_data(&raw);
cpu_identify(&raw, &data);
printf("No. of Physical Core(s) : %d\n", data.num_cores);
return 0;
}
This is a C++ solution using Boost.
cores.cpp:
// use boost to get number of cores on the processor
// compile with : g++ -o cores cores.cpp -lboost_system -lboost_thread
#include <iostream>
#include <boost/thread.hpp>
int main ()
{
std::cout << "No. of Physical Core(s) : " << boost::thread::physical_concurrency() << std::endl;
std::cout << "No. of Logical Core(s) : " << boost::thread::hardware_concurrency() << std::endl;
return 0;
}
On my desktop (i5 2310) it returns:
No. of Physical Core(s) : 4
No. of Logical Core(s) : 4
While on my laptop (i5 480M):
No. of Physical Core(s) : 2
No. of Logical Core(s) : 4
Meaning that my laptop processor have Hyper-Threading tecnology

Without any lib:
int main()
{
unsigned int eax=11,ebx=0,ecx=1,edx=0;
asm volatile("cpuid"
: "=a" (eax),
"=b" (ebx),
"=c" (ecx),
"=d" (edx)
: "0" (eax), "2" (ecx)
: );
printf("Cores: %d\nThreads: %d\nActual thread: %d\n",eax,ebx,edx);
}
Output:
Cores: 4
Threads: 8
Actual thread: 1

You might simply read and parse /proc/cpuinfo pseudo-file (see proc(5) for details; open that pseudo-file as a text file and read it sequentially line by line; try cat /proc/cpuinfo in a terminal).
The advantage is that you just are parsing a (Linux-specific) text [pseudo-]file (without needing any external libraries, like in Gengisdave's answer), the disadvantage is that you need to parse it (not a big deal, read 80 bytes lines with fgets in a loop then use sscanf and test the scanned item count....)
The ht presence in flags: line means that your CPU has hyper-threading. The number of CPU threads is given by the number of processor: lines. The actual number of physical cores is given by cpu cores: (all this using a 4.1 kernel on my machine).
I am not sure you are right in wanting to understand how many physical cores you have. Hyper-threading may actually be useful. You need to benchmark.
And you probably should make the number of working threads (e.g. the size of your thread pool) in your application be user-configurable. Even on a 4 core hyper-threaded processor, I might want to have no more than 3 running threads (because I want to use the other threads for something else).

#include <stdio.h>
int main(int argc, char **argv)
{
unsigned int lcores = 0, tsibs = 0;
char buff[32];
char path[64];
for (lcores = 0;;lcores++) {
FILE *cpu;
snprintf(path, sizeof(path), "/sys/devices/system/cpu/cpu%u/topology/thread_siblings_list", lcores);
cpu = fopen(path, "r");
if (!cpu) break;
while (fscanf(cpu, "%[0-9]", buff)) {
tsibs++;
if (fgetc(cpu) != ',') break;
}
fclose(cpu);
}
printf("physical cores %u\n", lcores / (tsibs / lcores));
}
thread_siblings_list has a comma delimited list of cores which are "thread siblings" with the current core.
Divide the number of logical cores by the number of siblings to get the siblings per core. Divide the number of logical cores by the siblings per core to get the number of physical cores.

ARM: Start/Wakeup/Bringup the other CPU cores/APs and pass execution start address?

I've been banging my head with this for the last 3-4 days and I can't find a DECENT explanatory documentation (from ARM or unofficial) to help me.
I've got an ODROID-XU board (big.LITTLE 2 x Cortex-A15 + 2 x Cortex-A7) board and I'm trying to understand a bit more about the ARM architecture. In my "experimenting" code I've now arrived at the stage where I want to WAKE UP THE OTHER CORES FROM THEIR WFI (wait-for-interrupt) state.
The missing information I'm still trying to find is:
1. When getting the base address of the memory-mapped GIC I understand that I need to read CBAR; But no piece of documentation explains how the bits in CBAR (the 2 PERIPHBASE values) should be arranged to get to the final GIC base address
2. When sending an SGI through the GICD_SGIR register, what interrupt ID between 0 and 15 should I choose? Does it matter?
3. When sending an SGI through the GICD_SGIR register, how can I tell the other cores WHERE TO START EXECUTION FROM?
4. How does the fact that my code is loaded by the U-BOOT bootloader affect this context?
The Cortex-A Series Programmer's Guide v3.0 (found here: link) states the following in section 22.5.2 (SMP boot in Linux, page 271):
While the primary core is booting, the secondary cores will be held in a standby state, using the
WFI instruction. It (the primary core) will provide a startup address to the secondary cores and wake them using an
Inter-Processor Interrupt(IPI), meaning an SGI signalled through the GIC
How does Linux do that? The documentation-S don't give any other details regarding "It will provide a startup address to the secondary cores".
My frustration is growing and I'd be very grateful for answers.
Thank you very much in advance!
EXTRA DETAILS
Documentation I use:
ARMv7-A&R Architecture Reference Manual
Cortex-A15 TRM (Technical Reference Manual)
Cortex-A15 MPCore TRM
Cortex-A Series Programmer's Guide v3.0
GICv2 Architecture Specification
What I've done by now:
UBOOT loads me at 0x40008000; I've set-up Translation Tables (TTBs), written TTBR0 and TTBCR accordingly and mapped 0x40008000 to 0x8000_0000 (2GB), so I also enabled the MMU
Set-up exception handlers of my own
I've got Printf functionality over the serial (UART2 on ODROID-XU)
All the above seems to work properly.
What I'm trying to do now:
Get the GIC base address => at the moment I read CBAR and I simply AND (&) its value with 0xFFFF8000 and use this as the GIC base address, although I'm almost sure this ain't right
Enable the GIC distributor (at offset 0x1000 from GIC base address?), by writting GICD_CTLR with the value 0x1
Construct an SGI with the following params: Group = 0, ID = 0, TargetListFilter = "All CPUs Except Me" and send it (write it) through the GICD_SGIR GIC register
Since I haven't passed any execution start address for the other cores, nothing happens after all this
....UPDATE....
I've started looking at the Linux kernel and QEMU source codes in search for an answer. Here's what I found out (please correct me if I'm wrong):
When powering up the board ALL THE CORES start executing from the reset vector
A software (firmware) component executes WFI on the secondary cores and some other code that will act as a protocol between these secondary cores and the primary core, when the latter wants to wake them up again
For example, the protocol used on the EnergyCore ECX-1000 (Highbank) board is as follows:
**(1)** the secondary cores enter WFI and when
**(2)** the primary core sends an SGI to wake them up
**(3)** they check if the value at address (0x40 + 0x10 * coreid) is non-null;
**(4)** if it is non-null, they use it as an address to jump to (execute a BX)
**(5)** otherwise, they re-enter standby state, by re-executing WFI
**(6)** So, if I had an EnergyCore ECX-1000 board, I should write (0x40 + 0x10 * coreid) with the address I want each of the cores to jump to and send an SGI
Questions:
1. What is the software component that does this? Is it the BL1 binary I've written on the SD Card, or is it U-BOOT?
2. From what I understand, this software protocol differs from board to board. Is it so, or does it only depend on the underlying processor?
3. Where can I find information about this protocol for a pick-one ARM board? - can I find it on the official ARM website or on the board webpage?

Ok, I'm back baby. Here are the conclusions:
The software component that puts the CPUs to sleep is the bootloader (in my case U-Boot)
Linux somehow knows how the bootloader does this (hardcoded in the Linux kernel for each board) and knows how to wake them up again
For my ODROID-XU board the sources describing this process are UBOOT ODROID-v2012.07 and the linux kernel found here: LINUX ODROIDXU-3.4.y (it would have been better if I looked into kernel version from the branch odroid-3.12.y since the former doesn't start all of the 8 processors, just 4 of them but the latter does).
Anyway, here's the source code I've come up with, I'll post the relevant source files from the above source code trees that helped me writing this code afterwards:
typedef unsigned int DWORD;
typedef unsigned char BOOLEAN;
#define FAILURE (0)
#define SUCCESS (1)
#define NR_EXTRA_CPUS (3) // actually 7, but this kernel version can't wake them up all -> check kernel version 3.12 if you need this
// Hardcoded in the kernel and in U-Boot; here I've put the physical addresses for ease
// In my code (and in the linux kernel) these addresses are actually virtual
// (thus the 'VA' part in S5P_VA_...); note: mapped with memory type DEVICE
#define S5P_VA_CHIPID (0x10000000)
#define S5P_VA_SYSRAM_NS (0x02073000)
#define S5P_VA_PMU (0x10040000)
#define EXYNOS_SWRESET ((DWORD) S5P_VA_PMU + 0x0400)
// Other hardcoded values
#define EXYNOS5410_REV_1_0 (0x10)
#define EXYNOS_CORE_LOCAL_PWR_EN (0x3)
BOOLEAN BootAllSecondaryCPUs(void* CPUExecutionAddress){
// 1. Get bootBase (the address where we need to write the address where the woken CPUs will jump to)
// and powerBase (we also need to power up the cpus before waking them up (?))
DWORD bootBase, powerBase, powerOffset, clusterID;
asm volatile ("mrc p15, 0, %0, c0, c0, 5" : "=r" (clusterID));
clusterID = (clusterID >> 8);
powerOffset = 0;
if( (*(DWORD*)S5P_VA_CHIPID & 0xFF) < EXYNOS5410_REV_1_0 )
{
if( (clusterID & 0x1) == 0 ) powerOffset = 4;
}
else if( (clusterID & 0x1) != 0 ) powerOffset = 4;
bootBase = S5P_VA_SYSRAM_NS + 0x1C;
powerBase = (S5P_VA_PMU + 0x2000) + (powerOffset * 0x80);
// 2. Power up each CPU, write bootBase and send a SEV (they are in WFE [wait-for-event] standby state)
for (i = 1; i <= NR_EXTRA_CPUS; i++)
{
// 2.1 Power up this CPU
powerBase += 0x80;
DWORD powerStatus = *(DWORD*)( (DWORD) powerBase + 0x4);
if ((powerStatus & EXYNOS_CORE_LOCAL_PWR_EN) == 0)
{
*(DWORD*) powerBase = EXYNOS_CORE_LOCAL_PWR_EN;
for (i = 0; i < 10; i++) // 10 millis timeout
{
powerStatus = *(DWORD*)((DWORD) powerBase + 0x4);
if ((powerStatus & EXYNOS_CORE_LOCAL_PWR_EN) == EXYNOS_CORE_LOCAL_PWR_EN)
break;
DelayMilliseconds(1); // not implemented here, if you need this, post a comment request
}
if ((powerStatus & EXYNOS_CORE_LOCAL_PWR_EN) != EXYNOS_CORE_LOCAL_PWR_EN)
return FAILURE;
}
if ( (clusterID & 0x0F) != 0 )
{
if ( *(DWORD*)(S5P_VA_PMU + 0x0908) == 0 )
do { DelayMicroseconds(10); } // not implemented here, if you need this, post a comment request
while (*(DWORD*)(S5P_VA_PMU + 0x0908) == 0);
*(DWORD*) EXYNOS_SWRESET = (DWORD)(((1 << 20) | (1 << 8)) << i);
}
// 2.2 Write bootBase and execute a SEV to finally wake up the CPUs
asm volatile ("dmb" : : : "memory");
*(DWORD*) bootBase = (DWORD) CPUExecutionAddress;
asm volatile ("isb");
asm volatile ("\n dsb\n sev\n nop\n");
}
return SUCCESS;
}
This successfully wakes 3 of 7 of the secondary CPUs.
And now for that short list of relevant source files in u-boot and the linux kernel:
UBOOT: lowlevel_init.S - notice lines 363-369, how the secondary CPUs wait in a WFE for the value at _hotplug_addr to be non-zeroed and to jump to it; _hotplug_addr is actually bootBase in the above code; also lines 282-285 tell us that _hotplug_addr is to be relocated at CONFIG_PHY_IRAM_NS_BASE + _hotplug_addr - nscode_base (_hotplug_addr - nscode_base is 0x1C and CONFIG_PHY_IRAM_NS_BASE is 0x02073000, thus the above hardcodings in the linux kernel)
LINUX KERNEL: generic - smp.c (look at function __cpu_up), platform specific (odroid-xu): platsmp.c (function boot_secondary, called by generic __cpu_up; also look at platform_smp_prepare_cpus [at the bottom] => that's the function that actually sets the boot base and power base values)

For clarity and future reference, there's a subtle piece of information missing here thanks to the lack of proper documentation of the Exynos boot protocol (n.b. this question should really be marked "Exynos 5" rather than "Cortex-A15" - it's a SoC-specific thing and what ARM says is only a general recommendation). From cold boot, the secondary cores aren't in WFI, they're still powered off.
The simpler minimal solution (based on what Linux's hotplug does), which I worked out in the process of writing a boot shim to get a hypervisor running on the XU, takes two steps:
First write the entry point address to the Exynos holding pen (0x02073000 + 0x1c)
Then poke the power controller to switch on the relevant core(s): That way, they drop out of the secure boot path into the holding pen to find the entry point waiting for them, skipping the WFI loop and obviating the need to even touch the GIC at all.
Unless you're planning a full-on CPU hotplug implementation you can skip checking the cluster ID - if we're booting, we're on cluster 0 and nowhere else (the check for pre-production chips with backwards cluster registers should be unnecessary on the Odroid too - certainly was for me).
From my investigation, firing up the A7s is a little more involved. Judging from the Exynos big.LITTLE switcher driver, it seems you need to poke a separate set of power controller registers to enable cluster 1 first (and you may need to mess around with the CCI too, especially to have the MMUs and caches on) - I didn't get further since by that point it was more "having fun" than "doing real work"...
As an aside, Samsung's mainline patch for CPU hotplug on the 5410 makes the core power control stuff rather clearer than the mess in their downstream code, IMO.

QEMU uses PSCI
The ARM Power State Coordination Interface (PSCI) is documented at: https://developer.arm.com/docs/den0022/latest/arm-power-state-coordination-interface-platform-design-document and controls things such as powering on and off of cores.
TL;DR this is the aarch64 snippet to wake up CPU 1 on QEMU v3.0.0 ARMv8 aarch64:
/* PSCI function identifier: CPU_ON. */
ldr w0, =0xc4000003
/* Argument 1: target_cpu */
mov x1, 1
/* Argument 2: entry_point_address */
ldr x2, =cpu1_entry_address
/* Argument 3: context_id */
mov x3, 0
/* Unused hvc args: the Linux kernel zeroes them,
* but I don't think it is required.
*/
hvc 0
and for ARMv7:
ldr r0, =0x84000003
mov r1, #1
ldr r2, =cpu1_entry_address
mov r3, #0
hvc 0
A full runnable example with a spinlock is available on the ARM section of this answer: What does multicore assembly language look like?
The hvc instruction then gets handled by an EL2 handler, see also: the ARM section of: What are Ring 0 and Ring 3 in the context of operating systems?
Linux kernel
In Linux v4.19, that address is informed to the Linux kernel through the device tree, QEMU for example auto-generates an entry of form:
psci {
method = "hvc";
compatible = "arm,psci-0.2", "arm,psci";
cpu_on = <0xc4000003>;
migrate = <0xc4000005>;
cpu_suspend = <0xc4000001>;
cpu_off = <0x84000002>;
};
The hvc instruction is called from: https://github.com/torvalds/linux/blob/v4.19/drivers/firmware/psci.c#L178
static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
which ends up going to: https://github.com/torvalds/linux/blob/v4.19/arch/arm64/kernel/smccc-call.S#L51

Go to www.arm.com and download there evaluation copy of DS-5 developement suite. Once installed, under the examples there will be a startup_Cortex-A15MPCore directory. Look at startup.s.

How to access CPU's heat sensors?

I am working on software in which I need to access the temperature sensors in the CPU and get control over them.
I don't know much hardware interfacing; I just know how to interface with the mouse. I have googled a lot about it but failed to find any relevant information or piece of code.
I really need to add this in my software. Please guide me how to have the control over the sensors using C or C++ or ASM.

Without a specific kernel driver, it's difficult to query the temperature, other than through WMI. Here is a piece of C code that does it, based on WMI's MSAcpi_ThermalZoneTemperature class:
HRESULT GetCpuTemperature(LPLONG pTemperature)
{
if (pTemperature == NULL)
return E_INVALIDARG;
*pTemperature = -1;
HRESULT ci = CoInitialize(NULL); // needs comdef.h
HRESULT hr = CoInitializeSecurity(NULL, -1, NULL, NULL, RPC_C_AUTHN_LEVEL_DEFAULT, RPC_C_IMP_LEVEL_IMPERSONATE, NULL, EOAC_NONE, NULL);
if (SUCCEEDED(hr))
{
IWbemLocator *pLocator; // needs Wbemidl.h & Wbemuuid.lib
hr = CoCreateInstance(CLSID_WbemAdministrativeLocator, NULL, CLSCTX_INPROC_SERVER, IID_IWbemLocator, (LPVOID*)&pLocator);
if (SUCCEEDED(hr))
{
IWbemServices *pServices;
BSTR ns = SysAllocString(L"root\\WMI");
hr = pLocator->ConnectServer(ns, NULL, NULL, NULL, 0, NULL, NULL, &pServices);
pLocator->Release();
SysFreeString(ns);
if (SUCCEEDED(hr))
{
BSTR query = SysAllocString(L"SELECT * FROM MSAcpi_ThermalZoneTemperature");
BSTR wql = SysAllocString(L"WQL");
IEnumWbemClassObject *pEnum;
hr = pServices->ExecQuery(wql, query, WBEM_FLAG_RETURN_IMMEDIATELY | WBEM_FLAG_FORWARD_ONLY, NULL, &pEnum);
SysFreeString(wql);
SysFreeString(query);
pServices->Release();
if (SUCCEEDED(hr))
{
IWbemClassObject *pObject;
ULONG returned;
hr = pEnum->Next(WBEM_INFINITE, 1, &pObject, &returned);
pEnum->Release();
if (SUCCEEDED(hr))
{
BSTR temp = SysAllocString(L"CurrentTemperature");
VARIANT v;
VariantInit(&v);
hr = pObject->Get(temp, 0, &v, NULL, NULL);
pObject->Release();
SysFreeString(temp);
if (SUCCEEDED(hr))
{
*pTemperature = V_I4(&v);
}
VariantClear(&v);
}
}
}
if (ci == S_OK)
{
CoUninitialize();
}
}
}
return hr;
}
and some test code:
HRESULT GetCpuTemperature(LPLONG pTemperature);
int _tmain(int argc, _TCHAR* argv[])
{
LONG temp;
HRESULT hr = GetCpuTemperature(&temp);
printf("hr=0x%08x temp=%i\n", hr, temp);
}

I assume you are interested in a IA-32 (Intel Architecture, 32-bit) CPU and Microsoft Windows.
The Model Specific Register (MSR) IA32_THERM_STATUS has 7 bits encoding the "Digital Readout (bits 22:16, RO) — Digital temperature reading in 1 degree Celsius relative to the TCC activation temperature." (see "14.5.5.2 Reading the Digital Sensor" in "Intel® 64 and IA-32 Architectures - Software Developer’s Manual - Volume 3 (3A & 3B): System Programming Guide" http://www.intel.com/Assets/PDF/manual/325384.pdf).
So IA32_THERM_STATUS will not give you the "CPU temperature" but some proxy for it.
In order to read the IA32_THERM_STATUS register you use the asm instruction rdmsr, now rdmsr cannot be called from user space code and so you need some kernel space code (maybe a device driver?).
You can also use the intrinsic __readmsr (see http://msdn.microsoft.com/en-us/library/y55zyfdx(v=VS.100).aspx) which has anyway the same limitation: "This function is only available in kernel mode".
Every CPU cores has its own Digital Thermal Sensors (DTS) and so some more code is needed to get all the temperatures (maybe with the affinity mask? see Win32 API SetThreadAffinityMask).
I did some tests and actually found a correlation between the IA32_THERM_STATUS DTS readouts and the Prime95 "In-place large FFTs (maximum heat, power consumption, some RAM tested)" test. Prime95 is ftp://mersenne.org/gimps/p95v266.zip
I did not find a formula to get the "CPU temperature" (whatever that may mean) from the DTS readout.
Edit:
Quoting from an interesting post TJunction Max? #THERMTRIP? #PROCHOT? by "fgw" (December 2007):
there is no way to find tjmax of a certain processor in any register.
thus no software can read this value. what various software developers
are doing, is they simply assume a certain tjunction for a certain
processor and hold this information in a table within the program.
besides that, tjmax is not even the correct value they are after. in
fact they are looking for TCC activacion temperature threshold. this
temperature threshold is used to calculate current absolute
coretemperatures from. theoretical you can say: absolute
coretemperature = TCC activacion temperature threshold - DTS i had to
say theoretically because, as stated above, this TCC activacion
temperature threshold cant be read by software and has to be assumed
by the programmer. in most situations (coretemp, everest, ...) they
assume a value of 85C or 100C depending on processor family and
revision. as this TCC activacion temperature threshold is calibrated
during manufacturing individually per processor, it could be 83C for
one processor but may be 87C for the other. taking into account the
way those programms are calculating coretemperatures, you can figure
out at your own, how accurate absolute coretemperatures are! neither
tjmax nor the "most wanted" TCC activacion temperature threshold can
be found in any public intel documents. following some discussions
over on the intel developer forum, intel shows no sign to make this
information available.

You can read it from the MSAcpi_ThermalZoneTemperature in WMI
Using WMI from C++ is a bit involved, see MSDN explanantion and examples
note: changed original unhelpful answer

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight