Cannot add perf event into group with existing leader - c

I'm trying to add perf event into group of events with an existing leader using the following code snippet (as example):
struct perf_event_attr perf_event;
int cpu_fd = -1;
int process_fd = -1;
memset(&perf_event, 0, sizeof(perf_event));
perf_event.type = PERF_TYPE_HARDWARE;
perf_event.size = sizeof(perf_event);
perf_event.config = config; // config (PERF_COUNT_HW_CPU_CYCLES) is passed as a function's argument
perf_event.use_clockid = 1;
perf_event.clockid = CLOCK_MONOTONIC;
perf_event.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_GROUP;
// First step:
// Create leader of perf events group
// cpu_idx is passed as an function's argument
cpu_fd = perf_event_open(&perf_event, -1, cpu_idx, -1,
PERF_FLAG_FD_CLOEXEC);
if (cpu_fd < 0) {
printf("Unable to open new CPU PMU event.\n");
return STATUS_NOK;
}
memset(&perf_event, 0, sizeof(perf_event));
perf_event.type = PERF_TYPE_HARDWARE;
perf_event.size = sizeof(perf_event);
perf_event.config = config; // same as above (PERF_COUNT_HW_CPU_CYCLES)
perf_event.use_clockid = 1;
perf_event.clockid = CLOCK_MONOTONIC;
perf_event.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED;
// Second step:
// Attempting to add new perf event for account current process CPU cycles
// in existing group
// cpu_idx is passed as an function's argument
process_fd = perf_event_open(&perf_event, getpid(), cpu_idx, cpu_fd,
PERF_FLAG_FD_CLOEXEC);
if (process_fd < 0) {
printf("Unable to open new process PMU event.\n");
return STATUS_NOK;
}
On first step leader perf event is created for accounting the total number of CPU cycles on required CPU with cpu_idx.
On the second step there is an attempt to add new perf event to existing leader in order to account current process CPU cycles on required CPU with cpu_idx.
But for unknown reason creation of new process_fd perf descriptor always fails with errno EINVAL (Invalid argument). I tried to investigate root cause with syscall tracing in the kernel but it seems I completely stuck.
Could you please help to find out root cause of such issue?
These are many possible reasons for EINVAL case returned from perf_event_open syscall, I appreciate any suggestions or ideas for debugging.
Thanks a lot for your help!

Related

Vulkan - Asynchronous Texture Upload - Image Transition Issue

I'm using the transfer queue to upload data to GPU local memory to be used by the graphics queue. I believe I need 3 barriers, one to release the texture object from the transfer queue, one to acquire it on the graphics queue, and one transition it from TRANSFER_DST_OPTIMAL to SHADER_READ_ONLY_OPTIMAL. I think my barriers are what's incorrect as this is the error I get and also, I see the correct rendered output as I'm on Nvidia hardware. Is there any synchronization missing?
UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout(ERROR / SPEC): msgNum: 1303270965 -
Validation Error: [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object 0:
handle = 0x562696461ca0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x4dae5635 |
Submitted command buffer expects VkImage 0x1c000000001c[] (subresource: aspectMask 0x1 array
layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL--instead,
current layout is VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL.
I believe what I'm doing wrong is not properly specifying stageMasks
VkImageMemoryBarrier tex_barrier = {0};
/* layout transition - UNDEFINED -> TRANSFER_DST */
tex_barrier.srcAccessMask = 0;
tex_barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
tex_barrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
tex_barrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
tex_barrier.srcQueueFamilyIndex = -1;
tex_barrier.dstQueueFamilyIndex = -1;
tex_barrier.subresourceRange = (VkImageSubresourceRange) { VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1 };
vkCmdPipelineBarrier(transfer_cmdbuffs[0],
VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
VK_PIPELINE_STAGE_TRANSFER_BIT,
0,
0, NULL, 0, NULL, 1, &tex_barrier);
/* queue ownership transfer */
tex_barrier.srcAccessMask = 0;
tex_barrier.dstAccessMask = 0;
tex_barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
tex_barrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
tex_barrier.srcQueueFamilyIndex = device.transfer_queue_family_index;
tex_barrier.dstQueueFamilyIndex = device.graphics_queue_family_index;
vkCmdPipelineBarrier(transfer_cmdbuffs[0],
VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_TRANSFER_BIT,
0,
0, NULL, 0, NULL, 1, &tex_barrier);
tex_barrier.srcAccessMask = 0;
tex_barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
tex_barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
tex_barrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
tex_barrier.srcQueueFamilyIndex = device.transfer_queue_family_index;
tex_barrier.dstQueueFamilyIndex = device.graphics_queue_family_index;
vkCmdPipelineBarrier(transfer_cmdbuffs[0],
VK_PIPELINE_STAGE_TRANSFER_BIT,
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
0,
0, NULL, 0, NULL, 1, &tex_barrier);
Doing an ownership transfer is a 2-way process: the source of the transfer has to release the resource, and the receiver has to acquire it. And by "the source" and "the receiver", I mean the queues themselves. You can't merely give a queue take ownership of a resource; that queue must issue a command to claim ownership of it.
You need to submit a release barrier operation on the source queue. It must specify the source queue family as well as the destination queue family. Then, you have to submit an acquire barrier operation on the receiving queue, using the same source and destination. And you must ensure the order of these operations via a semaphore. So the vkQueueSubmit call for the acquire has to wait on the semaphore from the submission of the release operation (a timeline semaphore would work too).
Now, since these are pipeline/memory barriers, you are free to also specify a layout transition. You don't need a third barrier to change the layout, but both barriers have to specify the same source/destination layouts for the acquire/release operation.

Windows Filtering Platform: ClassifyFn BSOD at DISPATCH_LEVEL

I'm trying to implement a simple firewall which filters network connections made by Windows processes.
The firewall should either allow/block the connection.
In order to intercept connections by any process, I created a kernel driver which makes use of Windows Filtering Platform.
I registered a ClassifyFn (FWPS_CALLOUT_CLASSIFY_FN1) callback at the filtering layer FWPM_LAYER_ALE_AUTH_CONNECT_V4:
FWPM_CALLOUT m_callout = { 0 };
m_callout.applicableLayer = FWPM_LAYER_ALE_AUTH_CONNECT_V4;
...
status = FwpmCalloutAdd(filter_engine_handle, &m_callout, NULL, NULL);
The decision regarding connection allow/block should be taken by userlevel.
I communicate with Userlevel using FltSendMessage,
which cannot be used at IRQL DISPATCH_LEVEL.
Following the instructions of the Microsoft documentation regarding how to process callouts asynchronously,
I do call FwpsPendOperation0 before calling FltSendMessage.
After the call to FltSendMessage, I resume packet processing by calling FwpsCompleteOperation0.
FwpsPendOperation0 documentation states that calling this function should make possible to operate calls at PASSIVE_LEVEL:
A callout can pend the current processing operation on a packet when
the callout must perform processing on one of these layers that may
take a long interval to complete or that should occur at IRQL =
PASSIVE_LEVEL if the current IRQL > PASSIVE_LEVEL.
However, when the ClassifyFn callback is called at DISPATCH_LEVEL, I do sometimes still get a BSOD on FltSendMessage (INVALID_PROCESS_ATTACH_ATTEMPT).
I don't understand what's wrong.
Thank you in advance for any advice which could point me to the right direction.
Here is the relevant code of the ClassifyFn callback:
/*************************
ClassifyFn Function
**************************/
void example_classify(
const FWPS_INCOMING_VALUES * inFixedValues,
const FWPS_INCOMING_METADATA_VALUES * inMetaValues,
void * layerData,
const void * classifyContext,
const FWPS_FILTER * filter,
UINT64 flowContext,
FWPS_CLASSIFY_OUT * classifyOut)
{
NTSTATUS status;
BOOLEAN bIsReauthorize = FALSE;
BOOLEAN SafeToOpen = TRUE; // Value returned by userlevel which signals to allow/deny packet
classifyOut->actionType = FWP_ACTION_PERMIT;
remote_address = inFixedValues->incomingValue[FWPS_FIELD_ALE_AUTH_CONNECT_V4_IP_REMOTE_ADDRESS].value.uint32;
remote_port = inFixedValues->incomingValue[FWPS_FIELD_ALE_AUTH_CONNECT_V4_IP_REMOTE_PORT].value.uint16;
bIsReauthorize = IsAleReauthorize(inFixedValues);
if (!bIsReauthorize)
{
// First time receiving packet (not a reauthorized packet)
// Communicate with userlevel asynchronously
HANDLE hCompletion;
status = FwpsPendOperation0(inMetaValues->completionHandle, &hCompletion);
//
// FltSendMessage call here
// ERROR HERE:
// INVALID_PROCESS_ATTACH_ATTEMP BSOD on FltMessage call when at IRQL DISPATCH_LEVEL
//
FwpsCompleteOperation0(hCompletion, NULL);
}
if (!SafeToOpen) {
// Packet blocked
classifyOut->actionType = FWP_ACTION_BLOCK;
}
else {
// Packet allowed
}
return;
}
You need to invoke FltSendMessage() on another thread running at PASSIVE_LEVEL. You can use IoQueueWorkItem() or implement your own mechanism to process it on a system worker thread created via PsCreateSystemThread().

Windows WFP Driver: Getting BSOD when processing packets in ClassifyFn callback

I am trying to code a simple firewall application which can allow or block network connection attempts made from userlevel processes.
To do so, following the WFPStarterKit tutorial, I created a WFP Driver which is set to intercept data at FWPM_LAYER_OUTBOUND_TRANSPORT_V4 layer.
The ClassifyFn callback function is responsible for intercepting the connection attempt, and either allow or deny it.
Once the ClassifyFn callback gets hit, the ProcessID of the packet is sent, along with a few other info, to a userlevel process through the FltSendMessage function.
The userlevel process receives the message, checks the ProcessID, and replies a boolean allow/deny command to the driver.
While this approach works when blocking a first packet, in some cases (expecially when allowing multiple packets) the code generates a BSOD with the INVALID_PROCESS_ATTACH_ATTEMPT error code.
The error is triggered at the call to FltSendMessage.
While I am still unable to pinpoint the exact problem,
it seems that making the callout thread wait (through FltSendMessage) for a reply from userlevel can generate this BSOD error on some conditions.
I would be very grateful if you can point me to the right direction.
Here is the function where I register the callout:
NTSTATUS register_example_callout(DEVICE_OBJECT * wdm_device)
{
NTSTATUS status = STATUS_SUCCESS;
FWPS_CALLOUT s_callout = { 0 };
FWPM_CALLOUT m_callout = { 0 };
FWPM_DISPLAY_DATA display_data = { 0 };
if (filter_engine_handle == NULL)
return STATUS_INVALID_HANDLE;
display_data.name = EXAMPLE_CALLOUT_NAME;
display_data.description = EXAMPLE_CALLOUT_DESCRIPTION;
// Register a new Callout with the Filter Engine using the provided callout functions
s_callout.calloutKey = EXAMPLE_CALLOUT_GUID;
s_callout.classifyFn = example_classify;
s_callout.notifyFn = example_notify;
s_callout.flowDeleteFn = example_flow_delete;
status = FwpsCalloutRegister((void *)wdm_device, &s_callout, &example_callout_id);
if (!NT_SUCCESS(status)) {
DbgPrint("Failed to register callout functions for example callout, status 0x%08x", status);
goto Exit;
}
// Setup a FWPM_CALLOUT structure to store/track the state associated with the FWPS_CALLOUT
m_callout.calloutKey = EXAMPLE_CALLOUT_GUID;
m_callout.displayData = display_data;
m_callout.applicableLayer = FWPM_LAYER_OUTBOUND_TRANSPORT_V4;
m_callout.flags = 0;
status = FwpmCalloutAdd(filter_engine_handle, &m_callout, NULL, NULL);
if (!NT_SUCCESS(status)) {
DbgPrint("Failed to register example callout, status 0x%08x", status);
}
else {
DbgPrint("Example Callout Registered");
}
Exit:
return status;
}
Here is the callout function:
/*************************
ClassifyFn Function
**************************/
void example_classify(
const FWPS_INCOMING_VALUES * inFixedValues,
const FWPS_INCOMING_METADATA_VALUES * inMetaValues,
void * layerData,
const void * classifyContext,
const FWPS_FILTER * filter,
UINT64 flowContext,
FWPS_CLASSIFY_OUT * classifyOut)
{
UNREFERENCED_PARAMETER(layerData);
UNREFERENCED_PARAMETER(classifyContext);
UNREFERENCED_PARAMETER(flowContext);
UNREFERENCED_PARAMETER(filter);
UNREFERENCED_PARAMETER(inMetaValues);
NETWORK_ACCESS_QUERY AccessQuery;
BOOLEAN SafeToOpen = TRUE;
classifyOut->actionType = FWP_ACTION_PERMIT;
AccessQuery.remote_address = inFixedValues->incomingValue[FWPS_FIELD_OUTBOUND_TRANSPORT_V4_IP_REMOTE_ADDRESS].value.uint32;
AccessQuery.remote_port = inFixedValues->incomingValue[FWPS_FIELD_OUTBOUND_TRANSPORT_V4_IP_REMOTE_PORT].value.uint16;
// Get Process ID
AccessQuery.ProcessId = (UINT64)PsGetCurrentProcessId();
if (!AccessQuery.ProcessId)
{
return;
}
// Here we connect to our userlevel application using FltSendMessage.
// Some checks are done and the SafeToOpen variable is populated with a BOOLEAN which indicates if to allow or block the packet.
// However, sometimes, a BSOD is generated with an INVALID_PROCESS_ATTACH_ATTEMPT error on the FltSendMessage call
QueryUserLevel(QUERY_NETWORK, &AccessQuery, sizeof(NETWORK_ACCESS_QUERY), &SafeToOpen, NULL, 0);
if (!SafeToOpen) {
classifyOut->actionType = FWP_ACTION_BLOCK;
}
return;
}
WFP drivers communicate to user-mode applications using the inverted call model. In this method, you keep an IRP from the user-mode pending at your kernel-mode driver instance and whenever you want to send data back to the user-mode you complete the IRP along with the data you want to send back.
The problem was that sometimes the ClassifyFn callback function can be called at IRQL DISPATCH_LEVEL.
FltSendMessage does not support DISPATCH_LEVEL, as it can only be run at IRQL <= APC_LEVEL.
Running at DISPATCH_LEVEL can cause this function to generate a BSOD.
I solved the problem by invoking FltSendMessage from a worker thread which runs at IRQL PASSIVE_LEVEL.
The worker thread can be created using IoQueueWorkItem.

Suspend/Resume all user processes - Is that possible?

I have PC's with a lot of applications running at once, i was thinking is it possible to SUSPEND all applications, i want to do that to run periodically one other application that is using a lot the CPU so want it to have all the processor time.
The thing is i want to suspend all applications run my thing that uses the CPU a lot, then when my thingy exit, to resume all applications and all work to be resumed fine....
Any comments are welcome.
It's possible but not recommended at all.
Set the process and thread priority so your application will be given a larger slice of the CPU.
This also means it won't kill the desktop, any network connections, antivirus, start menu, the window manager, etc as your method will.
You could possibly keep a list that you yourself manually generate of programs that are too demanding (say, for (bad) example, Steam.exe, chrome.exe, 90GB-video-game.exe, etc). Basically, you get the entire list of all running processes, search that list for all of the blacklisted names, and NtSuspendProcess/NtResumeProcess (should you need to allow it to run again in the future).
I don't believe suspending all user processes is a good idea. Much of those are weirdly protected and probably should remain running, anyway, and it's an uphill battle with very little to gain.
As mentioned in another answer, you can of course just adjust your processes priority up if you have permission to do so. This sorts the OS-wide process list in favor of your process, so you get CPU time first.
Here's an example of something similar to your original request. I'm writing a program in C++ that needed this exact feature, so I figured I'd help out. This will find Steam.exe or chrome.exe, and suspend the first one it finds for 10 seconds.. then will resume it. This will show as "not responding" on Windows if you try to interact with the window whilst it's suspended. Some applications may not like being suspended, YMMV.
/*Find, suspend, resume Win32 C++
*Written by jimmio92. No rights reserved. Public domain.
*NO WARRANTY! NO LIABILITY! (obviously)
*/
#include <windows.h>
#include <psapi.h>
typedef LONG (NTAPI *NtSuspendProcess)(IN HANDLE ProcessHandle);
typedef LONG (NTAPI *NtResumeProcess)(IN HANDLE ProcessHandle);
NtSuspendProcess dSuspendProcess = nullptr;
NtResumeProcess dResumeProcess = nullptr;
int get_the_pid() {
DWORD procs[4096], bytes;
int out = -1;
if(!EnumProcesses(procs, sizeof(procs), &bytes)) {
return -1;
}
for(size_t i = 0; i < bytes/sizeof(DWORD); ++i) {
TCHAR name[MAX_PATH] = "";
HMODULE mod;
HANDLE p = nullptr;
bool found = false;
p = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, procs[i]);
if(p == nullptr)
continue;
DWORD unused_bytes_for_all_modules = 0;
if(EnumProcessModules(p, &mod, sizeof(mod), &unused_bytes_for_all_modules)) {
GetModuleBaseName(p, mod, name, sizeof(name));
//change this to use an array of names or whatever fits your need better
if(strcmp(name, "Steam.exe") == 0 || strcmp(name, "chrome.exe") == 0) {
out = procs[i];
found = true;
}
}
CloseHandle(p);
if(found) break;
}
return out;
}
void suspend_process_by_id(int pid) {
HANDLE h = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
if(h == nullptr)
return;
dSuspendProcess(h);
CloseHandle(h);
}
void resume_process_by_id(int pid) {
HANDLE h = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
if(h == nullptr)
return;
dResumeProcess(h);
CloseHandle(h);
}
void init() {
//load NtSuspendProcess from ntdll.dll
HMODULE ntmod = GetModuleHandle("ntdll");
dSuspendProcess = (NtSuspendProcess)GetProcAddress(ntmod, "NtSuspendProcess");
dResumeProcess = (NtResumeProcess)GetProcAddress(ntmod, "NtResumeProcess");
}
int main() {
init();
int pid = get_the_pid();
if(pid < 0) {
printf("Steam.exe and chrome.exe not found");
}
suspend_process_by_id(pid);
//wait ten seconds for demonstration purposes
Sleep(10000);
resume_process_by_id(pid);
return 0;
}

select() times out immediately after long runtime (C++)

Most of the time this code works just fine. But sometimes when the executable has been running for a while, select() appears to time out immediately, then get into a weird state where it keeps getting called, timing out immediately, over and over. Then it has to be killed from the outside.
My guess would be that the way that standard input changes overtime is at fault - that is what select is blocking on.
Looking around on StackOverflow, most of people's select() troubles seem to be solved by making sure to reset with the macros (FD_ZERO & FD_SET) every time and using the right initial parameter to select. I don't think those are the issues here.
int rc = 0;
fd_set fdset;
struct timeval timeout;
// -- clear out the response -- //
readValue = "";
// -- set the timeout -- //
timeout.tv_sec = passedInTimeout; // 5 seconds
timeout.tv_usec = 0;
// -- indicate which file descriptors to select from -- //
FD_ZERO(&fdset);
FD_SET(passedInFileDescriptor, &fdset); //passedInFileDescriptor = 0
// -- perform the selection operation, with timeout -- //
rc = select(1, &fdset, NULL, NULL, &timeout);
if (rc == -1) // -- select failed -- //
{
result = TR_ERROR;
}
else if (rc == 0) // -- select timed out -- //
{
result = TR_TIMEDOUT;
}
else
{
if (FD_ISSET(mFileDescriptor, &fdset))
{
if(rc = readData(readValue) <= 0)
{
result = TR_ERROR;
}
} else {
result = TR_SUCCESS;
}
}
Beware that some implementaions of "select" apply strictly the specification:
"nfds is the highest-numbered file descriptor in any of the three sets, plus 1".
So, you'd better to change "1" with "passedInFileDescriptor+1" as first parameter.
I don't know if this can solve your problem, but at least your code becomes more... uhm... "traditional" ;)
Bye
On some OSes, timeout is modified when calling select to reflect the amount of time not slept. It doesn't look like you're reusing timeout in your example, but make sure that you are indeed reinitializing it to 5 seconds every time before calling select.
I'm having the same problem, it works fine on windows but not on linux and I have the maxfd set to last socket + 1. It occurs periodically after long runs. I pick up the connection on accept and then the first call to select periodically times out.
Look at this code:
if (FD_ISSET(mFileDescriptor, &fdset))
{
if(rc = readData(readValue) <= 0)
{
result = TR_ERROR;
}
} else {
result = TR_SUCCESS;
}
There are two things bothering me here:
if your FD has no data in it (like, say, an error occured),
FD_ISSET() will return false and your function returns
TR_SUCCESS !?
you FD_SET(passedInFileDescriptor, &fdset), but check on another
value: FD_ISSET(mFileDescriptor, &fdset). If mFileDescriptor !=
passedInFileDescriptor at some point, you'll fall into my first
assumption.
It should be looking like this:
if (FD_ISSET(passedInFileDescriptor, &fdset))
{
if(rc = readData(readValue) <= 0)
{
result = TR_ERROR;
}
else
{
result = TR_SUCCESS;
}
}
else
{
result = TR_ERROR;
}
No?
(Edit: also, this answer also points the problem of your use of select() with a bad high_fd value)
Another edit: well, looks like the guys never came back... frustrating.

Resources