I am trying to figure out how to write a integer value to the end of my file. The value is size.
DWORD size = 12314432;
BOOL ret = WriteFile(hFile, size, sizeof(DWORD), NULL, NULL);
However WriteFile() requires that parameter 3 be of type LPCVOID so I am not sure how I would give it the DWORD instead.
I have tried..
unsigned char b[sizeof(DWORD)] = {0};
sprintf(b, "%d", size);
WriteFile(hFile, b, sizeof(DWORD), NULL, NULL);
However this just puts the hex value of each digit. So if size=1234 then it would write "31 32 33 44" to end of the file.
I would like the end of the file to just get the number in 4 bytes.
You provide the address of the DWORD like this:
DWORD size = 12314432;
BOOL ret = WriteFile(hFile, &size, sizeof size, NULL, NULL);
Use the ampersand to get the address of the DWORD variable.
Express your intend for the size, you want to write all of size, so say so.
You can pass (void*)&size. You do need something which has an address, like size. You can't pass an expression there. (void*) &(5*7+3) won't work.
I'd create a map to store just one element (a port number) and it should be read/written both from userspace and kernelspace.
Which map type should I use? Which size for key and value is appropriate and how can I write/read from both sides?
_user.c
/* create array map with one element */
map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key), sizeof(value), 1, 0);
...
/* update map */
ret = bpf_map_update_elem(map_fd, &key, &i, BPF_ANY);
_kern.c
How can I refer to map_fd and operate on the same map?
EDIT:
I could successfully create and interact with the map only in one way:
defining the map within _kern.c file, as follow:
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(uint32_t),
.value_size = sizeof(uint32_t),
.max_entries = 1,
};
That definition allows to operate directly on the map using bpf helpers like bpf_map_lookup_elem.
Instead within _user.c after loading the _kern.o ebpf program into the kernel through bpf_prog_load I used
map_fd = bpf_object__find_map_fd_by_name(obj, "my_map");
to retrieve the file descriptor associated with the map (I was missing this point). Once you get the file descriptor to perform, for instance, a map update you can call
ret = bpf_map_update_elem(map_fd, &key, &value, BPF_ANY);
QUESTION: in this case I retrieve the fd from user space using libbpf, but if I create a map from _user.c with bpf_create_map then how can I retrieve the fd from ebpf program ?
If you know you have just one element, the easiest is probably to go for an array map. In that case, you can trivially access your map entry by its index in the array: 0.
If you were to do that, the size of the key would be the size of a 4-byte integer (sizeof(uint32_t)), which are always used for array indices. The size of the value would be the size you need to store your port number: very likely a sizeof(uint16_t).
Then you can read/write from your BPF program by calling the relevant BPF helper functions: bpf_map_lookup_elem() or bpf_map_update_elem() (see man page for details). They are usually defined in bpf_helpers.h which is not usually installed on your system, you can find versions of it in bcc or kernel repo.
From user space, you would update the entry by using the bpf() system call, with its relevant commands: BPF_MAP_LOOKUP_ELEM() and BPF_MAP_UPDATE_ELEM() (see man page). But you don't necessarily have to re-implement the calls yourself: if you write a program, you should probably have a look at libbpf that provides wrappers. If you want to update your map from the command line, this is easy to do with bpftool (see man page), something like bpftool map <map_ref> update key 0 0 0 0 value 0x37 0x13 (update) or bpftool map <map_ref> lookup key 0 0 0 0 (lookup).
I am having some trouble with the WriteFile function in C.
I open a handle to the volume T: with CreateFile:
HANDLE hvol = CreateFile("\\\\.\\T:", GENERIC_READ | GENERIC_WRITE, 3,
NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
Set the file pointer with SetFilePointer:
DWORD offset = SetFilePointer(hvol, bps + datarunOffset[0], NULL, FILE_BEGIN);
And write with:
bool wsuccess = WriteFile(hvol, filearray, secfilesize, &byteswritten, NULL);
Where filearray is an array of the byte values of a small file, and secfilesize is the number of elements in filearray rounded to a multiple of the disk's sector size (512).
If I have set the file pointer to 0x400 (the second sector of $Boot) then the array is written with no error codes.
If I have set the file pointer to 0x3200 (the second sector of $UpCase) then nothing is written and
error code 5: access denied
is returned.
I cannot find a solution and I have full control over the volume. It just seemed odd that I could write to $Boot but not $UpCase, is there any reason for this? and can anyone offer a solution?
I working through exercises from Windows System Programming and I'm not fully comprehending LARGE_INTEGER and OVERLAPPED Structures. For example I have the following Structures defined in main. The first structure is used to keep track of the number of records. The second is used for the record data. The author defines and uses two overlapped structure to keep track of the record file offset.
typedef struct _HEADER {
DWORD numRecords;
DWORD numNonEmptyRecords;
} HEADER; /* 8bytes */
typedef struct _RECORD {
DWORD referenceCount;
SYSTEMTIME recordCreationTime;
SYSTEMTIME recordLastRefernceTime;
SYSTEMTIME recordUpdateTime;
TCHAR dataString[STRING_SIZE];
} RECORD; /* 308bytes */
LARGE_INTEGER currentPtr;
OVERLAPPED ov = {0, 0, 0, 0, NULL}, ovZero = {0, 0, 0, 0, NULL};
After the records are created. The user can be prompted Read, Write, or Delete a record. The record entered by the user is stored in recNo.
currentPtr.QuadPart = (LONGLONG)recNo * sizeof(RECORD) + sizeof(HEADER);
ov.Offset = currentPtr.LowPart;
ov.OffsetHigh = currentPtr.HighPart;
Can someone please explain how the values for the LARGE_INTEGR currentPtr are calculated? What is a Union? I have looked at the example in windbg and I don't understand how the currentPtr.LowPart and currentPtr.HighPart are calculated. Below is an example of file read operation being called with the OVERLAPPED Structure.
ReadFile (hFile, &record, sizeof (RECORD), &nXfer, &ov)
A union gives different names and types to the same location in memory. So if a LARGE_INTEGER union was stored at location 0x1000, and since X86 is little endian:
LARGE_INTEGER.QuadPart is 64 bit integer at 0x1000
LARGE_INTEGER.LowPart is the lower 32 bits of the 64 bit integer at 0x1000.
LARGE_INTEGER.HighPart is the upper 32 bits of the 64 bit integer at 0x1004.
OVERLAPPED is used for asynchronous I/O. A read or write call in overlapped mode will return immediately, and the event specified in the OVERLAPPED structure will be signaled when the I/O completes.
MSDN article for OVERLAPPED structure:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms684342(v=vs.85).aspx
In 32 bit mode, Offset would share memory with Pointer, in 64 bit mode, Offset and OffsetHigh would share memory with Pointer. Offset and OffsetHigh are inputs, while Pointer is used internally. InternalHigh is poorly named since it now reports number of bytes transferred, and may change yet again. Internal is now a status.
I am trying to learn about clEnqueueMapBuffer in OpenCL by writing a kernel which finds the square of values in an input buffer, but only returns two items at a time in the output buffer using clEnqueueMapBuffer. As I understand it, this function returns a pointer in the hosts memory which points to the buffer memory in the device. Then clEnqueueUnmapMemObject must unmap this buffer to allow the kernels to continue their computations. Now, when I call clEnqueueMapBuffer, it is returning random data.
Here is my kernel
__kernel void testStream(
__global int *input_vector,
__global int *output_vector,
__global int *mem_flag) // informs the host when the workload is finished
{
mem_flag[0] = 1;
}
and my source
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <CL/opencl.h>
#include "utils.h"
int main(void)
{
unsigned int n = 24;
int BUFF_SIZE = 2;
// Input and output vectors
int num_bytes = sizeof(int) * n;
int *output_buffer = (int *) malloc(num_bytes);
int output_buffer_offset = 0;
int *mapped_data = NULL;
// use mapped_flag for determining if the job on the device is finished
int *mapped_flag = NULL;
int *host_in = (int *) malloc(num_bytes);
int *host_out = (int *) malloc(num_bytes);
// Declare cl variables
cl_mem device_in;
cl_mem device_out;
cl_mem device_out_flag;
// Declare cl boilerplate
cl_platform_id platform = NULL;
cl_device_id device = NULL;
cl_command_queue queue = NULL;
cl_context context = NULL;
cl_program program = NULL;
cl_kernel kernel = NULL;
// Located in utils.c -- the source is irrelevant here
char *kernel_source = read_kernel("kernels/test.cl");
// Initialize host_in
int i;
for (i = 0; i < n; i++) {
host_in[i] = i + 1;
}
// Set up opencl
cl_int error;
error = clGetPlatformIDs(1, &platform, NULL);
printf("clGetPlatformIDs: %d\n", (int) error);
error = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, NULL);
printf("clGetDeviceIDs: %d\n", (int) error);
context = clCreateContext(NULL, 1, &device, NULL, NULL, &error);
printf("clCreateContext: %d\n", (int) error);
queue = clCreateCommandQueue(context, device, 0, &error);
printf("clCreateCommandQueue: %d\n", (int) error);
program = clCreateProgramWithSource(context, 1,
(const char**)&kernel_source, NULL, &error);
printf("clCreateProgramWithSource: %d\n", (int) error);
clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
kernel = clCreateKernel(program, "testStream", &error);
printf("clCreateKernel: %d\n", (int) error);
// Create the buffers
device_in = clCreateBuffer(context, CL_MEM_READ_ONLY,
num_bytes, NULL, NULL);
device_out = clCreateBuffer(context,
CL_MEM_WRITE_ONLY,
sizeof(int) * BUFF_SIZE, NULL, NULL);
device_out_flag = clCreateBuffer(context,
CL_MEM_WRITE_ONLY,
sizeof(int) * 2, NULL, NULL);
// Write the input buffer
clEnqueueWriteBuffer(
queue, device_in, CL_FALSE, 0, num_bytes,
host_in, 0, NULL, NULL);
// Set the kernel arguments
error = clSetKernelArg(kernel, 0, sizeof(cl_mem), &device_in);
error = clSetKernelArg(kernel, 1, sizeof(cl_mem), &device_out);
error = clSetKernelArg(kernel, 2, sizeof(cl_mem), &device_out_flag);
// Execute the kernel over the entire range of data
clEnqueueNDRangeKernel(queue, kernel, 1, NULL,
(const size_t *) &n, NULL, 0, NULL, NULL);
// Map and unmap until the flag is set to true
int break_flag = 0;
while(1) {
// Map the buffers
mapped_data = (int *) clEnqueueMapBuffer(
queue, device_out, CL_TRUE, CL_MAP_READ, 0,
sizeof(int) * BUFF_SIZE, 0, NULL, NULL, &error);
mapped_flag = (int *) clEnqueueMapBuffer(
queue, device_out_flag, CL_TRUE, CL_MAP_READ, 0,
sizeof(int) , 0, NULL,NULL, &error);
// Extract the data out of the buffer
printf("mapped_flag[0] = %d\n", mapped_flag[0]);
// Set the break_flag
break_flag = mapped_flag[0];
// Unmap the buffers
error = clEnqueueUnmapMemObject(queue, device_out, mapped_data, 0,
NULL, NULL);
error = clEnqueueUnmapMemObject(queue, device_out_flag, mapped_flag,
0, NULL, NULL);
if (break_flag == 1) {break;}
usleep(1000*1000);
}
return 0;
}
When I run the program, I get output similar to
clGetPlatformIDs: 0
clGetDeviceIDs: 0
clCreateContext: 0
clCreateCommandQueue: 0
clCreateProgramWithSource: 0
clCreateKernel: 0
mapped_flag[0] = 45366144
mapped_flag[0] = 45366144
mapped_flag[0] = 45366144
mapped_flag[0] = 45366144
mapped_flag[0] = 45366144
Why is this happening?
Edit
I am running this code on an HP dm1z with fedora 19 64-bit on the kernel 3.13.7-100.fc19.x86_64. Here is the output from clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 AMD-APP (1214.3)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Board name: AMD Radeon HD 6310 Graphics
Device Topology: PCI[ B#0, D#1, F#0 ]
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 492Mhz
Address bits: 32
Max memory allocation: 134217728
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 201326592
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x00007fd434852fc0
Name: Loveland
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 1214.3
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1214.3)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only
Also, it may be worth noting that when I began playing with OpenCL, I ran a test program to calculate the inner product, but that gave weird results. Initially I though it was an error with the program and forgot about it, but is it possible that the OpenCL implementation is faulty? If it helps, the OpenGL implementation has multiple errors, causing blocks of random data to show up on my desktop background, but this could also be a Linux problem.
You are passing NULL as the global work size to your clEnqueueNDRangeKernel call:
// Execute the kernel over the entire range of data
clEnqueueNDRangeKernel(queue, kernel, 1, NULL, NULL, NULL, 0, NULL, NULL);
If you were checking the error code returned by this call (which you always should), you would get the error code corresponding to CL_INVALID_GLOBAL_WORK_SIZE back. You always need to specify a global work size, so your call should look something like this:
// Execute the kernel over the entire range of data
size_t global[1] = {1};
error = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global, NULL, 0, NULL, NULL);
// check error == CL_SUCCESS!
Your calls to map and unmap buffers are fine; I've tested this code with the above fix and it works for me.
Your updated code to fix the above problem looks like this:
unsigned int n = 24;
...
// Execute the kernel over the entire range of data
clEnqueueNDRangeKernel(queue, kernel, 1, NULL,
(const size_t *) &n, NULL, 0, NULL, NULL);
This is not a safe way of passing the global work size parameter to the kernel. As an example, the unsigned int n variable might occupy 32 bits, whereas a size_t could be 64 bits. This means that when you pass the address of n and cast to a const size_t*, the implementation will read a 64-bit value, which will encompass the 32 bits of n plus 32 other bits that have some arbitrary value. You should either assign n to a size_t variable before passing it to clEnqueueNDRangeKernel, or just change it to be a size_t itself.
This may or may not be related to the problems you are having. You could be accidentally launching a huge number of work-items for example, which might explain why the code appears to block on the CPU.
Here are few remarks that come to my mind:
I have the feeling that somehow you think that calling clEnqueueMapBuffer will interrupt the execution of the kernel. I don't think this is correct. Once a command is launched for execution it runs until it is completed (or failed...). It is possible for several commands to be launched concurrently, but trying to read some data while a kernel is still processing it will result in undefined behavior. Besides, the way you create your command queue wouldn't let you run several commands at the same time. You need to use CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property when creating the queue to allow that.
I don't know with which global size you call your kernel after the post of #jprice, but except if you execute the kernel with only one workitem you'll have trouble with this kind of statement: mem_flag[0] = 1; since all the workitems will write to the same location. (I'm guessing that you posted only a portion of your kernel. Check if you have other statements like that...actually that'd be useful if you post the entire kernel code).
Since you map and unmap always the same portion of the buffers and always go to check in the first element (of mapped_flag) and since the kernel has completed its computation at that moment (see first point), at least it is normal that you always have the same value read.