How to execute process memory? - c

I want to inject my code into some process and, I want to execute my injected code.
I can inject a code like this:
#include <Windows.h>
#define t_process_name example
BYTE code[7] = { 0x6A, 0x00 , 0, 0, 0, 0, 0};
/*
0:push 0
2:jmp 521595
*/
int main(){
HWND hWnd = NULL;
DWORD pid = NULL;
HANDLE hProcess = NULL;
BYTE* space = NULL;
BYTE* dst = 0x521595;
int i = 0;
hWnd = FindWindowA(NULL, t_process_name);
GetWindowThreadProcessId(hWnd, &pid);
hProcess = OpenProcess(PROCESS_ALL_ACCESS, NULL, pid);
space = VirtualAllocEx(hProcess, NULL, sizeof(code), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
code[2] = 0xe9;
*(DWORD*)(code + 3) = (DWORD)(dst - (space+2)) - 5;
WriteProcessMemory(hProcess, space, code, sizeof(code), NULL);
// and...
}
But I don't know how to execute my injected code. (push 00 , jmp 521595 )
Are there any API or Way to do this?

If you know what you want to patch try diablo2oo2's Universal Patcher (windows only). It can create loader/patcher for your application. You're better off editing the source code directly.
edit: obviously this doesn't inject to other processes but if i may ask why would you do that?

If you know the exact assembly that needs to be executed you can do that with the asm
http://www.codeproject.com/Articles/15971/Using-Inline-Assembly-in-C-C

Related

Inconsistent behavior with Win32 Console API between console emulator

Can someone tell me what I am doing wrong in the following snippet?
The problem is that this snippet, which is supposed to print "Hello, World!" with colors works exactly as expected when ran in cmder with cmd.exe as the shell, but is is completly broken when used in the native cmd.exe terminal emulator or the native PowerShell emulator.
Specifically, the background color is changed, and the reset doesn't work.
#include <windows.h>
#include <io.h>
#include <conio.h>
#include <fileapi.h>
#include <assert.h>
#include <stdio.h>
typedef struct {
HANDLE h;
HANDLE hin;
WORD savedAttr;
} Terminal;
#define FOREGROUND_RGB (FOREGROUND_RED | FOREGROUND_BLUE | FOREGROUND_GREEN)
#define BACKGROUND_RGB (BACKGROUND_RED | BACKGROUND_BLUE | BACKGROUND_GREEN)
#define TRY_OS(expr) if ((expr) == 0) { osError(); }
#define TRY_OR(expr) if ((expr) == 0)
static WORD get_text_attributes(Terminal term)
{
CONSOLE_SCREEN_BUFFER_INFO c;
TRY_OR(GetConsoleScreenBufferInfo(term.h, &c)) {
return 0x70;
}
return c.wAttributes;
}
Terminal getHandle(FILE* in, FILE* out)
{
Terminal term;
if ((in == stdin) && (out == stdout)) {
// grab the current console even if stdin/stdout have been redirected
term.h =
CreateFile("CON", GENERIC_WRITE, FILE_SHARE_WRITE, NULL, OPEN_EXISTING,
0, 0);
term.hin =
CreateFile("CON", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0,
0);
// term.h = GetStdHandle(STD_OUTPUT_HANDLE);
// term.hin = GetStdHandle(STD_INPUT_HANDLE);
} else {
term.h = (HANDLE) _get_osfhandle(_fileno(out));
term.hin = (HANDLE) _get_osfhandle(_fileno(in));
}
term.savedAttr = get_text_attributes(term);
return term;
}
void setFG(Terminal term, int color)
{
WORD attr = get_text_attributes(term);
attr &= ~FOREGROUND_RGB; // clear FG color
attr &= ~FOREGROUND_INTENSITY; // clear FG intensity
attr |= color;
SetConsoleTextAttribute(term.h, attr);
}
void setBG(Terminal term, int color)
{
WORD attr = get_text_attributes(term);
attr &= ~BACKGROUND_RGB; // clear BG color
attr &= ~BACKGROUND_INTENSITY; // clear BG intensity
attr |= color;
SetConsoleTextAttribute(term.h, attr);
}
int main()
{
Terminal term = getHandle(stdin, stdout);
setFG(term, FOREGROUND_RED);
WriteConsole(term.h, "Hello, ", 7, NULL, NULL);
setFG(term, FOREGROUND_BLUE);
WriteConsole(term.h, "World !\r\n", 9, NULL, NULL);
// reset style
SetConsoleTextAttribute(term.h, term.savedAttr);
return 0;
}
I guess I am doing something wrong in my use of the API and cmder is looser about it, but I wrote this whole code by reading the official Microsoft doc, so I am a bit confused.
I suggest you could try to use Multi-Byte Character Set in stead of Unicode Character Set.
Or you could try to use wide-character literal, like:
term.h =
CreateFile(L"CON", GENERIC_WRITE, FILE_SHARE_WRITE, NULL, OPEN_EXISTING,
0, 0);
term.hin =
CreateFile(L"CON", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0,
.
WriteConsole(term.h, L"Hello, ", 8, NULL, NULL);
.
WriteConsole(term.h, L"World !\n\r", 9, NULL, NULL);
EDIT:
Results of running with cmd:
Results of running with visual studio:

When or why does clEnqueueNDRangeKernel return Null as event?

Hi I am trying to exectue example code from the book 7 Concurreny Models in 7 weeks. The Author uses a macbook, while I am using a a dell xps with windows 10.
My Program crashes because the timing_event is still null after I call the function clEnqueueNDRangeKernel().
cl_event timing_event;
size_t work_units = NUM_ELEMENTS;
clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &work_units,
NULL, 0, NULL,&timing_event);
The docs state that the follwing about the event parameter
event
Returns an event object that identifies this particular kernel
execution instance. Event objects are unique and can be used to
identify a particular kernel execution instance later on. If event is
NULL, no event will be created for this kernel execution instance and
therefore it will not be possible for the application to query or
queue a wait for this particular kernel execution instance.
Can sombody provide some explanation why this happens on my dell and not on the macbook of the author?
I found the solution. The problem did not come from clEnqueueNDRangeKernel()it happened earlier in clBuildProgram(program, 0, NULL, NULL, NULL, NULL);. I retrieved the build Information with clGetProgramBuildInfo(). The problem was that my multiply_arrays.cl file wasn't utf 8 encoded
For everyone who is new to opencl. Every opencl function returns a status integer which is mapped to an specific error code. If a function does return does not return an status code, it is possible to pass a pointer to the function. See the example functions I linked below. This is very helpful to debug your program.
Returns Status Code
Status Code by Reference
main.cpp
/***
* Excerpted from "Seven Concurrency Models in Seven Weeks",
* published by The Pragmatic Bookshelf.
* Copyrights apply to this code. It may not be used to create training material,
* courses, books, articles, and the like. Contact us if you are in doubt.
* We make no guarantees that this code is fit for any purpose.
* Visit http://www.pragmaticprogrammer.com/titles/pb7con for more book information.
***/
#ifdef __APPLE__
#include <OpenCL/cl.h>
#include <mach/mach_time.h>
#else
#include <CL/cl.h>
#include <Windows.h>
#endif
#include <stdio.h>
#include<iostream>
#include <inttypes.h>
#include <chrono>
#define NUM_ELEMENTS (100000)
char* read_source(const char* filename) {
FILE *h = fopen(filename, "r");
fseek(h, 0, SEEK_END);
size_t s = ftell(h);
rewind(h);
char* program = (char*)malloc(s + 1);
fread(program, sizeof(char), s, h);
program[s] = '\0';
fclose(h);
return program;
}
void random_fill(cl_float array[], size_t size) {
for (int i = 0; i < size; ++i)
array[i] = (cl_float)rand() / RAND_MAX;
}
int main() {
//Status for Errorhandling
cl_int status;
//Identify Platform
cl_platform_id platform;
clGetPlatformIDs(1, &platform, NULL);
//Get Id of GPU
cl_device_id device;
cl_uint num_devices = 0;
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, &num_devices);
// Create Context
cl_context context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);
//Use context to create Command Queue
//Que enables us to send commands to the gpu device
cl_command_queue queue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, NULL);
//Load Kernel
char* source = read_source("multiply_arrays.cl");
cl_program program = clCreateProgramWithSource(context, 1,
(const char**)&source, NULL, &status);
free(source);
// Build Program
status = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
size_t len;
char *buffer;
clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &len);
buffer = (char *) malloc(len);
clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, len, buffer, NULL);
printf("%s\n", buffer);
//Create Kernel
cl_kernel kernel = clCreateKernel(program, "multiply_arrays", &status);
// Create Arrays with random Numbers
cl_float a[NUM_ELEMENTS], b[NUM_ELEMENTS];
random_fill(a, NUM_ELEMENTS);
random_fill(b, NUM_ELEMENTS);
//uint64_t startGPU = mach_absolute_time();
auto start = std::chrono::high_resolution_clock::now();
//Create Readonly input Buffers with value from a and b
cl_mem inputA = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
sizeof(cl_float) * NUM_ELEMENTS, a, NULL);
cl_mem inputB = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
sizeof(cl_float) * NUM_ELEMENTS, b, NULL);
//Create Output buffer write Only
cl_mem output = clCreateBuffer(context, CL_MEM_WRITE_ONLY,
sizeof(cl_float) * NUM_ELEMENTS, NULL, NULL);
//set Kernel Arguments
clSetKernelArg(kernel, 0, sizeof(cl_mem), &inputA);
clSetKernelArg(kernel, 1, sizeof(cl_mem), &inputB);
clSetKernelArg(kernel, 2, sizeof(cl_mem), &output);
cl_event timing_event;
size_t work_units = NUM_ELEMENTS;
status = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &work_units,
NULL, 0, NULL,&timing_event);
cl_float results[NUM_ELEMENTS];
//Calculate Results and copy from output buffer to results
clEnqueueReadBuffer(queue, output, CL_TRUE, 0, sizeof(cl_float) * NUM_ELEMENTS,
results, 0, NULL, NULL);
//uint64_t endGPU = mach_absolute_time();
auto finish = std::chrono::high_resolution_clock::now();
//printf("Total (GPU): %lu ns\n\n", (unsigned long)(endGPU - startGPU));
std::cout << "Total(GPU) :"<< std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count() << "ns\n";
cl_ulong starttime;
clGetEventProfilingInfo(timing_event, CL_PROFILING_COMMAND_START,
sizeof(cl_ulong), &starttime, NULL);
cl_ulong endtime;
clGetEventProfilingInfo(timing_event, CL_PROFILING_COMMAND_END,
sizeof(cl_ulong), &endtime, NULL);
printf("Elapsed (GPU): %lu ns\n\n", (unsigned long)(endtime - starttime));
clReleaseEvent(timing_event);
clReleaseMemObject(inputA);
clReleaseMemObject(inputB);
clReleaseMemObject(output);
clReleaseKernel(kernel);
clReleaseProgram(program);
clReleaseCommandQueue(queue);
clReleaseContext(context);
//uint64_t startCPU = mach_absolute_time();
start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < NUM_ELEMENTS; ++i)
results[i] = a[i] * b[i];
//uint64_t endCPU = mach_absolute_time();
finish = std::chrono::high_resolution_clock::now();
//printf("Elapsed (CPU): %lu ns\n\n", (unsigned long)(endCPU - startCPU));
std::cout << "Elapsed (CPU) :" << std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count() << "ns\n";
return 0;
}
multiply_arrays.cl
__kernel void multiply_arrays(__global const float* inputA,
__global const float* inputB,
__global float* output) {
int i = get_global_id(0);
output[i] = inputA[i] * inputB[i];
}
//ö

OpenCL - Writing to the Buffer is zero?

I have written a kernel, which should be doing nothing, except from adding an one to each component of a float3:
__kernel void GetCellIndex(__global Particle* particles) {
int globalID = get_global_id(0);
particles[globalID].position.x += 1;
particles[globalID].position.y += 1;
particles[globalID].position.z += 1;
};
with following struct (in the kernel)
typedef struct _Particle
{
cl_float3 position;
}Particle;
my problem is, that when i write my array of particles to the GPU, every component is zero. here is the neccassary code:
(Particle*) particles = new Particle[200];
for (int i = 0; i < 200; i++)
{
particles[i].position.x = 5f;
}
cl_Particles = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(Particle)*200, NULL, &err);
if (err != 0)
{
std::cout << "CreateBuffer does not work!" << std::endl;
system("Pause");
}
clEnqueueWriteBuffer(queue, cl_Particles, CL_TRUE, 0, sizeof(Particle) * 200, &particles, 0, NULL, NULL);
//init of kernel etc.
err = clSetKernelArg(kernel, 0, sizeof(Particle) * 200, &cl_Particles);
if (err != 0) {
std::cout << "Error: setKernelArg 0 does not work!" << std::endl;
system("Pause");
}
and this is my struct on the CPU:
typedef struct _Particle
{
cl_float4 position;
}Particle;
can someone help me with this problem?
(any clue is worth to discuss...)
Thanks
Your code snippet contains some typical C programming errors. At first,
(Particle*) particles = new Particle[200];
does not declare a new variable particle as a pointer to Particle. It must be:
Particle *particles = new Particle[200];
As next, in your call of
clEnqueueWriteBuffer(queue, cl_Particles, CL_TRUE, 0, sizeof(Particle) * 200, &particles, 0, NULL, NULL);
you passed a pointer to the particles pointer as the 6th parameter (ptr). But, here you must pass a pointer to the region on the host containing the data. Thus, change &particles to particles:
clEnqueueWriteBuffer(queue, cl_Particles, CL_TRUE, 0, sizeof(Particle) * 200, particles, 0, NULL, NULL);
The setup of the kernel arguments is also wrong. Here, you must pass the OpenCL buffer created with clCreateBuffer. Thus, replace
err = clSetKernelArg(kernel, 0, sizeof(Particle) * 200, &cl_Particles);
with:
err = clSetKernelArg(kernel, 0, sizeof(cl_Particle), &cl_Particles);
As clCreateBuffer returns a value of type cl_mem, the expression sizeof(cl_Particle) evaluates to the same as sizeof(cl_mem). I recommend to always call sizeof() on the variable, so you need to change the data-type only in one place: the variable declaration.
On my platform, cl_float3 is the same as cl_float4. This might not be true on your/every platform, so you should always use the same type in the host code and in the kernel code. Also, in your kernel code you should/must use the type float4 instead of cl_float4.
I hope, I got the C calls right because I actually tested it with this C++ code. This code snippet contains the fixed C calls as comments:
Particle *particles = new Particle[200];
for (int i = 0; i < 200; i++)
{
//particles[i].position.x = 5f;
particles[i].position.s[0] = 0x5f; // due to VC++ compiler
}
//cl_mem cl_Particles = cl_createBuffer(context, CL_MEM_READ_WRITE, sizeof(Particle)*200, NULL, &err); // FIXED
cl::Buffer cl_Particles(context, CL_MEM_READ_WRITE, sizeof(Particle)*200, NULL, &err);
checkErr(err, "Buffer::Buffer()");
//err = clEnqueueWriteBuffer(queue, cl_Particles, CL_TRUE, 0, sizeof(Particle) * 200, particles, 0, NULL, NULL); // FIXED
queue.enqueueWriteBuffer(cl_Particles, CL_TRUE, 0, sizeof(Particle) * 200, particles, NULL, NULL);
checkErr(err, "ComamndQueue::enqueueWriteBuffer()");
//init of kernel
cl::Kernel kernel(program, "GetCellIndex", &err);
checkErr(err, "Kernel::Kernel()");
//err = clSetKernelArg(kernel, 0, sizeof(cl_Particle), &cl_Particles); // FIXED
err = kernel.setArg(0, sizeof(cl_Particles), &cl_Particles);
checkErr(err, "Kernel::setArg()");

WIN32 C Properly using SetWindowLongPtr and GetWindowLongPtr

Yes this is a homework assignment and I'm completely stumped.
So I've made a struct and two windows:
typedef struct thingy {
int count;
TCHAR* MSG;
COLORREF colour; };
The windows have:
wndclass.lpfnWndProc = WndProc;
wndclass.cbClsExtra = sizeof(thingy*);
wndclass.cbWndExtra = sizeof(thingy*);
I need one window to display 0 and the next to display 1 using this struct stored in the clsextra using SetWindowLongPtr and GetWindowLongPtr/SetClassLongPtr and GetClassLongPtr
Count of course has to be initialized to 0 for the FIRST window but not for the second and I have no idea how to do this. Only one WndProc can be used to do this.
static thingy* mythingy = (thingy*)GetWindowLongPtr(hwnd, 0);
char buf[128];
int num = GetClassLongPtr(hwnd, 0);
static boolean set = false;
case WM_CREATE:
if (!set) {
mythingy = (thingy*)malloc(sizeof(thingy));
mythingy->count = 0;
mythingy->colour = RGB(0, 0, 0);
mythingy->MSG = TEXT("Hello Windows!");
set = true;
}
if (lParam != NULL) {
SetClassLongPtr(hwnd, 0, (LONG)mythingy->count);
}
mythingy->count++;
SetWindowLongPtr(hwnd, 0, (LONG)mythingy);
return 0;
case WM_PAINT:
hdc = BeginPaint(hwnd, &ps);
GetClientRect(hwnd, &rect);
DrawText(hdc, mythingy->MSG, -1, &rect,
DT_SINGLELINE | DT_CENTER | DT_VCENTER);
sprintf_s(buf, "%d", num);
TextOut(hdc, 0, 0, LPCWSTR(buf), 1);
EndPaint(hwnd, &ps);
return 0;
Right now both windows display 1 and I'm struggling to see why it isn't doing what I want as I can't find anything on Google about how to use these two functions or when I need to be calling them.
Window: 0x000a0528
count = 0
Add to class data
Window: 0x001f099a
count = 1
Add to class data
From the paint method I get the data and both of them are 1.
A Window Class
... is a set of attributes that the system uses as a template to create a window. Every window is a member of a window class.
Since the Windows API is exposed as a flat C interface, there is no inheritance at the language level. The phrase "is a member of" is implemented by sharing the class memory across window instances of that class. Consequently, every call to GetClassLongPtr accesses the same shared memory.
In contrast, each window can reserve cbWndExtra bytes of memory, that are attributed to the specific window instance. This memory is private to each window, and can store per-window data.
To implement your requirements you need to store the common information (current count of windows) in the window class' extra memory (cbClsExtra), and keep the per-window data (index, message, and color) in the window instance's extra memory (cbWndExtra).
Apply the following changes to your code:
// Total count of windows stored as an integer:
wndclass.cbClsExtra = sizeof(int);
In the WM_CREATE-handler, set the per-window data, increment the total count, and store it away:
case WM_CREATE:
{
int count = (int)GetClassLongPtr(hwnd, 0);
// Allocate new per-window data object:
thingy* mythingy = (thingy*)malloc(sizeof(thingy));
mythingy->count = count;
mythingy->colour = RGB(0, 0, 0);
mythingy->MSG = TEXT("Hello Windows!");
// Store the per-window data:
SetWindowLongPtr(hwnd, 0, (LONG_PTR)mythingy);
// Increment total count and store it in the class extra memory:
++count;
SetClassLongPtr(hwnd, 0, (LONG_PTR)count);
}
return DefWindowProc(hwnd, msg, wParam, lParam);
In the WM_PAINT-handler, access the per-window data:
case WM_PAINT:
{
PAINTSTRUCT ps;
hdc = BeginPaint(hwnd, &ps);
RECT rect;
GetClientRect(hwnd, &rect);
// Retrieve per-window data:
thingy* mythingy = (thingy*)GetWindowLongPtr(hwnd, 0);
DrawText(hdc, mythingy->MSG, -1, &rect, DT_SINGLELINE | DT_CENTER | DT_VCENTER);
char buf[128];
sprintf_s(buf, "%d", mythingy->count);
TextOutA(hdc, 0, 0, buf, 1);
EndPaint(hwnd, &ps);
return 0;
}
Note: All error handling has been elided for brevity. Character-encoding issues have not really been addressed either (char vs. wchar_t). Likewise, resource management is missing. You'd probably want to deallocate memory in a WM_NCDESTROY-handler. The code assumes, that only windows of a single window class are created.

getting wrong output for Parallel Floyd Warshall Algorithm in OpenCL

#include <stdio.h>
#include <stdlib.h>
#include <iostream>
/*#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else*/
#include <CL/cl.h>
//#endif
#define DATA_SIZE 16
using namespace std;
const char *ProgramSource =
"__kernel void floydWarshallPass(__global uint * pathDistanceBuffer,const unsigned int numNodes, __global uint * result, const unsigned int pass)\n"\
"{\n"\
"int xValue = get_global_id(0);\n"\
"int yValue = get_global_id(1);\n"\
"int k = pass;\n"\
"int oldWeight = pathDistanceBuffer[yValue * 4 + xValue];\n"\
"int tempWeight = (pathDistanceBuffer[yValue * 4 + k] + pathDistanceBuffer[k * 4 + xValue]);\n"\
"if (tempWeight < oldWeight)\n"\
"{\n"\
"pathDistanceBuffer[yValue * 4 + xValue] = tempWeight;\n"\
"result[yValue * 4 + xValue] = tempWeight;\n"\
"}\n"\
"}\n"\
"\n";
int main(void)
{
cl_context context;
cl_context_properties properties[3];
cl_kernel kernel;
cl_command_queue command_queue;
cl_program program;
cl_int err;
cl_uint num_of_platforms=0;
cl_platform_id platform_id;
cl_device_id device_id;
cl_uint num_of_devices=0;
cl_mem inputA, inputB, output;
cl_int numNodes;
size_t global;
float inputDataA[16] = {0,2,3,4,5,0,7,8,9,10,0,12,13,14,15,0};
float results[16]={0};
int i,j;
numNodes = 16;
if(clGetPlatformIDs(1, &platform_id, &num_of_platforms) != CL_SUCCESS)
{
printf("Unable to get platform id\n");
return 1;
}
// try to get a supported GPU device
if (clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_CPU, 1, &device_id, &num_of_devices) != CL_SUCCESS)
{
printf("Unable to get device_id\n");
return 1;
}
// context properties list - must be terminated with 0
properties[0]= CL_CONTEXT_PLATFORM;
properties[1]= (cl_context_properties) platform_id;
properties[2]= 0;
// create a context with the GPU device
context = clCreateContext(properties,1,&device_id,NULL,NULL,&err);
// create command queue using the context and device
command_queue = clCreateCommandQueue(context, device_id, 0, &err);
// create a program from the kernel source code
program = clCreateProgramWithSource(context,1,(const char **) &ProgramSource, NULL, &err);
// compile the program
if (clBuildProgram(program, 0, NULL, NULL, NULL, NULL) != CL_SUCCESS)
{
printf("Error building program\n");
return 1;
}
// specify which kernel from the program to execute
kernel = clCreateKernel(program, "floydWarshallPass", &err);
// create buffers for the input and ouput
inputA = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float) * DATA_SIZE, NULL, NULL);
output = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(float) * DATA_SIZE, NULL, NULL);
// load data into the input buffer
clEnqueueWriteBuffer(command_queue, inputA, CL_TRUE, 0, sizeof(float) * DATA_SIZE, inputDataA, 0, NULL, NULL);
clEnqueueWriteBuffer(command_queue, output, CL_TRUE, 0, sizeof(float) * DATA_SIZE, inputDataA, 0, NULL, NULL);
// set the argument list for the kernel command
clSetKernelArg(kernel, 0, sizeof(cl_mem), &inputA);
clSetKernelArg(kernel, 1, sizeof(cl_int), (void *)&numNodes);
clSetKernelArg(kernel, 2, sizeof(cl_mem), &output);
global=DATA_SIZE;
// enqueue the kernel command for execution
for(cl_uint sh=0; sh<16; sh++)
{
clSetKernelArg(kernel, 3, sizeof(cl_uint), (void *)&sh);
clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &global, NULL, 0, NULL, NULL);
//clEnqueueReadBuffer(command_queue, output, CL_TRUE, 0, sizeof(float)*DATA_SIZE, results, 0, NULL, NULL);
//clEnqueueWriteBuffer(command_queue, inputA, CL_TRUE, 0, sizeof(float) * DATA_SIZE, results, 0, NULL, NULL);
//clEnqueueWriteBuffer(command_queue, output, CL_TRUE, 0, sizeof(float) * DATA_SIZE, results, 0, NULL, NULL);
//clSetKernelArg(kernel, 0, sizeof(cl_mem), &inputA);
//clSetKernelArg(kernel, 1, sizeof(cl_int), (void *)&numNodes);
//clSetKernelArg(kernel, 2, sizeof(cl_mem), &output);
clFinish(command_queue);
}
clFinish(command_queue);
// copy the results from out of the output buffer
clEnqueueReadBuffer(command_queue, output, CL_TRUE, 0, sizeof(float) *DATA_SIZE, results, 0, NULL, NULL);
// print the results
printf("output: ");
for(i=0;i<16; i++)
{
printf("%f ",results[i]);
}
// cleanup - release OpenCL resources
clReleaseMemObject(inputA);
//clReleaseMemObject(inputB);
clReleaseMemObject(output);
clReleaseProgram(program);
clReleaseKernel(kernel);
clReleaseCommandQueue(command_queue);
clReleaseContext(context);
return 0;
}
I am getting -0.00000 output for every node.
P.S i am running my code on CL_DEVICE_TYPE_CPU because on GPU it is giving error that cannot get device id.
Please give some guidance on how to get correct output.
I think your question is a little too broad, you should have narrowed your code a little bit. I'll try to help you with some errors I found on your code, but I didn't debug or compile it, so those issues I describe here are only something for you to start looking at.
Why are you calling get_global_id with parameter 1 on your kernel?
Back at your clEnqueueNDRangeKernel you specified that your
work-items dimension is only one, so your get_global_id is querying
for a non-existing dimension. If you want to translate a single
dimension coordinate into two coordinate, you should use a
transformation such as below:
int id = get_global_id(0);
int x = id % size->width;
int y = id / size->height;
Pay attention when you use sizeof(float) to measure the size of the data types: they may be not of the same size inside the OpenCL implementation. Use sizeof(cl_float) instead.
Maybe you are not getting any GPU because you don't have the proper drivers installed on your computer. Go to the GPU vendor website and look for runtime drivers for OpenCL.
Take a look at those pages from the OpenCL specification
get_global_id
OpenCl data types

Resources