I am getting an error when I try to run a c file which does some basic writes to a serial port. I am trying to run it asynchronously because the writes sometimes take a long time to transfer. My original version had it running synchronously with WriteFile() commands which worked fine. I am new to using OVERLAPPED and would appreciate and input concerning it.
The error I am getting is:
Debug Assertion Failed!
<path to dbgheap.c>
Line: 1317
Expression: _CrtIsValidHeapPointer(pUserData)
when the second write function is called.
In main:
{
//initialized port (with overlapped), DBC, and timeouts
result = write_port(outPortHandle, 128);
result = write_port(outPortHandle, 131);
}
static void CALLBACK write_compl(DWORD dwErrorCode, DWORD dwNumberOfBytesTransfered, LPOVERLAPPED lpOverlapped) {
//write completed. check for errors? if so throw an exception maybe?
printf("write completed--and made it to callback function\n");
}
int write_port(HANDLE hComm,BYTE* lpBuf) {
OVERLAPPED osWrite = {0};
// Create this write operation's OVERLAPPED structure's hEvent.
osWrite.hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
if (osWrite.hEvent == NULL)
// error creating overlapped event handle
return 0;
// Issue write.
if (!WriteFileEx(hComm, &lpBuf, 1, &osWrite, &write_compl )) {
if (GetLastError() != ERROR_IO_PENDING) {
// WriteFile failed, but isn't delayed. Report error and abort.
printf("last error: %ld",GetLastError());
return 0; //failed, return false;
}
else {
// Write is pending.
WaitForSingleObjectEx(osWrite.hEvent, 50, TRUE); //50 ms timeout
return -1; //pending
}
}
else {
return 1; //finished
}
}
That was not the full code, sorry. I was using an array of BYTEs as well, not constants. But system("pause")'s were causing my debug assertion failed errors, and after carefully looking through my code, when the WriteFileEx() was successful, it was never setting an alert/timeout on the event in the overlapped structure, so the callback function would never get called. I fixed these problems though.
I just need help with the handling/accessing a single BYTE in a structure which is allocated when a ReadFileEx() function is called (for storing the BYTE that is read so it can be handled). I need to know how to access that BYTE storage using an offset and make the overlapped structure null. Would making the overlapped structure null be as simple as setting the handle in it to INVALID_HANDLE_VALUE?
I think you have a couple of issues:
You are passing an integer as a pointer (your compiler should warn against this or preferably refuse to compile the code):
result = write_port(outPortHandle, 128);
Compare this to the definition of write_port:
int write_port(HANDLE hComm,BYTE* lpBuf) {
The above statements doesn't match. Later on you then pass a pointer to the lpBuf pointer to the WriteFileEx function by taking the address of the BYTE* -> "&lpBuf". This will not result in what you think it will do.
Even if you fix this, you will still have potential lifetime issues whenever the write is successfully queued but won't complete within the 50 ms timeout.
When using overlapped I/O, you need to make sure that the read/write buffer and the overlapped structure remain valid until the I/O is completed, cancelled or the associated device is closed. In your code above you use a pointer to an OVERLAPPED struct that lives on the stack in your call to WriteFileEx. If WriteFileEx does not complete within 50 ms, the pending I/O will have a reference to a non-existing OVERLAPPED struct and you will (hopefully) have an access violation (or worse, silently corrupted stack data somewhere in your app).
The canonical way of handling these lifetime issues (if performance is not a big issue), is to use a custom struct that includes an OVERLAPPED struct and some storage for the data to be read/written. Allocate the struct when posting the write and deallocate the struct from the I/O completion routine. Pass the address of the included OVERLAPPED struct to WriteFileEx, and use e.g. offsetof to get the address to the custom struct from the OVERLAPPED address in the completion routine.
Also note that WriteFileEx does not actually use the hEvent member, IIRC.
EDIT: Added code sample, please note:
I haven't actually tried to compile the code, there might be typos or other problems with the code.
It's not the most efficient way of sending data (allocating/deallocating a memory block for each byte that is sent). It should be easy to improve, though.
#include <stddef.h>
#include <assert.h>
#include <windows.h>
// ...
typedef struct _MYOVERLAPPED
{
OVERLAPPED ol;
BYTE buffer;
} MYOVERLAPPED, *LPMYOVERLAPPED;
// ...
static void CALLBACK write_compl(DWORD dwErrorCode, DWORD dwNumberOfBytesTransfered, LPOVERLAPPED lpOverlapped)
{
if (NULL == lpOverlapped)
{
assert(!"Should never happen");
return;
}
LPBYTE pOlAsBytes = (LPBYTE)lpOverlapped;
LPBYTE pMyOlAsBytes = pOlAsBytes - offsetof(MYOVERLAPPED, ol);
LPMYOVERLAPPED pMyOl = (LPMYOVERLAPPED)pOlAsBytes;
if ((ERROR_SUCCESS == dwErrorCode) &&
(sizeof(BYTE) == dwNumberOfBytesTransfered))
{
printf("written %uc\n", pMyOl->buffer);
}
else
{
// handle error
}
free(pMyOl);
}
int write_port(HANDLE hComm, BYTE byte) {
LPMYOVERLAPPED pMyOl = (LPMYOVERLAPPED)malloc(sizeof(MYOVERLAPPED));
ZeroMemory(pMyOl, sizeof(MYOVERLAPPED));
pMyOl->buffer = byte;
// Issue write.
if (!WriteFileEx(hComm, &pMyOl->buffer, sizeof(BYTE), pMyOl, &write_compl )) {
if (GetLastError() != ERROR_IO_PENDING) {
// WriteFile failed, but isn't delayed. Report error and abort.
free(pMyOl);
printf("last error: %ld",GetLastError());
return 0; //failed, return false;
}
else {
return -1; //pending
}
}
else {
free(pMyOl);
return 1; //finished
}
}
result = write_port(outPortHandle, 128);
result = write_port(outPortHandle, 131);
The lpBuf argument have to be pointers to buffers, not constants.
e.g.
char buffer;
buffer = 128;
result = write_port(outPortHandle, &buffer);
buffer = 131;
result = write_port(outPortHandle, &buffer);
What you really want to do is also pass a buffer length.
e.g.
char buffer[] = { 128, 131 };
result = write_port(outPortHandle, &buffer, sizeof(buffer));
int write_port(HANDLE hComm,BYTE* lpBuf, size_t length) {
...
// Issue write.
if (!WriteFileEx(hComm, &lpBuf, length, &osWrite, &write_compl )) {
...
Related
Let's consider this very nice and easy to use remux sample by horgh.
I'd like to achieve the same task: convert an RTSP H264 encoded stream to a fragmented MP4 stream.
This code does exactly this task.
However I don't want to write the mp4 onto disk at all, but I need to get a byte buffer or array in C with the contents that would normally written to disk.
How is that achievable?
This sample uses vs_open_output to define the output format and this function needs an output url.
If I would get rid of outputting the contents to disk, how shall I modify this code?
Or there might be better alternatives as well, those are also welcomed.
Update:
As szatmary recommended, I have checked his example link.
However as I stated in the question I need the output as buffer instead of a file.
This example demonstrates nicely how can I read my custom source and give it to ffmpeg.
What I need is how can open the input as standard (with avformat_open_input) then do my custom modification with the packets and then instead writing to file, write to a buffer.
What have I tried?
Based on szatmary's example I created some buffers and initialization:
uint8_t *buffer;
buffer = (uint8_t *)av_malloc(4096);
format_ctx = avformat_alloc_context();
format_ctx->pb = avio_alloc_context(
buffer, 4096, // internal buffer and its size
1, // write flag (1=true, 0=false)
opaque, // user data, will be passed to our callback functions
0, // no read
&IOWriteFunc,
&IOSeekFunc
);
format_ctx->flags |= AVFMT_FLAG_CUSTOM_IO;
AVOutputFormat * const output_format = av_guess_format("mp4", NULL, NULL);
format_ctx->oformat = output_format;
avformat_alloc_output_context2(&format_ctx, output_format,
NULL, NULL)
Then of course I have created 'IOWriteFunc' and 'IOSeekFunc':
static int IOWriteFunc(void *opaque, uint8_t *buf, int buf_size) {
printf("Bytes read: %d\n", buf_size);
int len = buf_size;
return (int)len;
}
static int64_t IOSeekFunc (void *opaque, int64_t offset, int whence) {
switch(whence){
case SEEK_SET:
return 1;
break;
case SEEK_CUR:
return 1;
break;
case SEEK_END:
return 1;
break;
case AVSEEK_SIZE:
return 4096;
break;
default:
return -1;
}
return 1;
}
Then I need to write the header to the output buffer, and the expected behaviour here is to print "Bytes read: x":
AVDictionary * opts = NULL;
av_dict_set(&opts, "movflags", "frag_keyframe+empty_moov", 0);
av_dict_set_int(&opts, "flush_packets", 1, 0);
avformat_write_header(output->format_ctx, &opts)
In the last line during execution, it always runs into segfault, here is the backtrace:
#0 0x00007ffff7a6ee30 in () at /usr/lib/x86_64-linux-gnu/libavformat.so.57
#1 0x00007ffff7a98189 in avformat_init_output () at /usr/lib/x86_64-linux-gnu/libavformat.so.57
#2 0x00007ffff7a98ca5 in avformat_write_header () at /usr/lib/x86_64-linux-gnu/libavformat.so.57
...
The hard thing for me with the example is that it uses avformat_open_input.
However there is no such thing for the output (no avformat_open_ouput).
Update2:
I have found another example for reading: doc/examples/avio_reading.c.
There are mentions of a similar example for writing (avio_writing.c), but ffmpeg does not have this available (at least in my google search).
Is this task really this hard to solve? standard rtsp input to custom avio?
Fortunately ffmpeg.org is down. Great.
It was a silly mistake:
In the initialization part I called this:
avformat_alloc_output_context2(&format_ctx, output_format,
NULL, NULL)
However before this I already put the avio buffers into format_ctx:
format_ctx->pb = ...
Also, this line is unnecessary:
format_ctx = avformat_alloc_context();
Correct order:
AVOutputFormat * const output_format = av_guess_format("mp4", NULL, NULL);
avformat_alloc_output_context2(&format_ctx, output_format,
NULL, NULL)
format_ctx->pb = avio_alloc_context(
buffer, 4096, // internal buffer and its size
1, // write flag (1=true, 0=false)
opaque, // user data, will be passed to our callback functions
0, // no read
&IOWriteFunc,
&IOSeekFunc
);
format_ctx->flags |= AVFMT_FLAG_CUSTOM_IO;
format_ctx->oformat = output_format; //might be unncessary too
Segfault is gone now.
You need to write a AVIOContext implementation.
My situation is that MmCopyVirtualMemory almost always (%99 of the time) returns STATUS_PARTIAL_COPY.
(Im operating in a Ring0 Driver)
I've tried so many different things, like using different variables sizes and types, different addresses etc... It always returns STATUS_PARTIAL_COPY.
Nothing online has helped either, its not a really common error.
Error description:
{Partial Copy} Due to protection conflicts not all the requested bytes could be copied.
My way of reading a processes memory:
DWORD64 Read(DWORD64 SourceAddress, SIZE_T Size)
{
SIZE_T Bytes;
NTSTATUS Status = STATUS_SUCCESS;
DWORD64 TempRead;
DbgPrintEx(0, 0, "\nRead Address:%p\n", SourceAddress); // Always Prints Correct Address
DbgPrintEx(0, 0, "Read szAddress:%x\n", Size); // Prints Correct Size 8 bytes
Status = MmCopyVirtualMemory(Process, SourceAddress, PsGetCurrentProcess(), &TempRead, Size, KernelMode, &Bytes);
DbgPrintEx(0, 0, "Read Bytes:%x\n", Bytes); // Copied bytes - prints 0
DbgPrintEx(0, 0, "Read Output:%p\n", TempRead); // prints 0 as expected since it failed
if (!NT_SUCCESS(Status))
{
DbgPrintEx(0, 0, "Read Failed:%p\n", Status);
return NULL;
}
return TempRead;
}
Example of how I use it:
NetMan = Read(BaseAddr + NET_MAN, sizeof(DWORD64));
//BaseAddr is a DWORD64 and NetMan is also a DWORD64
I've tripped checked my code to many times, all of it appears to be right.
After investigating MiDoPoolCopy (MmCopyVirtualMemoy calls this) it seems that its failing during the move operation:
// MiDoPoolCopy function
// Probe to make sure that the specified buffer is accessible in
// the target process.
//Wont execute (supplied KernelMode)
if ((InVa == FromAddress) && (PreviousMode != KernelMode)){
Probing = TRUE;
ProbeForRead (FromAddress, BufferSize, sizeof(CHAR));
Probing = FALSE;
}
//Failing either here, copying inside the target process's address space to the buffer
RtlCopyMemory (PoolArea, InVa, AmountToMove);
KeUnstackDetachProcess (&ApcState);
KeStackAttachProcess (&ToProcess->Pcb, &ApcState);
//
// Now operating in the context of the ToProcess.
//
//Wont execute (supplied KernelMode)
if ((InVa == FromAddress) && (PreviousMode != KernelMode)){
Probing = TRUE;
ProbeForWrite (ToAddress, BufferSize, sizeof(CHAR));
Probing = FALSE;
}
Moving = TRUE;
//or failing here - moving from the Target Process to Source (target process->Kernel)
RtlCopyMemory (OutVa, PoolArea, AmountToMove);
Here's the SEH returning STATUS_PARTIAL_COPY (wrapped in try and except)
//(wrapped in try and except)
//
// If the failure occurred during the move operation, determine
// which move failed, and calculate the number of bytes
// actually moved.
//
*NumberOfBytesRead = BufferSize - LeftToMove;
if (Moving == TRUE) {
//
// The failure occurred writing the data.
//
if (ExceptionAddressConfirmed == TRUE) {
*NumberOfBytesRead = (SIZE_T)((ULONG_PTR)(BadVa - (ULONG_PTR)FromAddress));
}
}
return STATUS_PARTIAL_COPY;
MmCopyVirtualMemory (undocumented struct)
NTSTATUS NTAPI MmCopyVirtualMemory
(
PEPROCESS SourceProcess,
PVOID SourceAddress,
PEPROCESS TargetProcess,
PVOID TargetAddress,
SIZE_T BufferSize,
KPROCESSOR_MODE PreviousMode,
PSIZE_T ReturnSize
);
Here is the source for both MmCopyVirtualMemory and MiDoPoolCopy:
https://lacicloud.net/custom/open/leaks/Windows%20Leaked%20Source/wrk-v1.2/base/ntos/mm/readwrt.c
Any help would be greatly appreciated, I've been stuck on this for to long...
I know this is old, but because I had the same issue and fixed it, maybe someone else will stumble on this in the future.
Reading your code the issue is the target address you want to copy the data. You are using a single defined DWORD64, which is when running the function a simple variable (register or on stack) and not a memory region where you can write to.
The solution is to provide a correct buffer for which you allocated memory before with malloc.
Example (you want to read a DWORD64, based on your code with some adjustments):
DWORD64* buffer = malloc(sizeof(DWORD64))
Status = MmCopyVirtualMemory(Process, (void*)SourceAddress, PsGetCurrentProcess(), (void*)buffer, sizeof(DWORD64), UserMode, &Bytes);
Do not forget to free your buffer after using to prevent a memory leak.
I am trying to write a remapping target for usage with DM.
I followed instructions from several places (including this Answer) all essentially giving the same code.
This is ok, but not enough for me.
I need to modify "in transit" data of struct bio being remapped.
This means I need to make a deep-clone of the bio, including the data; apparently the provided functions (e.g.: bio_clone_bioset()) do not copy data at all, but point iovec's to the original pages/offsets.
I tried some variations of the following scheme:
void
mt_copy(struct bio *dst, struct bio *src) {
struct bvec_iter src_iter, dst_iter;
struct bio_vec src_bv, dst_bv;
void *src_p, *dst_p;
unsigned bytes;
unsigned salt;
src_iter = src->bi_iter;
dst_iter = dst->bi_iter;
salt = src_iter.bi_sector;
while (1) {
if (!src_iter.bi_size) {
break;
}
if (!dst_iter.bi_size) {
break;
}
src_bv = bio_iter_iovec(src, src_iter);
dst_bv = bio_iter_iovec(dst, dst_iter);
bytes = min(src_bv.bv_len, dst_bv.bv_len);
src_p = kmap_atomic(src_bv.bv_page);
dst_p = kmap_atomic(dst_bv.bv_page);
memcpy(dst_p + dst_bv.bv_offset, src_p + src_bv.bv_offset, bytes);
kunmap_atomic(dst_p);
kunmap_atomic(src_p);
bio_advance_iter(src, &src_iter, bytes);
bio_advance_iter(dst, &dst_iter, bytes);
}
}
static struct bio *
mt_clone(struct bio *bio) {
struct bio *clone;
clone = bio_clone_bioset(bio, GFP_KERNEL, NULL);
if (!clone) {
return NULL;
}
if (bio_alloc_pages(clone, GFP_KERNEL)) {
bio_put(clone);
return NULL;
}
clone->bi_private = bio;
if (bio_data_dir(bio) == WRITE) {
mt_copy(clone, bio);
}
return clone;
}
static int
mt_map(struct dm_target *ti, struct bio *bio) {
struct mt_private *mdt = (struct mt_private *) ti->private;
bio->bi_bdev = mdt->dev->bdev;
bio = mt_clone(bio);
submit_bio(bio->bi_rw, bio);
return DM_MAPIO_SUBMITTED;
}
This, however, does not work.
When I submit_bio() using the cloned bio I do not get the .end_io call and the calling task becomes blocked ("INFO: task mount:488 blocked for more than 120 seconds."). This with a READ request consisting of a single iovec (1024 bytes). In this case, of course the in buffers do not need copying because they should be overwritten; I need to copy back the incoming data unto the original buffers after the request has completed... but I don't get there.
I'm quite evidently missing some piece, but I'm unable to understand what.
Note: I didn't do any optimization (e.g.: use smarter allocation strategies) specifically because I need to get the basics first.
Note: I corrected a mistake (thanks #RuslanRLaishev), unfortunately ininfluent; see my own answer.
It's correct ?
if (bio_alloc_pages(**bio**, GFP_KERNEL)) {
bio_put(clone);
return NULL;
}
or
if (bio_alloc_pages(**clone**, GFP_KERNEL)) {
bio_put(bio);
return NULL;
}
It turns out bio_clone_bioset() and friends do not copy the callback address to call when request is over.
Trivial solution is to add clone->bi_end_io = bio->bi_end_io; before the end of mt_clone().
Unfortunately this is not enough to make the code functional because it turns out upper layers can spawn thousands of inflight requests (i.e.: requests queued and preprocessed before the previous ones complete) leading to memory starvation. Trying to slow upper layers by returning DM_MAPIO_REQUEUE does not seem to work (see: https://unix.stackexchange.com/q/410525/130498). This has nothing to do with current question, however.
I'm trying to make a function, ReadFileMany, which imitates ReadFile's interface, which is given a list of (offset, length) pairs to read from, and which reads all portions of the file asynchronously.
I'm doing this inside ReadFileMany by using RegisterWaitForSingleObject to direct a thread pool thread to wait for I/O to complete, so that it can call ReadFile again to read the next portion of the file, etc.
The trouble I'm running into is that I can't seem to be able to mimic a certain behavior of ReadFile.
Specifically, file handles can themselves be used like events, without the need for an event handle:
OVERLAPPED ov = {0};
DWORD nw;
if (ReadFile(hFile, buf, buf_size, &nw, &ov))
{
if (WaitForSingleObject(hFile, INFINITE) == WAIT_OBJECT_0)
{
...
}
}
but of course, if the user waits on a file handle, he might get a notification that an intermediate read has complete, rather than the final read.
So, inside ReadFileMany, I have no choice but to pass an hEvent parameter to ReadFile for all but the last portion.
The question is, is there any way to still allow the user to wait for the file handle to become signaled when all the portions have been read?
At first the answer seems obvious: just avoid passing an event handle when reading the last portion!
That works fine if the read will be successful, but not if there are errors. If ReadFile starts suddenly returning an error on the last read, I will need to set the file handle to a signaled state manually in order to allow the reader to wake up from his potential call to WaitForSingleObject(hFile).
But there seems to be no way to set a file handle to a signaled state without performing actual I/O... so this is where I get stuck: is there any way for me to write this function so that it behaves like ReadFile on the outside, or is it impossible to do it correctly?
Assuming you are trying to read multiple sections of a single file, I would simply allocate an array of OVERLAPPED structures, one for each requested section of the file, kick off all of the ReadFile() calls at one time, and then use WaitForMultipleObjects() to wait for all of the I/Os to finish, eg:
struct sReadInfo
{
OVERLAPPED ov;
LPVOID pBuffer;
DWORD dwNumBytes;
DWORD dwNumBytesRead;
bool bPending;
sReadInfo()
{
memset(&ov, 0, sizeof(OVERLAPPED));
pBuffer = NULL;
dwNumBytes = 0;
dwNumBytesRead = 0;
bPending = false;
ov.hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
if (!ov.hEvent) throw std::exception();
}
~sReadInfo()
{
CloseHandle(hEvent);
}
};
.
bool error = false;
try
{
std::vector<sReadInfo> ri(numSections);
std::vector<HANDLE> h;
h.reserve(numSections);
for(int i = 0; i < numSections; ++i)
{
ULARGE_INTEGER ul;
ul.QuadPart = ...; // desired file offset to read from
sReadInfo *r = &ri[i];
r->ov.Offset = ul.LowPart;
r->ov.OffsetHigh = ul.HighPart;
r->pBuffer = ...; // desired buffer to read into
r->dwNumBytes = ...; // desired number of bytes to read
if (!ReadFile(hFile, r->pBuffer, r->dwNumBytes, &r->dwNumBytesRead, &r->ov))
{
if (GetLastError() != ERROR_IO_PENDING)
throw std::exception();
r->bPending = true;
h.push_back(r->ov.hEvent);
}
}
if (!h.empty())
{
if (WaitForMultipleObjects(h.size(), &h[0], TRUE, INFINITE) != WAIT_OBJECT_0)
throw std::exception();
}
for (int i = 0; i < numSections; ++i)
{
sReadInfo *r = &ri[i];
if (r->bPending)
GetOverlappedResult(hFile, &r->ov, &r->dwNumBytesRead, FALSE);
// ...
}
}
catch (const std::exception &)
{
CancelIo(hFile);
return false;
}
return true;
I have some long source code that involves a struct definition:
struct exec_env {
cl_program* cpPrograms;
cl_context cxGPUContext;
int cpProgramCount;
int cpKernelCount;
int nvidia_platform_index;
int num_cl_mem_buffs_used;
int total;
cl_platform_id cpPlatform;
cl_uint ciDeviceCount;
cl_int ciErrNum;
cl_command_queue commandQueue;
cl_kernel* cpKernels;
cl_device_id *cdDevices;
cl_mem* cmMem;
};
The strange thing, is that the output of my program is dependent on the order in which I declare the components of this struct. Why might this be?
EDIT:
Some more code:
int HandleClient(int sock) {
struct exec_env my_env;
int err, cl_err;
int rec_buff [sizeof(int)];
log("[LOG]: In HandleClient. \n");
my_env.total = 0;
//in anticipation of some cl_mem buffers, we pre-emtively init some. Later, we should have these
//grow/shrink dynamically.
my_env.num_cl_mem_buffs_used = 0;
if ((my_env.cmMem = (cl_mem*)malloc(MAX_CL_BUFFS * sizeof(cl_mem))) == NULL)
{
log("[ERROR]:Failed to allocate memory for cl_mem structures\n");
//let the client know
replyHeader(sock, MALLOC_FAIL, UNKNOWN, 0, 0);
return EXIT_FAILURE;
}
my_env.cpPlatform = NULL;
my_env.ciDeviceCount = 0;
my_env.cdDevices = NULL;
my_env.commandQueue = NULL;
my_env.cxGPUContext = NULL;
while(1){
log("[LOG]: Awaiting next packet header... \n");
//read the first 4 bytes of the header 1st, which signify the function id. We later switch on this value
//so we can read the rest of the header which is function dependent.
if((err = receiveAll(sock,(char*) &rec_buff, sizeof(int))) != EXIT_SUCCESS){
return err;
}
log("[LOG]: Got function id %d \n", rec_buff[0]);
log("[LOG]: Total Function count: %d \n", my_env.total);
my_env.total++;
//now we switch based on the function_id
switch (rec_buff[0]) {
case CREATE_BUFFER:;
{
//first define a client packet to hold the header
struct clCreateBuffer_client_packet my_client_packet_hdr;
int client_hdr_size_bytes = CLI_PKT_HDR_SIZE + CRE_BUFF_CLI_PKT_HDR_EXTRA_SIZE;
//buffer for the rest of the header (except the size_t)
int header_rec_buff [(client_hdr_size_bytes - sizeof(my_client_packet_hdr.buff_size))];
//size_t header_rec_buff_size_t [sizeof(my_client_packet_hdr.buff_size)];
size_t header_rec_buff_size_t [1];
//set the first field
my_client_packet_hdr.std_header.function_id = rec_buff[0];
//read the rest of the header
if((err = receiveAll(sock,(char*) &header_rec_buff, (client_hdr_size_bytes - sizeof(my_client_packet_hdr.std_header.function_id) - sizeof(my_client_packet_hdr.buff_size)))) != EXIT_SUCCESS){
//signal the client that something went wrong. Note we let the client know it was a socket read error at the server end.
err = replyHeader(sock, err, CREATE_BUFFER, 0, 0);
cleanUpAllOpenCL(&my_env);
return err;
}
//read the rest of the header (size_t)
if((err = receiveAll(sock, (char*)&header_rec_buff_size_t, sizeof(my_client_packet_hdr.buff_size))) != EXIT_SUCCESS){
//signal the client that something went wrong. Note we let the client know it was a socket read error at the server end.
err = replyHeader(sock, err, CREATE_BUFFER, 0, 0);
cleanUpAllOpenCL(&my_env);
return err;
}
log("[LOG]: Got the rest of the header, packet size is %d \n", header_rec_buff[0]);
log("[LOG]: Got the rest of the header, flags are %d \n", header_rec_buff[1]);
log("[LOG]: Buff size is %d \n", header_rec_buff_size_t[0]);
//set the remaining fields
my_client_packet_hdr.std_header.packet_size = header_rec_buff[0];
my_client_packet_hdr.flags = header_rec_buff[1];
my_client_packet_hdr.buff_size = header_rec_buff_size_t[0];
//get the payload (if one exists)
int payload_size = (my_client_packet_hdr.std_header.packet_size - client_hdr_size_bytes);
log("[LOG]: payload_size is %d \n", payload_size);
char* payload = NULL;
if(payload_size != 0){
if ((payload = malloc(my_client_packet_hdr.buff_size)) == NULL){
log("[ERROR]:Failed to allocate memory for payload!\n");
replyHeader(sock, MALLOC_FAIL, UNKNOWN, 0, 0);
cleanUpAllOpenCL(&my_env);
return EXIT_FAILURE;
}
if((err = receiveAllSizet(sock, payload, my_client_packet_hdr.buff_size)) != EXIT_SUCCESS){
//signal the client that something went wrong. Note we let the client know it was a socket read error at the server end.
err = replyHeader(sock, err, CREATE_BUFFER, 0, 0);
free(payload);
cleanUpAllOpenCL(&my_env);
return err;
}
}
//make the opencl call
log("[LOG]: ***num_cl_mem_buffs_used before***: %d \n", my_env.num_cl_mem_buffs_used);
cl_err = h_clCreateBuffer(&my_env, my_client_packet_hdr.flags, my_client_packet_hdr.buff_size, payload, &my_env.cmMem);
my_env.num_cl_mem_buffs_used = (my_env.num_cl_mem_buffs_used+1);
log("[LOG]: ***num_cl_mem_buffs_used after***: %d \n", my_env.num_cl_mem_buffs_used);
//send back the reply with the error code to the client
log("[LOG]: Sending back reply header \n");
if((err = replyHeader(sock, cl_err, CREATE_BUFFER, 0, (my_env.num_cl_mem_buffs_used -1))) != EXIT_SUCCESS){
//send the header failed, so we exit
log("[ERROR]: Failed to send reply header to client, %d \n", err);
log("[LOG]: OpenCL function result was %d \n", cl_err);
if(payload != NULL) free(payload);
cleanUpAllOpenCL(&my_env);
return err;
}
//now exit if failed
if(cl_err != CL_SUCCESS){
log("[ERROR]: Error executing OpenCL function clCreateBuffer %d \n", cl_err);
if(payload != NULL) free(payload);
cleanUpAllOpenCL(&my_env);
return EXIT_FAILURE;
}
}
break;
Now what's really interesting is the call to h_clCreateBuffer. This function is as follows
int h_clCreateBuffer(struct exec_env* my_env, int flags, size_t size, void* buff, cl_mem* all_mems){
/*
* TODO:
* Sort out the flags.
* How do we store cl_mem objects persistantly? In the my_env struct? Can we have a pointer int the my_env
* struct that points to a mallocd area of mem. Each malloc entry is a pointer to a cl_mem object. Then we
* can update the malloced area, growing it as we have more and more cl_mem objects.
*/
//check that we have enough pointers to cl_mem. TODO, dynamically expand if not
if(my_env->num_cl_mem_buffs_used == MAX_CL_BUFFS){
return CL_MEM_OUT_OF_RANGE;
}
int ciErrNum;
cl_mem_flags flag;
if(flags == CL_MEM_READ_WRITE_ONLY){
flag = CL_MEM_READ_WRITE;
}
if(flags == CL_MEM_READ_WRITE_OR_CL_MEM_COPY_HOST_PTR){
flag = CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR;
}
log("[LOG]: Got flags. Calling clCreateBuffer\n");
log("[LOG]: ***num_cl_mem_buffs_used before in function***: %d \n", my_env->num_cl_mem_buffs_used);
all_mems[my_env->num_cl_mem_buffs_used] = clCreateBuffer(my_env->cxGPUContext, flag , size, buff, &ciErrNum);
log("[LOG]: ***num_cl_mem_buffs_used after in function***: %d \n", my_env->num_cl_mem_buffs_used);
log("[LOG]: Finished clCreateBuffer with id: %d \n", my_env->num_cl_mem_buffs_used);
//log("[LOG]: Finished clCreateBuffer with id: %d \n", buff_counter);
return ciErrNum;
}
The first time round the while loop, my_env->num_cl_mem_buffs_used is increased by 1. However, the next time round the loop, after the call to clCreateBuffer, the value of my_env->num_cl_mem_buffs_used gets reset to 0. This does not happen when I change the order in which I declare the members of the struct! Thoughts? Note that I've omitted the other case statements, all of which so similar things, i.e. updating the structs members.
Well, if your program dumps the raw memory content of the object of your struct type, then the output will obviously depend on the ordering of the fields inside your struct. So, here's one obvious scenario that will create such a dependency. There are many others.
Why are you surprised that the output of your program depends on that order? In general, there's nothing strange in that dependency. If you base your verdict of on the knowledge of the rest of the code, then I understand. But people here have no such knowledge and we are not telepathic.
It's hard to tell. Maybe you can post some code. If I had to guess, I'd say you were casting some input file (made of bytes) into this struct. In that case, you must have the proper order declared (usually standardized by some protocol) for your struct in order to properly cast or else risk invalidating your data.
For example, if you have file that is made of two bytes and you are casting the file to a struct, you need to be sure that your struct has properly defined the order to ensure correct data.
struct example1
{
byte foo;
byte bar;
};
struct example2
{
byte bar;
byte foo;
};
//...
char buffer[];
//fill buffer with some bits
(example1)buffer;
(example2)buffer;
//those two casted structs will have different data because of the way they are defined.
In this case, "buffer" will always be filled in the same manner, as per some standard.
Of course the output depends on the order. Order of fields in a struct matters.
An additional explanation to the other answers posted here: the compiler may be adding padding between fields in the struct, especially if you are on a 64 bit platform.
If you are not using binary serialization, then your best bet is an invalid pointer issue. Like +1 error, or some invalid pointer arithmetics can cause this. But it is hard to know without code. And it is still hard to know, even with the code. You may try to use some kind of pointer validation/tracking system to be sure.
other guesses
by changing the order you are having different uninitialized values in the struct. A pointer being zero or not zero for example
you somehow manage to overrun an item (by casting ) and blast later items. Different items get blasted depending on order
This may happen if your code uses on "old style" initializer as from C89. For a simple example
struct toto {
unsigned a;
double b;
};
.
.
toto A = { 0, 1 };
If you interchange the fields in the definition this remains a valid initializer, but your fields are initialized completely different. Modern C, AKA C99, has designated initializers for that:
toto A = { .a = 0, .b = 1 };
Now, even when reordering your fields or inserting a new one, your initialization remains valid.
This is a common error which is perhaps at the origin of the initializerphobia that I observe in many C89 programs.
You have 14 fields in your struct, so there is 14! possible ways the compiler and/or standard C library can order them during the output.
If you think from the compiler designer's point of view, what should the order be? Random is certainly not useful. The only useful order is the order in which the struct fields were declared by you.