OpenACC: Deep Copy and Unified Memory - deep-copy

I would like to understand clearly a situations I faced often accelerating an application with OpenACC. Let's say I have this loop:
#pragma acc parallel loop collapse(4)
for (k = KBEG; k <= KEND; k++){
for (j = JBEG; j <= JEND; j++){
for (i = IBEG; i <= IEND; i++){
for (nv = 0; nv < NVAR; nv++) A0[k][j][i][nv] =
data->A[k][j][i][nv];
}}}
Being data a structured type variable:
typedef struct Data_{
double ****A;
double ****B;
} Data;
I noticed that both with Unified Memory (-ta=tesla:managed) or not, I get an error at the execution: error 700: Illegal address during kernel execution.
I identified the problem with the deep copy problem I read in literature: the implicit copy done by the compiler does a simple copy of A, that points to an address on the host memory, but not a copy of the data it is pointing to. The host address cannot be read by the device and this generates the error.
Is the deep copy problem the correct interpretation of my error?
Moreover, if I'm using Unified Memory and it is indeed a deep copy problem, shouldn't the device be capable of reading the address, being A, at least virtually, situated on unified memory and address space?
I can easly resolve the error adding the diretive:
#pragma acc enter data(data)
and adding present(data) to the parallel pragma. Notice that I don't need to copy manually A and B.
I would like to understand the reason of both the problem and the solution.

Unified memory is only available for allocated (heap) memory. I'm assuming that "data" itself is not allocated? In that case, you do need to include it in a data region and should add the "present" clause so the compiler doesn't try to implicitly copy it.

Related

Is this the correct use of 'restrict' in C?

Currently I'm learning about parallel programming. I have the following loop that needs to be parallelized.
for(i=0; i<n/2; i++)
a[i] = a[i+1] + a[2*i]
If I run this sequentially there is no problem, but if I want to run this in parallel, there occurs data recurrence. To avoid this I want to store the information to 'read' in a seperate variable e.g b.
So then the code would be:
b = a;
#pragma omp parallel for private(i)
for(i=0; i<n/2; i++)
a[i] = b[i+1] + b[2*i];
But here comes the part I where I begin to doubt. Probably the variable b will point to the same memory location as a. So the second code block will do exactly as the first code block. Including recurrence I'm trying to avoid.
I tried something with * restric {variable}. Unfortunately I can't really find the right documentation.
My question:
Do I avoid data recurrence by writing the code as follows?
int *restrict b;
int *restrict a;
b = a;
#pragma omp parallel for private(i)
for(i=0; i<n/2; i++)
a[i] = b[i+1] + b[2*i];
If not, what is a correct way to achieve this goal?
Thanks,
Ter
In your proposed code:
int *restrict b;
int *restrict a;
b = a;
the assignment of a to b violates the restrict requirement. That requires that a and b do not point to the same memory, yet they clearly do point to the same memory.
It is not safe.
You'd have to.make a separately allocated copy of the array to be safe. You could do that with:
int *b = malloc(n * size of(*b));
…error check…;
memmove(b, a, n *sizeof(*b));
…revised loop using a and b…
free(b);
I always use memmove() because it is always correct, dealing with overlapping copies. In this case, it would be legitimate to use memcpy() because the space allocated for b will be separate from the space for a. The system would be broken if the newly allocated space for b overlaps with a at all, assuming the pointer to a is valid. If there was an overlap, the trouble would be that a was allocated and freed — so a is a dangling pointer pointing to released memory (and should not be being used at all), and b was coincidentally allocated where the old a was previously allocated. On the whole, it's not a problem worth worrying about. (Using memmove() doesn't help if a is a dangling pointer, but it is always safe if given valid pointers, even if the areas of memory overlap.)

Why the access violation exception when freeing directly after allocating memory

While freeing some pointers, I get an access violation.
In order to know what's going on, I've decided to ask to free the pointers at an earlier stage in the code, even directly after memory has been allocated, and still it crashes.
It means that something is seriously wrong in the way my structures are handled in memory.
I know that in a previous version of the code, there was a keyword before the definition of some variables, but that keyword is lost (it was part of a #define clause I can't find back).
Does anybody know what's wrong in this piece of code or what the mentioned keyword should be?
typedef unsigned long longword;
typedef struct part_tag { struct part_tag *next;
__int64 fileptr;
word needcount;
byte loadflag,lock;
byte partdat[8192];
} part;
static longword *partptrs;
<keyword> part *freepart;
<keyword> part *firstpart;
void alloc_parts (void) {
part *ps;
int i;
partptrs = (longword*)malloc (number_of_parts * sizeof(longword)); // number... = 50
ps = (part*)&freepart;
for (i=0; i<number_of_parts; i++) {
ps->next = (struct part_tag*)malloc(sizeof(part));
partptrs[i] = (longword)ps->next;
ps = ps->next;
ps->fileptr = 0; ps->loadflag = 0; ps->lock = 0; ps->needcount = 0; // fill in "ps" structure
};
ps->next = nil;
firstpart = nil;
for (i=0; i<number_of_parts; i++) {
ps = (part*)partptrs[i];
free(ps); <-- here it already crashes at the first occurence (i=0)
};
}
Thanks in advance
In the comments somebody asks why I'm freeing pointers directly after allocating them. This is not how the program originally was written, but in order to know what's causing the access violation I've rewritten in that style.
Originally:
alloc_parts();
<do the whole processing>
free_parts();
In order to analyse the access violation I've adapted the alloc_parts() function into the source code excerpt I've written there. The point is that even directly after allocating memory, the freeing is going wrong. How is that even possible?
In the meanwhile I've observed another weird phenomena:
While allocating the memory, the values of ps seem to be "complete" address values. While trying to free the memory, the values of ps only contain the last digits of the memory addresses.
Example of complete address : 0x00000216eeed6150
Example of address in freeing loop : 0x00000000eeed6150 // terminating digits are equal,
// so at least something is right :-)
This problem was caused by the longword type: it seems that this type was too small to hold entire memory addresses. I've replaced this by another type (unsigned long long) but the problem still persists.
Finally, after a long time of misery, the problem is solved:
The program was originally meant as a 32-bit application, which means that the original type unsigned long was sufficient to keep memory addresses.
However, this program gets compiled now as a 64-bit application, hence the mentioned type is not sufficiently large anymore to keep 64-bit memory addresses, hence another type has been used for solving this issue:
typedef intptr_t longword;
This solves the issue.
#Andrew Henle: sorry, I didn't realise that your comment contained the actual solution to this problem.

Can an address be assigned to a variable in C?

Is it possible to assign a variable the address you want, in the memory?
I tried to do so but I am getting an error as "Lvalue required as left operand of assignment".
int main() {
int i = 10;
&i = 7200;
printf("i=%d address=%u", i, &i);
}
What is wrong with my approach?
Is there any way in C in which we can assign an address we want, to a variable?
Not directly.
You can do this though : int* i = 7200;
.. and then use i (ie. *i = 10) but you will most likely get a crash. This is only meaningful when doing low level development - device drivers, etc... with known memory addreses.
Assuming you are on an x86-type processer on a modern operating system, it is not possible to write to aribtray memory locations; the CPU works in concert with the OS to protect memory so that one process cannot accidentally (or intentionally) overwrite another processes' memory. Allowing this would be a security risk (see: buffer overflow). If you try to anyway, you get the 'Segmentation fault' error as the OS/CPU prevents you from doing this.
For technical details on this, you want to start with 1, 2, and 3.
Instead, you ask the OS to give you a memory location you can write to, using malloc. In this case, the OS kernel (which is generally the only process that is allowed to write to arbitrary memory locations) finds a free area of memory and allocates it to your process. The allocation process also marks that area of memory as belonging to your process, so that you can read it and write it.
However, a different OS/processor architecture/configuration could allow you to write to an arbitrary location. In that case, this code would work:
#include <stdio.h>
void main() {
int *ptr;
ptr = (int*)7000;
*ptr = 10;
printf("Value: %i", *ptr);
}
C language provides you with no means for "attaching" a name to a specific memory address. I.e. you cannot tell the language that a specific variable name is supposed to refer to a lvalue located at a specific address. So, the answer to your question, as stated, is "no". End of story.
Moreover, formally speaking, there's no alternative portable way to work with specific numerical addresses in C. The language itself defines no features that would help you do that.
However, a specific implementation might provide you with means to access specific addresses. In a typical implementation, converting an integral value Ato a pointer type creates a pointer that points to address A. By dereferencing such pointer you can access that memory location.
Not portably. But some compilers (usually for the embedded world) have extensions to do it.
For example on IAR compiler (here for MSP430), you can do this:
static const char version[] # 0x1000 = "v1.0";
This will put object version at memory address 0x1000.
You can do in the windows system with mingw64 setup in visual studio code tool, here is my code
#include<stdio.h>
int main()
{
int *c;
c = (int *)0x000000000061fe14; // Allocating the address 8-bit with respect to your CPU arch.
*c = NULL; // Initializing the null pointer for allocated address
*c = 0x10; // Assign a hex value (you can assign integer also without '0x')
printf("%p\n",c); // Prints the address of the c pointer variable
printf("%x\n",*c); // Prints the assigned value 0x10 -hex
}
It is tested with mentioned environment. Hope this helps Happy coding !!!
No.
Even if you could, 7200 is not a pointer (memory address), it's an int, so that wouldn't work anyway.
There's probably no way to determine which address a variable will have. But as a last hope for you, there is something called "pointer", so you can modify a value on address 7200 (although this address will probably be inaccessible):
int *i = (int *)7200;
*i = 10;
Use ldscript/linker command file. This will however, assign at link time, not run time.
Linker command file syntax depends largely on specific compiler. So you will need to google for linker command file, for your compiler.
Approximate pseudo syntax would be somewhat like this:
In linker command file:
.section start=0x1000 lenth=0x100 myVariables
In C file:
#pragma section myVariables
int myVar=10;
It's not possible, maybe possible with compiler extensions. You could however access memory at an address you want (if the address is accessible to your process):
int addr = 7200;
*((int*)addr) = newVal;
I think '&' in &a evaluates the address of i at the compile time which i think is a virtual address .So it is not a Lvalue according to your compiler. Use pointer instead

How does C treat struct assignment

Suppose I have a struct like that:
typedef struct {
char *str;
int len;
} INS;
And an array of that struct.
INS *ins[N] = { &item, &item, ... }
When i try to access its elements, not as pointer, but as struct itself, all the fields are copied to a temporary local place?
for (int i = 0; i < N; i++) {
INS in = *ins[i];
// internaly the above line would be like:
// in.str = ins[i]->str;
// in.len = ins[i]->len;
}
?
So as I increase the structure fields that would be a more expensive assignment operation?
Correct, in is a copy of *ins[i].
Never mind your memory consumption, but your code will most likely not be correct: The object in dies at the end of the loop body, and any changes you make to in have no lasting effect!
The structure assignment behaves like a memcpy. Yes, it is more expensive for a larger structure. Paradoxically, the larger your structure becomes, the harder it is to measure the additional expense of adding another field.
Yes, struct have value semantics in C. So assigning a struct to another will result in a member-wise copy. Keep in mind that the pointers will still point to the same objects.
The compiler may optimize away the copy of the structure and instead either access members directly from the array to supply the values needed in your C code that uses the copy or may copy just the individual members you use. A good compiler will do this.
Storing values via pointers can interfere with this optimization. For example, suppose your routine also has a pointer to int, p. When the compiler processes your code INS in = *ins[i], it could “think” something like this: “Copying ins[i] is expensive. Instead, I will just remember that in is a copy, and I will fetch members for it later, when they are used.” However, if your code contains *p = 3, this could change ins[i], unless the compiler is able to deduce that p does not point into ins[i]. (There is a way to help the compiler make that deduction, with the restrict keyword.)
In summary: Operations that look expensive on the surface might be implemented efficiently by a good compiler. Operations that look cheap might be expensive (writing to *p breaks a big optimization). Generally, you should write code that clearly expresses your algorithm and let the compiler optimize.
To expand on how the compiler might optimize this. Suppose you write:
for (int i = 0; i < N; i++) {
INS in = *ins[i];
...
}
where the code in “...” accesses in.str and in.len but not any of the other 237 members you add to the INS struct. Then the compiler is free to, in effect, transform this code into:
for (int i = 0; i < N; i++) {
char *str = *ins[i].str;
int len = *ins[i].len;
...
}
That is, even though you wrote a statement that, on the surface, copies all of an INS struct, the compiler is only required to copy the parts that are actually needed. (Actually, it is not even required to copy those parts. It is only required to produce a program that gets the same results as if it had followed the source code directly.)

Why am i able to access other variables using array indexing?

Here len is at A[10] and i is at A[11]. Is there a way to catch these errors??
I tried compiling with gcc -Wall -W but no warnings are displayed.
int main()
{
int A[10];
int i, len;
len = sizeof(A) / sizeof(0[A]);
printf("Len = %d\n",len);
for(i = 0; i < len; ++i){
A[i] = i*19%7;
}
A[i] = 5;
A[i + 1] = 6;
printf("Len = %d i = %d\n",len,i);
return 0;
}
Output :
Len = 10
Len = 5 i = 6
You are accessing memory outside the bounds of the array; in C, there is no bounds checking done on array indices.
Accessing memory beyond the end of the array technically results in undefined behavior. This means that there are no guarantees about what happens when you do it. In your example, you end up overwriting the memory occupied by another variable. However, undefined behavior can also cause your application to crash, or worse.
Is there a way to catch these errors?
The compiler can catch some errors like this, but not many. It is often impossible to catch this sort of error at compile-time and report a warning.
Static analysis tools can catch other instances of this sort of error and are usually built to report warnings about code that is likely to cause this sort of error.
C does not generally do bounds-checking, but a number of people have implemented bounds-checking for C. For instance there is a patch for GCC at http://sourceforge.net/projects/boundschecking/. Of course bounds-checking does have some overhead, but it can often be enabled and disabled on a per-file basis.
The array allocation of A is adjacent in memory to i and len. Remember that when you address via an array, it's exactly like using pointers, and you're walking off the end of the array, bumping into the other stuff you put there.
C by default does not do bounds checking. You're expected to be careful as a programmer; in exchange you get speed and size benefits.
Usually external tools, like lint, will catch the problems via static code analysis. Some libraries, depending on compiler vendor, will add additional padding or memory protection to detect when you've walked off the end.
Lots of interesting, dangerous, and non-portable things reside in memory at "random spots." Most of the house keeping for heap memory allocations occur in memory locations before the one the compiler gives you.
The general rule is that if you didn't allocate or request it, don't mess with it.
i's location in memory is just past the end of A. That's not guaranteed with every compiler and every architecture, but most probably wouldn't have a reason to do it any other way.
Note that counting from 0 to 9, you have 10 elements.
Array indexing starts from 0. Hence the size of array is equal to one less than the declared value. You are overwriting the memory beyond what is allowed.
These errors may not be reported as warnings but you can use tools like prevent, sparrow, Klockworks or purify to find such "malpractices" if i may call them that.
The short answer is that local variables are al-located on stack, and indexing is just like *(ptr + index). So it could happen that the room for int y[N] is adjacent to the room for another int x; e.g. x is located after the last y. So, y[N-1] is this last y, while y[N] is the int past the last y, and in this case, by accident, it happens you get x (or whatever in your practical example). But it is absolutely not a sure fact what you can get going past the bounds of an array and so you can't rely on that. Even though undetected, it's a "index out of bound error", and a source of bugs.

Resources