PTX arrays as operands not working - arrays

The PTX manual (version 2.3) (http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ptx_isa_2.3.pdf) 6.4.2 states:
Array elements can be accessed using an explicitly calculated byte
address, or by indexing into the array using square-bracket notation.
The expression within square brackets is either a constant integer, a
register variable, or a simple “register with constant offset”
expression, where the offset is a constant expression that is either
added or subtracted from a register variable. If more complicated
indexing is desired, it must be written as an address calculation
prior to use.
ld.global.u32 s, a[0];
ld.global.u32 s, a[N-1];
mov.u32 s, a[1]; // move address of a[1] into s
When I try this I can only get the version pointer plus byte offset to work, i.e. [a+0].
This code fails to load:
.reg .f32 f<1>;
.global .f32 a[10];
ld.global.f32 f0,a[0];
Whereas this loads fine:
.reg .f32 f<1>;
.global .f32 a[10];
ld.global.f32 f0,[a+0];
The problem with the byte offset version is that it really is a byte offset. So, one has to take the underlying size of the type into account, i.e. the second element is [a+4]. Whereas a[1] is supposed to work this out for you.
Ideas what's going wrong?
EDIT
And there is an even more severe issue here involved: The above text states that a register variable can be used to index the array, like:
ld.global.f32 f0,a[u0];
where u0 is probably a .reg.u32 or some other compatible integer.
However, with the pointer plus byte offset method this is not possible. It is illegal to do something like:
mul.u32 u1,u0,4;
ld.global.f32 f0,[a+u1]; // here a reg variable is not allowed.
Now this is a severe limitation. however, one can do another address calculation prior to the load statement. But this complicates things.

This does not seem to fit with the PTX documentation you quoted, but you can add in a multiplier corresponding with the size of the items in your array. For instance, to get the 10th 32-bit word:
ld.const.u32 my_u32, [my_ptr + 10 * 4];

Related

How exactly array type is stored in C?

So I've been reading Brian W. Kernighan and Dennis M. Ritchie's "The C Programming Language" and everything was clear until I got to the array-to-pointer section. The first thing we can read is that by definition, a[i] is converted by C to *(a+i). Okay, this is clear and logical. The next thing is that when we pass an array as a function parameter, you actually pass the pointer to the first element in that array. Then we find out that we can add integers to such a pointer and even it is valid to have a pointer to the first element after the array. But then it's written that we can subtract pointers only in the same array.
So how does C 'know' if these two pointers point to the same array? Is there some metainformation associated with the array? Or does it just mean that this is undefined behavior and compiler won't even generate a warning? Is array stored in memory as just ordinary values of the size of an array type, one after another, or is there something else?
One reason the C standard only defines subtraction for two pointers if they are in the same array is that some (mostly old) C implementations use a form of addressing in which an address consists of a base address plus an offset, and different arrays may have different base addresses.
In some machines, a full address in memory may have a base that is a number of segments or other blocks of some sort and an offset that is a number of bytes within the page. This was done because, for example, some early hardware would work with data in 16-bit pieces and was designed to work with 16-bit addresses, but later versions of hardware extending the same architecture would have larger addresses but would still use 16-bit pieces of data in order to keep some compatibility with previous software. So the newer hardware might have a 22-bit address space. Old software using just 16-bit addresses would still behave the same, but newer software could use an additional piece of data to specify different base addresses and thereby access all memory in the 22-bit address space.
In such a system, the combination of a base b and an offset o might refer to memory address 64•b + o. This gives access to the full 22 bits of address space—with b=65535 and o=63, we have 64•b + o = 64•65535 + 63 = 4,194,303 = 222−1.
Observe that many locations in form can be accessed by multiple addresses. For example, b=17, o=40 refers to the same location as b=16, o=104 and as b=15, o=168. Although the formula for making a 22-bit address could have been designed to be 65536•b + o, and that would have given each memory location a unique address, the overlapping formula was used because it gives a programmer flexibility in choosing their base. Recall that these machines were largely designed around using 16-bit pieces of data. With the non-overlapping address scheme, you would have to calculate both the base and the offset whenever doing address arithmetic. With the overlapping address scheme, you can choose a base for an array you are working with, and then doing any address arithmetic requires calculating only with the offset part.
A C implementation for this architecture can easily support arrays up to 65536 arrays by setting one base address for the array and then doing arithmetic only with the offset part. For example, if we have an array A of 1000 int, and it is allocated starting at memory location 78,976 (equal to 1234•64), we can set b to 1234 and index the array with offsets from 0 to 1998 (999•2, since each int is two bytes in this C implementation).
Then, if we have a pointer p pointing to A[125], it is represented with (1234, 250), to point to offset 250 with base 1234. And if q points to A[55], it is represented with (1234, 110). To subtract these pointers, we ignore the base, subtract the offsets, and divide by the size of one element, so the result is (250-110)/2 = 70.
Now, if you have a pointer r pointing to element 13 in some other array B, it is going to have a different base, say 2345. So r would be represented with (2345, 26). Then, to subtract r from p, we need to subtract (2345, 26) from (1234, 250). In this case, you cannot ignore the bases; simply working with the offsets would give (250−26)/2 = 112, but these items are not 112 elements (or 224 bytes) apart.
The compiler could be altered to do the math by subtracting the bases, multiplying by 64, and add that to the difference of the offsets. But then it is doing math to subtract pointers that is completely unnecessary in the intended uses of pointer arithmetic. So the C standard committee decided a compiler should not be required to support this, and the way to specify that is to say that the behavior is not defined when you subtract pointers to elements in different arrays.
... it's written that we can subtract pointers only in the same array.
So how does C 'know' if these two pointers point to the same array?
C does not know that. It is the programmer's responsability to make sure about the limits.
int arr[100];
int *p1 = arr + 30;
int *p2 = arr + 50;
//both p1 and p2 point into arr
p2 - p1; //ok
p1 - p2; //ok
int *p3 = &((int)42); // ignore the C99 compound literal
//p3 does not point into arr
p3 - p1; //nope!

Finding the difference between the addresses of elements in an array

I have an exam revision question on pointer arithmetic and one part where we are subtracting the address of two array variables is not making sense to me.
Well one array actually equals the other. I understand the individual outputs
for each array variable and in this case the difference between the two addresses
is 16, given an int = 4 bytes on this os.
What I don't understand is why the subtraction gives 4.
My logic would be that they are 4 positions apart in the array, but this doesn't make sense to me.
int main(void)
{
int oddNums[5] = {1, 3, 5, 7, 9};
int *ip = oddNums;
printf("&oddNums[4] %d - ip %d= %d\n",&oddNums[4], ip, &oddNums[4] - ip);
/*prints &oddNums[4] 2686740 - ip 2686724= 4*/
return EXIT_SUCCESS;
}
Subtraction returns 4 because it returns its result in terms of sizeof(<array-element>). This is done to make subtraction an inverse of addition, which also operates in terms of array element size.
Recall that if a is an array and i is an integer, then a+i is the same as &a[i], so the addition must consider the size of the element. In order to follow the rules of the math, the subtraction must divide out the size of an element as well.
This makes pointer arithmetics a lot easier, because the operations of addition and subtraction take care of dealing with the size of array element. Without this rule, one would need to keep dividing or multiplying results of addition or subtraction by the size of an element in order to get the address of the desired element or to get the offset. This is error-prone, and it is also hard to read. Finally, this would create maintenance nightmare in situations when you change element size from one byte to several bytes, and whoever coded the algorithm has forgotten to multiply or divide by sizeof.
The definition of pointer subtraction is to give the number of elements' difference between the two pointers.
It's similar to adding a pointer to an integer: it means to advance the pointer by that number of elements.
Make sure you are thinking of "pointer" as something that tells you where to find an object of a certain type. (As opposed to thinking of it as an integer representing a memory address).

Subtle differences in C pointer addresses

What is the difference between:
*((uint32_t*)(p) + 4);
*(uint32_t*)(p+4);
or is there even a difference in the value?
My intuition is that in the later example the value starts at the 4th index of the array that p is pointing at and takes the first 4 bytes starting from index 4. While in the first example it takes one byte every 4 indices. Is this intuition correct?
The p+4 expression computes the address by adding 4*sizeof(*p) bytes to the value of p. If the size of *p is the same as that of uint32_t, there is no difference between the results of these two expressions.
Given that
p is an int pointer
and assuming that int on your system is 32-bit, your two expressions produce the same result.

+= Operator Chaining (with a dash of UB)

I understand there is no sequence point here before the semicolon, but is there a plausible explanation for the dereferenced pointer to use the old value 2 in the expression?
Or can it be simply put down as undefined behaviour?
int i=2;
int *x=&i;
*x+=*x+=i+=7;
Result:
i= 13
It is "simply" undefined behavior.
That said, the compiler probably emits code that reads the value of i once then performs all the arithmetic, then stores the new value of i.
The obvious way to find out the real explanation would be to go look at the assembly generated by the compiler.
The behaviour isn't undefined, it is down to the way the compiler breaks down the expression and pushes the intermediate results onto the stack. The two *xs are calculated first (both equal 2) and are pushed onto the stack. Then i has 7 added to it and equals 9. Then the second *x, which still equals 2, is pulled off the stack, and added, to make 11. Then the first *x is pulled off the stack and added to the 11 to make 13.
Look up Reverse Polish Notation for hints on what is going on here.

C array address confusion

Say we have the following code:
int main(){
int a[3]={1,2,3};
printf(" E: 0x%x\n", a);
printf(" &E[2]: 0x%x\n", &a[2]);
printf("&E[2]-E: 0x%x\n", &a[2] - a);
return 1;
}
When compiled and run the results are follows:
E: 0xbf8231f8
&E[2]: 0xbf823200
&E[2]-E: 0x2
I understand the result of &E[2] which is 8 plus the array's address, since indexed by 2 and of type int (4 bytes on my 32-bit system), but I can't figure out why the last line is 2 instead of 8?
In addition, what type of the last line should be - an integer or an integer pointer?
I wonder if it is the C type system (kinda casting) that make this quirk?
You have to remember what the expression a[2] really means. It is exactly equivalent to *(a+2). So much so, that it is perfectly legal to write 2[a] instead, with identical effect.
For that to work and make sense, pointer arithmetic takes into account the type of the thing pointed at. But that is taken care of behind the scenes. You get to simply use natural offsets into your arrays, and all the details just work out.
The same logic applies to pointer differences, which explains your result of 2.
Under the hood, in your example the index is multiplied by sizeof(int) to get a byte offset which is added to the base address of the array. You expose that detail in your two prints of the addresses.
When subtracting pointers of the same type the result is number of elements and not number of bytes. This is by design so that you can easily index arrays of any type. If you want number of bytes - cast the addresses to char*.
When you increment the pointer by 1 (p+1) then pointer would points to next valid address by adding ( p + sizeof(Type)) bytes to p. (if Type is int then p+sizeof(int))
Similar logic holds good for p-1 also ( of course subtract in this case).
If you just apply those principles here:
In simple terms:
a[2] can be represented as (a+2)
a[2]-a ==> (a+2) - (a) ==> 2
So, behind the scene,
a[2] - a[0]
==> {(a+ (2* sizeof(int)) ) - (a+0) } / sizeof(int)
==> 2 * sizeof(int) / sizeof(int) ==> 2
The line &E[2]-2 is doing pointer subtraction, not integer subtraction. Pointer subtraction (when both pointers point to data of the same type) returns the difference of the addresses in divided by the size of the type they point to. The return value is an int.
To answer your "update" question, once again pointer arithmetic (this time pointer addition) is being performed. It's done this way in C to make it easier to "index" a chunk of contiguous data pointed to by the pointer.
You may be interested in Pointer Arithmetic In C question and answers.
basically, + and - operators take element size into account when used on pointers.
When adding and subtracting pointers in C, you use the size of the data type rather than absolute addresses.
If you have an int pointer and add the number 2 to it, it will advance 2 * sizeof(int). In the same manner, if you subtract two int pointers, you will get the result in units of sizeof(int) rather than the difference of the absolute addresses.
(Having pointers using the size of the data type is quite convenient, so that you for example can simply use p++ instead of having to specify the size of the type every time: p+=sizeof(int).)
Re: "In addtion,what type of the last line should be?An integer,or a integer pointer??"
an integer/number. by the same token that the: Today - April 1 = number. not date
If you want to see the byte difference, you'll have to a type that is 1 byte in size, like this:
printf("&E[2]-E:\t0x%x\n",(char*)(&a[2])-(char*)(&a[0]))

Resources