Why does the indexing in an array start with zero in C and not with 1?
In C, the name of an array is essentially a pointer [but see the comments], a reference to a memory location, and so the expression array[n] refers to a memory location n elements away from the starting element. This means that the index is used as an offset. The first element of the array is exactly contained in the memory location that array refers (0 elements away), so it should be denoted as array[0].
For more info:
http://developeronline.blogspot.com/2008/04/why-array-index-should-start-from-0.html
This question was posted over a year ago, but here goes...
About the above reasons
While Dijkstra's article (previously referenced in a now-deleted answer) makes sense from a mathematical perspective, it isn't as relevant when it comes to programming.
The decision taken by the language specification & compiler-designers is based on the
decision made by computer system-designers to start count at 0.
The probable reason
Quoting from a Plea for Peace by Danny Cohen.
IEEE Link
IEN-137
For any base b, the first b^N
non-negative integers are represented by exactly N digits (including
leading zeros) only if numbering starts at 0.
This can be tested quite easily. In base-2, take 2^3 = 8
The 8th number is:
8 (binary: 1000) if we start count at 1
7 (binary: 111) if we start count at 0
111 can be represented using 3 bits, while 1000 will require an extra bit (4 bits).
Why is this relevant
Computer memory addresses have 2^N cells addressed by N bits. Now if we start counting at 1, 2^N cells would need N+1 address lines. The extra-bit is needed to access exactly 1 address. (1000 in the above case.). Another way to solve it would be to leave the last address inaccessible, and use N address lines.
Both are sub-optimal solutions, compared to starting count at 0, which would keep all addresses accessible, using exactly N address lines!
Conclusion
The decision to start count at 0, has since permeated all digital systems, including the software running on them, because it makes it simpler for the code to translate to what the underlying system can interpret. If it weren't so, there would be one unnecessary translation operation between the machine and programmer, for every array access. It makes compilation easier.
Quoting from the paper:
Who's on first? Zero or one?
People start counting from the number one. The very word first is abbreviated as 1st, which indicates one. This, however, is a very modern notation. The older concepts do not necessarily support this relationship. In English and French the word first is not derived from the word one, but from an old word for prince, which means foremost. Similarly, The English word second is not derived from the number two but from an old word which means "to follow." Obviously, there is a close relation between third and three, fourth and four, and so on. These relationships occur in other language families, also. In Hebrew, for example, first is derived from the word head, meaning "the foremost." The Hebrew word for second is derived from the word two, thisrelationship of ordinal and cardinal names holds for all the other numbers. For a very long time, people have counted from one, not from zero, As a matter of fact, the inclusion of zero as a full-fledged member of the set of all numbers is a relatively modern concept, even though it is one of the most important numbers mathematically. It has many important properties, such as being a multiple of any integer. A nice mathematical theorem states that for any basis b the first bⁿ positive integers are represented by exactly n digits (leading zeros included). This is true if and only if the count starts with zero (hence, 0 through bⁿ-1), not with one (for 1 through bⁿ). This theorem is the basis of computer memory ad dressing. Typically, 2ⁿ cells are addressed by an N-bit addressing scheme. A count starting from one rather than zero would cause the loss of either one memory cell or an additional address line. Since either price is too expensive, computer engineers agree to use the mathematical notation that starts with zero. Good for them! This is probably the reason why all memories start at address-0, even those of systems that count bits from B1 up. The designers of the 1401 were probably ashamed to have address-0. They hid it from the users and pretended that the memory starts at address-1. Communication engineers, like most people, start counting from one. They never have to suffer the loss of a memory cell, for example. Therefore, they happily count one-to-eight, not zero-to-seven, as computer people do. ref
Because 0 is how far from the pointer to the head of the array to the array's first element.
Consider:
int foo[5] = {1,2,3,4,5};
To access 0 we do:
foo[0]
But foo decomposes to a pointer, and the above access has analogous pointer arithmetic way of accessing it
*(foo + 0)
These days pointer arithmetic isn't used as frequently. Way back when though, it was a convenient way to take an address and move X "ints" away from that starting point. Of course if you wanted to just stay where you are, you just add 0!
Because 0-based index allows...
array[index]
...to be implemented as...
*(array + index)
If index were 1-based, compiler would need to generate: *(array + index - 1), and this "-1" would hurt the performance.
Because it made the compiler and linker simpler (easier to write).
Reference:
"...Referencing memory by an address and an offset is represented directly in hardware on virtually all computer architectures, so this design detail in C makes compilation easier"
and
"...this makes for a simpler implementation..."
Array index always starts with zero.Let assume base address is 2000. Now arr[i] = *(arr+i). Now if i= 0, this means *(2000+0)is equal to base address or address of first element in array. this index is treated as offset, so bydeafault index starts from zero.
For the same reason that, when it's Wednesday and somebody asks you how many days til Wednesday, you say 0 rather than 1, and that when it's Wednesday and somebody asks you how many days until Thursday, you say 1 rather than 2.
I am from a Java background. I Have presented answer to this question in the diagram below which i have written in a piece of paper which is self explanatory
Main Steps:
Creating Reference
Instantiation of Array
Allocation of Data to array
Also note when array is just instantiated .... Zero is allocated to
all the blocks by default until we assign value for it
Array starts with zero because first address will be pointing to the
reference (i:e - X102+0 in image)
Note: Blocks shown in the image is memory representation
The most elegant explanation I've read for zero-based numbering is an observation that values aren't stored at the marked places on the number line, but rather in the spaces between them. The first item is stored between zero and one, the next between one and two, etc. The Nth item is stored between N-1 and N. A range of items may be described using the numbers on either side. Individual items are by convention described using the numbers below it. If one is given a range (X,Y), identifying individual numbers using the number below means that one can identify the first item without using any arithmetic (it's item X) but one must subtract one from Y to identify the last item (Y-1). Identifying items using the number above would make it easier to identify the last item in a range (it would be item Y), but harder to identify the first (X+1).
Although it wouldn't be horrible to identify items based upon the number above them, defining the first item in the range (X,Y) as being the one above X generally works out more nicely than defining it as the one below (X+1).
It is because the address has to point to the right element in the array. Let us assume the below array:
let arr = [10, 20, 40, 60];
Let us now consider the start of the address being 12 and the size of the element be 4 bytes.
address of arr[0] = 12 + (0 * 4) => 12
address of arr[1] = 12 + (1 * 4) => 16
address of arr[2] = 12 + (2 * 4) => 20
address of arr[3] = 12 + (3 * 4) => 24
If it was not zero-based, technically our first element address in the array would be 16 which is wrong as it's location is 12.
The technical reason might derive from the fact that the pointer to a memory location of an array is the contents of the first element of the array. If you declare the pointer with an index of one, programs would normally add that value of one to the pointer to access the content which is not what you want, of course.
Try to access a pixel screen using X,Y coordinates on a 1-based matrix. The formula is utterly complex. Why is complex? Because you end up converting the X,Y coords into one number, the offset. Why you need to convert X,Y to an offset? Because that's how memory is organized inside computers, as a continuous stream of memory cells (arrays). How computers deals with array cells? Using offsets (displacements from the first cell, a zero-based indexing model).
So at some point in the code you need (or the compiler needs) to convert the 1-base formula to a 0-based formula because that's how computers deal with memory.
In array, the index tells the distance from the starting element. So, the first element is at 0 distance from the starting element. So, that's why array start from 0.
Suppose we want to create an array of size 5
int array[5] = [2,3,5,9,8]
let the 1st element of the array is pointed at location 100
and let we consider the indexing starts from 1 not from 0.
now we have to find the location of the 1st element with the help of index
(remember the location of 1st element is 100)
since the size of an integer is 4-bit
therefore --> considering index 1 the position would be
size of index(1) * size of integer(4) = 4
so the actual position it will show us is
100 + 4 = 104
which is not true because the initial location was at 100.
it should be pointing to 100 not at 104
this is wrong
now suppose we have taken the indexing from 0
then the position of 1st element should be the size of index(0) * size of integer(4) = 0
therefore -->
location of 1st element is 100 + 0 = 100
and that was the actual location of the element
this is why indexing starts at 0;
first of all you need to know that arrays are internally considered as pointers because the "name of array itself contains the address of the first element of array "
ex. int arr[2] = {5,4};
consider that array starts at address 100
so element first element will be at address 100 and second will be at 104
now,
consider that if array index starts from 1, so
arr[1]:-
this can be written in the pointers expression like this-
arr[1] = *(arr + 1 * (size of single element of array));
consider size of int is 4bytes, now,
arr[1] = *(arr + 1 * (4) );
arr[1] = *(arr + 4);
as we know array name contains the address of its first element so arr = 100
now,
arr[1] = *(100 + 4);
arr[1] = *(104);
which gives,
arr[1] = 4;
because of this expression we are unable to access the element at address 100 which is official first element,
now consider array index starts from 0, so
arr[0]:-
this will be resolved as
arr[0] = *(arr + 0 + (size of type of array));
arr[0] = *(arr + 0 * 4);
arr[0] = *(arr + 0);
arr[0] = *(arr);
now, we know that array name contains the address of its first element
so,
arr[0] = *(100);
which gives correct result
arr[0] = 5;
therefore array index always starts from 0 in c.
reference: all details are written in book "The C programming language by brian kerninghan and dennis ritchie"
Array name is a constant pointer pointing to the base address.When you use arr[i] the compiler manipulates it as *(arr+i).Since int range is -128 to 127,the compiler thinks that -128 to -1 are negative numbers and 0 to 128 are positive numbers.So array index always starts with zero.
Related
Why does the indexing in an array start with zero in C and not with 1?
In C, the name of an array is essentially a pointer [but see the comments], a reference to a memory location, and so the expression array[n] refers to a memory location n elements away from the starting element. This means that the index is used as an offset. The first element of the array is exactly contained in the memory location that array refers (0 elements away), so it should be denoted as array[0].
For more info:
http://developeronline.blogspot.com/2008/04/why-array-index-should-start-from-0.html
This question was posted over a year ago, but here goes...
About the above reasons
While Dijkstra's article (previously referenced in a now-deleted answer) makes sense from a mathematical perspective, it isn't as relevant when it comes to programming.
The decision taken by the language specification & compiler-designers is based on the
decision made by computer system-designers to start count at 0.
The probable reason
Quoting from a Plea for Peace by Danny Cohen.
IEEE Link
IEN-137
For any base b, the first b^N
non-negative integers are represented by exactly N digits (including
leading zeros) only if numbering starts at 0.
This can be tested quite easily. In base-2, take 2^3 = 8
The 8th number is:
8 (binary: 1000) if we start count at 1
7 (binary: 111) if we start count at 0
111 can be represented using 3 bits, while 1000 will require an extra bit (4 bits).
Why is this relevant
Computer memory addresses have 2^N cells addressed by N bits. Now if we start counting at 1, 2^N cells would need N+1 address lines. The extra-bit is needed to access exactly 1 address. (1000 in the above case.). Another way to solve it would be to leave the last address inaccessible, and use N address lines.
Both are sub-optimal solutions, compared to starting count at 0, which would keep all addresses accessible, using exactly N address lines!
Conclusion
The decision to start count at 0, has since permeated all digital systems, including the software running on them, because it makes it simpler for the code to translate to what the underlying system can interpret. If it weren't so, there would be one unnecessary translation operation between the machine and programmer, for every array access. It makes compilation easier.
Quoting from the paper:
Who's on first? Zero or one?
People start counting from the number one. The very word first is abbreviated as 1st, which indicates one. This, however, is a very modern notation. The older concepts do not necessarily support this relationship. In English and French the word first is not derived from the word one, but from an old word for prince, which means foremost. Similarly, The English word second is not derived from the number two but from an old word which means "to follow." Obviously, there is a close relation between third and three, fourth and four, and so on. These relationships occur in other language families, also. In Hebrew, for example, first is derived from the word head, meaning "the foremost." The Hebrew word for second is derived from the word two, thisrelationship of ordinal and cardinal names holds for all the other numbers. For a very long time, people have counted from one, not from zero, As a matter of fact, the inclusion of zero as a full-fledged member of the set of all numbers is a relatively modern concept, even though it is one of the most important numbers mathematically. It has many important properties, such as being a multiple of any integer. A nice mathematical theorem states that for any basis b the first bⁿ positive integers are represented by exactly n digits (leading zeros included). This is true if and only if the count starts with zero (hence, 0 through bⁿ-1), not with one (for 1 through bⁿ). This theorem is the basis of computer memory ad dressing. Typically, 2ⁿ cells are addressed by an N-bit addressing scheme. A count starting from one rather than zero would cause the loss of either one memory cell or an additional address line. Since either price is too expensive, computer engineers agree to use the mathematical notation that starts with zero. Good for them! This is probably the reason why all memories start at address-0, even those of systems that count bits from B1 up. The designers of the 1401 were probably ashamed to have address-0. They hid it from the users and pretended that the memory starts at address-1. Communication engineers, like most people, start counting from one. They never have to suffer the loss of a memory cell, for example. Therefore, they happily count one-to-eight, not zero-to-seven, as computer people do. ref
Because 0 is how far from the pointer to the head of the array to the array's first element.
Consider:
int foo[5] = {1,2,3,4,5};
To access 0 we do:
foo[0]
But foo decomposes to a pointer, and the above access has analogous pointer arithmetic way of accessing it
*(foo + 0)
These days pointer arithmetic isn't used as frequently. Way back when though, it was a convenient way to take an address and move X "ints" away from that starting point. Of course if you wanted to just stay where you are, you just add 0!
Because 0-based index allows...
array[index]
...to be implemented as...
*(array + index)
If index were 1-based, compiler would need to generate: *(array + index - 1), and this "-1" would hurt the performance.
Because it made the compiler and linker simpler (easier to write).
Reference:
"...Referencing memory by an address and an offset is represented directly in hardware on virtually all computer architectures, so this design detail in C makes compilation easier"
and
"...this makes for a simpler implementation..."
Array index always starts with zero.Let assume base address is 2000. Now arr[i] = *(arr+i). Now if i= 0, this means *(2000+0)is equal to base address or address of first element in array. this index is treated as offset, so bydeafault index starts from zero.
For the same reason that, when it's Wednesday and somebody asks you how many days til Wednesday, you say 0 rather than 1, and that when it's Wednesday and somebody asks you how many days until Thursday, you say 1 rather than 2.
I am from a Java background. I Have presented answer to this question in the diagram below which i have written in a piece of paper which is self explanatory
Main Steps:
Creating Reference
Instantiation of Array
Allocation of Data to array
Also note when array is just instantiated .... Zero is allocated to
all the blocks by default until we assign value for it
Array starts with zero because first address will be pointing to the
reference (i:e - X102+0 in image)
Note: Blocks shown in the image is memory representation
The most elegant explanation I've read for zero-based numbering is an observation that values aren't stored at the marked places on the number line, but rather in the spaces between them. The first item is stored between zero and one, the next between one and two, etc. The Nth item is stored between N-1 and N. A range of items may be described using the numbers on either side. Individual items are by convention described using the numbers below it. If one is given a range (X,Y), identifying individual numbers using the number below means that one can identify the first item without using any arithmetic (it's item X) but one must subtract one from Y to identify the last item (Y-1). Identifying items using the number above would make it easier to identify the last item in a range (it would be item Y), but harder to identify the first (X+1).
Although it wouldn't be horrible to identify items based upon the number above them, defining the first item in the range (X,Y) as being the one above X generally works out more nicely than defining it as the one below (X+1).
It is because the address has to point to the right element in the array. Let us assume the below array:
let arr = [10, 20, 40, 60];
Let us now consider the start of the address being 12 and the size of the element be 4 bytes.
address of arr[0] = 12 + (0 * 4) => 12
address of arr[1] = 12 + (1 * 4) => 16
address of arr[2] = 12 + (2 * 4) => 20
address of arr[3] = 12 + (3 * 4) => 24
If it was not zero-based, technically our first element address in the array would be 16 which is wrong as it's location is 12.
The technical reason might derive from the fact that the pointer to a memory location of an array is the contents of the first element of the array. If you declare the pointer with an index of one, programs would normally add that value of one to the pointer to access the content which is not what you want, of course.
Try to access a pixel screen using X,Y coordinates on a 1-based matrix. The formula is utterly complex. Why is complex? Because you end up converting the X,Y coords into one number, the offset. Why you need to convert X,Y to an offset? Because that's how memory is organized inside computers, as a continuous stream of memory cells (arrays). How computers deals with array cells? Using offsets (displacements from the first cell, a zero-based indexing model).
So at some point in the code you need (or the compiler needs) to convert the 1-base formula to a 0-based formula because that's how computers deal with memory.
In array, the index tells the distance from the starting element. So, the first element is at 0 distance from the starting element. So, that's why array start from 0.
Suppose we want to create an array of size 5
int array[5] = [2,3,5,9,8]
let the 1st element of the array is pointed at location 100
and let we consider the indexing starts from 1 not from 0.
now we have to find the location of the 1st element with the help of index
(remember the location of 1st element is 100)
since the size of an integer is 4-bit
therefore --> considering index 1 the position would be
size of index(1) * size of integer(4) = 4
so the actual position it will show us is
100 + 4 = 104
which is not true because the initial location was at 100.
it should be pointing to 100 not at 104
this is wrong
now suppose we have taken the indexing from 0
then the position of 1st element should be the size of index(0) * size of integer(4) = 0
therefore -->
location of 1st element is 100 + 0 = 100
and that was the actual location of the element
this is why indexing starts at 0;
first of all you need to know that arrays are internally considered as pointers because the "name of array itself contains the address of the first element of array "
ex. int arr[2] = {5,4};
consider that array starts at address 100
so element first element will be at address 100 and second will be at 104
now,
consider that if array index starts from 1, so
arr[1]:-
this can be written in the pointers expression like this-
arr[1] = *(arr + 1 * (size of single element of array));
consider size of int is 4bytes, now,
arr[1] = *(arr + 1 * (4) );
arr[1] = *(arr + 4);
as we know array name contains the address of its first element so arr = 100
now,
arr[1] = *(100 + 4);
arr[1] = *(104);
which gives,
arr[1] = 4;
because of this expression we are unable to access the element at address 100 which is official first element,
now consider array index starts from 0, so
arr[0]:-
this will be resolved as
arr[0] = *(arr + 0 + (size of type of array));
arr[0] = *(arr + 0 * 4);
arr[0] = *(arr + 0);
arr[0] = *(arr);
now, we know that array name contains the address of its first element
so,
arr[0] = *(100);
which gives correct result
arr[0] = 5;
therefore array index always starts from 0 in c.
reference: all details are written in book "The C programming language by brian kerninghan and dennis ritchie"
Array name is a constant pointer pointing to the base address.When you use arr[i] the compiler manipulates it as *(arr+i).Since int range is -128 to 127,the compiler thinks that -128 to -1 are negative numbers and 0 to 128 are positive numbers.So array index always starts with zero.
Why does the indexing in an array start with zero in C and not with 1?
In C, the name of an array is essentially a pointer [but see the comments], a reference to a memory location, and so the expression array[n] refers to a memory location n elements away from the starting element. This means that the index is used as an offset. The first element of the array is exactly contained in the memory location that array refers (0 elements away), so it should be denoted as array[0].
For more info:
http://developeronline.blogspot.com/2008/04/why-array-index-should-start-from-0.html
This question was posted over a year ago, but here goes...
About the above reasons
While Dijkstra's article (previously referenced in a now-deleted answer) makes sense from a mathematical perspective, it isn't as relevant when it comes to programming.
The decision taken by the language specification & compiler-designers is based on the
decision made by computer system-designers to start count at 0.
The probable reason
Quoting from a Plea for Peace by Danny Cohen.
IEEE Link
IEN-137
For any base b, the first b^N
non-negative integers are represented by exactly N digits (including
leading zeros) only if numbering starts at 0.
This can be tested quite easily. In base-2, take 2^3 = 8
The 8th number is:
8 (binary: 1000) if we start count at 1
7 (binary: 111) if we start count at 0
111 can be represented using 3 bits, while 1000 will require an extra bit (4 bits).
Why is this relevant
Computer memory addresses have 2^N cells addressed by N bits. Now if we start counting at 1, 2^N cells would need N+1 address lines. The extra-bit is needed to access exactly 1 address. (1000 in the above case.). Another way to solve it would be to leave the last address inaccessible, and use N address lines.
Both are sub-optimal solutions, compared to starting count at 0, which would keep all addresses accessible, using exactly N address lines!
Conclusion
The decision to start count at 0, has since permeated all digital systems, including the software running on them, because it makes it simpler for the code to translate to what the underlying system can interpret. If it weren't so, there would be one unnecessary translation operation between the machine and programmer, for every array access. It makes compilation easier.
Quoting from the paper:
Who's on first? Zero or one?
People start counting from the number one. The very word first is abbreviated as 1st, which indicates one. This, however, is a very modern notation. The older concepts do not necessarily support this relationship. In English and French the word first is not derived from the word one, but from an old word for prince, which means foremost. Similarly, The English word second is not derived from the number two but from an old word which means "to follow." Obviously, there is a close relation between third and three, fourth and four, and so on. These relationships occur in other language families, also. In Hebrew, for example, first is derived from the word head, meaning "the foremost." The Hebrew word for second is derived from the word two, thisrelationship of ordinal and cardinal names holds for all the other numbers. For a very long time, people have counted from one, not from zero, As a matter of fact, the inclusion of zero as a full-fledged member of the set of all numbers is a relatively modern concept, even though it is one of the most important numbers mathematically. It has many important properties, such as being a multiple of any integer. A nice mathematical theorem states that for any basis b the first bⁿ positive integers are represented by exactly n digits (leading zeros included). This is true if and only if the count starts with zero (hence, 0 through bⁿ-1), not with one (for 1 through bⁿ). This theorem is the basis of computer memory ad dressing. Typically, 2ⁿ cells are addressed by an N-bit addressing scheme. A count starting from one rather than zero would cause the loss of either one memory cell or an additional address line. Since either price is too expensive, computer engineers agree to use the mathematical notation that starts with zero. Good for them! This is probably the reason why all memories start at address-0, even those of systems that count bits from B1 up. The designers of the 1401 were probably ashamed to have address-0. They hid it from the users and pretended that the memory starts at address-1. Communication engineers, like most people, start counting from one. They never have to suffer the loss of a memory cell, for example. Therefore, they happily count one-to-eight, not zero-to-seven, as computer people do. ref
Because 0 is how far from the pointer to the head of the array to the array's first element.
Consider:
int foo[5] = {1,2,3,4,5};
To access 0 we do:
foo[0]
But foo decomposes to a pointer, and the above access has analogous pointer arithmetic way of accessing it
*(foo + 0)
These days pointer arithmetic isn't used as frequently. Way back when though, it was a convenient way to take an address and move X "ints" away from that starting point. Of course if you wanted to just stay where you are, you just add 0!
Because 0-based index allows...
array[index]
...to be implemented as...
*(array + index)
If index were 1-based, compiler would need to generate: *(array + index - 1), and this "-1" would hurt the performance.
Because it made the compiler and linker simpler (easier to write).
Reference:
"...Referencing memory by an address and an offset is represented directly in hardware on virtually all computer architectures, so this design detail in C makes compilation easier"
and
"...this makes for a simpler implementation..."
Array index always starts with zero.Let assume base address is 2000. Now arr[i] = *(arr+i). Now if i= 0, this means *(2000+0)is equal to base address or address of first element in array. this index is treated as offset, so bydeafault index starts from zero.
For the same reason that, when it's Wednesday and somebody asks you how many days til Wednesday, you say 0 rather than 1, and that when it's Wednesday and somebody asks you how many days until Thursday, you say 1 rather than 2.
I am from a Java background. I Have presented answer to this question in the diagram below which i have written in a piece of paper which is self explanatory
Main Steps:
Creating Reference
Instantiation of Array
Allocation of Data to array
Also note when array is just instantiated .... Zero is allocated to
all the blocks by default until we assign value for it
Array starts with zero because first address will be pointing to the
reference (i:e - X102+0 in image)
Note: Blocks shown in the image is memory representation
The most elegant explanation I've read for zero-based numbering is an observation that values aren't stored at the marked places on the number line, but rather in the spaces between them. The first item is stored between zero and one, the next between one and two, etc. The Nth item is stored between N-1 and N. A range of items may be described using the numbers on either side. Individual items are by convention described using the numbers below it. If one is given a range (X,Y), identifying individual numbers using the number below means that one can identify the first item without using any arithmetic (it's item X) but one must subtract one from Y to identify the last item (Y-1). Identifying items using the number above would make it easier to identify the last item in a range (it would be item Y), but harder to identify the first (X+1).
Although it wouldn't be horrible to identify items based upon the number above them, defining the first item in the range (X,Y) as being the one above X generally works out more nicely than defining it as the one below (X+1).
It is because the address has to point to the right element in the array. Let us assume the below array:
let arr = [10, 20, 40, 60];
Let us now consider the start of the address being 12 and the size of the element be 4 bytes.
address of arr[0] = 12 + (0 * 4) => 12
address of arr[1] = 12 + (1 * 4) => 16
address of arr[2] = 12 + (2 * 4) => 20
address of arr[3] = 12 + (3 * 4) => 24
If it was not zero-based, technically our first element address in the array would be 16 which is wrong as it's location is 12.
The technical reason might derive from the fact that the pointer to a memory location of an array is the contents of the first element of the array. If you declare the pointer with an index of one, programs would normally add that value of one to the pointer to access the content which is not what you want, of course.
Try to access a pixel screen using X,Y coordinates on a 1-based matrix. The formula is utterly complex. Why is complex? Because you end up converting the X,Y coords into one number, the offset. Why you need to convert X,Y to an offset? Because that's how memory is organized inside computers, as a continuous stream of memory cells (arrays). How computers deals with array cells? Using offsets (displacements from the first cell, a zero-based indexing model).
So at some point in the code you need (or the compiler needs) to convert the 1-base formula to a 0-based formula because that's how computers deal with memory.
In array, the index tells the distance from the starting element. So, the first element is at 0 distance from the starting element. So, that's why array start from 0.
Suppose we want to create an array of size 5
int array[5] = [2,3,5,9,8]
let the 1st element of the array is pointed at location 100
and let we consider the indexing starts from 1 not from 0.
now we have to find the location of the 1st element with the help of index
(remember the location of 1st element is 100)
since the size of an integer is 4-bit
therefore --> considering index 1 the position would be
size of index(1) * size of integer(4) = 4
so the actual position it will show us is
100 + 4 = 104
which is not true because the initial location was at 100.
it should be pointing to 100 not at 104
this is wrong
now suppose we have taken the indexing from 0
then the position of 1st element should be the size of index(0) * size of integer(4) = 0
therefore -->
location of 1st element is 100 + 0 = 100
and that was the actual location of the element
this is why indexing starts at 0;
first of all you need to know that arrays are internally considered as pointers because the "name of array itself contains the address of the first element of array "
ex. int arr[2] = {5,4};
consider that array starts at address 100
so element first element will be at address 100 and second will be at 104
now,
consider that if array index starts from 1, so
arr[1]:-
this can be written in the pointers expression like this-
arr[1] = *(arr + 1 * (size of single element of array));
consider size of int is 4bytes, now,
arr[1] = *(arr + 1 * (4) );
arr[1] = *(arr + 4);
as we know array name contains the address of its first element so arr = 100
now,
arr[1] = *(100 + 4);
arr[1] = *(104);
which gives,
arr[1] = 4;
because of this expression we are unable to access the element at address 100 which is official first element,
now consider array index starts from 0, so
arr[0]:-
this will be resolved as
arr[0] = *(arr + 0 + (size of type of array));
arr[0] = *(arr + 0 * 4);
arr[0] = *(arr + 0);
arr[0] = *(arr);
now, we know that array name contains the address of its first element
so,
arr[0] = *(100);
which gives correct result
arr[0] = 5;
therefore array index always starts from 0 in c.
reference: all details are written in book "The C programming language by brian kerninghan and dennis ritchie"
Array name is a constant pointer pointing to the base address.When you use arr[i] the compiler manipulates it as *(arr+i).Since int range is -128 to 127,the compiler thinks that -128 to -1 are negative numbers and 0 to 128 are positive numbers.So array index always starts with zero.
I am relatively new to C and am just learning about ways that memory is stored during a program. Can someone please explain why the following code:
int main(int argc, char** argv){
float x[3][4];
printf("%p\n%p\n%p\n%p\n", &(x[0][0]), &(x[2][0]), &(x[2][4]), &(x[3][0]));
return 0;
}
outputs this:
0x7fff5386fc40
0x7fff5386fc60
0x7fff5386fc70
0x7fff5386fc70
Why would the first 3 be different places in memory but the last be the same as the third?
Why is there a gap the size of 20 between the first two, but a gap the size of 10 between the second and third? The distance between &(x[2][0]) and &(x[2][4]) doesn't seem like half the distance between &(x[0][0])and &(x[2][0]).
Thanks in advance.
When you declare an array of size n, the indices range from 0 to n - 1. So x[2][4] and x[3][0] are actually stepping outside the bounds of your arrays.
If you weren't already aware, the multidimensional array you declared is actually an array of arrays.
Your compiler is laying out each array one after the other in memory. So, in memory, your elements are laid out in this order: x[0][0], x[0][1], x[0][2], x[0][3], x[1][0], x[1][1], and so on.
It looks like you already understand how pointers work, so I'll gloss over that. The reason the last two elements are the same is because x[2][4] is out of bounds, so it's referring to the next slot in memory after the end of the x[2] array. That would be the first element of the x[3] array, if there was one, which would be x[3][0].
Now, since x[3][0] refers to an address that you don't have a variable mapping to, it's entirely possible that dereferencing it could cause a segmentation fault. In the context of your program, there just happens to be something stored at 0x7fff5386fc70; in other words, you got lucky.
This is due to pointer arithmetic.
Your array is flat, which means that data are stored in a linear way, each one after the other in memory. First [0][0] then [0][1], etc.
The address of [x][y] is calculated as (x*4+y)*float_size+starting_address.
So the gap between the two first [0][0] and [2][0] is 8*float_size. The difference is 20 in hexadecimal, which is 32 in decimal, float_size is then 4.
In between the second and third you have (2*4+4)-(2*4)*float_size which is 16 in decimal, so 10 in hexadecimal. This is exactly half the size of the previous because it is the size of one row (the size of 4 elements in the third row), and the previous is the size of two rows (the size of 8 elements in the first and second rows).
Arrays are linear data structures. Irrespective of their dimension, say 1-dimensional or 2-dimensional or 3-dimensional, they are linearlly arranged.
Your x[3][4] will be stored in memory as consecutive fixed sized cells like :
| (0,0) | (0, 1) | (0,2) | (0,3) | (1,0) | (1,1) | (1,2) | (1,3) | (2,0) | (2,1) | (2,2) | (2,3) |
This x[0][0] notation is matrix notation. On compile time, it is converted to pointer notation. The calculation is like:
x[i][j] = y * i + j where y in your case is 4.
So on calculating by this way the outputs are perfect.
Array elements in C are stored contiguously, in row-major order. So, in your example, &x[row][column] is exactly equal to &x[0][0]+((row*4)+column))*sizeof(float) (when those addresses are converted to number of bytes, which is what you're outputting).
The third address you're printing has the second index out of bounds (valid values 0 to 3), and the fourth has the first index out of bounds (valid values 0 to 2). It just happens that the values you've chosen work out to the same location in memory, because the rows are laid out in memory end-to-end.
There are 8 elements between &(x[0][0]) and &(x[2][0]). The actual difference in memory is multiplied by sizeof(float) which, for your compiler, is 4. 4*8 is 32 which, when printed as hex, is 0x20, is the difference you're seeing.
If you picked a value of row and column where ((row*4)+column)) was 12(=3*4) or more, your code would be computing the address of something outside the array. Attempting to use such a pointer pointer (e.g. setting the value at that address) would give undefined behaviour. You just got lucky that the indices you picked happen to be within the array.
I want to know how to find memory address for specific Index in three dimensioned array without do that by code such as i have array[5][5][6] I want to find the location in memory for index element array[2][2][2] if i use address 500 for example in memory for the first element in array.
It depends on whether the language you're using specifies row- or column-major order for it's array addressing/allocation.
Since you mentioned C in your comments, which uses the row-major scheme, I'll describe that first.
Suppose you have a 3-dimensional array, int arr[X][Y][Z]; where X,Y, and Z are natural numbers.
An example might be in a grid split over several pages, where the data of interest is has a particular page, row and column.
We're interested in the data stored at a given point, (a,b,c) (i.e. arr[a][b][c]). In a row major system, the point (a,b,c) denotes the ath page, the bth row, and the cth column.
To compute the offset of a given point in that array, using a row-major system, use the following computation:
offset = a*Y*Z + b*Z + c
So
arr[a][b][c] == *(arr + a*Y*Z + b*Z + c)
In a column-major system, the importance of the indices is reversed. So point (a,b,c) would describe the ath column in the bth row on the cth page.
Accordingly, the offset becomes:
offset = a + b*X + c*X*Y
i.e.:
arr[a][b][c] == *(arr + a + b*X + c*X*Y)
Why does any of this matter? Locality. It determines how you'll loop over the array. In a row-major system, the last index is contiguous, in that you should iterate over it first before increasing the other ones in order to not jump all over the memory space. The opposite is true in a column-major system.
Suppose we have a 6x7x8 array returned by a function: int arr[6][7][8]; Let's say that arr[0][0][0] is at address 0x1000. In a row-major system,arr[0][0][1] will be at 0x1001; immediately adjacent to arr[0][0][0].
In a column-major system, though, arr[0][0][1] will be at 0x102A, nowhere near (relatively speaking) arr[0][0][0]. If we were dealing with a large array and unaware of the majority system our language implemented, we could waste a lot of time by leaping all over the memory space unnecessarily.
I don't understand the address of a 2-dimensional array Mat struture for a given point is computed as:
addr(M_{i,j}) = M.data + M.step[0]*i + M.step[1]*j
And why???
M.step[i] >= M.step[i+1] (in fact, M.step[i] >= M.step[i+1]*M.size[i+1] )
For example, if we have a 2-dimensional array with size 5X10. The way I know how to compute the address for the point (4,7) is the following:
Address = 4 + 7*5
Could someone shed some light on it??
Best regards,
1) Address you are talking about is index in the array, not address in computer memory. For example, if you have an array that occupies memory between 10000 to 20000, than address of pixel at point (0,0) is 10000, not 0.
2) Image may have more than one channels and pixel values may use more than one byte. For example if you have matrix with 3 channels and pixels are ints (i.e. 4 bytes), than step[1] is 3x4=12 bytes. Address of pixel at (0,5) in such array will be 10000 + step[0] x 0 + 12 x 5.
3) Also your computation is missing the fact that matrix may not be continuous in memory, i.e. between end of one row and beginning of next one may be some gap. This is also incorporated in step[0].
Just a recommendation: don't bother too much with all those computations of steps. If you need to access random pixels in image use function 'at()', and if you work on the rows sequentially use 'ptr()' to get pointer to the beginning of the row. This will save you a lot of computations and potential bugs.