Why the value of this matrix element is unknown? - c

This is a question on my exercise book:
If we write int a[][3]={{0},{1},{2}};, the value of the element a[1][2] will be ____.
The key says its value cannot be known.
Since the statement are not granted to be written outside a function, the matrix should not be simply seen as a global variable, which initializes all elements to 0. However, I suppose the initializer {{0},{1},{2}} is equivalent to {{0,0,0},{1,0,0},{2,0,0}}, so a[1][2] should be 0. Who is wrong, the key or me?
PS: I wrote this code:
#include <stdio.h>
int main()
{
int a[][3]={{0},{1},{2}};
printf("%d",a[1][2]);
return 0;
}
And its output is exactly 0.

You are correct, the rest of the values are initialized to default values, 0 in this case.
The relevant quote from the standard:
6.7.9 Initialization
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.

Your answer is right and the key is wrong. Rest of the array members that you didn't initialize will be implicitly initialized to 0 and this is guaranteed by the C standard irrespective of whether the array is global or inside a function.
C11, 6.7.9
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.

The problem is that C has lax rules for how the braces should be interpreted, the braces do not specify how many items there are in each array. So you will end up with an array of int [3][3]; which may or may not be what you expected.
According to the rules of array initialization, the items in each array that are not initialized explicitly, will get initialized as if they had static storage duration. That is, to zero.
So you are correct and you can easily prove it by printing the raw contents of the memory, like this:
#include <stdio.h>
#include <inttypes.h>
#include <string.h>
void mem_dump (int a[3][3], size_t size);
int main()
{
int a[][3]={{0},{1},{2}};
printf("a is initialized like this:\n");
mem_dump(a, sizeof(a));
printf("\n");
int rubbish[3][3];
memset(rubbish, 0xAA, sizeof(rubbish)); // fill up with some nonsense character
memcpy(a, rubbish, sizeof(a)); // a guaranteed to contain junk.
printf("a is now filled with junk:\n");
mem_dump(a, sizeof(a));
printf("\n");
memcpy(a, (int[][3]){{0},{1},{2}}, sizeof(a)); // copy back the initialized values
printf("a now contains the initialized values once more:\n");
mem_dump(a, sizeof(a));
return 0;
}
void mem_dump (int a[3][3], size_t size)
{
for (size_t i=0; i<size; i++)
{
printf("%.2" PRIx8 " ", ((uint8_t*)a)[i] );
if( (i+1) % sizeof(int[3]) == 0) // if divisible by the size of a sub array
printf("\n");
}
}
Output:
a is initialized like this:
00 00 00 00 00 00 00 00 00 00 00 00
01 00 00 00 00 00 00 00 00 00 00 00
02 00 00 00 00 00 00 00 00 00 00 00
a is now filled with junk:
aa aa aa aa aa aa aa aa aa aa aa aa
aa aa aa aa aa aa aa aa aa aa aa aa
aa aa aa aa aa aa aa aa aa aa aa aa
a now contains the initialized values once more:
00 00 00 00 00 00 00 00 00 00 00 00
01 00 00 00 00 00 00 00 00 00 00 00
02 00 00 00 00 00 00 00 00 00 00 00

Both are right.
If you don't initialize a local non-static variable, it will have an indeterminate value. But you do initialize the a variable, that's what the "assignment" does, it initializes the variable. And if you initialize an array with less values than it's been declared to have, then the rest will be initialized to "zero".

Related

Increment pointer address and value in C

int main (void){
int num1=2;
int *pnum=NULL;
pnum=&num1;
*pnum++;
printf("%d",*pnum);
}
why does this code print the address, not the value? Doesn't * dereference the pnum?
Check the precedence of operators: postfix unary operators bind tighter than prefix unary operators, so *pnum++ is equivalent to *(pnum++), not to (*pnum)++.
pnum++ increments the pointer pnum and returns the old value of pnum. Incrementing a pointer makes it point to the next element of an array. Any variable can be treated as an array of one element, so pnum points to the element after the first in the one-element array located where num1 is, which I'll call num1_array[1]. It is valid for a pointer to point to the end of an array, i.e. one position past the last element. It is not valid to dereference that pointer: that would be an array overflow. But it is valid to calculate the pointer. Constructing an invalid pointer in C is undefined behavior, even if you don't dereference it; however this pointer is valid.
*pnum++ dereferences the old value of pnum. Since that was a pointer to num1, this expression is perfectly valid and its value is the value of num1. At this point, any halfway decent compiler would warn that the value is unused. If you didn't see this message, configure your compiler to print more warnings: unfortunately, many compilers default to accepting bad code rather than signal the badness. For example, with GCC or Clang:
$ gcc -Wall -Wextra -Werror a.c
a.c: In function ‘main’:
a.c:6:5: error: value computed is not used [-Werror=unused-value]
6 | *pnum++;
| ^~~~~~~
cc1: all warnings being treated as errors
The call to printf receives the argument *pnum. We saw before that at this point, pnum points to the end of the one-element array num1_array[1]. This pointer is valid, but since it points to the end of an object, dereferencing has undefined behavior. In practice, this usually either crashes or prints some garbage value that happens to be in a particular memory location. When you're debugging a program, there are tools that can help by making it more likely that an invalid pointer will cause a crash rather than silently using a garbage value. For example, with GCC or Clang, you can use AddressSanitizer:
$ export ASAN_OPTIONS=symbolize=1
$ gcc -Wall -Wextra -fsanitize=address a.c && ./a.out
a.c: In function ‘main’:
a.c:6:5: warning: value computed is not used [-Wunused-value]
6 | *pnum++;
| ^~~~~~~
=================================================================
==2498121==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff15ae3e74 at pc 0x55d593978366 bp 0x7fff15ae3e30 sp 0x7fff15ae3e20
READ of size 4 at 0x7fff15ae3e74 thread T0
#0 0x55d593978365 in main (/tmp/stackoverflow/a.out+0x1365)
#1 0x7f525a1380b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
#2 0x55d59397818d in _start (/tmp/stackoverflow/a.out+0x118d)
Address 0x7fff15ae3e74 is located in stack of thread T0 at offset 36 in frame
#0 0x55d593978258 in main (/tmp/stackoverflow/a.out+0x1258)
This frame has 1 object(s):
[32, 36) 'num1' (line 3) <== Memory access at offset 36 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/tmp/stackoverflow/a.out+0x1365) in main
Shadow bytes around the buggy address:
0x100062b54770: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100062b54780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100062b54790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100062b547a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100062b547b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100062b547c0: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1[04]f3
0x100062b547d0: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100062b547e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100062b547f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100062b54800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100062b54810: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==2498121==ABORTING
This trace tells you:
That there was a buffer overflow in a local variable (stack-buffer-overflow).
That the overflowing access was an attempt to read 4 bytes (READ of size 4).
Some more information about the problematic address ([32, 36) 'num1'). You can see that the program tried to access memory just after num1.
The address of the problematic instruction (#0 0x55d593978365). You can set a breakpoint there in a debugger to examine what the program might be doing.
On most platforms, given your program, num1 is a variable on the stack, and the end of num1 is the address of the previous variable on the stack. This could be anything, depending on the details of how your compiler accesses memory. One of the many things this could be is pnum, if pnum and num1 happen to have the same size on your platform (this is typically the case on 32-bit platforms) and the compiler decides to put pnum just before num1 on the stack (this depends heavily on the compiler, the optimization level, and fine details of the program). So it is plausible for your program to print the address of pnum: not because *pnum somehow didn't invoke the dereference operator, but because your program has made pnum point to itself.
Postfix operators always have higher precedence than prefix operators in C. So *pnum++ is equivalent to *(pnum++) -- it increments the pointer, not the value pointed at.
You need (*pnum)++ or ++*pnum if you want to increment the pointed at value.
You need to put parenthesis around *pnum. Otherwise the address which the pointer points to is changed. pnum++ is pointer arithmetic and increments the pointer by the number equal to the size of the data type for which it is a pointer, in this case sizeof(int). This means it doesn't point to the correct value anymore and the program therefore prints “garbage” as mentioned by Eugene Sh. in the comments, because it dereferences the incremented pointer using *(pnum++).
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int num1 = 2;
int *pnum = &num1;
(*pnum)++; /* important */
printf("%d\n", *pnum);
return EXIT_SUCCESS;
}
When you write *pnum++; There, you change the address pnum is pointing. And incrementing it by one, we don't really know where it has started to point. As suggested above, it's referencing garbage right now.

Segmentation fault(core dumped ) error while reading a python generated binary array in C [duplicate]

This question already has answers here:
Crash or "segmentation fault" when data is copied/scanned/read to an uninitialized pointer
(5 answers)
Closed 3 years ago.
I am trying to load a 2D array created by numpy and read the elements in C, but I get Segmentation fault(core dumped ) error while running it. The code goes by the lines of
#include <stdio.h>
#include <string.h>
int main(){
char *file;
FILE *input;
int N1, N2, ii, jj;
float element;
strcpy(file, "/home/caesar/Desktop/test.bin");
input = fopen(file, "rb");
fread(&N1, sizeof(int), 1, input);
fread(&N2, sizeof(int), 1, input);
float memoryarray[N1][N2];
for(ii= 0; ii<N1; ii++){
for(jj=0; jj<N2; jj++){
fread(&element, sizeof(float), 1, input);
memoryarray[ii][jj]= element;
}
}
printf("%f", memoryarray[2][3]);
fclose(input);
return 0;
}
This is the starting for a task where I will have to read elements from matrices of the form 400*400*400 or so. The idea is to read all elements from the file and store it in memory and then read from memory index wise when necessary, for example, here i am trying to access and print the element in the second row third column.
P.S: I am quite new to pointers.
Dear all, I tried the methods you said., here is the modified version of the code, the segmentation fault error is gone but the output is either all zeros, or is just plain garbage values.
I ran the executable three times and the outputs I got were
Output1: -0.000000
Output 2: 0.000000
Output 3 : -97341413674450944.000000
My array contains integers btw
Here is the modified version of the code
#include <stdio.h>
#include <string.h>
void main(){
const char file[] ="/home/caesar/Desktop/test.bin";
FILE *input;
int N1, N2, ii, jj;
float element;
//strcpy(file, "/home/caesar/Desktop/test.bin");
input = fopen(file, "r");
fread(&N1, sizeof(int), 1, input);
fread(&N2, sizeof(int), 1, input);
float memoryarray[N1][N2];
for(ii= 0; ii<N1; ii++){
for(jj=0; jj<N2; jj++){
fread(&element, sizeof(float), 1, input);
memoryarray[ii][jj]= element;
}
}
printf("%f", memoryarray[1][2]);
fclose(input);
Also here is the hex dump of the file that i am trying to open. Some of you asked me to verify whether fopen() is working or not, i checked, it is working.
00000000 00 00 40 40 00 00 40 40 01 00 00 00 00 00 00 00 |..##..##........|
00000010 02 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 |................|
*
00000030 04 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 |................|
00000040 05 00 00 00 00 00 00 00 06 00 00 00 00 00 00 00 |................|
00000050
So here is my problem in brief. I have multidimensional arrays of double precision floats written to a file using python. I want to take those files and access the elements whenever necessary by using the index of the elements to get the values. Any C code to do so would solve my problem.
Here is the python code i am using to write the file
with open('/home/caesar/Desktop/test.bin', 'wb') as myfile:
N= np.zeros(2, dtype= np.float32, order= "C")
N[0]= 3
N[1]= 3
a= [[1,2,3],[2,3,4], [4,5,6]]
N.astype(np.float32).tofile(myfile)
b= np.asarray(a)
b.tofile(myfile)
strcpy(file, "/home/caesar/Desktop/test.bin");
This writes to a garbage memory address.
You should either declare file as an array of suitable size, like this:
char file[100];
or
initialize the char pointer directly with the path like this (and get rid of the strcpy):
const char *file = "/home/caesar/Desktop/test.bin";
or the best, as per common consensus (refer comments):
fopen("/home/caesar/Desktop/test.bin", "rb");

call a vararg function with an array?

In this example below, I would like to pass to a function that receive variable number of arguments the content of an array.
In other terms, I would like to pass to printf the content of foo by value and thus, pass these arguments on the stack.
#include <stdarg.h>
#include <stdio.h>
void main()
{
int foo[] = {1,2,3,4};
printf("%d, %d, %d, %d\n", foo);
}
I know this example looks stupid because I can use printf("%d, %d, %d, %d\n", 1,2,3,4);. Just imagine I'm calling void bar(char** a, ...) instead and the array is something I receive from RS232...
EDIT
In other words, I would like to avoid this:
#include <stdarg.h>
#include <stdio.h>
void main()
{
int foo[] = {1,2,3,4};
switch(sizeof(foo))
{
case 1: printf("%d, %d, %d, %d\n", foo[0]); break;
case 2: printf("%d, %d, %d, %d\n", foo[0], foo[1]); break;
case 3: printf("%d, %d, %d, %d\n", foo[0], foo[1], foo[2]); break;
case 4: printf("%d, %d, %d, %d\n", foo[0], foo[1], foo[2], foo[3]); break;
...
}
}
I would like to pass to printf the content of foo by value and thus, pass these arguments on the stack.
You cannot pass an array by value. Not by "normal" function call, and not by varargs either (which is, basically, just a different way of reading the stack).
Whenever you use an array as argument to a function, what the called function receives is a pointer.
The easiest example for this is the char array, a.k.a. "string".
int main()
{
char buffer1[100];
char buffer2[] = "Hello";
strcpy( buffer2, buffer1 );
}
What strcpy() "sees" is not two arrays, but two pointers:
char * strcpy( char * restrict s1, const char * restrict s2 )
{
// Yes I know this is a naive implementation in more than one way.
char * rc = s1;
while ( ( *s1++ = *s2++ ) );
return rc;
}
(This is why the size of the array is only known in the scope the array was declared in. Once you pass it around, it's just a pointer, with no place to put the size information.)
The same holds true for passing an array to a varargs function: What ends up on the stack is a pointer to the (first element of) the array, not the whole array.
You can pass an array by reference and do useful things with it in the called function if:
you pass the (pointer to the) array and a count of elements (think argc / argv), or
caller and callee agree on a fixed size, or
caller and callee agree on the array being "terminated" in some way.
Standard printf() does the last one for "%s" and strings (which are terminated by '\0'), but is not equipped to do so with, as in your example, an int[] array. So you would have to write your own custom printme().
In no case are you passing the array "by value". If you think about it, it wouldn't make much sense to copy all elements to the stack for larger arrays anyway.
As already said, you cannot pass an array by value in a va_arg directly. It is possible though if it is packed inside a struct. It is not portable but one can do some things when the implementation is known.
Here an example, that might help.
void call(size_t siz, ...);
struct xx1 { int arr[1]; };
struct xx10 { int arr[10]; };
struct xx20 { int arr[20]; };
void call(size_t siz, ...)
{
va_list va;
va_start(va, siz);
struct xx20 x = va_arg(va, struct xx20);
printf("HEXDUMP:%s\n", HEXDUMP(&x, siz));
va_end(va);
}
int main(void)
{
struct xx10 aa = { {1,2,3,4,5,[9]=-1}};
struct xx20 bb = { {[10]=1,2,3,4,5,[19]=-1}};
struct xx1 cc = { {-1}};
call(sizeof aa, aa);
call(sizeof bb, bb);
call(sizeof cc, cc);
}
Will print following (HEXDUMP() is one of my debug functions, it's obvious what it does).
HEXDUMP:
0x7fff1f154160:01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00 ................
0x7fff1f154170:05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x7fff1f154180:00 00 00 00 ff ff ff ff ........
HEXDUMP:
0x7fff1f154160:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x7fff1f154170:00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x7fff1f154180:00 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 ................
0x7fff1f154190:03 00 00 00 04 00 00 00 05 00 00 00 00 00 00 00 ................
0x7fff1f1541a0:00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff ................
Tested on Linux x86_64 compiled with gcc 5.1 and Solaris SPARC9 compiled with gcc 3.4
I don't know if it is helpful, but it's maybe a start. As can be seen, using the biggest struct array in the functions va_arg allows to handle smaller arrays if the size is known.
But be careful, it probably is full of undefined behaviours (example, if you call the function with a struct array size smaller than 4 int, it doesn't work on Linux x86_64 because the struct is passed by registers, not as an array on stack, but on your embedded processor it might work).
Short answer: No, you can't do it, it's impossible.
Slightly longer answer: Well, maybe you can do it, but it's super tricky. You are basically trying to call a function with an argument list that is not known until run time. There are libraries that can help you dynamically construct argument lists and call functions with them; one library is libffi: https://sourceware.org/libffi/.
See also question 15.13 in the C FAQ list: How can I call a function with an argument list built up at run time?
See also these previous Stackoverflow questions:
C late binding with unknown arguments
How to call functions by their pointers passing multiple arguments in C?
Calling a variadic function with an unknown number of parameters
Ok look at this example, from my code. This is simple one way.
void my_printf(char const * frmt, ...)
{
va_list argl;
unsigned char const * tmp;
unsigned char chr;
va_start(argl,frmt);
while ((chr = (unsigned char)*frmt) != (char)0x0) {
frmt += 1;
if (chr != '%') {
dbg_chr(chr);
continue;
}
chr = (unsigned char)*frmt;
frmt += 1;
switch (chr) {
...
case 'S':
tmp = va_arg(argl,unsigned char const *);
dbg_buf_str(tmp,(uint16_t)va_arg(argl,int));
break;
case 'H':
tmp = va_arg(argl,unsigned char const *);
dbg_buf_hex(tmp,(uint16_t)va_arg(argl,int));
break;
case '%': dbg_chr('%'); break;
}
}
va_end(argl);
}
There dbg_chr(uint8_t byte) drop byte to USART and enable transmitter.
Use example:
#define TEST_LEN 0x4
uint8_t test_buf[TEST_LEN] = {'A','B','C','D'};
my_printf("This is hex buf: \"%H\"",test_buf,TEST_LEN);
As mentioned above, variadic argument might be passed as a struct-packed array:
void logger(char * bufr, uint32_t * args, uint32_t argNum) {
memset(buf, 0, sizeof buf);
struct {
uint32_t ar[16];
} argStr;
for(uint8_t a = 0; a < argNum; a += 1)
argStr.ar[a] = args[a];
snprintf(buf, sizeof buf, bufr, argStr);
strcat(buf, '\0');
pushStr(buf, strlen(buf));
}
tested and works with gnu C compiler

Is accessing a global array outside its bound undefined behavior?

I just had an exam in my class today --- reading C code and input, and the required answer was what will appear on the screen if the program actually runs. One of the questions declared a[4][4] as a global variable and at a point of that program, it tries to access a[27][27], so I answered something like "Accessing an array outside its bounds is an undefined behavior" but the teacher said that a[27][27] will have a value of 0.
Afterwards, I tried some code to check whether "all uninitialized golbal variable is set to 0" is true or not. Well, it seems to be true.
So now my question:
Seems like some extra memory had been cleared and reserved for the code to run. How much memory is reserved? Why does a compiler reserve more memory than it should, and what is it for?
Will a[27][27] be 0 for all environment?
Edit :
In that code, a[4][4] is the only global variable declared and there are some more local ones in main().
I tried that code again in DevC++. All of them is 0. But that is not true in VSE, in which most value are 0 but some have a random value as Vyktor has pointed out.
You were right: it is undefined behavior and you cannot count it always producing 0.
As for why you are seeing zero in this case: modern operating systems allocate memory to processes in relatively coarse-grained chunks called pages that are much larger than individual variables (at least 4KB on x86). When you have a single global variable, it will be located somewhere on a page. Assuming a is of type int[][] and ints are four bytes on your system, a[27][27] will be located about 500 bytes from the beginning of a. So as long as a is near the beginning of the page, accessing a[27][27] will be backed by actual memory and reading it won't cause a page fault / access violation.
Of course, you cannot count on this. If, for example, a is preceded by nearly 4KB of other global variables then a[27][27] will not be backed by memory and your process will crash when you try to read it.
Even if the process does not crash, you cannot count on getting the value 0. If you have a very simple program on a modern multi-user operating system that does nothing but allocate this variable and print that value, you probably will see 0. Operating systems set memory contents to some benign value (usually all zeros) when handing over memory to a process so that sensitive data from one process or user cannot leak to another.
However, there is no general guarantee that arbitrary memory you read will be zero. You could run your program on a platform where memory isn't initialized on allocation, and you would see whatever value happened to be there from its last use.
Also, if a is followed by enough other global variables that are initialized to non-zero values then accessing a[27][27] would show you whatever value happens to be there.
Accessing an array out of bounds is undefined behavior, which means the results are unpredictable so this result of a[27][27] being 0 is not reliable at all.
clang tell you this very clearly if we use -fsanitize=undefined:
runtime error: index 27 out of bounds for type 'int [4][4]'
Once you have undefined behavior the compiler can really do anything at all, we have even seen examples where gcc has turned a finite loop into an infinite loop based on optimizations around undefined behavior. Both clang and gcc in some circumstances can generate and undefined instruction opcode if it detects undefined behavior.
Why is it undefined behavior, Why is out-of-bounds pointer arithmetic undefined behaviour? provides a good summary of reasons. For example, the resulting pointer may not be a valid address, the pointer could now point outside the assigned memory pages, you could be working with memory mapped hardware instead of RAM etc...
Most likely the segment where static variables are being stored is much larger then the array you are allocating or the segment that you are stomping though just happens to be zeroed out and so you are just lucky in this case but again completely unreliable behavior. Most likely your page size is 4k and access of a[27][27] is within that bound which is probably why you are not seeing a segmentation fault.
What the standard says
The draft C99 standard tell us this is undefined behavior in section 6.5.6 Additive operators which covers pointer arithmetic which is what an array access comes down to. It says:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
[...]
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined. If the result points one past the last element of the
array object, it shall not be used as the operand of a unary *
operator that is evaluated.
and the standards definition of undefined behavior tells us that the standard imposes no requirements on the behavior and notes possible behavior is unpredictable:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, [...]
Here is the quote from the standard, that specifies what is undefined behavior.
J.2 Undefined behavor
An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]) (6.5.6).
Addition or subtraction of a pointer into, or just beyond, an array object and an
integer type produces a result that points just beyond the array object and is used as
the operand of a unary * operator that is evaluated (6.5.6).
In your case you the array subscript is completely outside of the array. Depending that the value will be zero is completely unreliable.
Furthermore the behavior of entire program is in question.
If just run your code from visual studio 2012 and got result like this (different at each run):
Address of a: 00FB8130
Address of a[4][4]: 00FB8180
Address of a[27][27]: 00FB834C
Value of a[27][27]: 0
Address of a[1000][1000]: 00FBCF50
Value of a[1000][1000]: <<< Unhandled exception at 0x00FB3D8F in GlobalArray.exe:
0xC0000005: Access violation reading location 0x00FBCF50.
When you look at Modules window you see that your application module memory range is 00FA0000-00FBC000. And unless you have CRT Checks turned on nothing will control what do you do inside your memory (as long as you don't violate memory protection).
So you got 0 at a[27][27] purely by chance. When you open memory view from position 00FB8130 (a) you will probably see something like this:
0x00FB8130 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8180 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................
0x00FB8190 c0 90 45 00 b0 e9 45 00 00 00 00 00 00 00 00 00 À.E.°éE.........
0x00FB81A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB81B0 00 00 00 00 80 5c af 0f 00 00 00 00 00 00 00 00 ....€\¯.........
0x00FB81C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
..........
0x00FB8330 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x00FB8340 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ <<<<
0x00FB8350 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
.......... ^^ ^^ ^^ ^^
It's possible that with your compiler you will always get 0 for that code because of how it uses memory, but just few bytes away you can find another variable.
For example with memory shown above a[6][0] points to address 0x00FB8190 which contains integer value of 4559040.
Then get your teacher to explain this one.
I don't know if this will work on your system but playing about with blatting memory AFTER the array a with non-zero'd bytes gives a different result for a[27][27].
On my system, when I printed contents of a[27][27] it was 0xFFFFFFFF. ie -1 converted to unsigned is all bits set in twos complement.
#include <stdio.h>
#include <string.h>
#define printer(expr) { printf(#expr" = %u\n", expr); }
unsigned int d[8096];
int a[4][4]; /* assuming an int is 4 bytes, next 4 x 4 x 4 bytes will be initialised to zero */
unsigned int b[8096];
unsigned int c[8096];
int main() {
/* make sure next bytes do not contain zero'd bytes */
memset(b, -1, 8096*4);
memset(c, -1, 8096*4);
memset(d, -1, 8096*4);
/* lets check normal access */
printer(a[0][0]);
printer(a[3][3]);
/* Now we disrepect the machine - undefined behaviour shall result */
printer(a[27][27]);
return 0;
}
This is my output:
a[0][0] = 0
a[3][3] = 0
a[27][27] = 4294967295
I saw in comments about viewing memory in Visual Studio. Easiest way is to add a break-point somewhere in your code (to halt execution) then go into Debug... windows... Memory menu, select eg Memory 1. You then find the memory address of your array a. In my case address was 0x0130EFC0. so you enter 0x0130EFC0 in the address fiend and press Enter. This shows the memory at that location.
Eg in my case.
0x0130EFC0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ..................................
0x0130EFE2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff ..............................ÿÿÿÿ
0x0130F004 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
0x0130F026 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
0x0130F048 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
The zeros are of the course the array a, which has a byte size of 4 x 4 x sizeof an int (4 in my case) = 64 bytes. The bytes from address 0x0130EFC0 are 0xFF each (from b,c, or d contents).
Note that:
0x130EFC0 + 64 = 0x130EFC0 + 0x40 = 130F000
which is that the start of all those ff bytes you see. Probably array b.
For common compilers, accessing an array beyond its bounds can give predictable results only in very special cases, and you should not rely on that. Example :
int a[4][4];
int b[4][4];
Provided there are no alignment problem, and you ask neither aggressive optimisation nor sanitization checks, a[6][1] should in reality be b[2][1]. But please never do that in production code !
On a particular system, your teacher may be correct -- that may be how your particular compiler and operating system would behave.
On a generic system (i.e. without "insider" knowledge) then your answer is correct: this is UB.
First of all C language have not boundary check. In effect it have no check at all on almost everything. This is the joy and the doom of C.
Now going back to the issue, if you overflow the memory doesn't mean that you trigger a segfault.
Lets have a closer look to how it works.
When you start a program, or enter a subroutine the processor saves on the stack the address to which return when function ends.
The stack has been initialized from OS during process memory allocation, and got a range of legal memory where you can read or write as you like, not only store return addresses.
The common practice used by compilers to create local (automatic) variables is to reserve some space on the stack, and use that space for variables. Look following well known 32 bits assembler sequence, named prologue, that you'll find on any function enter:
push ebp ;save register on the stack
mov ebp,esp ;get actual stack address
sub esp,4 ;displace the stack of 4 bytes that will be used to store a 4 chars array
considering that stack grows in the reverse direction of data, the layout of memory is:
0x0.....1C [Parameters (if any)] ;former function
0x0.....18 [Return Address]
0x0.....14 EBP
0x0.....10 0x0......x ;Local DWORD parameter
0x0.....0C [Parameters (if any)] ;our function
0x0.....08 [Return Address]
0x0.....04 EBP
0x0.....00 0, 'c', 'b', 'a' ;our string of 3 chars plus final nul
This is known as stack frame.
Now consider the string of four bytes starting at 0x0....0 and ending at 0x....3. If we write more than 3 chars in the array we will go replacing sequentially: the saved copy of EBP, the return address, parameters, local variables of previous function then its EBP, return address, etc.
The most scenographic effect we get is that, on function return, the CPU try to jump back to a wrong address generating a segfault. Same behaviour can be achieved if one of local variables are pointers, in this case we will try to read, or write, to wrong locations triggering again the segfault.
When segfault could not happen:
when the bloated variable is not on the stack, or you have so many local variables that you overwrite them without touching the return address (and they are not pointers).
Another case is that the processor reserves a guard space between local variables and return address, in this case the buffer overflow doesn't reach the address.
Another possibility is accessing array elements randomly, in this case an oversized array can exceed stack space and overflow on other data, but luckily we mdon't touch those elements that are mapped where is saved the return address (everythibng can happen...).
When we can have segfault bloating variables that are not on stack?
When overflowing array bound or pointers.
I hope these are useful info...

How can I change char array to int using C?

i am studying C now, and I am parsing a raw registry file and read it.
i have some problem now,
000011E0 00 00 00 00 60 01 00 00 B9 01 00 00 00 00 00 00
000011F0 20 C0 26 00 FF FF FF FF 00 00 00 00 FF FF FF FF
00001200 10 FC 00 00 FF FF FF FF 4C 00 01 00 00 00 00 00
this is hex value of REGISTRY file.
fseek(fp,0x11F0,SEEK_SET);
char tmp[4];
int now = ftell(fp);
for(int i = 0 ; i < 4 ; i++){
tmp[i] = fgetc(fp);
}
I made this tmp array, but I need 0x0026c020.
how can I change this array to that value? or please suggest me better algorithm.
Thanks.
If you know for a fact that the value is stored with the same endianness as the host OS architecture, you can just do:
int value = *(int *)tmp;
However, you should not read the bytes in backwards order, as you do here -- that alters the endianness and will result in an incorrect value. Try this:
int value;
if (fread(&value, sizeof(value), 1, fp) != 1) {
/* Could not read, handle error. */
}
/* value is set, inspect it */
To convert a string into integer there are already available functions one such function is
strtoul().
you can use standard strtoul() function to convert string into integer values.

Resources