C - distance between 2 pointers - c

How can I know the distance in bytes between 2 pointers?
For example:
I would like to know how many bytes there are between p2 and p1 ( in this case 3) because to reach p2 with p1 I have to do 3 steps...
step1 p1 is in B
step2 p1 is in C
step3 p1 is in D
so i need that return to me 3
I'm asking this type of question because I'm implementing the lz77 algorithm

You could try with:
ptrdiff_t bytes = ((char *)p2) - ((char *)p1);
But this only works as expected if the pointers you subtract point to the same single piece of memory or within it. For example:
This will not work as expected:
char *p1 = malloc(3); // "ABC", piece 1
char *p2 = malloc(3); // "DEF", piece 2
char *p3 = malloc(3); // "GHI", piece 3
ptrdiff_t bytes = p3 - p1; // ABC ... DEF ... GHI
// ^ ^
// p1 p3
// Or:
// GHI ... ABC ... DEF
// ^ ^
// p1 p3
// Gives on my machine 32
printf("%td\n", bytes);
Because:
The malloc implementation could allocate some additional bytes for internal purposes (e.g. memory barrier). This would effect the outcome bytes.
It is not guaranteed that p1 < p2 < p3. So your result could be negative.
However this will work:
char *p1 = malloc(9); // "ABCDEFGHI", one piece of memory
char *p2 = p1 + 3; // this is within the same piece as above
char *p3 = p2 + 3; // this too
ptrdiff_t bytes = p3 - p1; // ABC DEF GHI
// ^ ^
// p1 p3
// Gives the expected 6
printf("%td\n", bytes);
Because:
The allocated 9 Bytes will always be in one piece of memory. Therefore this will always be true: p1 < p2 < p3 and since the padding/additional bytes are on the start/end of the piece subtraction will work.

Another way:
(p2-p1)*sizeof(*p1)
This works only when p1 and p2 point to memory locations that were allocated in one call to malloc family of functions.
This is valid:
int* p1 = malloc(sizeof(int)*20);
int* p2 = p1+10;
int sizeInBytes = (p2-p1)*sizeof(*p1);
This is not valid:
int* p1 = malloc(sizeof(int)*20);
int* p2 = malloc(sizeof(int)*10);
int sizeInBytes = (p2-p1)*sizeof(*p1); // Undefined behavior
Update, in response to comment by #chux
According to draft the C Standard (ISO/IEC 9899:201x):
6.5.6 Additive operators
...
9 When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements.

Related

How can an empty array be involved in a calculation?

#include <stdio.h>
int f( int *x ) {
*x = 35;
}
int main( int argc, char *argv[] ) {
int p[32];
int *q = p + 5;
f( q );
printf( "%d", p[5] );
return 0;
}
Can you please explain why the output is 35?
I tried to output the value of p by using printf("%d", p) and right after int p[32], it returned a -2077686688.
I think it just simply because I didn't assign any value to the p[32] array yet, so it just returned a random number.
However, the part that confuses me the most is *q = p + 5
How can an array do that?
Since there is no value in the p array, how can it just return its size in this expression?
In your code, int p[32] sets aside an array of size 32 that can be referenced using p. When you define q to be p + 5, you are assigning q to be a pointer to the 6th (1-indexed) element in memory starting from wherever p points to.
When you pass q to f(), the value at q is set to 35 from whatever was there before (uninitialized memory). Since q points to the same location as p[5], p[5] will be 35 since that is the value set at the location in memory by f().
C is a low-level programming language and the behavior around using memory through C variables might be a bit confusing to programmers coming from languages with higher level abstractions.
Let's break down your main function first.
int p[32];
When a function is called in your program, it gets allocated some section in RAM assigned to your process. This section is called the stack. With this statement, you're telling the compiler that your function (main) needs space for 32 integers in stack. Further statements you make with the variable p will be operating on this space reserved for the 32 integers.
Note that, you're not telling anything to the compiler on how this portion of memory assigned for p is initialized. So all these bytes allocated for 32 integers will store whatever they contained before your function is called.
Let's look at the next one.
int *q = p + 5;
This is very similar but now you are asking for some memory in stack with a size that can fit "a pointer to a integer". Pointer is the C abstraction for bare "memory address with a type". So this space will be used to store addresses in memory, and these addresses will refer to another space in RAM that is intended to store integers.
You are also telling the compiler to initialize the stack space for q, with the value of p + 5. Unlike the space for the 32 integers above (p), the space for q will be initialized right after your function is called.
The expression p + 5 is applying what is called "pointer arithmetic". This is used to take an address in RAM, and go up or down based on whatever offset we need. Remember, p was an array and arrays in C work like pointers (addresses) when they take part in pointer arithmetic. Thus, what p + 5 really means is the "address that is 5 integers after the first address p points to". This ends up being the "pointer to the sixth element of p" (first being p[0]), in other words, the address of p[5].
f(q);
In this statement, you are passing the address stored in q, which happened to be the address of the sixth element in p. The function f in return assigns 35 to the location in RAM pointed by this address, hence changing the integer that would be accessed by p[5] to the integer value of 35.
Right at this point, p[5] is the only element within p that has an initialized value. All other integers in p will continue to store what they held before main was called during the initialization of your program.
printf( "%d", p[5] );
When the execution returns back to main, the integer that can be accessed by p[5] is now set to 35, and that is exactly what you expect to see with this printf statement.
int main() {
int p[32] = { 0 };
// initializing array with zeros; no mode random values in array!
int *q = p + 5;
if (&p == &p[0] && &3[p] == &p[3]) {
printf("Sick sad C world\n");
}
/* We can say that there's no such thing as 'array' in C!
* (actually, C has arrays)
* but C arrays are 'thin layer'; try to compare JS Array and C Arrays
* See this: https://stackoverflow.com/a/381549/10972945
* So: p[0] == *(p + 0) == *p
* 'Array' is an address of it's zero element! */
printf(
"p located at %p\n"
"p + 1 located at %p\n"
"p + 5 located at %p\n"
"Size of int (in bytes) is %zu\n",
(void*) p,
(void*) (p + 1),
(void*) (p + 5),
sizeof(int)
);
/* Try to run this code and substract addresses, one from another.
p located at 0x7ffee3e04750
p + 1 located at 0x7ffee3e04754
p + 5 located at 0x7ffee3e04764
Size of int (in bytes) is 4
See:
address of (p + 1) - address of p
== 0x7ffee3e04754 - 0x7ffee3e04750
== 4 == sizeof(int)
address(p + 5) - address(p)
== 0x7ffee3e04764 - 0x7ffee3e04750
== 0x14 == 20 == 5 * sizeof(int)
*/
}

Is there an error in this code?

A question ask me if this code, contains any error.
The compiler doesn't give me any errors but this code contains some arguments that I don't know
The code is this:
int* mycalloc(int n) {
int *p = malloc(n*sizeof(int)), *q; //what does the ", *q"?
for (q=p; q<=p+n; ++q) *q = 0;
return p;
}
The possible solutions are:
The program is correct
There is an error at line 1
There is an error at line 2
There is an error at line 3
There is an error at line 4
There is no compile time error in the above code but at run time it will crash because of q<=p+n. q is simply an integer pointer.
It should be
for (q=p; q<p+n; ++q) /** it works even though n is zero or you can add seperate if condition for the same, this may be the interviewer concern **/
*q = 0;
What mymalloc is doing is allocating space for n integers and initialize
them with 0.
This could have been done so:
int *mymalloc(size_t n)
{
int *arr = malloc(n * sizeof *arr);
if(arr == NULL)
return NULL;
memset(arr, 0, n * sizeof *arr);
return arr;
}
or better
int *mymalloc(size_t n)
{
return calloc(n, sizeof int);
}
The way your function is doing this, is by looping through the array using a
pointer q. Let me explain
int *p = malloc(n*sizeof(int)), *q;
It declares two int* (pointers to int) variables p and q. p is
initialized with the values returned by malloc and q is left uninitialized.
It's the same as doing:
int *p;
int *q;
p = malloc(n*sizeof(int));
but in one line.
The next part is the interesting one:
for (q=p; q<p+n; ++q)
*q = 0;
First I corrected the condition and wrote it in two lines.
You can read this loop as follows:
initialize q with the same value as p, i.e. p and q point to the start
of the allocated memory
end the loop when q is pointing beyond the allocated memory
at the end of the loop, do ++q, which makes q go to the next int
In the loop do *q = 0, equivalent to q[0] = 0, thus setting the integer to
0 that is pointed to by q.
Let us think about the memory layout. Let's say n = 5. In my graphic ?
represents an unkown value.
BEFORE THE LOOP
b = the start of the allocated memory aka malloc return value
si = size of an integer in bytes, mostly 4
(beyond the limits)
b+0 b+1*si b+2*si b+3*si b+4*si b+5*si
+---------+---------+---------+---------+---------+
| ???? | ???? | ???? | ???? | ???? |
+---------+---------+---------+---------+---------+
^
|
p
In the first loop, q is set to p and *q = 0 is executed. It's the same as
doing p[0] = 0.
FIRST ITERATION
b = the start of the allocated memory aka malloc return value
si = size of an integer in bytes, mostly 4
(beyond the limits)
b+0 b+1*si b+2*si b+3*si b+4*si b+5*si
+---------+---------+---------+---------+---------+
| 0 | ???? | ???? | ???? | ???? |
+---------+---------+---------+---------+---------+
^
|
p,q
This is how the memory would look like after *q=0. Then the next loop is
executed, but before of that q++ is executed
BEFORE SECOND ITERATION, `q++`
b = the start of the allocated memory aka malloc return value
si = size of an integer in bytes, mostly 4
(beyond the limits)
b+0 b+1*si b+2*si b+3*si b+4*si b+5*si
+---------+---------+---------+---------+---------+
| 0 | ???? | ???? | ???? | ???? |
+---------+---------+---------+---------+---------+
^ ^
| |
p q
Now *q = 0 is executed, which is the same as p[1] = 0:
SECOND ITERATION,
b = the start of the allocated memory aka malloc return value
si = size of an integer in bytes, mostly 4
(beyond the limits)
b+0 b+1*si b+2*si b+3*si b+4*si b+5*si
+---------+---------+---------+---------+---------+
| 0 | 0 | ???? | ???? | ???? |
+---------+---------+---------+---------+---------+
^ ^
| |
p q
Then the loop continues, and you get now the point. This is why in your code the
condition of the loop q <= p+n is wrong, because it would do 1 step farther
than it needs and would write a 0 beyond the limits.
You loop is using pointer arithmetic. Pointer arithmetic is similar to regular
arithmetic (i.e. addition, subtraction with natural numbers), but it takes
the size of the object in consideration.
Consider this code
int p[] = { 1, 2, 3, 4, 5};
int *q = p;
p is an array of int of dimension 5. The common size of int is 4, that
means that the array q needs 20 bytes of memory. The first 4 bytes are for
p[0], the next 4 for p[1], etc. q is a pointer to int pointing at the
first element of the array p. In fact this code is equivalent to
int p[] = { 1, 2, 3, 4, 5};
int *q = &(p[0]);
That is what people call array decay, meaning that you can access the array as
if where a pointer. For pointer arithmetic there is almost no distinction
between the two.
What is pointer arithmetic then?
This: p+2. This will get you a pointer that is 2 spaces after p. Note that
I'm using the word space and not byte, and that's because depending on the type
of p, the number of bytes will be different. Mathematically what the compiler
is doing is calculating the address from
address where p is pointing + 2x(number of bytes for an int)
because the compiler knows the type of the pointer.
That's why you can also have expressions like p++ when p is a pointer. It is
doing p = p + 1 which is p = &(p[1]);.
b is the base address where the memory starts
memory
address b+0 b+1*si b+2*si b+3*si b+4*si
+---------+---------+---------+---------+---------+
| 1 | 2 | 3 | 4 | 5 |
+---------+---------+---------+---------+---------+
p p+1 p+2 p+3 p+4
pointer
(pointer arithmetic)
import cv2
import numpy as np
import scipy.ndimage
from sklearn.externals import joblib
from tools import *
#from ml import *
import argparse
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import confusion_matrix
from sklearn.externals import joblib
from sklearn import svm
import numpy as np
import os
import cv2
parser = argparse.ArgumentParser()
parser.add_argument('--mode', '-mode', help="Mode : train or predict", type=str)
parser.add_argument('--a', '-algorithm', help="algorithm/model name", type=str)
parser.add_argument('--i', '-image', help="licence plate to read", type=str)
parser.add_argument('--model', '-model', help="Model file path", type=str)
#parser.add_argument('--d', '-dataset', help="dataset folder path", type=str)

Pointer address in array of int pointers

I'm quite new in C language, so this "problem" is very confusing for me.
I wanted to create 2D array using array of int pointers (rows) which points to arrays of ints (columns) in one block of memory. I did it and it works but I'm not sure why after I checked something.
I've used malloc to allocate 48 bytes (2x4 array) in the heap (I'm on x86-64 machine):
int **a;
a = (int **)malloc(sizeof(int*) * 2 + sizeof(int) * 2 * 4);
Now lets assume that this is the whole 48 bytes in memory. I wanted 2 row's array so I needed 2 pointers to arrays of ints - a[0], a[1]:
----------------------------------------------------------------
| a[0] | a[1] | |
----------------------------------------------------------------
^
|
I assumed that all pointers are 8 bytes long and that address of a[2] (arrow) is the place where I can start storing my values (arrays of ints). So I did...
int *addr = (int*)&a[2];
a[0] = addr;
addr += 4;
a[1] = addr;
This is working perfectly fine, I can easily fill and print 2D array. Problem is that when I was writing int *addr = (int*)&a[2]; I was sure that this will be the address of a[0] plus 2 * 8 bytes, but it wasn't. I've checked it at another example with this simple code:
int *p;
int **k;
p = (int*) malloc(30);
k = (int**) malloc(30);
printf("&p = %p %p %p\n", &p[0], &p[1], &p[2]);
printf("&k = %p %p %p\n", &k[0], &k[1], &k[2]);
Output:
&p = 0x14d8010 0x14d8014 0x14d8018 <-- ok (int = 4 bytes)
&k = 0x14d8040 0x14d8048 0x14d8050 <-- something wrong in my opinion (ptrs = 8 bytes)
My question is: Why the third address of the pointer in array is 0x14d8050 not 0x14d8056. I think it might be because 0x14d8056 is not the best address for ints but why is that and why it happens only when dealing with array of pointers?? I've checked this on x86 machine and pointer has "normal" values
&p = 0x8322008 0x832200c 0x8322010
&k = 0x8322030 0x8322034 0x8322038
I know this might be an obvious or even stupid question for someone so please at least share some links with information about this behavior. Thank you.
Numbers prefixed by 0x are represented in hexa decimal.
Thus, 0x14d8048 + 8 == 0x14d8050 is expected.
as timrau said in his comment 0x14d8048 + 8 is not 0x14d8056 but 0x14d8050 because it's hexadecimal
concerning your 2D array , I'm not sure why it worked but that's not the way to create one.
there are two ways for creating a 2D array , the first and simple one is " statically " and it goes like this : int a[2][4]; .
the second one , the one you tried , is dynamically , the slightly more complicated and it goes like this
int **a;
int i;
a = malloc(2 * sizeof(*int));
for(i = 0 ; i < 2 ; i++)
a[i] = malloc(4 * sizeof(int));

Adding `int` to address causes int to be added 4 times

For a course about the functioning of operating systems, we had to write a malloc/free implementation for a specific sized struct. Our idea was to store the overhead, like the start and end of the specified (static) memory block our code has to work in, in the first few addresses of that block.
However, something went wrong with the calculation of the last memory slot; We're adding the size of all usable memory slots to the address of the first usable memory slot, to determine what the last slot is. However, when adding the int sizeofslots to the address of currentslot, it actually adds sizeofslots 4 times to this address. Here's the relevant code:
/* Example memory block*/
/* |------------------------------------------------------------------------------|*/
/* | ovr | 1 t | 0 | 1 t | 1 t | 1 t | 0 | 0 | 0 | 0 | 0 | 0 | 0 |*/
/* |------------------------------------------------------------------------------|*/
/* ovr: overhead, the variables `currentslot`, `firstslot` and `lastslot`.
* 1/0: Whether or not the slot is taken.
* t: the struct
*/
/* Store the pointer to the last allocated slot at the first address */
currentslot = get_MEM_BLOCK_START();
*currentslot = currentslot + 3*sizeof(void *);
/* The first usable memory slot after the overhead */
firstslot = currentslot + sizeof(void *);
*firstslot = currentslot + 3*sizeof(void *);
/* The total size of all the effective memory slots */
int sizeofslots = SLOT_SIZE * numslots;
/* The last usable slot in our memory block */
lastslot = currentslot + 2*sizeof(void*);
*lastslot = firstslot + sizeofslots;
printf("%p + %i = %p, became %p\n", previous, sizeofslots, previous + (SLOT_SIZE*numslots), *lastslot);
We figured it had something to do with integers being 4 bytes, but we still don't get what is happening here; Can anyone explain it?
C's pointer arithmetic always works like this; addition and subtraction is always in terms of the item being pointed at, not in bytes.
Compare it to array indexing: as you might know, the expression a[i] is equivalent to *(a + i), for any pointer a and integer i. Thus, it must be the case that the addition happens in terms of the size of each element of a.
To work around it, cast the structure pointer down to (char *) before the add.
When you add an integer to a pointer, it increments by that many strides (i.e. myPointer + x will increment by x*sizeof(x). If this didn't happen, it would be possible to have unaligned integers, which is many processor architectures is a fault and will cause some funky behaviour, to say the least.
Take the following as an example
char* foo = (char*)0x0; // Foo = 0
foo += 5; // foo = 5
short* bar = (short*)0x0; // Bar = 0; (we assume two byte shorts)
bar += 5; // Bar = 0xA (10)
int* foobar = (int*)0x0; // foobar = 0; (we assume four byte ints)
foobar += 2; // foobar = 8;
char (*myArr)[8]; // A pointer to an array of chars, 8 size
myArr += 2; // myArr = 0x10 (16). This is because sizeof(char[8]) = 8;
Example
const int MAX = 3;
int main ()
{
int var[] = {10, 100, 200};
int i, *ptr;
/* let us have array address in pointer */
ptr = var;
for ( i = 0; i < MAX; i++)
{
printf("Address of var[%d] = %x\n", i, ptr );
printf("Value of var[%d] = %d\n", i, *ptr );
/* move to the next location */
ptr++;
}
return 0;
}
Output::
Address of var[0] = bfb7fe3c
Value of var[0] = 10
Address of var[1] = bfb7fe40
Value of var[1] = 100
Address of var[2] = bfb7fe44
Value of var[2] = 200
You can deduce from the example that, a pointer increments itself by "Number Of Bytes" = "Size of the type it is pointing to". Here it is, Number Of bytes = sizeof(int). Similarly, it will increment itself 1 byte in case of char.

pointer to pointer to structure with malloc and strdup

My main intention is to pass a pointer to a structure, to a function, which will allocate memory and fill all its values. After returning back i will print it on screen.
Structure looks like this.
struct isolock{
char *name;
unsigned int state;};
typedef struct isolock lock;
Please note i can not change declaration of this structure. My main function is very simple like this.
int main(){
int i = 0;
isolock *lock = NULL;
fillArray(&lock);
//print struct contents
for(i=0;(lock+i)->name;i++){
printf("name = %s status = %u \n", (lock+i)->name,(lock+i)->state);
}
}
fillArray function is suppose to allocate n + 1 memory blocks for my structure where n is the actual number of locks present. whereas last (n+1)th block will be filled with zero. So that while printing from main, i can just check for this condition, without need of worrying about length of structures which i have allocated.
My problem is in fillArray function.
static const char *array[] = {"SWMGMT_LOCK","OTHER_LOCK"};
void fillArray(isolock **dlock){
int i = 0;
if (*dlock == NULL){
*dlock = (isolock *)malloc((sizeof (*dlock)) * 3); //allocate space for holding 3 struct values
for(i=0;i<2;i++){
(*(dlock) + i)->name = strdup(array[i]);
(*(dlock) + i)->state = (unsigned int)(i+1);
}
(*(dlock) + i)->name = 0; //LINE100
(*(dlock) + i)->state = 0;
}
}
o/p:
name = status = 1
name = OTHER_FP_LOCK status = 2
Actually this LINE100 causes problem. It actually replaces already filled structure entry i.e., this line makes dlock[0]->name to be filled with 0. whereas dlock[1] remains unchanged.
My gdb log shows something like this.
All these are logs are taken after allocation and filling values.
(gdb) p (*(dlock)+0)
$10 = (isolock *) 0x804b008
(gdb) p (*(dlock)+1)
$11 = (isolock *) 0x804b010
(gdb) p (*(dlock)+2) <== Note here 1
$12 = (isolock *) 0x804b018
(gdb) p *(*(dlock)+0)
$13 = {name = 0x804b018 "SWMGMT_LOCK", state = 1} <== Note here 2
(gdb) p *(*(dlock)+1)
$14 = {name = 0x804b028 "OTHER_FP_LOCK", state = 2}
(gdb) n
33 (*(dlock) + i)->state = 0;
(gdb)
35 }
(gdb) p *(*(dlock)+2)
$15 = {name = 0x0, state = 0}
(gdb) p (*(dlock)+2)
$16 = (isolock *) 0x804b018
From note 1 and note 2 its very clear that, strdup has returned already allocated memory location by malloc i.e., first call to strdup has returned the address of dlock[2]. how could this happen. because of this, (*dlock+2)->name = 0 has caused dlock[0]->name to be filled with 0.
To easily explain this problem,
Malloc has returned me three addresses. for easy understanding lets say, {1000,1008,1010}
Two times i ve called strdup which has returned {1010,1018}
this 1010 and 1018 are stored in char *name of lock[0] and lock[1] respectively.
Can someone tell me, am i doing something wrong in this code or is this problem of strdup ( allocating a already allocated block of memory)
NOTE: When i have changed char *name to char name[20] and instead of strdup, i used strcpy and it worked perfectly.
A possible cause of the error is your allocation:
*dlock = (isolock *)malloc((sizeof (*dlock)) * 3);
Here dlock is a pointer to a pointer, so sizeof(*dlock) is the size of a pointer which is not the same as sizeof(struct isolock) (or sizeof(**dlock)).

Resources