Pointer Arithmetic and Read/Write in C - c

I am writing some very low-level C code to emulate a "file system" for a homework project. The file system is made of fixed-size blocks (1024B) but a "file" is allowed to span multiple non-consecutive blocks.
I need to be able to write any kind of void *buf to a file with a function that mimics write. Let's say we have the following signature myWrite(int blockNum, void *buf, int nbytes.
Let's say nbytes is greater than 1024 so I have to write to multiple blocks. So I'd need to do something like:
int remainder = nbytes - 1024;
myWrite(firstBlock, buf, 1024);
myWrite(nextBlock, (buf + 1024), remainder);
myWrite calls the standard write function underneath, passing in similar arguments.
The problem is with the pointer arithmetic, C doesn't like when I do pointer arithmetic on a void*. I get a EINVAL or EARGS (depending on Mac vs. Linux) saying that C doesn't like the pointer I passed to the write system call since it was produced via void pointer arithmetic.
The problem is, I don't know what kind of data type I will be writing. Sometimes it's a char * and other times it's something custom like a my_type * that represents a struct from the program code.
Is there any way around this? I need to have a generalized write like this. My current implementation works most of the time, but sometimes it fails.

Stick to using void *-pointers.
When the need for byte wise pointer arithmetic arises just cast the void *-pointer to char * (as sizeof(char) is defined to be equal to 1):
int a[2] = {47, 11};
void * p = &a[0];
printf("%d\n", *((int *)p)); /* Prints 47. */
p = ((char *) p) + sizeof(a[0]); /* Increments p by sizeof(int) bytes. */
printf("%d\n", *((int *)p)); /* Prints 11. */
Also as a note,: To type integers describing amounts of memory or indexes use size_t as it is guaranteed to be wide enough to address/index everything on the machine the code is compiled for.

The problem is with the pointer arithmetic, C doesn't like when I do pointer arithmetic on a void*
That's correct: C prohibits you from adding integers to void*, because it does not know how to convert the number to a number of bytes by which to advance the pointer. Normally, the multiplier can be derived from the type of the pointer. void*, however, does not have a type, hence C prohibits the arithmetic on it.
C does not mind if you do pointer arithmetic on other pointer types where the size is known. You can convert your void* pointer to a uint8_t* pointer, a char* pointer, or any other pointer to a type where sizeof(*ptr) is equal to 1, and do the arithmetic on it right before doing the arithmetic operations:
myWrite(nextBlock, ((char*)buf) + 1024, remainder);
The functions themselves should continue taking void*.

Related

Difference between type casting (char*) and (size_t)

I was trying to find the difference of two pointers by subtraction, but one is int * and other is char *. As a result it gave me an error, as I expected, because of incompatible pointer type.
int main() {
char * ca="test";
int *ia=malloc(12);
*ia=45;
printf("add char * =%p, add int = %p \n", ca, ia);
printf("add ca-va * =%p\n", ca-ia);
return(0);
}
test3.c:22:35: error: invalid operands to binary - (have ‘char *’
and ‘int *’)
However, when I type cast int* to size_t I was successfully able to subtract the address. Can some explain what exactly size_t did here?
int main() {
char * ca="test";
int *ia=malloc(12);
*ia=45;
printf("add char * =%p, add int = %p \n", ca, ia);
printf("add ca-va * =%p\n", (ca-(size_t)ia));
return(0);
}
You have 2 problems here:
The difference between two pointer values is counted in units of the data type the pointers point to. This cannot work if you have two different data types.
Pointer arithmetics is only allowed within the same data object. You may only subtract pointers that point to the same array or to one block of dynamically allocated memory.
This is not the case in your code.
Subtracting pointers that do not match those criterias doesn't make much sense anyway.
The compiler is right to complain.
This is just pointer arithmetic.
For some pointer ptr and integer offset, ptr - offset means the address offset elements before ptr. Note that this is elements (whatever the pointer points to), not bytes. You can also use addition here. ptr[i] is shorthand for *(ptr + i).
For two pointers of the same type (e.g. both char*), ptr1 - ptr2 means the number of elements between the 2 pointers. e.g. if ptr1 - ptr2 == 5, then ptr1 + 5 == ptr2.
For two pointers of different types (e.g. char* and int*) ptr1 - ptr2 doesn't make any sense.
In your first piece of code the error occurs because you're trying to subtract pointers of different types. The second piece of code works because your cast is causing it to use the ptr - offset version. But this is is certainly not what you actually want because a pointer was converted to an offset and the result is a pointer.
What you probably want is something that Paul Hankin mentioned in a comment:
intptr_t pc = (intptr_t)ca;
intptr_t pa = (intptr_t)ia;
printf("add ca-va = %" PRIdPTR "\n", pc - pa);
This converts the pointers into integer types capable of holding an address and then does the subtraction. You will need to #include <inttypes.h> to get PRIdPTR (inttypes.h internally includes stdint.h which provides intptr_t).
size_t is an integer type. When a pointer is converted to an integer type, the result is implementation-defined (if it it can fit in the destination type; otherwise the behavior is not defined by the C standard).
Per a non-normative note in the C standard, “The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.” On machines with simple memory address schemes, the result of converting a pointer to an integer is typically the memory address. The remainder of this answer will assume we have such a C implementation.
Thus, if ca points to an array of char at address 9678, and ia points to some allocated memory at 4444, the result of converting ia to size_t would be 4444. Then, when 4444 is subtracted from ca, we are not subtracting two pointers but rather are subtracting an integer from a pointer. In general, the behavior of this is not defined by the C standard, because you are only allowed to add and subtract integers to pointers within the bounds of one array, and 4444 is far outside of ca in this example. However, what the compiler may do is simply convert the integer to the size of the pointed-to elements and then subtract the result from the address. Since ca points to char, and the size of char is one byte, converting 4444 to the size of 4444 char elements is simply 4444 bytes. Then 9678−4444 is 5234, so the result is a pointer that points to address 5234.
When you need to convert a pointer to an integer, there is a better type for this, uintptr_t, defined in the <stdint.h> header. (Comments have pointed out intptr_t, but you should use the unsigned version unless there is specific reason to use the signed version.) Then, if you convert both pointers to uintptr_t, as with (uintptr_t) ca - (uintptr_t) ia you will avoid the problem of the first pointer possibly pointing to some type whose size is not one byte. Then result on machines with flat memory address spaces will typically be the difference between the two addresses.
Since implementation-defined and undefined behavior are involved here, this is not something you can rely on, and you should not manipulate pointers this way in normal code.

Understanding code with use of the malloc() function and pointers

I'm trying to document some code to improve my knowledge of pointers and general ANSI C ability.
...
int static_store = 30;
const char * pcg = "String Literal";
int main()
{
int auto_store = 40;
char auto_string[] = "Auto char Array";
int * pi;
char * pcl;
pi = (int *) malloc(sizeof(int));
*pi = 35;
pcl = (char *) malloc(strlen("Dynamic String") + 1);
strcpy(pcl, "Dynamic String");
...
From first looks, two pointers are initialised, pi and pcl, of type int and char respectively.
But the lines after are confusing to me and I don't understand what is happening. I can see that malloc is being called to allocate memory the size of int (40). Is it assigning memory to the variables pi and pcl?
Is it assigning memory to the variables pi and pcl?
Yes malloc is allocating memory to these pointers.
pi is allocated memory equal to sizeof(int)(that may vary) and pcl has be allocated memory equal to length of string plus 1(plus 1 for null character).
From first looks, two pointers are initialised, pi and pcl, of type int and char respectively
They are declared not initialized.
Note- Please don't cast return of malloc
2 pointers are delcared (pi and pcl). At declaration they are not intialized.
pi is then pointed to a block of heap allocated memory that can hold 1 int (the size of this is platform dependant but it would usually be 4 bytes) allocated with the fucntion maloc. Somewhere this memory will have to be explicitely freed with the funtion free - failing to do so will be a memory leak.
The int value 35 is then stored at that memory location. *pi can be read as "what the pointer pi points to" it is effectively the same as pi[0].
pcl is then pointed to a block of heap allocated memory that is large enough to hold 14 char plus a '\0' char (i.e 15 bytes) using the function malloc (as above at some point this memory must be freed).
The 15 characters "Dynamic String\0" is then put in that memory using the function strcpy.
The line:
pi = (int *) malloc(sizeof(int))
Actually allocate memory for one int variable. The line afterwards, puts the value 35 into that variable.
The line:
pcl = (char *) malloc(strlen("Dynamic String") + 1)
Creates a dynamically allocated char array (which is equivalent to a string). The size of that array is of the length of the string ("Dynamic String") plus one. The next line copies the string: "Dynamic String" into the allocated array. The plus one is needed because each string in c ends with the char '\0' which is a sign for the end of a string.
The malloc function reserves a block of memory in the heap (the dynamic memory pool), and returns a pointer to the first element of that block of memory. That memory is reserved until you call free, or the program exits.
In the call
pi = (int *) malloc(sizeof(int));
malloc reserves a block of memory large enough to store a single int value, and the pointer to that memory block is assigned to pi. You do not need to cast the result of malloc1, and it's actually considered bad practice2. A better way to write that would be
pi = malloc( sizeof *pi );
The expression *pi has type int, so sizeof *pi is equivalent to sizeof (int)3. The advantage of using sizeof *pi over sizeof (int) (as well as dropping the cast) is that if you ever change the type of pi (from int * to long *, for example), you won't have to change anything in the malloc call; it will always allocate the right amount of memory regardless of the type of pi. It's one less maintenance headache to worry about.
Similarly, the call
pcl = (char *) malloc(strlen("Dynamic String") + 1);
reserves enough memory to hold the contents of "Dynamic String" (the +1 is necessary for the string terminator) and assigns the pointer to that memory to pcl. Again, this would be better written as
pcl = malloc( strlen("Dynamic String") + 1 ); // no cast
sizeof (char) is 1 by definition, so you don't need an explicit sizeof *pcl in the call above; however, if you ever decide to change the type of pcl from char * to wchar_t *, it would be good to have it in place, although you'd still have to change the string literal and change how you compute the length, so it's not maintenance-free.
The general form a malloc call is
T *p = malloc( num_elements * sizeof *p ); // where num_elements > 1
or
T *p;
...
p = malloc( num_elements * sizeof *p );
There is a similar function named calloc that will zero out the allocated memory block:
T *p = calloc( num_elements, sizeof *p );
or
T *p;
...
p = calloc( num_elements, sizeof *p );
1. In C, anyway; C++ does require the cast, since C++ does not allow implicit conversions between void * and other pointer types. But you shouldn't be using malloc in C++ code, anyway.
So why do people still cast the result of malloc in C? It's largely a holdover from pre-ANSI days (more than a quarter of a century ago), when malloc returned char *, which cannot be assigned to a different pointer type without a cast. The 1989 standard introduced the void * type, which is essentially a "generic" pointer that can be assigned to other pointer types without an explicit cast.
2. Under the C89 standard, if the compiler sees a function call without having seen a declaration for that function, it will assume that the function returns int. Thus, if you forget to include stdlib.h or otherwise don't have a declaration for malloc in scope, the compiler will assume it returns an int value and generate the machine code for the call accordingly. However, int and pointer types aren't compatible, and normally the compiler will issue diagnostic if you try to assign one to the other. By using the cast, however, you supress that diagnostic, and you may run into runtime problems later on.
The 1999 standard did away with implicit int typing for functions, so that's not really a problem anymore, but the cast is still unnecessary and leads to maintenance problems.
3. sizeof is an operator, not a function; parentheses are only required if the operand is a type name like int or double or struct blah.

Void pointers pretending to be void double pointers

I've been doing some thinking. I haven't found anything directly answering this question, but I think I know the answer; I just want some input from some more experienced persons.
Knowns:
A void pointer points to just a memory address. It includes no type information.
An int pointer points to a memory address containing an int. It will read whatever is in the memory address pointed to as an integer, regardless of what was stuffed into the address originally.
Question:
If a void double pointer void ** foo were to point to a dynamically allocated array of void pointers
void ** foo = malloc(sizeof(void *) * NUM_ELEMENTS);
is it true, as I am supposing, that because of the unique nature of void pointers actually lacking any sort of type information that instead of void ** foo an equivalent statement would be
void * bar = malloc(sizeof(void *) * NUM_ELEMENTS);
and that when I use indirection to access by assigning a specific type, such as with
(It was pointed out that I can't dereference void pointers. For clarity to the purpose of the question the next line is changed to be appropriate to that information)
int ** fubar = bar;
that I would get an appropriate pointer from the single void pointer which is just acting like a double pointer?
Or is this all just in my head?
It is permissible to assign the result of malloc to a void * object and then later assign it to an int ** object. This is because the return value of malloc has type void * anyway, and it is guaranteed to be suitable for assignment a pointer to any type of object with a fundamental alignment requirement.
However, this code:
#define NUM_ELEMENTS 1000
void *bar = malloc(sizeof(void *) * NUM_ELEMENTS);
int **fubar = bar;
*fubar = 0;
is not guaranteed by the C standard to work; it may have undefined behavior. The reason for this is not obvious. The C standard does not require different types of pointers to have the same size. A C implementation may set the size of an int * to one million bytes and the size of a void * to four bytes. In this case, the space allocated for 1000 void * would not be enough to hold one int *, so the assignment to *fubar has undefined behavior. Generally, one would implement C in such a way only to prove a point. However, similar errors are possible on a smaller scale: There are C implementations in which pointers of different types have different sizes.
A pointer to an object type may be converted to a pointer to another object type provided the pointer has alignment suitable for the destination type. If it does, then converting it back yields a pointer with the original value. Thus, you may convert pointers to void * to pointers to void and back, and you may convert pointers to void * to pointers to int * and back, provided the alignments are suitable (which they will be if the pointers were returned by malloc and you are not using custom objects with extended alignments).
In general, you cannot write using a pointer to an object type and then read the same bytes using a pointer to a different object type. This violates aliasing rules. An exception is that if one of the pointers is to a character type. Also, many C implementations do support such aliasing, but it may require setting command-line options to enable such support.
This prohibition on aliasing includes reinterpreting pointers. Consider this code:
int a;
int *b = &a;
void **c = (void **) &b;
void *d = *c;
int *e = (int *) d;
In the fourth line, c points to the bytes that b occupies but *c tries to interpret those bytes as a void *. This is not guaranteed to work, so the value that d gets is not necessarily a pointer to a, even when it is converted to int * as in the last line.
Under the C Standard, the behavior of the code you gave is undefined because you allocated an array of void pointers and then tried to use it as an array of int pointers. There is nothing in the Standard that requires these two kinds of pointer to have the same size or alignment. Now if you had said
void * bar = malloc(sizeof(int*) * NUM_ELEMENTS);
int ** fubar = bar;
Then all would be fine.
Now on the vast majority of machines, an int* and a void* will actually have the same size and alignment. So your code ought to work fine in practice.
Additionally, these two are not equivalent:
void ** foo = malloc(sizeof(void *) * NUM_ELEMENTS);
void * bar = malloc(sizeof(void *) * NUM_ELEMENTS);
This is because foo can be dereferenced at any element to get a void pointer, while bar cannot. For example, this program is correct and prints 00000000 on my 32-bit machine:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
void **a = calloc(10, sizeof(void*));
printf("%p\n", a[0]);
return 0;
}
One other point is that you seem to be thinking that type information is explicit in the pointer at the machine level. This is not true (at least for the vast majority of implementations). The type of C pointers is normally represented only while the program is being compiled. By the time compilation is done, explicit type information is normally lost except in debugging symbol tables, which are not runnable code. (There are some minor exceptions to this. And for C++ the situation is very different.)

casting of an address

You have an operating system, which has 2 functions dealing with memory allocation:
void *malloc( int sz ) // allocates a memory block sz bytes long
void free( void *addr ) // frees a memory block starting at addr
// (previously allocated by malloc)
Using these functions, implement the following 2 functions:
void *malloc_aligned( int sz ) // allocates a memory block sz bytes long,
// aligned to an address divisible by 16
void free_aligned( void *addr ) // frees a memory block starting at addr
// (previously allocated by malloc_aligned)
in the solution there is the following part:
void * aligned_malloc(size_t size){
unsigned char *res=malloc(size+16);
unsigned char offest=16-((long)res%16);
What I don't understand is: Why do we need to use unsigned char and why and what we achieve using 16-((long)res%16); and what is the purpose of (long)res in this case?
You can't do pointer arithmetic on "void *", because void has no size.
When adding to a pointer or subtracting to it, it's always done in units of sizeof(*p). Meaning - if you add one to an int pointer, its value grows by 4 (because the size of an integer is 4). So when you add to a void pointer, it should grow by the size of a void. But void has no size.
However, some compilers are willing to do arithmetic on void *, and they treat it like char *. With these compilers, you could implement these functions without casting. But it isn't standard.
Another point is that not all operators are applicable for pointers. Addition and subtraction are, but multiplication, division and modulus are not. So if you want to test the low bits of a pointer, to know if it's aligned, you cast it to a long.
Why long? The assumption is that long is as large as a pointer, which is true in Linux, but not in Windows. The right type is uintptr_t. However, if you're only interested in the low bits, it doesn't matter if you lose the high bits while casting. So a cast to int would have worked too.

Why does my homespun sizeof operator need a char* cast?

Below is the program to find the size of a structure without using sizeof operator:
struct MyStruct
{
int i;
int j;
};
int main()
{
struct MyStruct *p=0;
int size = ((char*)(p+1))-((char*)p);
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
return 0;
}
Why is typecasting to char * required?
If I don't use the char* pointer, the output is 1 - why?
Because pointer arithmetic works in units of the type pointed to. For example:
int* p_num = malloc(10 * sizeof(int));
int* p_num2 = p_num + 5;
Here, p_num2 does not point five bytes beyond p_num, it points five integers beyond p_num. If on your machine an integer is four bytes wide, the address stored in p_num2 will be twenty bytes beyond that stored in p_num. The reason for this is mainly so that pointers can be indexed like arrays. p_num[5] is exactly equivalent to *(p_num + 5), so it wouldn't make sense for pointer arithmetic to always work in bytes, otherwise p_num[5] would give you some data that started in the middle of the second integer, rather than giving you the sixth integer as you would expect.
In order to move a specific number of bytes beyond a pointer, you need to cast the pointer to point to a type that is guaranteed to be exactly 1 byte wide (a char).
Also, you have an error here:
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
You have two format specifiers but only one argument after the format string.
If I don't use the char* pointer, the output is 1 - WHY?
Because operator- obeys the same pointer arithmetic rules that operator+ does. You incremented the sizeof(MyStruct) when you added one to the pointer, but without the cast you are dividing the byte difference by sizeof(MyStruct) in the operator- for pointers.
Why not use the built in sizeof() operator?
Because you want the size of your struct in bytes. And pointer arithmetics implicitly uses type sizes.
int* p;
p + 5; // this is implicitly p + 5 * sizeof(int)
By casting to char* you circumvent this behavior.
Pointer arithmetic is defined in terms of the size of the type of the pointer. This is what allows (for example) the equivalence between pointer arithmetic and array subscripting -- *(ptr+n) is equivalent to ptr[n]. When you subtract two pointers, you get the difference as the number of items they're pointing at. The cast to pointer to char means that it tells you the number of chars between those addresses. Since C makes char and byte essentially equivalent (i.e. a byte is the storage necessary for one char) that's also the number of bytes occupied by the first item.

Resources