Pointer syntax query

Pointer syntax query - c

I can't seem to understand the difference between the following to pointer notations, can someone please guide me?
typedef struct some_struct struct_name;
struct_name this;
char buf[50];
this = *((some_struct *)(buf));
Now I tried to play around a bit and did the above thing like:
struct some_struct * this;
char buf[50];
this=(struct some_struct *)buf;
As far as I am concerned I think both the implementations should generate the same result, Can someone guide me whether there is a difference between the two and if yes can some one point it out?
Thanks.

In your first snippet, this is not a pointer, it's an instance of some_struct. The assignment you made did a shallow copy (i.e. memcpy()) of what's in buf as if it were an instance of some_struct as well.
In the second snippet, this is a pointer, and it's just pointed to the address of buf.
So, basically to sum up, first snippet this is not a pointer and the struct is copied into it. In the second, it's a pointer and assigned to the same memory as buf (i.e. not a copy).

In the second one, "this" will point to the first memory location of "buf". In the first example, you will either get a compiler error (I don't think you can assign structs in C with =, I could be wrong though), or the contents of buf (up to sizeof(struct_name)) will be copied into this, which resides on the stack.

Both approaches have their problems.
alignment: your buf might not be properly aligned for a variable of the structure type. If so this will produce undefined behavior (UB): in the best case it aborts your program, but it may make much worse things than that.
initialization: in the first cases you access uninitialized memory for reading. In the best case that gives you unspecific data, that is some random bytes. In the worst case, char is a signed integer type on your platform and you hit a trap representation for char => UB as above. (Your second case will encounter the same problem, once you try to access the object at the other end of the pointer.)
How to avoid all that:
Always initialize your variables. A simple = { 0 } should do in all cases.
never use char as a generic type for bytes but use unsigned char
never cast a byte buffer of arbitrary alignment to another data type. If needed, do it the other way round, cast a struct object to unsigned char.

Related

Dereferencing in C

I've just started to learn C so please be kind.
From what I've read so far regarding pointers:
int * test1; //this is a pointer which is basically an address to the process
//memory and usually has the size of 2 bytes (not necessarily, I know)
float test2; //this is an actual value and usually has the size of 4 bytes,
//being of float type
test2 = 3.0; //this assigns 3 to `test2`
Now, what I don't completely understand:
*test1 = 3; //does this assign 3 at the address
//specified by `pointerValue`?
test1 = 3; //this says that the pointer is basically pointing
//at the 3rd byte in process memory,
//which is somehow useless, since anything could be there
&test1; //this I really don't get,
//is it the pointer to the pointer?
//Meaning, the address at which the pointer address is kept?
//Is it of any use?
Similarly:
*test2; //does this has any sense?
&test2; //is this the address at which the 'test2' value is found?
//If so, it's a pointer, which means that you can have pointers pointing
//both to the heap address space and stack address space.
//I ask because I've always been confused by people who speak about
//pointers only in the heap context.

Great question.
Your first block is correct. A pointer is a variable that holds the address of some data. The type of that pointer tells the code how to interpret the contents of the address being held by that pointer.
The construct:
*test1 = 3
Is called the deferencing of a pointer. That means, you can access the address that the pointer points to and read and write to it like a normal variable. Note:
int *test;
/*
* test is a pointer to an int - (int *)
* *test behaves like an int - (int)
*
* So you can thing of (*test) as a pesudo-variable which has the type 'int'
*/
The above is just a mnemonic device that I use.
It is rare that you ever assign a numeric value to a pointer... maybe if you're developing for a specific environment which has some 'well-known' memory addresses, but at your level, I wouldn't worry to much about that.
Using
*test2
would ultimately result in an error. You'd be trying to deference something that is not a pointer, so you're likely to get some kind of system error as who knows where it is pointing.
&test1 and &test2 are, indeed, pointers to test1 and test2.
Pointers to pointers are very useful and a search of pointer to a pointer will lead you to some resources that are way better than I am.

It looks like you've got the first part right.
An incidental thought: there are various conventions about where to put that * sign. I prefer mine nestled with the variable name, as in int *test1 while others prefer int* test1. I'm not sure how common it is to have it floating in the middle.
Another incidental thought: test2 = 3.0 assigns a floating-point 3 to test2. The same end could be achieved with test2=3, in which case the 3 is implicitly converted from an integer to a floating point number. The convention you have chosen is probably safer in terms of clarity, but is not strictly necessary.
Non-incidentals
*test1=3 does assign 3 to the address specified by test.
test1=3 is a line that has meaning, but which I consider meaningless. We do not know what is at memory location 3, if it is safe to touch it, or even if we are allowed to touch it.
That's why it's handy to use something like
int var=3;
int *pointy=&var;
*pointy=4;
//Now var==4.
The command &var returns the memory location of var and stores it in pointy so that we can later access it with *pointy.
But I could also do something like this:
int var[]={1,2,3};
int *pointy=&var;
int *offset=2;
*(pointy+offset)=4;
//Now var[2]==4.
And this is where you might legitimately see something like test1=3: pointers can be added and subtracted just like numbers, so you can store offsets like this.
&test1 is a pointer to a pointer, but that sounds kind of confusing to me. It's really the address in memory where the value of test1 is stored. And test1 just happens to store as its value the address of another variable. Once you start thinking of pointers in this way (address in memory, value stored there), they become easier to work with... or at least I think so.
I don't know if *test2 has "meaning", per se. In principle, it could have a use in that we might imagine that the * command will take the value of test2 to be some location in memory, and it will return the value it finds there. But since you define test2 as a float, it is difficult to predict where in memory we would end up, setting test2=3 will not move us to the third spot of anything (look up the IEEE754 specification to see why). But I would be surprised if a compiler would allow such thing.
Let's look at another quick example:
int var=3;
int pointy1=&var;
int pointy2=&pointy1;
*pointy1=4; //Now var==4
**pointy2=5; //Now var==5
So you see that you can chain pointers together like this, as many in a row as you'd like. This might show up if you had an array of pointers which was filled with the addresses of many structures you'd created from dynamic memory, and those structures contained pointers to dynamically allocated things themselves. When the time comes to use a pointer to a pointer, you'll probably know it. For now, don't worry too much about them.

First let's add some confusion: the word "pointer" can refer to either a variable (or object) with a pointer type, or an expression with the pointer type. In most cases, when people talk about "pointers" they mean pointer variables.
A pointer can (must) point to a thing (An "object" in standards parlance). It can only point to the right kind of thing; a pointer to int is not supposed to point to a float object. A pointer can also be NULL; in that case there is no thing to point to.
A pointertype is also a type, and a pointer object is also an object. So it is allowable to construct a pointer to pointer: the pointer-to-pointer just stores the addres of the pointer object.
What a pointer can not be:
It cannot point to a value: p = &4; is impossible. 4 is a literal value, which is not stored in an object, and thus has no address.
the same goes for expressions: p = &(1+4); is impossible, because the expression "1+4" does not have a location.
the same goes for return value p = &sin(pi); is impossible; the return value is not an object and thus has no address.
variables marked as "register" (almost distinct now) cannot have an address.
you cannot take the address of a bitfield, basically because these can be smaller than character (or have a finer granularity), hence it would be possible that different bitmasks would have the same address.
There are some "exceptions" to the above skeletton (void pointers, casting, pointing one element beyond an array object) but for clarity these should be seen as refinements/amendments, IMHO.

Questions about typecasting

I have questions about typecasting. This is just a dummy program shown here. The actual code is too big to be posted.
typedef struct abc
{
int a;
}abc_t;
main()
{
abc_t *MY_str;
char *p;
MY_str = (abc_t *)p;
}
Whenever I run the quality analysis check tool, I get a level 2 warning:
Casting to different object pointer type. REFERENCE - ISO:C90-6.3.4 Cast Operators - Semantics <next> Msg(3:3305) Pointer cast to stricter alignment. <next>
Can anyone please tell me how to resolve this issue?

Simple - your static analysis tool (which, btw?) has decided that a char* does not have a particular alignment requirement (it could point anywhere in memory) whereas an abc_t* likely has a word alignment requirement (int must be on a 4/8 byte boundary).
In reality, as the char* is on the stack, it will be word aligned on most architectures. Your tool cannot see this.

In your implementation (and probably many others) each int must be at an address that is divisible by sizeof int, which is often 4.
On the other hand, a char can be at any address.
It's like assigning 3.25 to an int variable. That's also not possible.
So when you have a bad pointer, you will probably get an exception from your machine, and technically this code invokes undefined behavior.

a char* can be aligned on any byte boundary, which means if you cast it to a structure, the alignment requirements of that struct might not be met (such as 16 byte boundaries required for SIMD types).

Your code is invalid C. If you find yourself doing something like this, it's probably the result of a greater misunderstanding. For instance I'm guessing you want to read an abc_t object from a file/socket/etc. and you're used to passing a char pointer to the read/recv/whatever function. Instead you should just declare an object of type abc_t and pass its address to whatever reading function you're using.

Copying one structure to another

I know that I can copy the structure member by member, instead of that can I do a memcpy on structures?
Is it advisable to do so?
In my structure, I have a string also as member which I have to copy to another structure having the same member. How do I do that?

Copying by plain assignment is best, since it's shorter, easier to read, and has a higher level of abstraction. Instead of saying (to the human reader of the code) "copy these bits from here to there", and requiring the reader to think about the size argument to the copy, you're just doing a plain assignment ("copy this value from here to here"). There can be no hesitation about whether or not the size is correct.
Also, if the structure is heavily padded, assignment might make the compiler emit something more efficient, since it doesn't have to copy the padding (and it knows where it is), but mempcy() doesn't so it will always copy the exact number of bytes you tell it to copy.
If your string is an actual array, i.e.:
struct {
char string[32];
size_t len;
} a, b;
strcpy(a.string, "hello");
a.len = strlen(a.string);
Then you can still use plain assignment:
b = a;
To get a complete copy. For variable-length data modelled like this though, this is not the most efficient way to do the copy since the entire array will always be copied.
Beware though, that copying structs that contain pointers to heap-allocated memory can be a bit dangerous, since by doing so you're aliasing the pointer, and typically making it ambiguous who owns the pointer after the copying operation.
For these situations a "deep copy" is really the only choice, and that needs to go in a function.

Since C90, you can simply use:
dest_struct = source_struct;
as long as the string is memorized inside an array:
struct xxx {
char theString[100];
};
Otherwise, if it's a pointer, you'll need to copy it by hand.
struct xxx {
char* theString;
};
dest_struct = source_struct;
dest_struct.theString = malloc(strlen(source_struct.theString) + 1);
strcpy(dest_struct.theString, source_struct.theString);

If the structures are of compatible types, yes, you can, with something like:
memcpy (dest_struct, source_struct, sizeof (*dest_struct));
The only thing you need to be aware of is that this is a shallow copy. In other words, if you have a char * pointing to a specific string, both structures will point to the same string.
And changing the contents of one of those string fields (the data that the char * points to, not the char * itself) will change the other as well.
If you want a easy copy without having to manually do each field but with the added bonus of non-shallow string copies, use strdup:
memcpy (dest_struct, source_struct, sizeof (*dest_struct));
dest_struct->strptr = strdup (source_struct->strptr);
This will copy the entire contents of the structure, then deep-copy the string, effectively giving a separate string to each structure.
And, if your C implementation doesn't have a strdup (it's not part of the ISO standard), get one from here.

You can memcpy structs, or you can just assign them like any other value.
struct {int a, b;} c, d;
c.a = c.b = 10;
d = c;

In C, memcpy is only foolishly risky. As long as you get all three parameters exactly right, none of the struct members are pointers (or, you explicitly intend to do a shallow copy) and there aren't large alignment gaps in the struct that memcpy is going to waste time looping through (or performance never matters), then by all means, memcpy. You gain nothing except code that is harder to read, fragile to future changes and has to be hand-verified in code reviews (because the compiler can't), but hey yeah sure why not.
In C++, we advance to the ludicrously risky. You may have members of types which are not safely memcpyable, like std::string, which will cause your receiving struct to become a dangerous weapon, randomly corrupting memory whenever used. You may get surprises involving virtual functions when emulating slice-copies. The optimizer, which can do wondrous things for you because it has a guarantee of full type knowledge when it compiles =, can do nothing for your memcpy call.
In C++ there's a rule of thumb - if you see memcpy or memset, something's wrong. There are rare cases when this is not true, but they do not involve structs. You use memcpy when, and only when, you have reason to blindly copy bytes.
Assignment on the other hand is simple to read, checks correctness at compile time and then intelligently moves values at runtime. There is no downside.

You can use the following solution to accomplish your goal:
struct student
{
char name[20];
char country[20];
};
void main()
{
struct student S={"Wolverine","America"};
struct student X;
X=S;
printf("%s%s",X.name,X.country);
}

You can use a struct to read write into a file.
You do not need to cast it as a `char*.
Struct size will also be preserved.
(This point is not closest to the topic but guess it:
behaving on hard memory is often similar to RAM one.)
To move (to & from) a single string field you must use strncpy
and a transient string buffer '\0' terminating.
Somewhere you must remember the length of the record string field.
To move other fields you can use the dot notation, ex.:
NodeB->one=intvar;
floatvar2=(NodeA->insidebisnode_subvar).myfl;
struct mynode {
int one;
int two;
char txt3[3];
struct{char txt2[6];}txt2fi;
struct insidenode{
char txt[8];
long int myl;
void * mypointer;
size_t myst;
long long myll;
} insidenode_subvar;
struct insidebisnode{
float myfl;
} insidebisnode_subvar;
} mynode_subvar;
typedef struct mynode* Node;
...(main)
Node NodeA=malloc...
Node NodeB=malloc...
You can embed each string into a structs that fit it,
to evade point-2 and behave like Cobol:
NodeB->txt2fi=NodeA->txt2fi
...but you will still need of a transient string
plus one strncpy as mentioned at point-2 for scanf, printf
otherwise an operator longer input (shorter),
would have not be truncated (by spaces padded).
(NodeB->insidenode_subvar).mypointer=(NodeA->insidenode_subvar).mypointer
will create a pointer alias.
NodeB.txt3=NodeA.txt3
causes the compiler to reject:
error: incompatible types when assigning to type ‘char[3]’ from type ‘char *’
point-4 works only because NodeB->txt2fi & NodeA->txt2fi belong to the same typedef !!
A correct and simple answer to this topic I found at
In C, why can't I assign a string to a char array after it's declared?
"Arrays (also of chars) are second-class citizens in C"!!!

Which of these options is good practice for assigning a string value to a variable in C?

I have this snippet of C code:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
typedef struct Date {
int date;
char* month;
int year;
} Date_t;
typedef Date_t* pDate_t;
void assignMonth(pDate_t birth)
{
//1)
birth->month = "Nov";
//2)
//birth->month = malloc(sizeof(char) * 4);
//birth->month = strcpy(birth->month, "Nov");
}
int main()
{
Date_t birth;
birth.date = 13;
assignMonth(&birth);
birth.year = 1969;
printf("%d %s %d\n",birth.date, birth.month, birth.year);
return 0;
}
In the function assignMonth I have two possibilities for assigning month. Both give me the same result in the output, so what is the difference between them? I think that the second variant is the good one, am I wrong? If yes, why? If not, why?
Thanks in advance for any help.
P.S. I'm interested in what is going on in memory in both cases.

It depends on what you want to do with birth.month later. If you have no intention of changing it, then the first is better (quicker, no memory cleanup requirement required, and each Date_t object shares the same data). But if that is the case, I would change the definition of month to const char *. In fact, any attempt to write to *birth.month will cause undefined behaviour.
The second approach will cause a memory leak unless you remember to free(birth.month) before birth goes out of scope.

You're correct, the second variant is the "good" one.
Here's the difference:
With 1, birth->month ends up pointing to the string literal "Nov". It is an error to try to modify the contents of birth->month in this case, and so birth->month should really be a const char* (many modern compilers will warn about the assignment for this reason).
With 2, birth->month ends up pointing to an allocated block of memory whose contents are "Nov". You are then free to modify the contents of birth->month, and the type char* is accurate. The caveat is that you are also now required to free(birth->month) in order to release this memory when you are done with it.
The reason that 2 is the correct way to do it in general, even though 1 seems simpler in this case, is that 1 in general is misleading. In C, there is no string type (just sequences of characters), and so there is no assignment operation defined on strings. For two char*s, s1 and s2, s1 = s2 does not change the string value pointed to by s1 to be the same as s2, it makes s1 point at exactly the same string as s2. This means that any change to s1 will affect the contents of s2, and vice-versa. Also, you now need to be careful when deallocating that string, since free(s1); free(s2); will double-free and cause an error.
That said, if in your program birth->month will only ever be one of several constant strings ("Jan", "Feb", etc.) variant 1 is acceptable, however you should change the type of birth->month to const char* for clarity and correctness.

Neither is correct. Everyone's missing the fact that this structure is inherently broken. Month should be an integer ranging from 1 to 12, used as an index into a static const string array when you need to print the month as a string.

I suggest either:
const char* month;
...
birth->month = "Nov";
or:
char month[4];
...
strcpy(birth->month, "Nov");
avoiding the memory allocation altogether.

With option 1 you never allocate memory to store "Nov", which is okay because it's a static string. A fixed amount of memory was allocated for it automatically. This will be fine so long as it's a string that appears literally in the source and you never try to modify it. If you wanted to read a value in from the user, or from a file, then you'd need to allocate first.

Seperate to your question; why is struct Date typedefed? it has a type already - "struct Date".
You can use an incomplete type if you want to hide the structure decleration.
In my experience, people seem to typedef because they think they should - without actually thinking about the effect of doing so.

In the first case your cannot do something like birth->month[i]= 'c'. In other words you cannot modify the string literal "Mov" pointed to by birth->month because it is stored in the read only section of memory.
In the second case you can modify the contents of p->month because "Mov" resides on the heap. Also you need to deallocate the allocated memory using free in this case.

For this example, it doesn't matter too much. If you have a lot of Date_t variables (in, say, an in-memory database), the first methow will lead to less memory usage over-all, with the gotcha that you should not, under any circumstances, change any of the characters in the strings, as all "Nov" strings would be "the same" string (a pointer to the same 4 chars).
So, to an extent, both variants are good, but the best one would depend on expected usage pattern(s).

Pointer initialization and string manipulation in C

I have this function which is called about 1000 times from main(). When i initialize a pointer in this function using malloc(), seg fault occurs, possibly because i did not free() it before leaving the function. Now, I tried free()ing the pointer before returning to main, but its of no use, eventually a seg fault occurs.
The above scenario being one thing, how do i initialize double pointers (**ptr) and pointer to array of pointers (*ptr[])?
Is there a way to copy a string ( which is a char array) into an array of char pointers.
char arr[]; (Lets say there are fifty such arrays)
char *ptr_arr[50]; Now i want point each such char arr[] in *ptr_arr[]
How do i initialize char *ptr_arr[] here?
What are the effects of uninitialized pointers in C?
Does strcpy() append the '\0' on its own or do we have to do it manually? How safe is strcpy() compared to strncpy()? Like wise with strcat() and strncat().
Thanks.

Segfault can be caused by many things. Do you check the pointer after the malloc (if it's NULL)? Step through the lines of the code to see exactly where does it happen (and ask a seperate question with more details and code)
You don't seem to understand the relation of pointers and arrays in C. First, a pointer to array of pointers is defined like type*** or type**[]. In practice, only twice-indirected pointers are useful. Still, you can have something like this, just dereference the pointer enough times and do the actual memory allocation.
This is messy. Should be a separate question.
They most likely crash your program, BUT this is undefined, so you can't be sure. They might have the address of an already used memory "slot", so there might be a bug you don't even notice.

From your question, my advice would be to google "pointers in C" and read some tutorials to get an understanding of what pointers are and how to use them - there's a lot that would need to be repeated in an SO answer to get you up to speed.
The top two hits are here and here.

It's hard to answer your first question without seeing some code -- Segmentation Faults are tricky to track down and seeing the code would be more straightforward.
Double pointers are not more special than single pointers as the concepts behind them are the same. For example...
char * c = malloc(4);
char **c = &c;
I'm not quite sure what c) is asking, but to answer your last question, uninitialized pointers have undefined action in C, ie. you shouldn't rely on any specific result happening.
EDIT: You seem to have added a question since I replied...
strcpy(..) will indeed copy the null terminator of the source string to the destination string.

for part 'a', maybe this helps:
void myfunction(void) {
int * p = (int *) malloc (sizeof(int));
free(p);
}
int main () {
int i;
for (i = 0; i < 1000; i++)
myfunction();
return 0;
}
Here's a nice introduction to pointers from Stanford.

A pointer is a special type of variable which holds the address or location of another variable. Pointers point to these locations by keeping a record of the spot at which they were stored. Pointers to variables are found by recording the address at which a variable is stored. It is always possible to find the address of a piece of storage in C using the special & operator. For instance: if location were a float type variable, it would be easy to find a pointer to it called location_ptr
float location;
float *location_ptr,*address;
location_ptr = &(location);
or
address = &(location);
The declarations of pointers look a little strange at first. The star * symbol which stands in front of the variable name is C's way of declaring that variable to be a pointer. The four lines above make two identical pointers to a floating point variable called location, one of them is called location_ptr and the other is called address. The point is that a pointer is just a place to keep a record of the address of a variable, so they are really the same thing.
A pointer is a bundle of information that has two parts. One part is the address of the beginning of the segment of memory that holds whatever is pointed to. The other part is the type of value that the pointer points to the beginning of. This tells the computer how much of the memory after the beginning to read and how to interpret it. Thus, if the pointer is of a type int, the segment of memory returned will be four bytes long (32 bits) and be interpreted as an integer. In the case of a function, the type is the type of value that the function will return, although the address is the address of the beginning of the function executable.
Also get more tutorial on C/C++ Programming on http://www.jnucode.blogspot.com

You've added an additional question about strcpy/strncpy.
strcpy is actually safer.
It copies a nul terminated string, and it adds the nul terminator to the copy. i.e. you get an exact duplicate of the original string.
strncpy on the other hand has two distinct behaviours:
if the source string is fewer than 'n' characters long, it acts just as strcpy, nul terminating the copy
if the source string is greater than or equal to 'n' characters long, then it simply stops copying when it gets to 'n', and leaves the string unterminated. It is therefore necessary to always nul-terminate the resulting string to be sure it's still valid:
char dest[123];
strncpy(dest, source, 123);
dest[122] = '\0';