Arrays and structures in C store data in memory which is contiguous. Then why is that C does not allow direct copying of arrays using "=" where as it is allowed for structure.
Example:
int a[3] = {1,2,3};
int b[3];
b = a; // why is this not allowed.
struct book b1, b2;
b1.page = 100;
b1.price = 10.0;
b2 = b1; // Why is this allowed
For the first question
You cannot directly write to an array, you can write only to the individual cells to an array.
You can use a for loop to initialize array b or memcpy(&b, &a, sizeof b);
And with the structs the compiler does the memcpy for you.
Correct me if I am wrong.
When you type : b=a , the compiler expects that you are assigning an array to b, but a is just a pointer to the location where the first element of the array is stored so there is a type mismatch.This is why printf("%d",*a); will print 1.
And as for why structures can be assigned, it is because b1 and b2 in the above example are basically variables of the datatype book and variables can be assigned.When variables are assigned the contents are copied and they don't refer to the same memory location.This example might explain what i am saying more clearly:
#include<stdio.h>
typedef struct{int a;}num;
int main()
{
num b,c;
b.a = 10;
c=b;
b.a =11;
printf("%d\n",(c.a));
return 0;
}
The output is 10. This proves that b and c in this example do not point to the same memory.hope this helps.
Assignment requires that the type and therefore size of whatever is being assigned is known to the compiler. So an assignment of form
a = b;
requires that the types of a and b are both known to the compiler. If the types are the same (e.g. both a and b are of type int) then the compiler can simply copy b into a by whatever instructions it deems are most efficient. If the types are different, but an implicit promotion or type conversion is allowed, then the assignment is also possible after doing a promotion. For example, if a is of type long and b is of type short, then b will be implicitly promoted to long and the result of that promotion stored in a.
This doesn't work for arrays, because the size of an array (calculated as the size of its elements multiplied by number of elements) is not necessarily known. One compilation unit (aka source file) may have a declaration (possibly by including a header file)
extern int a[];
extern int b[];
void some_func()
{
a = b;
}
which tells the compiler that a and b are arrays of int, but that they will be defined (which includes giving them a size) by another compilation unit. Another compilation unit may then do;
extern int a[];
int a[] = {3,1,4,2,3}; /* definition of a */
and a third compilation unit may similarly define b as an array of 27 elements.
Once the object files are linked into a single executable, the usages of a and b in all compilation units are associated, and all operations on them refer to the same definitions.
The problem with this comes about because the separate compilation model is a core feature of C. So the compiler, when chewing on the first compilation unit above, has no information about the size of the arrays since it has no visibility of other compilation units, and is required to succeed or diagnose errors without referring to them. Since there is no information about the number of elements in either array available to the first compilation unit, there is no way to work out how many elements to copy from one array to another. The handling of this in C is that the assignment a = b is a diagnosable error in the function some_func().
There are alternative approaches (and some other programming languages handle such cases differently) but they are generally associated with other trade-offs.
The considerations doesn't generally affect struct types, since their size is known at compile time. So, if a and b are of the same struct type, the assignment a = b is possible - and can be implemented by (say) a call of memcpy().
Note: I am making some deliberate over-simplification in the explanation above, such as not considering the case of structs with flexible array members (from C99). Discussing such cases would make the discussion above more complicated, without changing the core considerations.
Related
Is the following program valid? (In the sense of being well-defined by the ISO C standard, not just happening to work on a particular compiler.)
struct foo {
int a, b, c;
};
int f(struct foo *p) {
// should return p->c
char *q = ((char *)p) + 2 * sizeof(int);
return *((int *)q);
}
It follows at least some of the rules for well-defined use of pointers:
The value being loaded, is of the same type that was stored at the address.
The provenance of the calculated pointer is valid, being derived from a valid pointer by adding an offset, that gives a pointer still within the original storage instance.
There is no mixing of element types within the struct, that would generate padding to make an element offset unpredictable.
But I'm still not sure it's valid to explicitly calculate and use element pointers that way.
C is a low level programming language. This code is well-defined but probably not portable.
It is not portable because it makes assumptions about the layout of the struct. In particular, you might run into fields being 64-bit aligned on a 64bit platform where in is 32 bit.
Better way of doing it is using the offsetof marco.
The C standard allows there to be arbitrary padding between elements of a struct (but not at the beginning of one). Real-world compilers won’t insert padding into a struct like that one, but the DeathStation 9000 is allowed to. If you want to do that portably, use the offsetof() macro from <stddef.h>.
*(int*)((char*)p + offsetof(foo, c))
is guaranteed to work. A difference, such as offsetof(foo,c) - offsetof(foo, b), is also well-defined. (Although, since offsetof() returns an unsigned value, it’s defined to wrap around to a large unsigned number if the difference underflows.)
In practice, of course, use &p->c.
An expression like the one in your original question is guaranteed to work for array elements, however, so long as you do not overrun your buffer. You can also generate a pointer one past the end of an array and compare that pointer to a pointer within the array, but dereferencing such a pointer is undefined behavior.
I think it likely that at least some authors of the Standard intended to allow a compiler given something like:
struct foo { unsigned char a[4], b[4]; } x;
int test(int i)
{
x.b[0] = 1;
x.a[i] = 2;
return x.b[0];
}
to generate code that would always return 1 regardless of the value of i. On the flip side, I think it is extremely like nearly all of the Committee would have intended that a function like:
struct foo { char a[4], b[4]; } x;
void put_byte(int);
void test2(unsigned char *p, int sz)
{
for (int i=0; i<sz; i++)
put_byte(p[i]);
}
be capable of outputting all of the bytes in x in a single invocation.
Clang and gcc will assume that any construct which applies the [] operator to a struct or union member will only be used to access elements of that member array, but the Standard defines the behavior of arrayLValue[index] as equivalent to (*((arrayLValue)+index)), and would define the address of x.a's first element, which is an unsigned char*, as equivalent to the address of x, cast to that type. Thus, if code calls test2((unsigned char*)x), the expression p[i] would be equivalent to x.a[i], which clang and gcc would only support for subscripts in the range 0 to 3.
The only way I see of reading the Standard as satisfying both viewpoints would be to treat support for even the latter construct as a "quality of implementation" issue outside the Standard's jurisdiction, on the assumption that quality implementations would support constructs like the latter with or without a mandate, and there was thus no need to write sufficiently detailed rules to distinguish those two scenarios.
I want to have a variably sized struct, but I want to embed an instance of the struct with a certain size into another struct. Here's the idea:
struct grid {
size_t width, height;
int items[ /* width * height */ ];
};
struct grid_1x1 {
size_t width, height;
int items[1];
};
struct grid_holder {
struct grid_1x1 a, b;
};
int main(void)
{
struct grid_holder h = {
.a = { .width = 1, .height = 1, .items = { 0 } },
.b = { .width = 1, .height = 1, .items = { 0 } },
};
struct grid *a = (struct grid *)&h.a, *b = (struct grid *)&h.b;
}
If all my code assumes that the items member of struct grid has width * height elements, is it alright to cast a and b as I have above?
In other words, does a flexible array member with one element always have the same offset and size as a fixed-size array member with one element, given that the structs are otherwise identical? I'd like an answer based on the C99 standard. If the offsets might differ, is there another way to achieve my goal stated at the beginning?
Yes, the behavior is not defined by the C standard.
The rule in C 2018 6.5 7 or C 1999 6.5 7 about which types may be used to access an object is not just about how the objects are laid out and represented. So the sentence in the question “In other words, does a flexible array member with one element always have the same offset and size as a fixed-size array member with one element, given that the structs are otherwise identical?” is incorrect. Having the same offset and size, even having identical structure definitions, does not make structures compatible for aliasing.
Different structures are different types deliberately. Consider these two types:
typedef struct { double real, imaginary; } Complex;
typedef struct { double x, y; } Coordinates;
These structures have identical definitions (except for the member names, but the following holds even if their names were identical), but they are different and incompatible types according to the C standard. This means that in a routine such as:
double foo(Complex *a, Coordinates *b)
{
a->real = 3; a->imaginary = 4;
b->x = 5; b->y = 6;
return sqrt(a->real*a->real + a->imaginary*a->imaginary);
}
the compiler is permitted to optimize the last statement to return 5; on the basis that b->x = 5; b->y = 6; cannot have changed a because a and b cannot be pointing to the same object, or, if they are, the behavior of b->x = 5; b->y = 6; is not defined.
So the C rules about aliasing are about compatible types plus various exceptions for particular cases. They are not primarily about how structures are laid out.
In contrast to the above example with different-but-identically-defined structures, when we have multiple pointers to the same structure type, the compiler cannot assume that a and b are not aliases (different names) for the same object. In:
double foo(Complex *a, Complex *b)
{
a->real = 3; a->imaginary = 4;
b->real = 5; b->imaginary = 6;
return sqrt(a->real*a->real + a->imaginary*a->imaginary);
}
the compiler cannot assume the return value is 5 because a and b may point to the same object, in which case b->real = 5; b->imaginary = 6; changes the contents of a.
There are two separate issues you need to worry about:
The Standard allows implementations to place arbitrary amounts of padding between structure members, provided only that the total amount of padding before any structure member is affected only by the types of that member and preceding members. For this purpose, arrays of different sizes are considered different types. At least in theory, some implementations targeting weird architectures might vary the padding before an array based upon its size. For example, on a platform where addresses identify 32-bit words but there are instructions to read and write 8-bit chunks within them, an implementation given struct x1 { long l; char a,b[4], c;}; could decide to pad the start of b so the whole thing fits in a single word, even if that same implementation given struct x1 { long l; char a,b[5], c;}; would not add such padding (since parts of b would be split between two words regardless). I'm unaware of any implementations that actually do such things, but the Committee would likely expected that the only time such laxity would matter would be if compilers were being developed and used on such platforms, and in that case people working with such platforms would be better able than the Committee to judge the pros and cons of different padding approaches.
Although the Common Initial Sequence rule was by all indications intended to allow a pointer to one structure type to be used to inspect any part of a Common Initial Sequence of other structure types (such ability is documented in the 1974 C Reference Guide, and after unions were added to the language, compilers would have had to go out of their way to support such usage with unions without also supporting it with structure pointers), the authors of clang and gcc regard as broken any code that would rely upon such treatment, and actively refuse to support such code except by use of the -fno-strict-aliasing flag.
I'd regard the first issue as purely theoretical, but the second issue means that any code which would attempt to use pointer to access multiple separately-declared structure types would need to use the -fno-strict-aliasing option when building with gcc or clang. That shouldn't be a problem, but the second issue means that anyone whose code might be used with clang or gcc would need to ensure that anyone using those compilers is aware of the need for the -fno-strict-aliasing (i.e. "don't be obtuse") flag. So far as I can tell, compilers that are designed for paying customers support the constructs usefully even when using -fstrict-aliasing because supporting them is useful and not difficult, but the maintainers of gcc and clang are ideologically opposed to such support.
Does a language feature allow the compiler to check the type of a variable in memory, or is type checking based only on the keyword used for the variable type?
For example:
unsigned short n = 3;
int *p = &n;
Both int and short use 4 bytes in memory, but the compiler cannot implicitly convert from a short * to an int *. How does the compiler know that n isn't a valid address for p in this case?
Does a language feature allow the compiler to check the type of a variable in memory, or is type checking based only on the keyword used for the variable type?
This is a very confused question. The compiler is the thing that implements the language. Therefore a language feature might require the compiler to do certain things (in order to make the feature work), but it doesn't allow the compiler to do things.
A "variable in memory" is a run-time concept. The compiler is only involved at compile time: It translates code in some language (the source language) into another language (the target language, typically assembler code / machine code). It emits instructions that (when executed) reserve memory and use this memory to store values. But at run-time, when the program is actually executed, the compiler is no longer part of the picture, so it can't check anything.
In C, types are checked at compile time. The compiler knows the types of literals (e.g. 42 is an int and "hello" is a char [6]), and it knows the type of everything you declare (because it has to parse the declarations), including variables. Type checking and type conversion rules are unrelated to the sizes of types.
For example:
short int a = 42;
double b = a; // OK, even though commonly sizeof a == 2 and sizeof b == 8
On the other hand:
signed char c;
char *p = &c; // error, even though commonly char and signed char have the same
// size, representation, and range of possible values
It is perfectly possible to type-check C without actually generating any code.
Every expression has a type, ultimately derived from the types of the variables and literals that appear in it. The type of &n is unsigned short*, and that cannot be used to initialize a variable of type int*. This has nothing to do with examining memory, so it works regardless of context other than the variable types.
I want to know what is the difference between
int a;
and
struct node{
int a;
};
struct node b;
Are they the same?
No they are not same. Operations permitted by language on a and b are different as their types are different. When they contain same bit patterns, interpretation of those bit patterns may be different.
Compiler may take different path to return an int vs struct node from a function or while passing them as function argument.
Essentially a and b.a possess same behavior. Compiler may sometime choose to optimize single member structs with the type of member.
Differences
1. Compile time: Type of a and type of b are different
2. Compile time: a = 42 (OK), b = 42 (Error)
3. Run time: Compiler may choose to use different strategies while copying a and b to a different variable of same type.
4. section 6.7.2.1 in the C99 standard says There may be unnamed padding within a structure object, but not at its beginning., which means sizeof a is allowed to be not equal to the sizeof b
Extra notesSingle member structs are almost always not required except in cases when
1. Other members are conditionally compiled out. (To keep the code manageable)
2. When you plan to pass an array as function parameter, return an array from a function or copy array using assignation operator.
3. Restrict operations. [For example you don't want emp id to be added, subtracted etc but assignation is OK]
No, they are not same.
int a; --> a variable named a of type integer.
struct node b; --> a variable named b of type struct node.
here, the struct contains only int a but that's not the case always. These two variables are
indeed of different data types.
representations are different.
access methods are different.
As per your example both are same. But structure is mainly used for accessing the more than one variable using the common variable name.
struct node {
int a;
float b;
char c;
};
struct node b;
So now using the variable b we can access the three different data types. So this is the main advantage of structure.
Accessing the structure as normal variable. b.a;
If you are using the pointer then this will be b->a. Note you have to allocate the memory for that.
int array[5][3];
(obviously) creates a multi-dimensional C array of 5 by 3. However,
int x = 5;
int array[x][3];
does not. I've always thought it would. What don't I understand about C arrays? If they only allow a constant to define the length of a C array, is there a way to get around this in some way?
In ANSI C (aka C89), all array dimensions must be compile-time integer constants (this excludes variables declared as const). The one exception is that the first array dimension can be written as an empty set of brackets in certain contexts, such as function parameters, extern declarations, and initializations. For example:
// The first parameter is a pointer to an array of char with 5 columns and an
// unknown number of rows. It's equivalent to 'char (*array_param)[5]', i.e.
// "pointer to array 5 of char" (this only applies to function parameters).
void some_function(char array_param[][5])
{
array_param[2][3] = 'c'; // Accesses the (2*5 + 3)rd element
}
// Declare a global 2D array with 5 columns and an unknown number of rows
extern char global_array[][5];
// Declare a 3x2 array. The first dimension is determined by the number of
// initializer elements
int my_array[][2] = {{1, 2}, {3, 4}, {5, 6}};
C99 added a new feature called variable-length arrays (VLAs), where the first dimension is allowed to be a non-constant, but only for arrays declared on the stack (i.e. those with automatic storage). Global arrays (i.e. those with static storage) cannot be VLAs. For example:
void some_function(int x)
{
// Declare VLA on the stack with x rows and 5 columns. If the allocation
// fails because there's not enough stack space, the behavior is undefined.
// You'll probably crash with a segmentation fault/access violation, but
// when and where could be unpredictable.
int my_vla[x][5];
}
Note that the latest edition of the C standard, C11, makes VLAs optional. Objective-C is based off of C99 and supports VLAs. C++ does not have VLAs, although many C/C++ compilers such as g++ which support VLAs in their C implementation also support VLAs in C++ as an extension.
int x = 5;
int array[x][3];
Yes, it does. It's a C99 variable length array. Be sure to switch to C99 mode and be sure to have array declared at block or function scope. Variable length arrays cannot be declared at file scope.
Try:
const int x=5;
int array[x][3];
As you said x has to be a constant or else think what would happen if in the middle of the program you changed the value of x,what would be the dimension of array:(
But by declaring it constan if you change the value of x you get a compile error.