This might be a bit long question. I was testing some character arrays in C and so came along this code.
char t[10];
strcpy(t, "abcd");
printf("%d\n", strlen(&t[5]));
printf("Length: %d\n", strlen(t));
Now apparently strlen(&t[5]) yields 3 while strlen(t) returns 4.
I know that string length is 4, this is obvious from inserting four characters. But why does strlen(&t[5]) return 3?
My guess is that
String: a | b | c | d | 0 | 0 | 0 | 0 | 0 | \0
Position: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
strlen(&t[5]) looks at the length of a string composed of positions 6, 7 and 8 (because the 10th character is a NULL terminating character, right)?
OK, then I did some experimentation and modified a code a bit.
char t[10];
strcpy(t, "abcdefghij");
printf("%d\n", strlen(&t[5]));
printf("Length: %d\n", strlen(t));
Now this time strlen(&t[5]) yields 5 while strlen(t) is 10, as expected. If I understand character arrays correctly, the state should now be
String: a | b | c | d | e | f | g | h | i | j | '\0'
Position: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
so why does strlen(&t[5]) return 5 this time? I've declared a character array of length 10, should then, by the same logic applied above, the result be 4?
Also shouldn't I be running into some compiler errors since the NULL terminating character is actually in the 11th spot? I'm new into C and would very much appreciate anyone's help.
First let me tell you, your "assumption"
String: a | b | c | d | 0 | 0 | 0 | 0 | 0 | \0
Position: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
is not correct. Based on your code, The values are only "guaranteed" up to index 4, not beyond that.
For the first case, in your code
printf("%d\n", strlen(&t[5]));
is wrong for various reasons,
you ought to use %zu for a size_t type.
&t[5] does not point to a valid string.
Any (or both) of the above causes undefined behavior and any output cannot be justified.
To elaborate, with a defintion like
char t[10];
strcpy(t, "abcd");
you have index 0 to 3 populated for t, and index 4 holds the null-terminator. The content of t[5] onward, is indeterminate.
Thus, &t[5] is not a pointer to the first element of a string, so cannot be used argument to strlen().
It may run out of bound in search of the null-terminator and experience invalid memory access and, as a side-effect, produce a segmentation fault,
It may find a null-terminator (just another garbage value) within the bound and report a "seemingly" valid length.
Both are equally likely and unlikely, really. UB is UB, there's not justifying it.
Then, for the second case, where you say
char t[10];
strcpy(t, "abcdefghij");
is once again, accessing memory out of bound.
You have all together 10 array elements to store a string, so you can have 9 other char elements, plus one null-terminator (to qualify the char array as a string).
However, you're attempting to put 10 char elements, plus a null character (in strcpy()), so you're off-by-one, accessing out of bound memory, invoking UB.
char t[10]; is not initialized so it just contains garbage values 1). strcpy(t, "abcd"); overwrites the first 5 characters with the string "abcd" and a null terminator.
However, &t[5] points at the first character after the null termination, which remains garbage. If you invoke strlen from there, anything can happen, since the pointer passed is not likely pointing at a null terminated string.
1) Garbage = indeterminate values. Assuming a sane 2's complement system, the address of the buffer t is taken, so the code does not invoke undefined behavior until the point where strlen starts reading outside the bounds of the array t. Reference.
Problem 1:
My guess is that
String: a | b | c | d | 0 | 0 | 0 | 0 | 0 | \0
Position: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
This assumption is wrong.
The array is not initialized to hold 0 values but contains some "random" garbage.
After copying "abcd" the upper half of the array (t[5] etc.) is still untouched resulting in a "random" length of the string due to undefined behaviour.
Problem 2:
If I understand character arrays correctly, the state should now be
String: a | b | c | d | e | f | g | h | i | j | '\0'
Position: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
Again wrong.
Your array only holds 10 characters. Theyare at index 0..9. Index 10 is out of bounds.
Your copy operation might result in this layout or it might as well just crash while writing out of bounds.
But this is not checked by the compiler. If you run into problems then it will be during runtime.
Related
I'm piddling around with an Arduino and I have next to no programming in C.
In looking through some example code I came across this array variable declaration:
byte myArray[][6] = {"0"};
I get that this is declaring an array with unspecified rows and 6 columns.
What I don't understand is the {"0"}.
Upon the execution of this like of code, what will this variable contain?
Thanks!
The expression will initialize an array that looks like this:
myArray[0][0]
^
| +----> myArray[0][1]
| |
+---+----+---+---+---+---+
myArray[0] -----> |'0'|'\0'| | | | |
+---+----+---+---+---+---+
As you don't specify the first dimension and you only initialze 1 line it defaults to byte myArray[1][6].
If you were to initialize your array with, for instance:
byte myArray[][6] = {"0", "1"};
Then it would be:
myArray[0][0]
^
| +----> myArray[0][1]
| |
+---+----+---+---+---+---+
myArray[0] -----> |'0'|'\0'| | | | |
+---+----+---+---+---+---+
myArray[1] -----> |'1'|'\0'| | | | |
+---+----+---+---+---+---+
^ |
| |
myArray[1][0] |
+--->myArray[1][1]
In this case, because you initialize 2 lines, it defaults to byte myArray[2][6].
The string literal "0" is equivalent to the compound literal (char[]){ '0', '\0' }. So the declaration is equivalent to:
byte myArray[][6] = { { '0', '\0' } };
So the resulting array will be one row that contains an ASCII 0 (or a 0 appropriate to whatever the target character set is) followed by 5 \0 or NUL bytes.
I want to store a program state in a file. So I have a mmapped file that I perform operations on and then save it and maybe use it later.
This is fine for simple things but if I want a long lived data structure that requires dynamic memory allocation, I need a memory allocator that I can force to allocate within the pages I have mmapped.
I'm fairly certain I can't do this with the standard c malloc, and I've looked at jemalloc and I don't know if I can see anything there. I don't know if I'm going the wrong way with this, but is there any way to specify the location/size of heap before it is used?
For something like this you don't really want dynamic memory allocation. What you want instead is an array which uses an index value of the pointed to element instead of an actual pointer.
Suppose you wanted to implement a binary tree. You can model it as follows:
struct tree {
int free;
int value;
int left;
int right;
};
The left and right fields contain the indexes of the nodes to the left and to the right of the given node, with the value -1 indicating no such node (i.e. it is equivalent to a NULL pointer in this context).
The free field can be used as a flag to determine whether a given element of the array is currently in use. If a node is marked with free equal to 1, the left field points to the next free node, making it easy to find free nodes.
Node 0 is special in that it is the start of the free list, and the right field points to the root node of the tree.
Then the following tree:
7
/ \
3 10
/ \ / \
1 4 8 12
Can be modeled as follows:
free value left right
---------------------------
0 | 1 | 0 | 8 | 1 |
---------------------------
1 | 0 | 7 | 2 | 3 |
---------------------------
2 | 0 | 3 | 4 | 5 |
---------------------------
3 | 0 | 10 | 6 | 7 |
---------------------------
4 | 0 | 1 | -1 | -1 |
---------------------------
5 | 0 | 4 | -1 | -1 |
---------------------------
6 | 0 | 8 | -1 | -1 |
---------------------------
7 | 0 | 12 | -1 | -1 |
---------------------------
8 | 1 | 0 | 9 | -1 |
---------------------------
9 | 1 | 0 | -1 | -1 |
---------------------------
Such a tree can either be memmapped, or kept in memory using malloc / realloc to manage the size.
If your data structure holds any kind of string, you'll want your structure to contain fixed size character arrays instead of pointers so that they serialize correctly.
This question already has answers here:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
(13 answers)
Closed 4 years ago.
#include <stdio.h>
#include <stdint.h>
typedef struct s{
uint8_t a[1];
uint16_t b;
uint8_t c[1];
}s_t;
int main()
{
printf("size = %d", sizeof(s_t));
return 0;
}
I am not sure why the output of this program is 6 bytes and not 5. Why does the compiler pad an extra byte after the last member ? It also seems like, if you make the last member array length 3, the padding makes the size 8. I am unable to explain this since this is not the case for 2 arrays only.
Here is an illustration of the alignment that the compiler generates:
Bytes:
+-----+---------------+
| 0 | a[1] |
+-----+---------------+
| 1 | N/A (padding) |
+-----+---------------+
| 2 | b |
+-----+---------------+
| 3 | b |
+-----+---------------+
| 4 | c |
+-----+---------------+
As 16-bit quantities:
+---+------+----+
| 0 | a[i] | |
+---+------+----+
| 2 | b |
+---+------+----+
| 4 | c | |
+---+------+----+
Processors like to fetch 16-bit quantities from even addresses.
When they are on odd addresses, the computer may have to make 2 16-bit fetches, and extract the unaligned data out of them.
The easy method to eliminate this extra fetch is to add padding bytes so that 16-bit quantities align to even addresses.
A rule of thumb is to place the larger items first, then the smaller.
Applying this rule:
+---+------+
| 0 | b |
+---+------+
| 2 | a[1] |
+---+------+
| 3 | c |
+---+------+
The rule eliminates the need for an extra padding byte.
Lets say we have this:
int main()
{
int32_t* value = (uint32_t*)malloc(sizeof(uint32_t));
uint32_t array[9] = {1, 2, 3, 4, 5, 6, 7, 8, 9};
*value = *(uint32_t*)((char*)array + 8);
printf("Value is: %d\n", *value);
return 0;
}
The value in this case would be 3.
Why exactly is that?
If we cast an uint32_t to char, does that mean one char is 4 Byte in uint32_t and therefore
array[9] = {0, 4, !!8!!, 12, 16, 20, 24, 28, 32};
Could someone try to explain this?
When you initialize an array, each initializer sets an element of the array regardless of how many bytes each element takes up.
You machine is probably using little-endian byte ordering. That means that array looks like this in memory:
-----------------------------------------------------------------
| 1 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | ...
-----------------------------------------------------------------
| [0] | [1] | [2] | [3] | ...
Each value of type uint32_t is 4 bytes long with the least significant byte first.
When you do (char*)array that casts array (converted to a pointer) to a char *, so any pointer arithmetic on a char * increases the address by the size of a char, which is 1.
So (char*)array + 8 points here:
(char*)array + 8 ------------------
v
-----------------------------------------------------------------
| 1 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | ...
-----------------------------------------------------------------
| [0] | [1] | [2] | [3] | ...
That pointer is then converted to a uint32_t * and dereferenced, so it reads the value 3.
You created array[9] takes 36 bytes. It stores in memory as shown in first row. 3 store as I represented(It varies from compiler).
After you typecast it into char memory is seen as shown in 2nd row.
Now if you add 8 it will go to 8th position that mean after 02, why because, (char*)array + 8 treated as type+8. Here type is char. So it moves only 8 bytes.
Then memory from 8 to 35 type cased to uint32_t and first value stored in *value. So it will 3 only.
This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 8 years ago.
#include <stdio.h>
void main()
{
int arr[3][2]={2,3,4,5,6,7};
printf("%d\n",arr);
printf("%d\n",arr[1]);
printf("%d",arr[1][2]);
}
The above code when compiled in Borland Turbo C++ gives the output
8682
8686
6
I don't understand how this program works. I understand that while printing arr it returns the base address as 8682 and arr[1] returns next address location 8686 (integer is 4 bytes) but why is arr[1][2] not flashing an error as arr[1][2] is out of bounds?
Strictly speaking, it's undefined behavior. However, if you look at how array indices are treated, you will see why it's working.
arr
|
v
+-----+-----+-----+-----+-----+-----+
| 2 | 3 | 4 | 5 | 6 | 7 |
+-----+-----+-----+-----+-----+-----+
arr[0]
|
v
+-----+-----+-----+-----+-----+-----+
| 2 | 3 | 4 | 5 | 6 | 7 |
+-----+-----+-----+-----+-----+-----+
arr[1]
|
v
+-----+-----+-----+-----+-----+-----+
| 2 | 3 | 4 | 5 | 6 | 7 |
+-----+-----+-----+-----+-----+-----+
arr[1][0]
|
v
+-----+-----+-----+-----+-----+-----+
| 2 | 3 | 4 | 5 | 6 | 7 |
+-----+-----+-----+-----+-----+-----+
arr[1][2]
|
v
+-----+-----+-----+-----+-----+-----+
| 2 | 3 | 4 | 5 | 6 | 7 |
+-----+-----+-----+-----+-----+-----+
why arr[1][2] is not flashing an error as arr[1][2] is out of bound
if my memory serves me right, C wont complain about array out of bounds errors. It will instead simply allow you to go out of bounds.
In C, errors are not thrown for out of bounds array accesses - in fact it doesn't even make any check! Instead the system will just access whatever happens to be at that spot in memory (one of the major dangers of C programs).
Here's an explanation of what is happening here. Take this code here:
arr[1][2]
What happens internally actually looks like this:
*(arr + 1 * 2 + 2) // the 2 comes from the second dimension size of the array
The 2D array internally is stored as a one-dimensional array with each row coming after the previous row:
arr[3][2] = {2, 3, 4, 5, 6, 7};
// is the same as
arr[6] = {2, 3, 4, 5, 6, 7};
The math from arr[1][2] works out to accessing arr[5], which is why you get 6 as the value.