Unexpected results when using memcpy - c

Hi i came across with this simple c program but i cant understand how this code works:
#include <string.h>
#include <stdio.h>
char *a = "\0hey\0\0"; /* 6 */
char *b = "word\0up yo"; /* 10 */
char *c = "\0\0\0\0"; /* 4 */
int main(void)
{
char z[20];
char *zp = z;
memcpy(zp, a, strlen(a)+1);
memcpy(zp, b, strlen(b)+1);
memcpy(zp, c, strlen(c)+1);
/* now z contains all 20 bytes, including 8 NULLs */
int i;
for(i = 0; i < 20; i++){
if (z[i] == 0){
printf("\\0");
}
printf("%c", z[i]);}
return 0;
}
I was expecting that printing z the output would be :
\0hey\0\0\0word\0up yo\0\0\0
But instead I am getting :
\0ord\0\0\0\0\0\0\0\0\0\0\0\0???Z
Finally , when i print a instead of z i get the right output.
Can anyone explain to me why this happens ? Thanks in advance.
EDIT: How i could concatenate such strings?

Strings in C are zero-terminated; the functions in the standard C library assume this property. In particular, the function strlen returns the number of non-zero characters from the start of the string. In your example, strlen(a) is equal to 0, already as the first character of a is zero.
The code will have the following effect:
memcpy(zp, a, strlen(a)+1);
Now zp still contains \0, because strlen(a) is 0, so 1 character is copied.
memcpy(zp, b, strlen(b)+1);
Now zp contains word\0: five characters copied.
memcpy(zp, c, strlen(c)+1);
Now just the first character of zp is overwritten, so it contains \0ord\0.
Finally , when i print a instead of z i get the right output. Can anyone explain to me why this happens ? Thanks in advance.
That's because a, b, and c happen to be allocated sequentially in the memory. When you print "20 bytes starting from the start of a", you're actually looking at the memory past the latest byte of a. This memory happens to contain b. So you actually start reading b. Same goes for b and c. Note that this is by no means guaranteed. Looking past the memory allocated for a char * is in fact an instance of undefined behaviour.
How i could concatenate such strings?
In general, there is no way how to find the length of such "strings" in the runtime. I would not call them strings as such, since "string" has a specific meaning in the C language - it refers to zero terminated strings, while your's are simply regions of memory.
However, since you know the size at compile time, you can use that. To avoid magic numbers in the code, it's better to use char arrays instead of char pointers, because then you can use the sizeof operator. However, note that all string literals in C are implicitly zero terminated! To fit the result in the 20-byte buffer, you'll want to use sizeof(x) - 1:
char a[] = "\0hey\0\0"; /* 6 */
char b[] = "word\0up yo"; /* 10 */
char c[] = "\0\0\0\0"; /* 4 */
memcpy(zp, a, sizeof(a) - 1);
zp += sizeof(a) - 1;
memcpy(zp, b, sizeof(b) - 1);
zp += sizeof(b) - 1;
memcpy(zp, c, sizeof(c) - 1);

Related

Do arrays end with NULL in C programming?

I am a beginner to C and I was asked to calculate size of an array without using sizeof operator. So I tried out this code, but it only works for odd number of elements. Do all arrays end with NULL just like string.
#include <stdio.h>
void main()
{
int a[] = {1,2,3,4,5,6,7,8,9};
int size = 0;
for (int i = 0; a[i] != '\0'; i++)
{
size++;
}
printf("size=%d\n", size);
}
No, in general, there is no default sentinel character for arrays.
As a special case, the arrays which ends with a null terminator (ASCII value 0), is called a string. However, that's a special case, and not the standard.
> So I tried out this code, but it only works for odd number of elements.
Try your code with this array -
int a[] = {1,2,0,4,5,6,7,8,9};
^
|
3 replaced with 0
and you will find the output will be size=2, why?
Because of the for loop condition - a[i] != '\0'.
So, what's happening when for loop condition hit - a[i] != '\0'?
This '\0' is integer character constant and its type is int. It is same as 0. When a[i] is 0, the condition becomes false and loop exits.
In your program, none of the element of array a has value 0 and for loop keep on iterating as the condition results in true for every element of array and your program end up accessing array beyond its size and this lead to undefined behaviour.
> Do all arrays end with NULL just like string.
The answer is NO. In C language, neither array nor string end with NULL, rather, strings are actually one-dimensional array of characters terminated by and including the first null character '\0'.
To calculate size of array without using sizeof, what you need is total number of bytes consumed by array and size (in bytes) of type of elements of array. Once you have this information, you can simply divide the total number of bytes by size of an element of array.
#include <stdio.h>
#include <stddef.h>
int main (void) {
int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
ptrdiff_t size = ((char *)(&a + 1) - (char *)&a) / ((char *)(a + 1) - (char *)a);
printf("size = %td\n", size);
return 0;
}
Output:
# ./a.out
size = 9
Additional:
'\0' and NULL are not same.

in C, why do I have " "s": initialization requires a brace-enclosed initializer list"?

DISCLAIMER: it's just a piece of the whole algorithm, but since I encountered a lot of errors, I've decided to divide and conquer, therefore, I start with the following code. (goal of the following code: create a string with the remainders of the division by 10 (n%10). for now, it's not reversed, I haven't done it yet, because I wanted to check this code first).
(i'm working in C, in visual studio environment).
I have to implement a function like atoi (that works from a string to a number), but I want to do the opposite (from a number to a string). but I have a problem:
the debugger pointed out that in the lines with the malloc, I should have initialized the string first (initialization requires a brace-enclosed initializer list),
but I have done it, I have initialized the string to a constant (in the 2nd line, I've written "this is the test seed")(because I need to work with a string, so I initialized, and then I malloc it to write the values of (unsigned int n) ).
this is how my program is supposed to work:
(1) the function takes an unsigned int constant (n),
(2) the function creates a "prototype" of the array (the zero-terminated string),
(3) then, I've created a for-loop without a check condition because I added it inside the loop body,
(4) now, the basic idea is that: each step, the loop uses the i to allocate 1 sizeof(char) (so 1 position) to store the i-th remainder of the n/10 division. n takes different values every steps ( n/=10; // so n assumes the value of the division). and if n/10 is equal to zero, that means I have reached the end of the loop because each remainder is in the string). Therefore, I put a break statement, in order to go outside the for-loop.
finally, the function is supposed to return the pointer to the 0-th position of the string.
so, to sum up: my main question is:
why do I have " "s": initialization requires a brace-enclosed initializer list"? (debugger repeated it twice). that's not how string is supposed to be initialized (with curly braces "{}"). String is initialized with " " instead, am I wrong?
char* convert(unsigned int n) {
char s[] = "this is the test seed";
for (unsigned int i = 0; ; i++) {
if (i == 0) {
char s[] = malloc (1 * sizeof(char));
}
if (i != 0) {
char s[] = malloc(i * sizeof(char));
}
if ((n / 10) == 0) {
break;
}
s[i] = n % 10;
n /= 10;
}
return s;
}
char s[]is an array, and therefore needs a brace-enclosed initializer list (or a character string literal). In the C standard, see section 6.7.8 (with 6.7.8.14 being the additional special case of a literal string for an array of character type). char s[] = malloc(...); is neither a brace-enclosed initializer list or a literal string, and the compiler is correctly reporting that as an error.
The reason for this, is that char s[] = ...; declares an array, which means that the compiler needs to know the length of the array at compile-time.
Perhaps you want char *s = malloc(...) instead, since scalars (for example, pointers) can be initialized with an assignment statement (see section 6.7.8.11).
Unrelated to your actual question, the code you've written is flawed, since you're returning the value of a local array (the first s). To avoid memory problems when you're coding, avoid mixing stack-allocated memory, statically allocated strings (eg: literal strings), and malloc-ed memory. If you mix these together, you'll never know what you can or can't do with the memory (for example, you won't be sure if you need to free the memory or not).
A complete working example:
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
char *convert(unsigned n) {
// Count digits of n (including edge-case when n=0).
int len = 0;
for (unsigned m=n; len == 0 || m; m /= 10) {
++len;
}
// Allocate and null-terminate the string.
char *s = malloc(len+1);
if (!s) return s;
s[len] = '\0';
// Assign digits to our memory, lowest on the right.
while (len > 0) {
s[--len] = '0' + n % 10;
n /= 10;
}
return s;
}
int main(int argc, char **argv) {
unsigned examples[] = {0, 1, 3, 9, 10, 100, 123, 1000000, 44465656, UINT_MAX};
for (int i = 0; i < sizeof(examples) / sizeof(*examples); ++i) {
char *s = convert(examples[i]);
if (!s) {
return 2;
}
printf("example %d: %u -> %s\n", i, examples[i], s);
free(s);
}
return 0;
}
It can be run like this (note the very useful -fsanitize options, which are invaluable especially if you're beginning programming in C).
$ gcc -fsanitize=address -fsanitize=leak -fsanitize=undefined -o convert -Wall convert.c && ./convert
example 0: 0 -> 0
example 1: 1 -> 1
example 2: 3 -> 3
example 3: 9 -> 9
example 4: 10 -> 10
example 5: 100 -> 100
example 6: 123 -> 123
example 7: 1000000 -> 1000000
example 8: 44465656 -> 44465656
example 9: 4294967295 -> 4294967295

realloc() for pointer to structure in C

I've run across this problem: I implemented a function whish converts a string into a structure. I've got this structure:
typedef struct {
unsigned a, b;
unsigned c, d;
} struct_t;
The heading of the function is the following:
struct_t * string_to_struct (char * g)
\retval p pointer to new structure created; \retval NULL if the conversion is not successful. The convertion is not successful for strings such as "5 8 10 10" (I'm given a segmentation fault error) but is successful for strings such as "5 6 6 7" or "4 5 6 8". I think the problem lies within the allocation of memory for pointer to structure p.
I thought about first allocating memory for p in this way:
p = (struct_t*)malloc(sizeof(struct_t));
And then I thought about reallocating memory for p due to make it room for the string (some strings are about 8 bytes and everything works fine, but if the string is about 10 bytes I get a segmentation fault, because by allocating memory for p as above I make room for only 8 bytes), using function realloc() to enlarge the memory where to put the structure and make it the size of the string, but I don't know how to do it properly.
Here it is how I attempted to implement the function, not using realloc():
struct_t * string_to_struct (char * g){
struct_t * p; /* pointer to new structure*/
int n;
n = sizeof(g);
if(sizeof(g) > sizeof(struct_t))
p = (struct_t*)malloc(n*sizeof(struct_t));
else
p = (struct_t*)malloc(sizeof(struct_t));
if(g[0] == '\0' ) /* trivial */
return NULL;
else
(*p).a = g[0];
(*p).b = g[2];
(*p).c = g[4];
(*p).d = g[6];
if((*p).a <= (*p).c && (*p).b <= (*p).d) /* check, the elements of the structure must satisfy those relations.*/
return p;
else
return NULL; /* convertion not successful */
}
But it's not working. Thanks in advance for any help.
First, this bit of logic is unnecessary:
int n;
n = sizeof(g);
if(sizeof(g) > sizeof(struct_t))
p = (struct_t*)malloc(n*sizeof(struct_t));
else
p = (struct_t*)malloc(sizeof(struct_t));
The correct amount of memory to allocate for one instance of your struct is always sizeof(struct_t). The length of the string doesn't matter. Also, sizeof(g) is giving you the size of the pointer, not the length of the string. You get the string length with strlen(g). You can replace the above with:
p = malloc(sizeof(struct_t));
The main problem is here:
(*p).a = g[0];
(*p).b = g[2];
(*p).c = g[4];
(*p).d = g[6];
What this does is store the ASCII value (assuming your system uses ASCII) of a particular character in the string into your struct as an integer. Given your example input of "5 6 6 7", g[0] contains the character '5'. This has an ASCII value of 53, so p->a has the value 53. When your input is all single digit numbers, the indexes you use correspond to where the digits are in the string, so you end up with the ASCII values of each digit. And because the ASCII values of the characters '0' to '9' are consecutive, the comparisons you do work as expected.
When you use a string like "5 8 10 10", the above assumption about the location of the digits breaks. So a gets '5' (53), b gets '8' (56), c gets '1' (49), and d gets a space (ASCII 32). Then your comparison (*p).b <= (*p).d) fails because 56 is not less than 32, so your function returns NULL. You're probably getting a segfault because the calling function isn't checking if NULL was returned.
To parse the string correctly, use strtok to break the string up into tokens, then use atoi or strtol to convert each substring to an integer.

Subtracting two strings in C

Well , I was actually looking at strcmp() , was confused about its working . Anyways I wrote this code
#include <stdio.h>
main()
{
char a[5] = "ggod";
char b[5] = "ggod";
int c = 0;
c = b - a;
printf("%d value", c);
}
and I get the output as
16
Can anyone explain Why is it 16 ?
What you have subtracted there are not two strings, but two char *. c holds the memory address difference between a and b. This can be pretty much anything arbitrary. Here it just means that you have 16 bytes space between the start of the first string and the start of the second one on your stack.
c = b - a;
This is pointer arithmetic. The array names it self points to starting address of array. c hold the difference between two locations which are pointed by b and a.
When you print those values with %p you will get to know in your case
if you print the values looks like this a==0x7fff042f3710 b==0x7fff042f3720
c= b-a ==>c=0x7fff042f3720-0x7fff042f3710=>c=0x10 //indecimal the value is 16
Try printing those
printf("%p %p\n",a,b);
c=b-a;
if you change size of array difference would be changed
char a[120]="ggod";
char b[5]="ggod";
b is an array object
a is also an array object
an array object is a static address to an array.
so b-a is adifference between 2 addresses and not between the 2 strings "ggod"-"ggod"
If you want to compare between 2 string you can use strcmp()
strcmp() will return 0 if the 2 strings are the same and non 0 value if the 2 strings are different
here after an example of using strcmp()

Is it ok to iterate backwards to one before the beginning of an array

If for example I have a ptr to a string and move ptr to last character in string and iterate backwards to beginning of string using *p-- and I iterate to position one before start of array is this OK? Or will I get an access violation? I am only moving pointer - not accessing. It seems to work in my code so wondering if it is bad practice or not?
Here is a sample - line with *next-- = rem + 'A'; is one I am questioning if ok???
#include <stdio.h> /* printf */
#include <string.h> /* strlen, strcpy */
#include <stdlib.h> /* malloc/free */
#include <math.h> /* pow */
/* AAAAA (or whatever length) = 0, to ZZZZZ. base 26 numbering system */
static void getNextString(const char* prev, char* next) {
int count = 0;
char tmpch = 0;
int length = strlen(prev);
int i = 0;
while((tmpch = *prev++) != 0) {
count += (tmpch - 'A') * (int)pow(26.0, length - i - 1);
++i;
}
/* assume all strings are uppercase eg AAAAA */
++count;
/*if count above ZZZ... then reset to AAA... */
if( count >= (int)pow(26.0, length))
count = 0;
next += (length-1); /* seek to last char in string */
while(i-- > 0) {
int rem = count % 26;
count /= 26;
*next-- = rem + 'A'; /*pntr positioned on 1 before array on last iteration - is OK? */
}
}
int main(int argc, char* argv[])
{
int buffsize = 5;
char* buff = (char*)malloc(buffsize+1);
strcpy(buff, "AAAAA");
int iterations = 100;
while(--iterations){
getNextString(buff, buff);
printf("iteration: %d buffer: %s\n", iterations, buff);
}
free(buff);
return 0;
}
According to the following C-FAQ question\answer, and I quote:
Pointer arithmetic is defined only as long as the pointer points
within the same allocated block of memory, or to the imaginary
``terminating'' element one past it; otherwise, the behavior is
undefined, even if the pointer is not dereferenced.
So my answer would be no, it is not OK to iterate before the beginning of an array.
There are references to the C standards as well:
K&R2 Sec. 5.3 p. 100, Sec. 5.4 pp. 102-3, Sec. A7.7 pp.
205-6
ISO Sec. 6.3.6 (C89) or 6.5.6/8 (C99)
Rationale Sec. 3.2.2.3
As long as you don't try to read or write from that address, it won't cause a violation. This is becuase the value in a ptr is just another number.
The only reason your code is working is that your length happens to be less than or equal to the initial value in i.
I personally would not want to rely on this, since I know I'd forget about that particular condition, and I'd make some modification that broke it. So while it technically works, it's not really a good idea.
[expr.add], ΒΆ5
If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined.
So it's UB, since the result do not point to any valid element of the array.
comment, acctually: (to FelixCQ's answer)
I could understand why obtaining an Out-of-range pointer in a loop could be dangerous because of possible loop unrolling and out-of-order evaluation, so that the pointer could get derefferenced before the terminating condition is evaluated, as in this simple example:
for (char* tmp = s+len; tmp >= s; tmp--) sum += *tmp;
However, if this is the reason for UB, then
for (int i = len; i >= 0; i--) sum += s[i];
has exactly the same problem! Or am I missing something?

Resources