Clarification of NULL assignment to char *

Clarification of NULL assignment to char * - c

I came across an example of non-portable C code where a char pointer is an argument to a variadic C function. The example is described in the image below. The part highlighted in blue is not necessarily clear, and appears wrong. In particular, I have two questions:
Assuming that NULL was 32-bit int 0 on a system, wouldn't compiler do an implicit cast of 32-bit - int to 64-bit 0 when it encounters char *string = NULL. If not, then are we saying that each expression like char *string = NULL is non-portable and must be always replaced with an explicit cast like char *string = (char *)NULL for portable C?
If NULL was 32-bit int 0, and char *string was 64-bit then why would printf run out of bits to print like it is suggested in the blue highlight. Shouldn't printf get full 64 bits as it was passed string and not NULL.
Source of the screenshot: https://wiki.sei.cmu.edu/confluence/plugins/servlet/mobile?contentId=87152357#content/view/87152357

The referenced article is wrong and should be disregarded.
Assuming that NULL was 32-bit int 0 on a system, wouldn't compiler do an implicit cast of 32-bit - int to 64-bit 0 when it encounters char *string = NULL.
An assignment automatically converts the right operand to the type of the left operand. So char *string = NULL will convert the NULL value to char *, not to “64-bit 0”.
If not, then are we saying that each expression like char *string = NULL is non-portable and must be always replaced with an explicit cast like char *string = (char *)NULL for portable C?
No, char *string = NULL is portable C code; it is strictly conforming.
If NULL was 32-bit int 0, and char *string was 64-bit then why would printf run out of bits to print like it is suggested in the blue highlight. Shouldn't printf get full 64 bits as it was passed string and not NULL.
The code referenced, char* string = NULL; followed by printf("%s %d\n", string, 1);, does not pass NULL to printf. It passes string to printf, and the prior assignment converts NULL to char *. So printf is passed a char * that has the value of a null pointer. This will not cause any problem in interpreting the variable arguments to printf. (It is, however, improper to pass a null pointer for the %s conversion.)
If the call were instead printf("%s", NULL);, then there is a problem. Arguments corresponding to the ... part of a variable-argument function are not automatically converted to a parameter type. They are processed by the default argument promotions, which largely promote narrow integer types to int and promote float to double, but they will not convert an int to any type of pointer. Thus, if NULL is defined as 0, then printf("%s", NULL); passes an int where a char * is expected, and this may cause various misinterpretations of the arguments.
In consequence, never use the NULL macro as a direct argument to a function with a variable argument list. Using a pointer variable that has been assigned from NULL is okay.

Related

Dereferencing char pointer returns int ? why? [duplicate]

This question already has answers here:
Char pointers and the printf function
(6 answers)
Closed 1 year ago.
For example, the following code returns an error and a warning when compiled and an int when changed to %d
Warning:
format %s expects argument of type char *, but argument 2 has type int
void stringd() {
char *s = "Hello";
printf("derefernced s is %s", *s);
}

*s is an expression of type char since it's the dereference operator applied to a pointer-to-char1. As a result, it gets promoted to an int when passed to printf; in order to print a null-terminated string, you need to pass the pointer to the first character (i.e. just s).
1 even though s is not a const pointer, you should not try to modify the characters it points to as they may be placed in read-only memory where string literals are stored on some architectures/environments; see this discussion for more details.

The variable s is a pointer to the first out of a series of characters which are consecutive in memory (colloquially referred to as a "string", though it's not quite the same). It's a pointer to a character, thus char *.
Dereferencing s (by doing *s) gives you the first of those characters, h, whose type is now just char. One layer of indirection was stripped away.
Thus, the issue is that you're trying to pass a character (char), where a string (char *) was expected. char * was expected because you used the %s type character in your format string to printf. Instead, you should use %c, which expects single, simple char.
The mistake here is actually quite grave. If you were allowed to pass this 'h' where a char * was expected, you would end up with the ASCII code of 'h' (0x68) being passed where a pointer was expected. printf would be none-the-wiser, and would try to dereference that value, treating 0x68 like a pointer to the beginning of a string. Of course, that's probably not a valid memory location in your program, so that should seg-fault pretty reliability, if it were allowed to happen.

Explanation of integer and register constant character in function

char firstmatch(char *s1, char *s2) {
char *temp;
temp = s1;
do {
if (strchr(s2, *temp) != 0)
return temp;
temp++;
} while (*temp != 0);
return 0;
}
char *strchr(register const char *s, int c) {
do {
if (*s == c) {
return (char*)s;
}
} while (*s++);
return (0);
}
I am new to programming and I have been given this code which finds the first character in a string s1 that is also in string s2. The task is to understand the C code and convert into Assembly code. As of right now my focus is just to understand what the C code is doing and I am currently having difficulty with pointers. I can sort through the code on the firstmatch() function and make my way down but I am kind of confused with the char * strchr() function. I am unable to understand whats the point of int c in regards to a constant character pointer? I'd appreciate if somebody could help explain it.

The function strchr() in your code sample is an incomplete implementation of the Standard C library function that locates the first occurrence of a character in a C string, if any.
The argument has type int for historical reasons: in early versions of the language functions arguments would be typed only if the implicit type int did not suffice. character arguments were passed as int values, so typing the argument differently was unnecessary.
The register keyword is obsolete: early C compilers were not as advanced as current ones and the programmer could help code generators determine which variables to store in CPU registers by adorning their definitions with the register keyword. Modern compilers are more efficient and usually beat programmers at this game, hence this keyword is mostly ignored nowadays.
Note however that this implementation behaves differently from the Standard function: the value of c must be converted to char before the comparison. As noted by chux, all functions in <string.h> treat bytes in C strings and memory blocks as unsigned chars for comparison purposes.
Here is a more readable version with the correct behavior:
#include <string.h>
char *strchr(const char *str, int c) {
const unsigned char *s = (const unsigned char *)str;
do {
if (*s == (unsigned char)c) {
return (char *)s;
}
} while (*s++ != '\0');
return NULL;
}

The int c argument might as well be char c. The type of *temp is char.
The strchr function takes a pointer into a nul terminated string and a char and either returns the pointer to the next occurrence of the char or null if it reached the nul at the end of the string.

strchr() receives a pointer to (think, memory address of) the first (or the only) character in a sequence.
The function extracts a character from memory using that pointer s and sees if its value matches the value of c. If there's a match, it returns the pointer.
If there's no match, it advances the pointer to the next character in the sequence (that is, increments the memory address by 1) and repeats.
If there's no match and the value of the character from memory is 0, NULL is returned.
The pointer being to a const char implies that memory isn't going to be written to, but may be read from. Indeed, the function never tries to write using the pointer.
So, you read chars from memory and compare them to an int. In most expressions chars implicitly convert to signed int (if such a conversion is generally possible without loss of any value of type char) or unsigned int (otherwise). See integer promotions on this. If after this both sides of the == operator are signed ints, everything is trivial, just compare those. If one is unsigned int (the promoted *s character) while the other one is signed int (c), the signed one is converted to unsigned (see the same linked article for the logic/rules), after which both sides of == have the same type (unsigned int) and are comparable (this is one of the key ideas of C, most binary operators convert their inputs to a common type and produce the result of that common type).
Simply put, in C you can compare different arithmetic types and the compiler will insert necessary (per the language rules) conversions. That said, not all conversions preserve value (e.g. a conversion from signed int to unsigned int doesn't preserve negative values, however they are converted in a well-defined manner) and that may be surprising (e.g. -1 > 1u evaluates to 1, which seems absurd to anyone knowing a bit of math), especially to the ones new to the language.
The real question here seems "Why isn't c defined as char?".
If one inspects the standard C library functions, they'll find that values of type char are (almost?) never passed or returned, although passing or returning pointers to char is quite common. Individual characters are typically passed by means of the int type. The reason for this is probably that, like mentioned above, char would convert to int or unsigned int in an expression anyway, so some additional conversions (back to char and then again to int) may be avoided.

The char *s1 represents a string in C.
The 0 represents the Acsii equivalent of '\0' which is the termination of a string in C. Chars and integers are interchangeable, but you need to know the Ascii value of each char. The letter 'A' is equivalent to integer 65 by Ascii value. This should answer your question about int c. It doesn't make any behavioral difference for the code.
Now suppose you had the string hello and meh, you would have:
char * s1 = ['h', 'e','l','l','o','\0']
char * s2 = ['m', 'e', 'h','\0']
So you call:
firstmatch('hello', 'meh')
temp is assigned the value of 'hello'.
Now you call
strchr('meh', 'h')
*temp in this case scenario is equivalent to temp[0], which is 'h'.
In the strchr, it loops through each letter of 'meh', starting from 'm'.
First iteration:
'm' == 'h' -> false therefore proceed to next letter (*s++)
Second iteration:
'e' == 'h' -> false therefore proceed to next letter (*s++)
Third iteration:
'h' == 'h' -> true therefore return a char value that is not 0.
This returns us to the firstmatch function inside the if condition.
Since the if condition passes on the third iteration, it returns us 'h'.
Suppose the third iteration failed, it would increment onto the next letter in s1, which would be 'e', and follow the same procedure described above.
Finally, the (*temp != 0) means that if we encounter the '\0' in the s1 for 'hello' we defined above, then it stops the entire loop and returns 0. Indicating there is no same letter.
Read about pointer arithmetic in C/C++ if you don't understand why *temp == temp[0]. Likewise *temp++ == temp[n+1] (n representing the current character).

Why is the character d printed when I assign a string literal to a char variable?

#include<stdio.h>
int main(void) {
char a = "any"; //any string
printf("%c", a);
getch();
}
Why always d (for %c) or 100 (for %d) gets printed? What's happening?

char a="any"; is not declaring any string. Your compiler should through error/warning. You need
const char *a = "any";
Now
printf("%c", *a);
will print character a as pointer a is pointer to the first element of the string literal any.

Your code has issues. On
char a="any";
the string literal "any" is a pointer but you are saving it in a (small) integer type. Because the value of a is (part of) an address it will have an essentially arbitrary value. On my machine it prints "T" (if I remove the getch() line because that doesn't compile).
GCC gives the following warning:
warning: initialization makes integer from pointer without a cast
and whatever compiler you are using probably tells you something similar. You should really take this warning seriously.

"something between double quotes"
Allocates memory for the string and returns a pointer to the string in memory. you need to store this pointer in a using char *a = "abc"; now 'a' is a pointer to the string abc.
printf("%s",a) this will print the entire string abc.
for a single character use single quotes 'a' in this case char a = 'b'; is correct.
printf("%c",a) this will print b.
printf("%d",a) this will print the ascii value of 'b', which is 98

In the statement
char a = "any";
the string literal "any" evaluates to the address of its first character which is of type const char *. Assigning a const char * to a char variable implicitly converts the pointer to an integer. This emits the following warning if you compile with -W gcc flag:
warning: initialization makes integer from pointer without a cast [enabled by default]
What happens next is the least significant byte of the cast integer value is assigned to the char variable a. This is a random value and cannot be predicted. On my machine, it happened to be the ascii code for the character /. On your machine, it happened to be 100, the ascii code for the character d. It is just pure chance. Nothing else.
You should assign a string literal to a const char * as:
const char *a = "any";
and print the string pointed to by a by the %s conversion specifier as:
printf("%s\n", a);
Also note that getch is not a C standard library function, so your program won't compile on a linux machine. You can use getchar for the same effect, though.

NULL implementation and pointer assignment

I have a couple of questions that came up when I was reading and writing code..
I already get another approach, but this questions still in my mind and I didn't find any good answer, so let's see:
Can I assign a pointer to a variable in a portable way, even losing
information? See the situation, I have a function that could return
NULL if an error, or a char, supposing that NULL is all 0, the char
will be nul, is this correct?
void * function_returning_null_or_char ();
An int variable is suppose to hold a pointer? (I read it in somewhere)..
My implementation of NULL is #define NULL (void *) 0. Would be better if it was #define NULL 0? So
it could be assigned to variable and pointers, like a char or an int
*. (see question 1)
When I assign a pointer to NULL, when I cast it to short or char
or another type, the value will be 0? Or The value will be the
special value of NULL cutted to fit on that type?
I think that all questions are resumed in:
The compiler is smart enough to convert a NULL pointer in 0s when it is cast to a variable?
example code:
int *p = NULL;
char c = (char) p; // This works, I dont know why, and char is equal to '/0';
char c = NULL; //This works because the compiler is smart to convert NULL to '/0' event stddef.h defining NULL as (void *)0;
char c = (char) function_returning_null_or_char (); //if this function return NULL whats the value of c? If NULL have a special value, c will have a special value too, or it will be 0?

It doesn't make sense to have a function that can return either NULL (a pointer value) or a char value. There is a null character '\0', but it's not related to the null pointer. There are various ways you can have a function return either a char value or some indication that there is no valid value. For example, return a structure.
No, an int variable is not supposed to hold a pointer. An int variable holds an integer value; to hold a pointer, use a pointer variable. (You can convert between integer and pointer types, but doing so makes sense far less often than you might think.)
NULL is defined for you by the implementation, in <stddef.h> and several other standard headers. Do not try to redefine it yourself.
Converting a null pointer value to an integer type yields an implementation-defined value. If you care what that value is, you're probably doing something wrong.
Pointers are not integers; don't try to treat them as integers. The representation of a null pointer is implementation-defined; it's commonly all-bits-zero, but there's no guarantee of that.
Take a look at the comp.lang.c FAQ, particularly sections 4 (Pointers) and 5 (Null Pointers).

C: why printing a null char with %s prints "(null)"?

Why does printing a null char ('\0', 0) with %s prints the "(null)" string actually?
Like this code:
char null_byte = '\0';
printf("null_byte: %s\n", null_byte);
...printing:
null_byte: (null)
...and it even runs without errors under Valgrind, all I get is the compiler warning warning: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘int’ [-Wformat] (note: I'm using gcc 4.6.3 on 32bit Ubuntu)

It's undefined behavior, but it happens that on your implementation:
the int value of 0 that you pass is read by %s as a null pointer
the handling of %s by printf has special-case code to identify a null pointer and print (null).
Neither of those is required by the standard. The part that is required[*], is that a char used in varargs is passed as an int.
[*] Well, it's required given that on your implementation all values of char can be represented as int. If you were on some funny implementation where char is unsigned and the same width as int, it would be passed as unsigned int. I think that funny implementation would conform to the standard.

Well, for starters, you're doing it wrong. '\0' is a character and should be printed with %c and not %s. I don't know if this is intentional for experimentation purposes.
The actual binary value of \0 is, well, 0. You're trying to cast the value 0 to a char * pointer, which would result in an invalid reference and crash. Your compiler is preventing that with a special treatment of the %s value.
Valgrind won't catch it because it runs on the resulting binary, not the source code (you'd need a static analyzer instead). Since the compiler has already converted that call into a safe "null pointer" text, valgrind won't see anything amiss.

null_byte contains 0. When you use %s in printf, you are trying to print a string, which is an adress of a char (a char *). What you do in your code, is that you are passing the adress 0 (NULL) as the adress of your string, which is why the output is null.
The compiler warned you that you passed the wrong type to the %s modifier. try printf("null_byte: %s\n", &null_byte);

Your printf statement is trying to print a string and is therefore interprets the value null_bye as a char * that has the value null. Take heed of the warning. Either do this
printf("null_byte: %s\n", &null_byte);
or this
printf("null_byte: %c\n", null_byte);

Because printf is variadic, the usual argument promotions are performed on null_byte so it gets promoted (cast) to int, value 0.
printf then reads a char * pointer, and the 0 int is interpreted as a null pointer. Your C standard library has a feature that null strings are printed as (null).

Adding an implementation example to other answers, in XV6, which is an educational re-implementation of Unix v6, if you pass a zero value to %s, it prints (null):
void printf(int fd, const char *fmt, ...) {
uint *ap = (uint*)(void*)&fmt + 1;
...
if(state == '%') {
...
if(c == 's') {
s = (char*)*ap;
ap++;
if(s == 0)
s = "(null)";
while(*s != 0){
putc(fd, *s);
s++;
}
}
}
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight