Check if string can be mutated in C - c

Say I have this function:
void f(char *s) {
s[0] = 'x';
}
This function will sometimes cause errors and sometimes not. For example,
char *s = "test";
f(s); // Error
char t[] = "test";
f(t); // Success
Inside function f, is it possible to determine whether or not s[0] = 'x'; will cause an error before doing it?

The responsibility is on the caller to comply with requirements of the function that the argument be changeable, not the reverse.
const char *s = "test"; // tell compiler s is immutable
f(s); // compilation error since f() requires a non-const argument
char t[] = "test";
f(t); // Success
The only way to stop the compiler from rejecting f(s) in the above is to either remove the const from the declaration of s, or to cast the const'ness away. Except in exceedingly rare circumstances, both are positive indicators of a problem.
Note: it is an anomaly in the language that s can be declared without the const qualifier. Get in the practice of using const where needed (e.g. when initialisating a pointer using a string literal). A lot of program bugs are eliminated that way.

Given the feedback I've received, It sounds like I should instead document whether or not the function requires its arguments to be mutable. For example:
#include <stdio.h>
#include <string.h>
char* f(char*);
void g(char*);
int main(int argc, char *argv[]) {
// s is immutable.
char *s = "test";
puts(f(s));
// t is mutable.
char t[] = "test";
g(t);
puts(t);
}
/*
* Returns a copy of the given string where the first character is replaced with 'x'.
* Parameter s must contain at least one character.
*/
char* f(char *s) {
char *t = strdup(s);
t[0] = 'x';
return t;
}
/*
* Replaces the first character of the given sting with 'x'.
* Parameter s must contain at least one character and be mutable.
*/
void g(char *s) {
s[0] = 'x';
}

Related

Returning string initialized from scalar of volatiles behaves very strange?

I am trying to make reverse engineering more difficult with string literals like in the first block of code. I am initializing it as a scalar with volatiles. It is volatile so the compiler won't optimize it and turn it into a plain string literal upon compilation.
#include <stdio.h>
static const volatile char a = 'a', b = 'b', c = 'c', d = 'd', e = 'e', f = 'f';
inline const char *_GetString(void) {
return (const char[]){a, b, c, d, e, f, 0};
}
const char *GetString(void) {
const char *x = _GetString();
puts(x);
return x;
}
int main(int argc, char *argv[]) {
puts(GetString());
return 0;
}
The preceding does not print abcdef twice. However this does:
#include <stdio.h>
const char *_GetString(void) {
return "abcdef";
}
const char *GetString(void) {
const char *x = _GetString();
puts(x);
return x;
}
int main(int argc, char *argv[]) {
puts(GetString());
return 0;
}
Why does this happen? How can I return a string from the function in this manner, that does not act odd, but still preserve the function being inline and difficult to reverse engineer?
Compound literals, like the one inside of _GetString in your first code snippet, have the same lifetime as any variable declared in the same scope. So when the function returns, it returns a pointer to the first element in the array, i.e. a pointer to a local variable. This means that when the function returns the pointer value returned is no longer valid, and attempting to use it invokes undefined behavior.
The second piece of code works because it returns the address of the first element of a string literal, and the lifetime of strings literals is that of the entire program, so the pointer is still valid.
This function:
inline const char *_GetString(void)
{
return (const char[]){a, b, c, d, e, f, 0};
}
creates a local array (with a lifetime limited to the lifetime of its enclosing block, which ends when the execution of this function ends in this case), and then returns its address. That's undefined behavior.
Regarding the obfuscation attempt, if your goal is to store an actual password inside the executable, I strongly recommend that you avoid doing so. If your program is able to access a password in its text form, a hacker will be able to do it too; the only thing you should be storing in that case would be a cryptographic hash of the password (after salting it).
This works, however it does not comply with inline. It does return a string, and does not contain any string literals.
#include <stdio.h>
#include <string.h>
static const volatile char a = 'a', b = 'b', c = 'c', d = 'd', e = 'e', f = 'f';
const char *GetString(void) {
static char array[7];
strcpy(array, (const char[]){a, b, c, d, e, f, 0});
return array;
}
int main() {
puts(GetString());
return 0;
}

C literals, where are these stored

Consider the following code:
#include <stdio.h>
void f(const char * str) {
str = "java";
}
void main (int argc, char * argv[]) {
const char *str = "erlang";
f(str);
printf("%s\n", str);
}
The output is "erlang" and I don't quite know why..
My current knowledge says that string literals "erlang" and "java" are both stored in the process adress space, within section "constants". And according to this, the fucntion f should change the pointer to point to "java", but this doesn't happen. Could someone please explain what is going on here?
Because function arguments are passed by value in C and modifying arguments in callee won't affece caller's local variables.
Use pointers to modify caller's local variables.
#include <stdio.h>
void f(const char ** str) { /* add * to declare pointer */
*str = "java"; /* add * to access what is pointed */
}
int main (int argc, char * argv[]) { /* use standard signature */
const char *str = "erlang";
f(&str); /* add & to get a pointer pointing at str */
printf("%s\n", str);
}
C has copy by value. When str is passed as an argument to f, it is copied first, and that very copy is actually passed to f. Assigning "java" to that copy doesn't do anything to the original str in main.
Since you are passing the value that means call by value you will see the output as java if you pass the reference like this:
#include <stdio.h>
void f(const char ** str) {
*str = "java";
}
void main (int argc, char * argv[]) {
const char *str = "erlang";
f(&str);
printf("%s\n", str);
}
output:
rabi#rabi-VirtualBox:~/rabi/c$ gcc ptr1.c
rabi#rabi-VirtualBox:~/rabi/c$ ./a.out
java
Function parameters are its local variables. You can imagine the function definition and its call the following way (I changed the name of the parameter from str to s for clearity)
void f(/*const char * s*/) {
const char *s = str;
s = "java";
}
//...
const char *str = "erlang";
f(str);
Any changes of the local variable s does not influence on the original variable str used as the argument. The variable str itself was unchanged.
You should pass arguments by reference if you are going to change them in a function. For example
#include <stdio.h>
void f( const char ** str )
{
*str = "java";
}
int main( void )
{
const char *str = "erlang";
f( &str );
printf( "%s\n", str );
}
The program output is
java
Take into account that according to the C Standard function main shall have return type int.
Could someone please explain what is going on here?
Many good answers all ready yet thought I'd try to perform a detailed walk-though with OP with slightly modified code.
Consider what happens with f("Hello World"). "Hello World" is a string literal. It initializes a char array. When an array is passed to a function or assigned to a pointer, it is converted to the address of the first element of the array. f() receives a copy of the address of 'H' in its str. #1 prints "Hello World". str is re-assigned to the address of 'j'. #2 prints "java". The function ends without affecting "Hello World".
With str = "erlang", str receives the address of the 'e'. #3 prints "erlang". On the function call, the value of main()'s str is copied to the f()'s str. #1 prints "erlang". Like before, str is re-assigned to the address of 'j'. #2 prints "java". The function ends without affecting main()'s str. #4 prints "erlang".
#include <stdio.h>
void f(const char * str) {
printf("f() before str='%s'\n", str); // #1
str = "java";
printf("f() after str='%s'\n", str); // #2
}
int main(void) {
f("Hello World");
puts("");
const char *str = "erlang";
printf("main() before str='%s'\n", str); // #3
f(str);
printf("main() after str='%s'\n", str); // #4
return 0;
}
Output
f() before str='Hello World'
f() after str='java'
main() before str='erlang'
f() before str='erlang'
f() after str='java'
main() after str='erlang'
As to OP's question:
C literals, where are these stored?
The location of a string literal is not defined in C. It might use the "process address space, within section constants", it might not. What is important is that an array is formed and the address of the first character is given in assignment to a const char *. Further detail: writing to this address is undefined behavior (UB), it may "work", fail, seg-fault, etc.
This would be more obvious if you changed the name of the argument for f...
#include <stdio.h>
void f(const char * foo) {
foo = "java";
}
int main (int argc, char * argv[]) {
const char *str = "erlang";
f(str);
printf("%s\n", str);
}
foo is a different variable to str. It has a different name, a different scope, and can contain a different value. The changes to foo won't propagate to str. If you wanted to modify str from within f, you would need to make f look like this:
void f(const char **foo) {
*foo = "java";
}
... and pass a pointer to str to f like so: f(&str);.
Did you happen to notice how I changed void main to int main? There are only two signatures for main entry points (excluding the equivalents) that are guaranteed by the standard to be portable:
int main(void) { /* ... */ }
... and ...
int main(int argc, char *argv[]) { /* ... */ }
Either way, main always returns int (or equivalent). This shouldn't inconvenience you too much, as in C99 (any half-decent compiler that's newer than fifteen years old) and C11 there's this little gem which allows you to omit return 0; from main:
If the return type of the main function is a type compatible with int, a return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument;11) reaching the } that terminates the main function returns a value of 0.
So if anything, your code using an int main entry point is not just portable but also one byte shorter than your code using a non-portable void main entry point.

error: initializer element is not a compile-time constant

I have been looking for answers but could not find anything to make this code run. I get av[1] highlighted by the compiler in the main function when declaring:
static char const *str = av[1];
Here is the code I tried to run with gcc:
#include <stdio.h>
#include <stdlib.h>
char *ft_strjoin(char const *s1, char const *s2);
void fct(char **av)
{
static char const *str = av[1];
str = ft_strjoin(av[1], av[1]);
printf("%s\n", str);
}
int main(int ac, char **av)
{
fct(&av[1]);
fct(&av[1]);
fct(&av[1]);
fct(&av[1]);
fct(&av[1]);
fct(&av[1]);
}
I found this interesting but I still don't get it and don't know how to run this code.
Quoting C11, §6.7.9, Initialization
All the expressions in an initializer for an object that has static or thread storage duration
shall be constant expressions or string literals.
In your code,
static char const *str = av[1];
av[1] is not a compile time constant value (i.e., not a constant expression). Hence the error.
You need to remove static from str to avoid the issue.
static variables need to be initialised with a compile time constants (constant literals). av[1] will be calculated at runtime and that's why you are getting the error message.
You could simulate that behaviour by writing:
static const char *str;
static bool already;
if ( !already )
{
str = av[1];
++already;
}
However this would be redundant compared to the solution of:
const char *str;
because you immediately overwrite that value anyway with the return value of your function.
(Also you pass the same argument in every call, so even if you used str, it still doesn't need to be static).

Why does the compiler complain about the assignment?

While compiling the following code, the compiler produces the warning:
assignment discards ‘const’ qualifier from pointer target type
#include<stdio.h>
int main(void)
{
char * cp;
const char *ccp;
cp = ccp;
}
And this code is ok(no warning).Why?
#include<stdio.h>
int main(void)
{
char * cp;
const char *ccp;
ccp = cp;
}
Edit: Then why isn't this ok?
int foo(const char **p)
{
// blah blah blah ...
}
int main(int argc, char **argv)
{
foo(argv);
}
Because adding constness is a "safe" operation (you are restricting what you can do to the pointed object, which is no big deal), while removing constness is not (you promised not to touch the pointed object through that pointer, and now you are trying to take back your promise).
As for the additional question, it's explained in the C-Faq: http://c-faq.com/ansi/constmismatch.html. Simply told, allowing that conversion would allow another kind of "unsafe" behavior:
int give_me_a_string(const char **p)
{
const char *str="asd";
*p=str; // p is a pointer to a const pointer, thus writing
// a in *p is allowed
}
int main()
{
char *p;
give_me_a_string(&ptrs); //< not actually allowed in C
p[5]='a'; // wooops - I'm allowed to edit str, which I promised
// not to touch
}
In the first case, you're taking a pointer to data that must not be modified (const), and assigning it to a pointer that allows modification of it's data. Bad and dangerous.
In the second case, you're taking a non-const pointer and assigning it to a pointer that can cause less trouble than the original. You're not opening yourself up to any harmful, illegal or undefined actions.

c pass a string to function then return a string

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *abc = "abc";
char *new_str;
new_str = getStr(&abc);
printf("%s", abc);
}
char *getStr(char *str)
{
printf(str);
return str;
}
What's wrong with the code above?
A bunch of small things:
You're passing &abc to getStr(). &abc is a pointer to the variable that is holding your string. Its type a pointer to a pointer, and that's incompatible with the char *str argument of getStr().
Your getStr() is defined after it is used. You either need to move its definition to before main() or add a prototype before main().
The type of a string literal like "abc" is const char *. You're defining a variable of type char *, which is dubious (since it would allow you to modify a string literal, which is not allowed).
here is a working version:
#include <stdio.h>
#include <stdlib.h>
char *getStr(char *str);
int main(void)
{
char *abc = "abc";
char *new_str;
new_str = getStr(abc);
//printf("%s", *abc); don't really need this
}
char *getStr(char *str) {
printf("%s", str);
return str;
}
Here's a list of problems with the old one:
No prototype for your function getStr.
You can't do printf(str) you need a "%s" as an additional param.
You need to change: new_str = getStr(&abc) to new_str = getStr(abc)
A C string is a pointer to a character that begins a sequence of characters that end with a null byte. The variable abc perfectly fits this definition.
Also, abc is of type pointer to character. You are passing the address of abc, ie getStr will be receiving the address of a pointer to a character--so getStr's only argument should be of type pointer to pointer to character. The types do not match up.
EDIT: Also, getStr is called before it's declared. Your compiler may allow this, but it's bad practice for many reasons. You should declare it or define it before it is used. If you are using gcc as your compiler, always use
gcc -ansi -Wall -pedantic
Those three flags will conform to ANSI standards, and it would either yell at you for the above issues or not compile.
Your getStr function takes a char *, but that's not what you're passing it - you're passing the address of the pointer, instead of passing it the actual pointer.
Change:
new_str = getStr(&abc);
to:
new_str = getStr(abc);
and I think it'll work.

Resources