Perform special kind of initialization of string in C - c

As you know in C, we can initialize string variables like this:
char text[1024] =
"Hello "
"World";
But what if I have a function that returns the word "World"?
char text[1024] =
"Hello "
World();
It seems to me that's not possible in C.
Please confirm.

What you want is not possible.
The L-value to the assigment operator needs to be modifyable, which an array isn't.
From the C11-Standard:
6.5.16/2
An assignment operator shall have a modifiable lvalue as its left operand.
The only exception to this is during initialisation when using literals as R-value:
char text[1024] = "Hello ""World";
From the C11-Standard:
6.7.9/14
An array of character type may be initialized by a character string literal or UTF−8 string
literal, optionally enclosed in braces. Successive bytes of the string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the
elements of the array.

If World() is something that always returns "World", then define it as a macro:
#define World "World"
And then do:
char text[1024] =
"Hello "
World; //Without parentheses
EDIT
String concatenation in the way you expect to do is made by the C preprocessor.You are actually looking for a runtime concatenation of two strings, which can be performed in multiple ways. The simplest one is achieved by strcat function, but the initialization should be performed explicitly by a function:
char text[1024];
void init_text() {
strcpy(text, "Hello ");
strcat(text, World()); //World() defined somewhere else
}
Alternative using sprintf :
void init_text() {
sprintf(text, "Hello %s", World());
}
Then in the main function, call init_text() at the beginning:
int main() {
init_text();
...
}

It is not possible in standard C to initialize something with some runtime specific behavior. So the standard portable way is to initialize the data by calling a function at the beginning of main, as answered by Claudix.
However, if you are using some recent GCC compiler (or Clang/LLVM) you could otherwise, on some systems (including Linux and probably other POSIX systems), use some constructor attribute on function. So you would declare:
static void init_text(void) __attribute__((constructor));
and define init_text like in Claudix's answer without having to call it in your main : since it has the constructor attribute, it would be called magically before main, or during dlopen(3) if it appears inside a dynamically linked plugin or library.
A more portable trick might be to have a function returning that text which will initialize it during its first call. So instead of using text you would call get_my_text() everywhere (perhaps by putting #define text get_my_text() in a header, but I don't recommend doing so for readability reasons, so replace every occurrence of text by get_my_text() ...), and define it as:
const char*get_my_text() {
static char textbuf[1024];
if (textbuf[0]) {
// already initialized, so return it
return textbuf;
};
snprintf(textbuf, sizeof(textbuf), "Hello %s", World());
return textbuf;
}
Beware that such a trick is not reliable in multi-threaded programs: two threads might run get_my_text exactly at the same time, and you have a data race. In a multi-threaded app use e.g. pthread_once
You could even define get_my_text as a static inline function in your header file.
PS Always prefer snprintf(3) to sprintf to avoid buffer overflows. Also notice that in standard C++ any static or global data with some given constructor is initialized before main ... hence the name of GCC function attribute...

Related

How to use strset() in linux using c language

I can’t use strset function in C. I'm using Linux, and I already imported string.h but it still does not work. I think Windows and Linux have different keywords, but I can’t find a fix online; they’re all using Windows.
This is my code:
char hey[100];
strset(hey,'\0');
ERROR:: warning: implicit declaration of function strset; did you
meanstrsep`? [-Wimplicit-function-declaration]
strset(hey, '\0');
^~~~~~ strsep
First of all strset (or rather _strset) is a Windows-specific function, it doesn't exist in any other system. By reading its documentation it should be easy to implement though.
But you also have a secondary problem, because you pass an uninitialized array to the function, which expects a pointer to the first character of a null-terminated string. This could lead to undefined behavior.
The solution to both problems is to initialize the array directly instead:
char hey[100] = { 0 }; // Initialize all of the array to zero
If your goal is to "reset" an existing null-terminated string to all zeroes then use the memset function:
char hey[100];
// ...
// Code that initializes hey, so it becomes a null-terminated string
// ...
memset(hey, 0, sizeof hey); // Set all of the array to zero
Alternatively if you want to emulate the behavior of _strset specifically:
memset(hey, 0, strlen(hey)); // Set all of the string (but not including
// the null-terminator) to zero
strset is not a standard C function. You can use the standard function memset. It has the following declaration
void *memset(void *s, int c, size_t n);
For example
memset( hey, '\0', sizeof( hey ) );

Why are strings in C declared with 'const'?

For example, why not:
char *s= "example";
instead of:
const char *s= "example";
I understand that const makes it unchangeable, but why do I receive an error when compiling the first?
Additionally, how does the concept apply to
int * x;
vs
const int *x;
I see the second used a lot more, is it good practice to use "cons int *"?
There's no requirement to use const, but it's a good idea.
In C, a string literal is an expression of type char[N], where N is the length of the string plus 1 (for the terminating '\0' null character). But attempting to modify the array that corresponds to the string literal has undefined behavior. Many compilers arrange for that array to be stored in read-only memory (not physical ROM, but memory that's marked read-only by the operating system). (An array expression is, in most contexts converted to a pointer expression referring to the initial element of the array object.)
It would have made more sense to make string literals const, but the const keyword did not exist in old versions of C, and it would have broken existing code. (C++ did make string literals const).
This:
char *s= "example"; /* not recommended */
is actually perfectly valid in C, but it's potentially dangerous. If, after this declaration, you do:
s[0] = 'E';
then you're attempting to modify the string literal, and the behavior is undefined.
This:
const char *s= "example"; /* recommended */
is also valid; the char* value that results from evaluating the string literal is safely and quietly converted to const char*. And it's generally better than the first version because it lets the compiler warn you if you attempt to modify the string literal (it's better to catch errors at compile time than at run time).
If you get an error on your first example, then it's likely that you're inadvertently compiling your code as C++ rather than as C -- or that you're using gcc's -Wwrite-strings option or something similar. (-Wwrite-strings makes string literals const; it can improve safety, but it can also cause gcc to reject, or at least warn about, valid C code.)
With Visual Studio 2015 at warning level 4, this compiles and runs whether compiled as C or C++:
#include <stdio.h>
char *s1= "example\n";
const char *s2= "example\n";
int main(int argc, char **argv)
{
printf(s1); // prints "example"
s1[2] = 'x';
printf(s1); // prints "exxmple"
printf(s2);
return 0;
}
If I add this line, it will fail to compile as C or C++ with every compiler I know of:
s2[2] = 'x'; // produces compile error
This is the error the const keyword is designed to avoid. It simply tells the compiler not to allow assignments to the object pointed to.
It doesn't matter if your pointer points to char or int or anything else. The const keyword has the same effect on all pointers, and that's to make it impossible (well, very hard) to assign to the thing declared const.
A string literal used as a value compiles to an array of char that should not be modified. Attempting to modify it invokes undefined behavior. For historical reasons of backward compatibility, its type is char [] although is really should be const char []. You can enable extra compiler warnings to change this and instruct the compiler to consider such strings to be const.

C Aligning string literals for a specific use case

I'm trying to align string literals in a specific way, as how I'm using it in my code is fairly specific. I don't want to have to assign it to a variable, for instance many of my functions are using it as a direct argument. And I want it to work both in local scope or global scope.
Usage example:
char *str = ALIGNED_STRING("blah"); //what I want
foo(ALIGNED_STRING("blah")); //what I want
_Alignas(16) char str[] = "blah"; //not what I want (but would correctly align the string)
The ideal solution would be (_Alignas(16) char[]){ "blah" } or a worser case using the GCC/Clang compiler extensions for alignment (__attribute__((alignment(16))) char[]){ "blah" }, but neither works (they're ignored and the default alignment for the type is used).
So my next thought was to align it myself, and then my functions that use the string could then fix it up correctly. e.g. #define ALIGNED_STRING(str) (char*)(((uintptr_t)(char[]){ "xxxxxxxxxxxxxxx" str } + 16 - 1) & ~(16 - 1)) (where the string containing 'x' would represent data needed to understand where the real string can be found, that's easy but just for the example assume the 'x' is fine). Now that works fine in local scope, but fails in the global scope. Since the compiler complains about it not being a compile-time constant (error: initializer element is not a compile-time constant); I would've thought it would work but it seems only addition and subtraction are valid operations on the pointer at compile-time.
So I'm wondering if there's anyway to achieve what I want to do? At the moment I'm just using the latter example (padding and manually aligning) and avoiding to use it in the global scope (but I would really want to). And the best solution would avoid needing to make runtime adjustments (like using the alignment qualifier would), but that doesn't seem possible unless I apply it to a variable (but as mentioned that's not what I want to do).
Was able to get close to OP's need with a compound literal. (C99)
#include <stdio.h>
#include <stddef.h>
void bar(const char *s) {
printf("%p %s\n", (void*)s, s);
}
// v-- compound literal --------------------------v
#define ALIGNED_STRING(S) (struct { _Alignas(16) char s[sizeof S]; }){ S }.s
int main() {
char s[] = "12";
bar(s);
char t[] = "34";
bar(t);
bar(ALIGNED_STRING("asdfas"));
char *u = ALIGNED_STRING("agsdas");
bar(u);
}
Output
0x28cc2d 12
0x28cc2a 34
0x28cc30 asdfas // 16 Aligned
0x28cc20 agsdas // 16 Aligned

Know if const qualifier is used

Is there any way in C to find if a variable has the const qualifier? Or if it's stored in the .rodata section?
For example, if I have this function:
void foo(char* myString) {...}
different actions should be taken in these two different function calls:
char str[] = "abc";
foo(str);
foo("def");
In the first case I can modify the string, in the second one no.
Not in standard C, i.e. not portably.
myString is just a char* in foo, all other information is lost. Whatever you feed into the function is automatically converted to char*.
And C does not know about ".rodata".
Depending on your platform you could check the address in myString (if you know your address ranges).
You can't differ them using the language alone. In other words, this is not possible without recurring to features specific to the compiler you're using, which is likely not to be portable. A few important remarks though:
In the first case you COULD modify the string, but you MUST NOT. If you want a mutable string, use initialization instead of assignment.
char *str1 = "abc"; // NOT OK, should be const char *
const char *str2 = "abc"; // OK, but not mutable
char str3[] = "abc"; // OK, using initialization, you can change its contents
#include<stdio.h>
void foo(char *mystr)
{
int a;
/*code goes here*/
#ifdef CHECK
int local_var;
printf(" strings address %p\n",mystr);
printf("local variables address %p \n",&local_var);
puts("");
puts("");
#endif
return;
}
int main()
{
char a[]="hello";
char *b="hello";
foo(a);
foo(b);
foo("hello");
}
On compiling with gcc -DCHECK prog_name.c and executing on my linux machine the following output comes...
strings address 0xbfdcacf6
local variables address 0xbfdcacc8
strings address 0x8048583
local variables address 0xbfdcacc8
strings address 0x8048583
local variables address 0xbfdcacc8
for first case when string is defined and initialized in the "proper c way for mutable strings" the difference between the addresses is 0x2E.(5 bytes).
in the second case when string is defined as char *p="hello" the differences in addresses is
0xB7D82745.Thats bigger than the size of my stack.so i am pretty sure the string is not on the stack.Hence the only place where you can find it is .rodata section.
The third one is similar case
PS:As mentioned above this isn't portable but the original question hardly leaves any scope for portability by mentioning .rodata :)
GCC provides the __builtin_constant_p builtin function, which enables you to determine whether an expression is constant or not at compile-time:
Built-in Function: int __builtin_constant_p (exp)
You can use the built-in function __builtin_constant_p to determine if a value is known to be constant at compile-time and hence that GCC can perform constant-folding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compile-time constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the `-O' option.
So I guess you should rewrite your foo function as a macro in such a case:
#define foo(x) \
(__builtin_constant_p(x) ? foo_on_const(x) : foo_on_var(x))
foo("abc") would expand to foo_on_const("abc") and foo(str) would expand to foo_on_var(str).

Defining const pointer to a const string

Readed bog of Ulrich Drepper and come across 2 entries that looks like conficting.
In the first one (string in global space) Ulrich states that the string should be defines as:
const char _pcre_ucp_names[] = "blabla";
while already in second one (string in function) he argues it should be declared as:
static const char _pcre_ucp_names[] = "blabla";
Can you explain what is the better name to declate a string?
UDP:
First of all I removed C++ tag - this question is valid for C as well for C++. So I don't think answers which explain what static means in class/function/file scope is relevant.
Read the articles before answering. The articles deal about memory usage - where the actual data is stored (in .rodata or in .data section), do the string should be relocated (if we're talking about unix/linux shared objects), is it possible to change the string or not.
UDP2
In first one it's said that for global variable following form:
(1) const char *a = "...";
is less good than
(2) const char a[] = "..."
Why? I always thought that (1) is better, since (2) actually replicate the string we assign it, while (1) only points to string we assign.
It depends—if you need the string to be visible to other source files in a project, you can't declare it static. If you only need to access it from the file where it's defined, then you probably want to use static.
The blog post you mention was talking about something different, though:
#include <stdio.h>
#include <string.h>
int main(void)
{
const char s[] = "hello"; /* Notice this variable is inside a function */
strcpy (s, "bye");
puts (s);
return 0;
}
In that case, static means something different: this creates a variable that persists across multiple calls to the same function. His other example showed a global variable, outside of a function.
EDIT:
To clarify, since you edited your question, the reason you don't want to use const char *a = "string" is you create an extra writable pointer. This means that, while you can't change the characters of the string, you can still make the pointer point to an entirely different string. See below:
const char *hello = "hello";
int main( int argc , char const *argv[] )
{
hello = "goodbye";
puts(hello);
return 0;
}
That example compiles and runs. If hello is supposed to be constant, this is surely not what you want. Of course, you can also get around this by writing this:
const char * const hello = "hello";
You still have two variables where you only needed one though -- hello is a pointer to a string constant, where if it's an array there isn't that extra pointer in the way.
Declaring it static means (if at global, file level) that it won't be visible outside this translation unit, or (if inside a scope) that it will retain its value between executions of the scope. It has nothing to do with the "constness" of the data.
While this is indeed a const string, it's neither a pointer nor a const pointer nor is the second one a declaration.
Both define (and initialize) a constant array of characters.
The only difference is that the first one will be visible and accessible from other translation units (proper declarations assumed), while the second one won't.
Note that, in C++, instead of making variables and constants static, you could put them into an unnamed namespace. Then, too, they are inaccessible from other translation units.
On the
const char *abc = "..."; and <br/>
const char def[] = "..."
part of the question...
The only difference to my knowledge is that the array-style definition is not demoted to a pointer when using the sizeof operator.
sizeof(abc) == size of pointer type <br/>
sizeof(def) == size of string (including \0)
Is it for use at a global (file) level or within a class or within a function ? The meaning of static differs ..
For a file level: It depends on the scope you want (either global or limited to the file). No other difference.
For a class: It's best with the static if you're not gonna change it. Because a const can still be redefined on the constructor so it will have to allocate space for a pointer inside the class itself if it's not static. If it is static then no need for a pointer in each class.
For a function: Doesn't really change anything important I think. In the non static case, a pointer will be allocated on the stack and initialized to point in .rodata at each function call. In the other case, it's more like a global variable but with limited scope.

Resources