pointer to a string literal - rewrite/reassign - c

Is it fine to use char pointer(char *retStatus) in below way? Like assigning/rewriting values whenever required without allocating memory? I tested it and it is working fine but would like to know is this a good approach to assign error messages to char * and then copy/concat to other static or allocated memory pointer.
void fn(char *status, size_t maxLen)
{
char *retStatus = NULL;
...
...
if(failure)
{
retStatus = "error1";
if((strlen(retStatus) + strlen(status)) < maxLen)
{
strcat_s(status, maxLen, retStatus);
}
}
...
...
if(failure)
{
retStatus = "error2";
if((strlen(retStatus) + strlen(status)) < maxLen)
{
strcat_s(status, maxLen, retStatus);
}
}
}
int main()
{
char status[10] = { 0 };
size_t statusMaxLen = sizeof(status) / sizeof(status[0]);
fn(status, statusMaxLen);
return 0;
}

Is it fine to use char pointer(char *retStatus) in below way? Like
assigning/rewriting values whenever required without allocating
memory?
A string literal represents an array of char with static storage duration, with the (rather substantial) restriction that any attempt to modify the array contents produces undefined behavior. You may use string literals in any way that you may use any other char array, subject to the restriction that you do not attempt to modify them.
With that said, it is better form to avoid assigning string literals to variables of type char *, or passing them as function arguments corresponding to parameters of that type. Instead, limit yourself to pointers of type const char *, which convey the relevant restriction explicitly.
I tested it and it is working fine but would like to know is
this a good approach to assign error messages to char * and then
copy/concat to other static or allocated memory pointer.
The particular combination of assignment followed by non-mutating access is allowed, and will work reliably, but again, it would be better to use a variable of type const char * instead of a variable of type char *. Do note, however, that it is possible to get yourself into trouble that way if you're not careful. For example, sizeof("error1") is unlikely to be equal to (retStatus = "error1", sizeof(retStatus)).

This is a valid and indeed a clever way of using pointers. The approach works fine for the given example.
The only possible issue with the approach, in the context of big and long running programs, is with the intended lifetime of the variable.
If memory is allocated explicitly using malloc, then it can also be deleted whenever the variable is no longer required. The explicit memory management will help conserve memory allocation and improve the overall performance.
In the current approach, the memory allocated to the variable will persist throughout the running time of the program.
If it is desirable to have the variable persist throughout the program runtime, then the followed approach is perfect.
If conservation of memory is a crucial requirement, then using malloc and free is a recommended approach.

Related

Is it valid to use "restrict" when there is the potential for reallocating memory (changing the pointer)?

I am attempting some optimization of code, but it is hard to wrap my head around whether "restrict" is useful in this situation or if it will cause problems.
I have a function that is passed two strings (char*) as well as an int (int*).
The second string is copied into memory following the first string, at the position indicated by the int. If this would overrun the allocation of memory for the first string, it must reallocate memory for the first string before doing so. A new pointer is created with the new allocation, and then the original first string pointer is set equal to it.
char* concatFunc (char* restrict first_string, char* const restrict second_string, int* const restrict offset) {
size_t block = 200000;
size_t len = strlen(second_string);
char* result = first_string;
if(*offset+len+1>block){
result = realloc(result,2*block);
}
memcpy(result+*offset,second_string,len+1);
*offset+=len;
return result;
}
The above function is repeatedly called by other functions that are also using the restrict keyword.
char* addStatement(char* restrict string_being_built, ..., int* const restrict offset){
char new_statement[30] = "example additional text";
string_being_built = concatFunc(string_being_built,&new_statement,offset);
}
So in the concatFunc the first_string is restricted (meaning memory pointed to will not be changed from anywhere else). But then if I am reallocating a pointer that is a copy of that, is that going to cause undefined behavior or is the compiler smart enough to accommodate that?
Basically: What happens when you restrict a pointer parameter, but then change the pointer.
What happens when you restrict a pointer parameter, but then change the pointer.
It depends on how the pointer was changed - and in this case, memcpy() risks UB.
With char* result = first_string;, inherits the restrict of char* restrict first_string.
After result = realloc(result,2*block);, result is as before and accessing via result does not collide with accessing through second_string or offset or result is new memory and accessing via result does not collide with accessing through second_string or offset.
Yet can the compiler know the newly assigned result has those one of two above properties of realloc()? After all, realloc() might be a user defined function and compiler should not assume result now has the restrict property anymore.
Thus memcpy() is in peril.
is the compiler smart enough to accommodate that?
I do not see it can, other than warn about memcpy() usage.
Of course OP can use memmove() instead of memcpy() to avoid the concern.
As I see it, a simplified example would be:
char* concatFunc (char* restrict first_string, char* restrict second_string) {
int block = rand();
first_string = foo(first_string, block);
// first_string at this point may equal second_string,
// breaking the memcpy() contract
memcpy(first_string, second_string, block);
return first_string;
}
Or even simpler
char* concatFunc (char* /* no restrict */ first_string, char* restrict second_string) {
return memcpy(first_string, second_string, 2);
}

Is it possible to define a pointer without a temp/aux variable? (Or would this be bad C-coding?)

I'm trying to understand C-pointers. As background, I'm used to coding in both C# and Python3.
I understand that pointers can be used to save the addresses of a variable (writing something like type* ptr = &var;) and that incrementing pointers is equivalent to incrementing the index of an array of objects of that object type type. But what I don't understand is whether or not you can use pointers and deferenced objects of the type (e.g. int) without referencing an already-defined variable.
I couldn't think of a way to do this, and most of the examples of C/C++ pointers all seem to use them to reference a variable. So it might be that what I'm asking is either impossible and/or bad coding practice. If so, it would be helpful to understand why.
For example, to clarify my confusion, if there is no way to use pointers without using predefined hard-coded variables, why would you use pointers at all instead of the basic object directly, or arrays of objects?
There is a short piece of code below to describe my question formally.
Many thanks for any advice!
// Learning about pointers and C-coding techniques.
#include <stdio.h>
/* Is there a way to define the int-pointer age WITHOUT the int variable auxAge? */
int main() // no command-line params being passed
{
int auxAge = 12345;
int* age = &auxAge;
// *age is an int, and age is an int* (i.e. age is a pointer-to-an-int, just an address to somewhere in memory where data defining some int is expected)
// do stuff with my *age int e.g. "(*age)++;" or "*age = 37;"
return 0;
}
Yes, you can use dynamic memory (also known as "heap") allocation:
#include <stdlib.h>
int * const integer = malloc(sizeof *integer);
if (integer != NULL)
{
*integer = 4711;
printf("forty seven eleven is %d\n", *integer);
free(integer);
// At this point we can no longer use the pointer, the memory is not ours any more.
}
This asks the C library to allocate some memory from the operating system and return a pointer to it. Allocating sizeof *integer bytes makes the allocation fit an integer exactly, and we can then use *integer to dereference the pointer, that will work pretty much exactly like referencing an integer directly.
There are many good reasons to use pointers in C, and one of them is, that you can only pass by value in C - you cannot pass by reference. Therefore passing pointer to an existing variable saves you the overhead of copying it to stack. As an example, let's assume this very large structure:
struct very_large_structure {
uint8_t kilobyte[1024];
}
And now assume a function which needs to use this structure:
bool has_zero(struct very_large_structure structure) {
for (int i = 0; i < sizeof(structure); i++) {
if (0 == structure.kilobyte[i]) {
return true;
}
}
return false;
}
So for this function to be called, you need to copy the whole structure to stack, and that can be especially on embedded platforms where C is widely used an unacceptable requirement.
If you will pass the structure via pointer, you are only copying to the stack the pointer itself, typically a 32-bit number:
bool has_zero(struct very_large_structure *structure) {
for (int i = 0; i < sizeof(*structure); i++) {
if (0 == structure->kilobyte[i]) {
return true;
}
}
return false;
}
This is by no mean the only and most important use of pointers, but it clearly shows the reasoning why pointers are important in C.
But what I don't understand is whether or not you can use pointers and deferenced objects of the type (e.g. int) without referencing an already-defined variable.
Yes, there are two cases where this is possible.
The first case occurs with dynamic memory allocation. You use the malloc, calloc, or realloc functions to allocate memory from a dynamic memory pool (the "heap"):
int *ptr = malloc( sizeof *ptr ); // allocate enough memory for a single `int` object
*ptr = some_value;
The second case occurs where you have a fixed, well-defined address for an I/O channel or port or something:
char *port = (char *) OxDEADBEEF;
although this is more common in embedded systems than general applications programming.
EDIT
Regarding the second case, chapter and verse:
6.3.2.3 Pointers
...
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment.
Parameters to a function in C are always pass by value, so changing a parameter value in a function isn't reflected in the caller. You can however use pointers to emulate pass by reference. For example:
void clear(int *x)
{
*x = 0;
}
int main()
{
int a = 4;
printf("a=%d\n", a); // prints 4
clear(&a);
printf("a=%d\n", a); // prints 0
return 0;
}
You can also use pointers to point to dynamically allocated memory:
int *getarray(int size)
{
int *array = malloc(size * sizeof *array);
if (!array) {
perror("malloc failed");
exit(1);
}
return array;
}
These are just a few examples.
Most common reason: because you wish to modify the contents without passing them around.
Analogy:
If you want your living room painted, you don't want to place your house on a truck trailer, move it to the painter, let him do the job and then haul it back. It would be expensive and time consuming. And if your house is to wide to get hauled around on the streets, the truck might crash. You would rather tell the painter which address you live on, have him go there and do the job.
In C terms, if you have a big struct or similar, you'll want a function to access this struct without making a copy of it, passing a copy to the function, then copy back the modified contents back into the original variable.
// BAD CODE, DONT DO THIS
typedef struct { ... } really_big;
really_big rb;
rb = do_stuff(rb);
...
rb do_stuff (really_big thing) // pass by value, return by value
{
thing->something = ...;
...
return thing;
}
This makes a copy of rb called thing. It is placed on the stack, wasting lots of memory and needlessly increasing the stack space used, increasing the possibility of stack overflow. And copying the contents from rb to thing takes lots of execution time. Then when it is returned, you make yet another copy, from thing back to rb.
By passing a pointer to the struct, none of the copying takes place, but the end result is the very same:
void do_stuff (really_big* thing)
{
thing->something = ...;
}

Function that returns an array of strings C

Is there a way to return an array of strings from a function without using dynamic memory allocation? The function goes something like this:
char** modify(char original[1000][1000]){
char result[1000][1000];
// some operations are applied to the original
// the original is copied to the result
return result;
}
In C, an object has one of four storage durations (also called lifetimes): static, thread, automatic, and allocated (C 2018 6.2.4 1).
Objects with automatic duration are automatically created inside a function and cease to exist when execution of the function ends, so you cannot use these that is created inside your function to return a value.
Objects with allocated storage duration persist until freed, but you have asked to exclude those.
Thread storage duration is either likely not applicable to your situation or is effectively equivalent to static storage duration, which I will discuss below.
This means your options are:
Let the caller pass you an object in which to return data. That object may have any storage duration—your function does not need to know since it will neither allocate nor release it. If you do this, the caller must provide an object large enough to return the data. If this size is not known in advance, you can either provide a separate function to calculate it (which the caller will then use to allocate the necessary space) or incorporate that into your function as a special mode in which it provides the size required without providing the data yet.
Use an object with static storage duration. Since this object is created when the program starts, you cannot adjust the size within your function. You must build a size limit into the program. A considerable problem with this approach is the function has only one object to return, so only one can be in use at a time. This means that, once the function is called, it should not be called again until the caller has finished using the data in the object. This is both a severe limitation in program design and an opportunity for bugs, so it is rarely used.
Thus, a typical solution looks like this:
size_t HowMuchSpaceIsNeeded(char original[1000][1000])
{
… Calculate size.
return SizeNeeded;
}
void modify(char destination[][1000], char original[1000][1000])
{
… Put results in destination.
}
A variation for safety is:
void modify(char destination[][1000], size_t size, char original[1000][1000])
{
if (size < amount needed)
… Report error (possibly by return value, or program abort).
… Put results in destination.
}
Then the caller does something like:
size_t size = HowMuchSpaceIsNeeded(original);
char (*results)[1000] = malloc(size);
if (!results)
… Report error.
modify(results, size, original)
… Work with results.
free(results);
As Davistor notes, a function can return an array embedded in a structure. In terms of C semantics, this avoids the object lifetime problem by returning a value, not an object. (The entire contents of the structure is the value of the structure.) In terms of actual hardware implementation, it is largely equivalent to the caller-passes-an-object method above. (The reasoning here is based on the logic of how computers work, not on the C specification: In order for a function to return a value that requires a lot of space to represent, the caller must provide the required space to the called function.) Generally, the caller will allocate space on the stack and provide that to the called function. This may be faster than a malloc, but it may also use a considerable amount of stack space. Usually, we avoid using sizable amounts of stack space, to avoid overflowing the stack.
Although you cannot return an array type in C, you can return a struct containing one:
#include <string.h>
#define NSTRINGS 100
#define STR_LEN 100
typedef struct stringtable {
char table[NSTRINGS][STR_LEN];
} stringtable;
stringtable modify ( const stringtable* const input )
{
stringtable result;
memcpy( &result, input, sizeof(result) );
return result;
}
I would generally recommend that you use Eric Postpischil’s solution, however. One way this might not be efficient is if you need to write to a specific variable or location. In that case, you could pass in its address, but here, you would need to create a large temporary array and copy it.
You cannot return a pointer to memory allocated inside a function without dynamic allocation. In your case, you will allocate result[1000][1000] on the stack in a zone which will be deallocated once the function returns. Besides dynamic allocation, you have the option of passing a buffer as an argument to your function:
void modify(char original[1000][1000], char result[][]) { ... }
Now the result matrix has to be allocated outside the modify function and its lifetime will not depend on the function's lifetime. Basically you pass the function an already allocated matrix where the result will be written.
You can't return pointers to the local variables, because lifetime of the memory to which they point is limited to the scope.
Basically result is a pointer to the stack-allocated array first element, so returning it and dereferencing it later will result in undefined behavior.
To bypass this issue, there are few work-arounds.
One of those, I saw in couple of projects, but I don't recommend it, because it is unsafe.
char** modify(char original[1000][1000]){
// `result` is static array, which lifetime is equal to the lifetime of the program
// Calling modify more than one time will result in overwriting of the `result`.
static char result[1000][1000];
return result;
}
Another approach will be to receive result pointer as function argument, so the caller will allocate storage for it.
void modify(char original[1000][1000], char (*result)[1000]){
result[0][1] = 42;
//...
}
void main() {
char result[1000][1000];
modify(someOriginal, result);
}
Anyway, I recommend you to read some decent book about C language and how a computer memory works.
You can use a linked list starts with the first string and ends with the last string .

Find the size of reserved memory for a character array in C

I'm trying to learn C and as a start, i set off writing a strcpy for my own practice. As we know, the original strcpy easily allows for security problems so I gave myself the task to write a "safe" strcpy.
The path I've chosen is to check wether the source string (character array) actually fits in the destination memory. As I've understood it, a string in C is nothing more than a pointer to a character array, 0x00 terminated.
So my challenge is how to find how much memory the compiler actually reserved for the destination string?
I tried:
sizeof(dest)
but that doesn't work, since it will return (as I later found out) the size of dest which is actually a pointer and on my 64 bit machine, will always return 8.
I also tried:
strlen(dest)
but that doesn't work either because it will just return the length until the first 0x0 is encountered, which doesn't necessarily reflect the actual memory reserved.
So this all sums up to the following question: How to find our how much memory the compiler reserved for my destination "string"???
Example:
char s[80] = "";
int i = someFunction(s); // should return 80
What is "someFunction"?
Thanks in advance!
Once you pass a char pointer to the function you are writing, you will loose knowledge for how much memory is allocated to s. You will need to pass this size as argument to the function.
You can use sizeof to check at compile time:
char s[80] = "";
int i = sizeof s ; // should return 80
Note that this fails if s is a pointer:
char *s = "";
int j = sizeof s; /* probably 4 or 8. */
Arrays are not pointers. To keep track of the size allocated for a pointer, the program simply must keep track of it. Also, you cannot pass an array to a function. When you use an array as an argument to a function, the compiler converts that to a pointer to the first element, so if you want the size to be avaliable to the called function, it must be passed as a parameter. For example:
char s[ SIZ ] = "";
foo( s, sizeof s );
So this all sums up to the following question: How to find our how much memory the compiler reserved for my destination "string"???
There is no portable way to find out how much memory is allocated. You have to keep track of it yourself.
The implementation must keep track of how much memory was malloced to a pointer, and it may make something available for you to find out. For example, glibc's malloc.h exposes
size_t malloc_usable_size (void *__ptr)
that gives you access to roughly that information, however, it doesn't tell you how much you requested, but how much is usable. Of course, that only works with pointers you obtained from malloc (and friends). For an array, you can only use sizeof where the array itself is in scope.
char s[80] = "";
int i = someFunction(s); // should return 80
In an expression s is a pointer to the first element of the array s. You cannot deduce the size of an array object with the only information of the value of a pointer to its first element. The only thing you can do is to store the information of the size of the array after you declare the array (here sizeof s) and then pass this information to the functions that need it.
There's no portable way to do it. However, the implementation certainly needs to know this information internally. Unix-based OSes, like Linux and OS X, provide functions for this task:
// OS X
#include <malloc/malloc.h>
size_t allocated = malloc_size(somePtr);
// Linux
#include <malloc.h>
size_t allocated = malloc_usable_size(somePtr);
// Maybe Windows...
size_t allocated = _msize(somePtr);
A way to tag the member returned by malloc is to always malloc an extra sizeof(size_t) bytes. Add that to the address malloc returns, and you have a storage space for storing the actual length. Store the malloced size - the sizeof (size_t) there, and you have the basis for your new set of functions.
When you pass two of these sorts of pointers into your new-special strcpy, you can subtract sizeof(size_t) off the pointers, and access the sizes directly. That lets you decide if the memory can be copied safely.
If you are doing strcat, then the two sizes, along with calculating the strlens means you can do the same sort of check to see if the results of the strcat will overflow the memory.
It's doable.
It's probably more trouble than it's worth.
Consider what happens if you pass in a character pointer that was not mallocated.
The assumption is that the size is before the pointer. That assumption is false.
Attempting to access the size in that case is undefined behavior. If you are lucky, you may get a signal.
One other implication of that sort of implementation is that when you go to free the memory, you have to pass in exactly-the-pointer-that-malloc-returned. If you don't get that right, heap corruption is possible.
Long story short...
Don't do it that way.
For situations where you are using character buffers in your program, you can do some smoke and mirrors to get the effect that you want. Something like this.
char input[] = "test";
char output[3];
if (sizeof(output) < sizeof(input))
{
memcpy(output,input,sizeof(input) + 1);
}
else
{
printf("Overflow detected value <%s>\n",input);
}
One can improve the error message by wraping the code in a macro.
#define STRCPYX(output,input) \
if (sizeof(output) < sizeof(input)) \
{ \
memcpy(output,input,sizeof(input) + 1); \
} \
else \
{ \
printf("STRCPYX would overflow %s with value <%s> from %s\n", \
#output, input, #input); \
} \
char input[] = "test";
char output[3];
STRCPYX(output,input);
While this does give you what you want, the same sort of risks apply.
char *input = "testing 123 testing";
char output[9];
STRCPYX(output,input);
the size of input is 8, and output is 9, the value of output ends up as "Testing "
C was not designed to protect the programmer from doing things incorrectly.
It is kind of like you are attempting to paddle upriver :)
It is a good exercise to think about.
Although arrays and pointers can appear to be interchangeable, they differ in one important aspect; an array has size. However because an array when passed to a function "degrades" to a pointer, the size information is lost.
The point is that at some point you know the size of the object - because you allocated it or declared it to be a certain size. The C language makes it your responsibility to retain and disseminate that information as necessary. So after your example:
char s[80] = ""; // sizeof(s) here is 80, because an array has size
int i = someFunction(s, sizeof(s)) ; // You have to tell the function how big the array is.
There is no "magic" method of determining the size of the array within someFunction(), because that information is discarded (for reasons of performance and efficiency - C is relatively low level in this respect, and does not add code or data that is not explicit); if the information is needed, you must explicitly pass it.
One way in which you can pass a string and retain size information, and even pass the string by copy rather than by reference is to wrap the string in a struct thus:
typedef struct
{
char s[80] ;
} charArray_t ;
then
charArray_t s ;
int i = someFunction( &s ) ;
with a definition of someFunction() like:
int someFunction( charArray_t* s )
{
return sizeof( s->s ) ;
}
You don't really gain much by doing that however - just avoid the additional parameter; in fact you loose some flexibility because someFunction() now only takes a fixed array length defined by charrArray_t, rather than any array. Sometimes such restrictions are useful. On feature of this approach is that you can pass by copy this:
int i = someFunction( s ) ;
then
int someFunction( charArray_t s )
{
return sizeof( s.s ) ;
}
since structures unlike arrays can be passed this way. You can equally return by copy as well. It can be somewhat inefficient however. Sometimes the convenience and safety outweigh the inefficiency however.

C char* pointers pointing to same location where they definitely shouldn't

I'm trying to write a simple C program on Ubuntu using Eclipse CDT (yes, I'm more comfortable with an IDE and I'm used to Eclipse from Java development), and I'm stuck with something weird. On one part of my code, I initialize a char array in a function, and it is by default pointing to the same location with one of the inputs, which has nothing to do with that char array. Here is my code:
char* subdir(const char input[], const char dir[]){
[*] int totallen = strlen(input) + strlen(dir) + 2;
char retval[totallen];
strcpy(retval, input);
strcat(retval, dir);
...}
Ok at the part I've marked with [*], there is a checkpoint. Even at that breakpoint, when I check y locals, I see that retval is pointing to the same address with my argument input. It not even possible as input comes from another function and retval is created in this function. Is is me being unexperienced with C and missing something, or is there a bug somewhere with the C compiler?
It seems so obvious to me that they should't point to the same (and a valid, of course, they aren't NULL) location. When the code goes on, it literally messes up everything; I get random characters and shapes in console and the program crashes.
I don't think it makes sense to check the address of retval BEFORE it appears, it being a VLA and all (by definition the compiler and the debugger don't know much about it, it's generated at runtime on the stack).
Try checking its address after its point of definition.
EDIT
I just read the "I get random characters and shapes in console". It's obvious now that you are returning the VLA and expecting things to work.
A VLA is only valid inside the block where it was defined. Using it outside is undefined behavior and thus very dangerous. Even if the size were constant, it still wouldn't be valid to return it from the function. In this case you most definitely want to malloc the memory.
What cnicutar said.
I hate people who do this, so I hate me ... but ... Arrays of non-const size are a C99 extension and not supported by C++. Of course GCC has extensions to make it happen.
Under the covers you are essentially doing an _alloca, so your odds of blowing out the stack are proportional to who has access to abuse the function.
Finally, I hope it doesn't actually get returned, because that would be returning a pointer to a stack allocated array, which would be your real problem since that array is gone as of the point of return.
In C++ you would typically use a string class.
In C you would either pass a pointer and length in as parameters, or a pointer to a pointer (or return a pointer) and specify the calls should call free() on it when done. These solutions all suck because they are error prone to leaks or truncation or overflow. :/
Well, your fundamental problem is that you are returning a pointer to the stack allocated VLA. You can't do that. Pointers to local variables are only valid inside the scope of the function that declares them. Your code results in Undefined Behaviour.
At least I am assuming that somewhere in the ..... in the real code is the line return retval.
You'll need to use heap allocation, or pass a suitably sized buffer to the function.
As well as that, you only need +1 rather than +2 in the length calculation - there is only one null-terminator.
Try changing retval to a character pointer and allocating your buffer using malloc().
Pass the two string arguments as, char * or const char *
Rather than returning char *, you should just pass another parameter with a string pointer that you already malloc'd space for.
Return bool or int describing what happened in the function, and use the parameter you passed to store the result.
Lastly don't forget to free the memory since you're having to malloc space for the string on the heap...
//retstr is not a const like the other two
bool subdir(const char *input, const char *dir,char *retstr){
strcpy(retstr, input);
strcat(retstr, dir);
return 1;
}
int main()
{
char h[]="Hello ";
char w[]="World!";
char *greet=(char*)malloc(strlen(h)+strlen(w)+1); //Size of the result plus room for the terminator!
subdir(h,w,greet);
printf("%s",greet);
return 1;
}
This will print: "Hello World!" added together by your function.
Also when you're creating a string on the fly you must malloc. The compiler doesn't know how long the two other strings are going to be, thus using char greet[totallen]; shouldn't work.

Resources