C Programming: Find Length of a Char* with Null Bytes - c

If I have a character pointer that contains NULL bytes is there any built in function I can use to find the length or will I just have to write my own function? Btw I'm using gcc.
EDIT:
Should have mentioned the character pointer was created using malloc().

If you have a pointer then the ONLY way to know the size is to store the size separately or have a unique value which terminates the string. (typically '\0') If you have neither of these, it simply cannot be done.
EDIT: since you have specified that you allocated the buffer using malloc then the answer is the paragraph above. You need to either remember how much you allocated with malloc or simply have a terminating value.
If you happen to have an array (like: char s[] = "hello\0world";) then you could resort to sizeof(s). But be very careful, the moment you try it with a pointer, you will get the size of the pointer, not the size of an array. (but strlen(s) would equal 5 since it counts up to the first '\0').
In addition, arrays decay to pointers when passed to functions. So if you pass the array to a function, you are back to square one.
NOTE:
void f(int *p) {}
and
void f(int p[]) {}
and
void f(int p[10]) {}
are all the same. In all 3 versions, p is a pointer, not an array.

How do you know where the string ends, if it contains NULL bytes as part of it? Certainly no built in function can work with strings like that. It'll interpret the first null byte as the end of the string.
If you want the length, you'll have to store it yourself. Keep in mind that no standard library string functions will work correctly on strings like these.

You'll need to keep track of the length yourself.
C strings are null terminated, meaning that the first null character signals the end of the string. All builtin string functions rely on this, so if you have a buffer that can contain NULLs as part of the data then you can't use them.
Since you're using malloc then you may need to keep track of two sizes: the size of your allocated buffer, and how many characters within that buffer constitute valid data.

Related

Modifying string in function C

Let's say I want to modify char array using function.
I am always seeing people using malloc, calloc, or pointers to modify int, char, or 2D arrays.
Am I right, if I say, that string can be returned from function only if I use malloc, create that array pointer and return him? Then why not getting/altering string, by passing it to function parameter?
Isn't my demonstration, which is using char array in parameter easier, than allocating/freeing? Is my concept wrong, or why am I never seeing people passing arrays to function? I am only seeing codes with passing like "char *array", not "char array[]", using malloc etc, when I see this method of altering char array easy. Am I missing something?
#include <stdio.h>
void change(char array[]){
array[0]='K';
}
int main(){
char array[]="HEY";
printf("%s\n", array);
change(array);
printf("%s\n",array );
return 0;
}
If you only need to change existing characters in the string, and the string will be in a variable, and you don't mind the side-effect of your original string being modified, then your solution may be acceptable and indeed easier. But:
What if you want to get a modified string, but also want to retain the original? To avoid destroying an arbitrary-sized original, you need to malloc space, make a copy, and modify that.
And what if you want to extend the string? If your change is to add " YOU" to the string, it can't modify the original because there's no space for it--it'll cause a buffer overflow, since there's only 4 bytes allocated for "HEY" (three letters plus the null terminator). Again, the solution involves mallocing space to work with.
Functions that make changes using your technique typically need a size or length parameter to avoid overflowing the array and causing a crash and a potential security risk. But although that avoids the overflow, there's still the question of what happens if there's not enough space: Silently drop some data? Pass back a flag or special value to indicate there wasn't enough space, and expect the caller to handle it? In the long run, it ends up easier to write it right the first time, and malloc/calloc the space and deal with having to free it up later and all that.

A C function that returns a char array Vs a function working with 2 char arrays

I'm a C beginner so my apologies if this doubt is too obvious.
What would be considered the most efficient way to solve this problem: Imagine that you have a char array ORIG and after working with it, you should have a new char array DEST. So, if I wanted to create a function for this goal, what would the best approach be:
A function that takes only one char array parameter ( argument ORIG ) and returning a char array DEST or
A void function that takes two char array arguments and does its job changing DEST as wished?
Thanks!
This very much depends on the nature of your function.
In your first case, the function has to allocate storage for the result (or return a pointer to some static object, but this wouldn't be thread-safe). This can be the right thing to do, e.g. for a function that duplicates a string, like POSIX' strdup(). This also means the caller must use free() on the result when it is no longer needed.
The second case requires the caller to provide the storage. This is often the idiomatic way to do these things in C, because in this case, the caller could just write
char result[256];
func(result, "bla bla");
and therefore use an automatic object to hold the result. It also has the benefit that you can use the actual return value to signal errors.
Both are ways of valid ways of doing it, but I'd suggest using the latter, since it means you can load the contents into any block of memory, while a returned array will have to be on heap, and be freed by design.
Again, both are valid ways of doing things, and this is just a guideline. What should be done usually depends on the situation.
It depends,
If you know that the length of DEST will be the same as the lenght of ORIG i would go for the 2nd approach because then you wont have to dynamiclly allocate memory for dest inside the function (and remember to free it outside the function).
If the length is different you have to dynamiclly allocate memory and you can do so in two ways:
1. Like your first approach - for returning array from a function in c you have to allocate a new array and return it's address(pointer)
2. The function can recieve two argument one is ORIG and second is a double pointer to RES , because the function recieves a double pointer it can allocate an array inside and return it via the argument.
1- is more "cleaner" way in terms of code ,and easier to use in terms of user expirience(the user is the caller)
Good luck!
In option 1 you will have to dynamically allocate (malloc) the output array. Which means you have a potential for a memory leak. In option 2 the output array is provided for you, so there is no chance of a leak, but there is a chance that the output array is not of sufficient size and you will get a buffer overrun when writing to it.
Both methods are acceptable, there might be a small performance difference in one compared to the other, but its really down to your choice.
Personally, being a cautios programmer, I would go for option 3:
/* Returns 0 on success, 1 on failure
Requires : inputSize <= outpuSize
input != output
input != null
output != null
*/
int DoStuff (char* output, size_t outputSize, char* input, size_t inputSize);
(Sorry if that's not proper C, its been decades:) )
(Edited in accordance with Felix Palmen's points.)

Portable way to check if a char* pointer is a null-terminated string

I have a C function that takes in a char* pointer. One of the function's preconditions is that the pointer argument is a null-terminated string
void foo(char *str) {
int length = strlen(str);
// ...
}
If str isn't a pointer to a null-terminated string, then strlen crashes. Is there a portable way to ensure that a char* pointer really does point to a null-terminated string?
I was thinking about using VirtualQuery to find lowest address after str that's not readable, and if we haven't seen a null-terminator between the beginning of str and that address, then str doesn't point to a null-terminated string.
No, there is no portable way to do that. A null-terminated string can be arbitrarily long (up to SIZE_MAX bytes) -- and so can a char array that isn't null-terminated. A function that takes a char* argument has no way of knowing how big a chunk of valid memory it points to, if any. A check would have to traverse memory until it finds a null character, which means that if there is no null character in array, it will go past the end of it, causing undefined behavior.
That's why the standard C library functions that take string pointers as arguments have undefined behavior of the argument doesn't point to a string. (Checking for a NULL pointer would be easy enough, but that would catch only one error case at the cost of slower execution for valid arguments.)
EDIT : Responding to your question's title:
Portable way to check if a char* pointer is a null-terminated string
a pointer cannot be a string. It may or may not be a pointer to a string.
To prove null termination of a string, you don't just have to prove that a null char exists, you have to prove that it exists at exactly the right spot (no later, but also no earlier). To do that you need to know the intended content or at least length of the string, at which point it is very simple to do the verification...
Consider e.g. a device w/o virtual memory: That means you can iterate over the whole address space without triggering any kind of interrupts.
If your stack is at a higher address than the heap and your compiler puts a copy of '\0' on the stack (instead of only keeping it in a register or using it as an immediate value), you are suddenly guaranteed that any string on the heap will be weakly zero-terminated in the sense that you will always be able to consider the '\0' that your verification code put on the stack as the zero-terminator.
The other answers are correct, but here's another way of thinking about it.
If the pointer points to a buffer of n chars, none of which are '\0', then as soon as you try to examine the n + 1 character, you're in the realm of undefined behavior. So, to scan to see if there's a '\0', it's not enough to know some upper bound of where the end of the buffer is, you have to know exactly where the end of the buffer is.
C doesn't give you a way to know that, other than to require that the caller provide it to you. VirtualQuery (assuming it were portable) is not enough because there may be other objects immediately after the buffer in memory. While it might appear to work on many implementations, the fact that you're relying on undefined behavior means that it's necessarily non-portable.
The best you can do is put an upper bounds on the size of the string with the strn functions. So if you are writing a library call and don't trust the caller, document your call noting that strings cannot be above a specific reasonable size and check:
#define MAXNAME 32
if (strnlen(sketchyName,MAXNAME)==MAXNAME) return ERROR;
As others have pointed out, there is no portable way to do this. The reason is that it isn't useful.
Normal semantics are to check for NULL only, and assume if a non-NULL is passed, it's valid. After all, there's likely to be a NULL somewhere after your pointer. The only other possibility is that you run into unmapped memory. It's more likely however that even with a bogus pointer, you find a NULL. That means a bogus 2000 character string will still get past the check.

Working with Pointers and Strcpy in C

I'm fairly new to the concept of pointers in C. Let's say I have two variables:
char *arch_file_name;
char *tmp_arch_file_name;
Now, I want to copy the value of arch_file_name to tmp_arch_file_name and add the word "tmp" to the end of it. I'm looking at them as strings, so I have:
strcpy(&tmp_arch_file_name, &arch_file_name);
strcat(tmp_arch_file_name, "tmp");
However, when strcat() is called, both of the variables change and are the same. I want one of them to change and the other to stay intact. I have to use pointers because I use the names later for the fopen(), rename() and delete() functions. How can I achieve this?
What you want is:
strcpy(tmp_arch_file_name, arch_file_name);
strcat(tmp_arch_file_name, "tmp");
You are just copying the pointers (and other random bits until you hit a 0 byte) in the original code, that's why they end up the same.
As shinkou correctly notes, make sure tmp_arch_file_name points to a buffer of sufficient size (it's not clear if you're doing this in your code). Simplest way is to do something like:
char buffer[256];
char* tmp_arch_file_name = buffer;
Before you use pointers, you need to allocate memory. Assuming that arch_file_name is assigned a value already, you should calculate the length of the result string, allocate memory, do strcpy, and then strcat, like this:
char *arch_file_name = "/temp/my.arch";
// Add lengths of the two strings together; add one for the \0 terminator:
char * tmp_arch_file_name = malloc((strlen(arch_file_name)+strlen("tmp")+1)*sizeof(char));
strcpy(tmp_arch_file_name, arch_file_name);
// ^ this and this ^ are pointers already; no ampersands!
strcat(tmp_arch_file_name, "tmp");
// use tmp_arch_file_name, and then...
free(tmp_arch_file_name);
First, you need to make sure those pointers actually point to valid memory. As they are, they're either NULL pointers or arbitrary values, neither of which will work very well:
char *arch_file_name = "somestring";
char tmp_arch_file_name[100]; // or malloc
Then you cpy and cat, but with the pointers, not pointers-to-the-pointers that you currently have:
strcpy (tmp_arch_file_name, arch_file_name); // i.e., no "&" chars
strcat (tmp_arch_file_name, "tmp");
Note that there is no bounds checking going on in this code - the sample doesn't need it since it's clear that all the strings will fit in the allocated buffers.
However, unless you totally control the data, a more robust solution would check sizes before blindly copying or appending. Since it's not directly related to the question, I won't add it in here, but it's something to be aware of.
The & operator is the address-of operator, that is it returns the address of a variable. However using it on a pointer returns the address of where the pointer is stored, not what it points to.

what is the correct way to define a string in C?

what is the correct way to define a string in C?
using:
char string[10];
or
char *string="???";
If I use an array, I can use any pointer to point to it and then manipulate it.
It seems like using the second one will cause trouble because we didn't allocate memory for that. I am taught that array is just a pointer value, I thought these two are the same before.
Until I did something like string* = *XXXX, and realize it didn't work like a pointer.
As #affenlehrer points out, how you "define" a string depends on how you want to use it. In reality, 'defining' a string in C really just amounts to putting it in quotes somewhere in your program. You should probably read more about how memory works and is allocated in C, but if you write:
char *ptr = "???"
What happens is that the compiler will take the string "???" (which is really four bytes of data, three '?'s followed by one zero byte for the NUL terminator). It will insert that at some static place in your program (in something called the .bss segment), and when your program starts running, the value of ptr will be initialized to point to that location in memory. This means you have a pointer to four bytes of memory, and if you try to write outside of those bytes, your program is doing something bad (and probably violating memory safety).
On the other hand, if you write
char string[10];
Then this basically tells the compiler to go allocate some space in your program of 10 bytes, and make the variable 'string' point to it. It depends where you put this: if you put it inside a function, then you will have a stack allocated buffer of 10 bytes. If you manipulate this buffer inside a function, and then don't do anything with the pointer afterwards, you're all fine. However, if you pass back the address of string -- or use the pointer in any way -- after the function returns, you're in the wrong. This is because, after the function returns, you lose all of the stack allocated variables.
There are even more ways to create strings in C (e.g. using malloc). What is your usecase? Basically you need a place in memory where the data is stored (on the stack, on the heap, static as in your second example) and then a character pointer to the first character of your string. Most string related functions will "see" the end of the string by the trailing '\0', in some other cases (mostly general purpose data related functions) you also have to provide the length of the string.

Resources