difference between strlen(string) and strlen( *string) - c

Let's say I have an array of strings that are all of same size.
char strings[][MAX_LENGTH];
what would be the difference between strlen(strings) and strlen(*strings)?
I know that strings by itself would be the address of the first string in the array,
but what is *strings?

First, don't do this. C will allow you to do lots of things that are a bad idea. This doesn't mean you ought to do it. :)
While you may have compiler warnings, these two are effectively identical. The reason is that with this definition:
char strings[][MAX_LENGTH];
The allocation for this will end up being one continuous block. Within that block of memory, there are no "structures" or management devices that can be used to identify where individual strings start and stop. This creates an interesting situation.
Effectively, *string and string are both pointers to precisely the same memory location. This means that calling strlen on either one of them will return the null delimited string length of the first element in the first array.
However, I must reiterate... Don't do this.

Related

Using character vs. pointer for an "array of words"

Let's say I have a paragraph and I want to split up all the words and put them in an array. What would be a better way to do it (for this example, let's assume 100 words all under length 20chars):
# character array
char our_array[100][20];
strcpy(our_array[0], "Hello";
strcpy(our_array[1], "Something");
Or:
# string (pointer) array
char *newer_string[100];
newer_string[0] = "Hello";
newer_string[1] = "Something";
Why would one be preferable over the other? And is one more common in practice than the other?
Option 1 will assign a fixed 2D array of 100*20. In this case, the strings are stored in the 2D array. This has the following features.
Fixed storage per string. If e.g. one name is 50 characters long, the array needs to be 100*50.
The array is mutable. i.e. you can change the elements easily.
No requirement of heap memory allocation (malloc/calloc)
If there is a need of sorting the names, this method requires copying the whole strings around and is inefficient.
Option 2, as you have shown only works for constant string allocations at compile time. If you want to read the string from a file or from the user you need to dynamically allocate the memory. Something like shown below.
char *newer_string[100];
char stringtemp[101]; // size this to the maximum string len you need to support.
int len;
for (i=0; i<100; i++)
{
scanf("%s",stringtemp);
len = strlen(stringtemp);
newer_string[i] = malloc(len+1);
if (newer_string[i] == NULL) { /*handle memory error*/ }
strcpy(newer_string[i], stringtemp);
}
The features here are
More effecient memory storage. e.g. if one string is long, only that array element has more memory
Needs dynamic memory allocation. So you also need to take care of free
Easier to sort. For a sorting algorithm, you need to only swap the pointers newer_string[i] and newer_string[i+1]
Version 2 does not occupy all the memory at once (as Version 1 would).
Also the length of your strings can be arbitrary, you are not bound to a specific length (e.g. 20).
You may get a small memory overhead in Version 2 (due to the pointers you need to save), but this is only really true if at least nearly all words are used and nearly all of them have the specified length.
In general, I would always recommend Version 2 (Array of strings/char pointers).
It is even easier to replace strings in this 1D-Array than in the 2D Version.
It depends on what you want to do with that variable. This is definitely not written in stone, and there are little guidelines.
The first option...
char our_array[100][20];
strcpy(our_array[0], "Hello";
strcpy(our_array[1], "Something");
...has the advantage that each element of our_array is actually an array of char. So you can modify that data. Those strings are not read-only.
On the other hand, you are limited to strings of 19 characteres, and it's quite easy to fumble that. Because you are using strcpy() to initialize that array of strings, any error you make will not be detected by the compiler.
The other option...
char *newer_string[100];
newer_string[0] = "Hello";
newer_string[1] = "Something";
...has the advantage that each element of the array is a pointer. The strings are kept in read-only, static storage. They ocupy less space, and you can easy change newer_string[i] to point to something else. However, you cannot modify that data.

C - is there a way to work with strings which have NULL character in the middle

Is it possible to have strings with NULL character somewhere except the end and work with them? Like get their size, use strcat, etc?
I have some ideas:
1) Write your own function for getting length (or something else), which is going to iterate over a string. If it meets a NULL char, it is going to check the next char of the string. If it is not NULL - continue counting chars. But it may (and WILL!) eventually lead to situation when you are reading memory OUTSIDE of the char array. So it is a bad idea.
2) Use sizeof(array)/sizeof(type), eg sizeof(input)/sizeof(char). That is going to work pretty good I think.
Do you have any other ideas on how this can be done? Maybe there are some function which I am not aware of (C newbie alert :))?
The only really safe method I can think of is to use "Pascal"-type strings (that is, something that has a string header and assorted other data associated with it).
Something like this:
typedef struct {
int len, allocated;
char *data;
} my_string;
You would then have to implement pretty much every string manipulation function yourself. Keeping both the "length of the string" and "the size of the allocation" allows you to have an allocation that's larger than the current contents, this may make repeated string concatenation cheaper (allows an amortized O(1) append).
You can have an array of char, either statically or dynamically allocated, that contains a zero byte in the middle, but only the part up to and including the zero can be considered a "string" in the standard C sense. Only that part will be recognized or considered by the standard library's string functions.
You can use a different terminator -- say two zeroes in a row -- and write your own string functions, but that just pushes off the problem. What happens when you need two zeroes in the middle of your string? In any case, you need to exercise even more care in this case than in the ordinary string case to ensure that your custom strings are properly terminated. You also have to be certain to avoid using them with the standard string functions.
If your special strings are stored in char array of known size then you can get the length of the overall array via sizeof, but that doesn't tell you what portion of the array contains meaningful data. It also doesn't help with any of the other string functions you might want to perform, and it does nothing for you if your handle on the pseudo-strings is a char *.
If you are contemplating custom string functions anyway, then you should consider string objects that have an explicit length stored with them. For example:
struct my_string {
unsigned allocated, length;
char *contents;
};
Your custom functions then handle objects of that type, being certain to do the right thing with the length member. There is no explicit terminator, so these strings can contain any char value. Also, you can be certain not to mixed these up with standard strings.
As long as you store the length of the array of chars then you can have strings with nul characters or even without a terminating nul.
struct MyString
{
int length;
char* buffer;
};
And then you would have to write all your equivalent functions for managing the string.
The bstring library http://bstring.sourceforge.net and Microsofts BSTR (uses wide chars) are existing libraries that work in this way and also offer some compatibilty with c-style strings.
pros - getting the length of the string is quick
cons - the strings need to be dynamically allocated.

Working with Pointers and Strcpy in C

I'm fairly new to the concept of pointers in C. Let's say I have two variables:
char *arch_file_name;
char *tmp_arch_file_name;
Now, I want to copy the value of arch_file_name to tmp_arch_file_name and add the word "tmp" to the end of it. I'm looking at them as strings, so I have:
strcpy(&tmp_arch_file_name, &arch_file_name);
strcat(tmp_arch_file_name, "tmp");
However, when strcat() is called, both of the variables change and are the same. I want one of them to change and the other to stay intact. I have to use pointers because I use the names later for the fopen(), rename() and delete() functions. How can I achieve this?
What you want is:
strcpy(tmp_arch_file_name, arch_file_name);
strcat(tmp_arch_file_name, "tmp");
You are just copying the pointers (and other random bits until you hit a 0 byte) in the original code, that's why they end up the same.
As shinkou correctly notes, make sure tmp_arch_file_name points to a buffer of sufficient size (it's not clear if you're doing this in your code). Simplest way is to do something like:
char buffer[256];
char* tmp_arch_file_name = buffer;
Before you use pointers, you need to allocate memory. Assuming that arch_file_name is assigned a value already, you should calculate the length of the result string, allocate memory, do strcpy, and then strcat, like this:
char *arch_file_name = "/temp/my.arch";
// Add lengths of the two strings together; add one for the \0 terminator:
char * tmp_arch_file_name = malloc((strlen(arch_file_name)+strlen("tmp")+1)*sizeof(char));
strcpy(tmp_arch_file_name, arch_file_name);
// ^ this and this ^ are pointers already; no ampersands!
strcat(tmp_arch_file_name, "tmp");
// use tmp_arch_file_name, and then...
free(tmp_arch_file_name);
First, you need to make sure those pointers actually point to valid memory. As they are, they're either NULL pointers or arbitrary values, neither of which will work very well:
char *arch_file_name = "somestring";
char tmp_arch_file_name[100]; // or malloc
Then you cpy and cat, but with the pointers, not pointers-to-the-pointers that you currently have:
strcpy (tmp_arch_file_name, arch_file_name); // i.e., no "&" chars
strcat (tmp_arch_file_name, "tmp");
Note that there is no bounds checking going on in this code - the sample doesn't need it since it's clear that all the strings will fit in the allocated buffers.
However, unless you totally control the data, a more robust solution would check sizes before blindly copying or appending. Since it's not directly related to the question, I won't add it in here, but it's something to be aware of.
The & operator is the address-of operator, that is it returns the address of a variable. However using it on a pointer returns the address of where the pointer is stored, not what it points to.

C Programming: Find Length of a Char* with Null Bytes

If I have a character pointer that contains NULL bytes is there any built in function I can use to find the length or will I just have to write my own function? Btw I'm using gcc.
EDIT:
Should have mentioned the character pointer was created using malloc().
If you have a pointer then the ONLY way to know the size is to store the size separately or have a unique value which terminates the string. (typically '\0') If you have neither of these, it simply cannot be done.
EDIT: since you have specified that you allocated the buffer using malloc then the answer is the paragraph above. You need to either remember how much you allocated with malloc or simply have a terminating value.
If you happen to have an array (like: char s[] = "hello\0world";) then you could resort to sizeof(s). But be very careful, the moment you try it with a pointer, you will get the size of the pointer, not the size of an array. (but strlen(s) would equal 5 since it counts up to the first '\0').
In addition, arrays decay to pointers when passed to functions. So if you pass the array to a function, you are back to square one.
NOTE:
void f(int *p) {}
and
void f(int p[]) {}
and
void f(int p[10]) {}
are all the same. In all 3 versions, p is a pointer, not an array.
How do you know where the string ends, if it contains NULL bytes as part of it? Certainly no built in function can work with strings like that. It'll interpret the first null byte as the end of the string.
If you want the length, you'll have to store it yourself. Keep in mind that no standard library string functions will work correctly on strings like these.
You'll need to keep track of the length yourself.
C strings are null terminated, meaning that the first null character signals the end of the string. All builtin string functions rely on this, so if you have a buffer that can contain NULLs as part of the data then you can't use them.
Since you're using malloc then you may need to keep track of two sizes: the size of your allocated buffer, and how many characters within that buffer constitute valid data.

how test if char array is null?

I've been doing (in C)
char array[100];
if (array == NULL)
something;
which is very wrong (which I have finally learned since my program doesn't work). What is the equivalent where I could test a new array to see if nothing has been put in it yet?
Also, how do you make an array empty/clean it out?
I know there are other posts out on this topic out there, but I couldn't find a straightforward answer.
An array declared with
char array[100]
always has 100 characters in it.
By "cleaning out" you may mean assigning a particular character to each slot, such as the character '\0'. You can do this with a loop, or one of several library calls to clear memory or move memory blocks.
Look at memset -- it can "clear" or "reset" your array nicely.
If you are working with strings, with are special char arrays terminated with a zero, then in order to test for an empty array, see this SO question. Otherwise if you have a regular character array not intended to represent text, write a loop to make sure all entries in the array are your special blank character, whatever you choose it to be.
You can also declare your character array like so:
char* array = malloc(100);
or even
char* array = NULL;
but that is a little different. In this case the array being NULL means "no array has been allocated" which is different from "an array has been allocated but I have not put anything in it yet."

Resources