Coming from a Java background I'm learning C, but I find those vague compiler error messages increasingly frustrating. Here's my code:
/*
* PURPOSE
* Do case-insensetive string comparison.
*/
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int compareString(char cString1[], char cString2[]);
char strToLower(char cString[]);
int main() {
// Declarations
char cString1[50], cString2[50];
int isEqual;
// Input
puts("Enter string 1: ");
gets(cString1);
puts("Enter string 2: ");
gets(cString2);
// Call
isEqual = compareString(cString1, cString2);
if (isEqual == 0)
printf("Equal!\n");
else
printf("Not equal!\n");
return 0;
}
// WATCH OUT
// This method *will* modify its input arrays.
int compareString(char cString1[], char cString2[]) {
// To lowercase
cString1 = strToLower(cString1);
cString2 = strToLower(cString2);
// Do regular strcmp
return strcmp(cString1, cString2);
}
// WATCH OUT
// This method *will* modify its input arrays.
char strToLower(char cString[]) {
// Declarations
int iTeller;
for (iTeller = 0; cString[iTeller] != '\0'; iTeller++)
cString[iTeller] = (char)tolower(cString[iTeller]);
return cString;
}
This generates two warnings.
assignment makes pointer from integer without a cast
cString1 = strToLower(cString1);
cString2 = strToLower(cString2);
return makes integer from pointer without a cast
return cString;
Can someone explain these warnings?
C strings are not anything like Java strings. They're essentially arrays of characters.
You are getting the error because strToLower returns a char. A char is a form of integer in C. You are assigning it into a char[] which is a pointer. Hence "converting integer to pointer".
Your strToLower makes all its changes in place, there is no reason for it to return anything, especially not a char. You should "return" void, or a char*.
On the call to strToLower, there is also no need for assignment, you are essentially just passing the memory address for cString1.
In my experience, Strings in C are the hardest part to learn for anyone coming from Java/C# background back to C. People can get along with memory allocation (since even in Java you often allocate arrays). If your eventual goal is C++ and not C, you may prefer to focus less on C strings, make sure you understand the basics, and just use the C++ string from STL.
strToLower's return type should be char* not char
(or it should return nothing at all, since it doesn't re-allocate the string)
1) Don't use gets! You're introducing a buffer-overflow vulnerability. Use fgets(..., stdin) instead.
2) In strToLower you're returning a char instead of a char-array. Either return char* as Autopulated suggested, or just return void since you're modifying the input anyway. As a result, just write
strToLower(cString1);
strToLower(cString2);
3) To compare case-insensitive strings, you can use strcasecmp (Linux & Mac) or stricmp (Windows).
As others already noted, in one case you are attempting to return cString (which is a char * value in this context - a pointer) from a function that is declared to return a char (which is an integer). In another case you do the reverse: you are assigning a char return value to a char * pointer. This is what triggers the warnings. You certainly need to declare your return values as char *, not as char.
Note BTW that these assignments are in fact constraint violations from the language point of view (i.e. they are "errors"), since it is illegal to mix pointers and integers in C like that (aside from integral constant zero). Your compiler is simply too forgiving in this regard and reports these violations as mere "warnings".
What I also wanted to note is that in several answers you might notice the relatively strange suggestion to return void from your functions, since you are modifying the string in-place. While it will certainly work (since you indeed are modifying the string in-place), there's nothing really wrong with returning the same value from the function. In fact, it is a rather standard practice in C language where applicable (take a look at the standard functions like strcpy and others), since it enables "chaining" of function calls if you choose to use it, and costs virtually nothing if you don't use "chaining".
That said, the assignments in your implementation of compareString look complete superfluous to me (even though they won't break anything). I'd either get rid of them
int compareString(char cString1[], char cString2[]) {
// To lowercase
strToLower(cString1);
strToLower(cString2);
// Do regular strcmp
return strcmp(cString1, cString2);
}
or use "chaining" and do
int compareString(char cString1[], char cString2[]) {
return strcmp(strToLower(cString1), strToLower(cString2));
}
(this is when your char * return would come handy). Just keep in mind that such "chained" function calls are sometimes difficult to debug with a step-by-step debugger.
As an additional, unrealted note, I'd say that implementing a string comparison function in such a destructive fashion (it modifies the input strings) might not be the best idea. A non-destructive function would be of a much greater value in my opinion. Instead of performing as explicit conversion of the input strings to a lower case, it is usually a better idea to implement a custom char-by-char case-insensitive string comparison function and use it instead of calling the standard strcmp.
You don't need these two assigments:
cString1 = strToLower(cString1);
cString2 = strToLower(cString2);
you are modifying the strings in place.
Warnings are because you are returning a char, and assigning to a char[] (which is equivalent to char*)
You are returning char, and not char*, which is the pointer to the first character of an array.
If you want to return a new character array instead of doing in-place modification, you can ask for an already allocated pointer (char*) as parameter or an uninitialized pointer. In this last case you must allocate the proper number of characters for new string and remember that in C parameters as passed by value ALWAYS, so you must use char** as parameter in the case of array allocated internally by function. Of course, the caller must free that pointer later.
strToLower should return a char * instead of a char. Something like this would do.
char *strToLower(char *cString)
char cString1[]
This is an array, i.e. a pointer to the first element of a range of elements of the same data type. Note you're not passing the array by-value but by-pointer.
char strToLower(...)
However, this returns a char. So your assignment
cString1 = strToLower(cString1);
has different types on each side of the assignment operator .. you're actually assigning a 'char' (sort of integer) to an array, which resolves to a simple pointer. Due to C++'s implicit conversion rules this works, but the result is rubbish and further access to the array causes undefined behaviour.
The solution is to make strToLower return char*.
Related
Thanks in advance for any help. I'm making a simple program in C and I've already declared two strings called: "message1" & "message2", how would I go about changing the contents of these strings? I initially fill them with "empty" for the check that happens in the segment of code shown below:
char message1[32] = "empty";
…
if(message1 != "empty");
{
printf("\n[USER 1]: %s", message1);
message1 = "empty";
}
After this check, if message1 contains anything but the original value, it will then print said value and then reset message1 to its original value of "empty". However, this is obviously not the case. I've Googled for the answer and am very confused.
You seem to be confusing array semantics with pointer semantics.
If your string (which is a data format) is stored in an array, you can change individual characters. But you cannot assign to an array. So this line
message1 = "empty";
becomes a constraint violation. Instead you should use strcpy(), or better snprintf() (don't be tempted to use strncpy() -- it is not a safer strcpy()). Also you cannot meaningfully compare the array and a string literal. So this line
if(message1 != "empty");
also doesn't make any sense. Use the strcmp() function. (Also the semicolon is probably not what you want because it terminates the statement so the following compound statement is not controlled by the if statement.
If you don't need to modify the individual characters, then you can just use a char * to point to the start of a string. Then the assignment is valid, and the comparison is probably ok (it relies upon the compiler consolidating string literals which is not required by the standard, but all the good compilers will do this).
Two things needs to be swapped.
1.
if (message1 != "empty");
Use strcmp() to compare strings. The logical operators aren't used for comparison between strings in C.
if (strcmp(message1,"empty"))
{
....
}
2.
message1 = "empty";
You can't assign arrays in C by a string. Use strcpy for that.
strcpy(message1,"empty");
I'm writing a parser in C, and I've got some code that looks like this:
char *consume_while(parser *self, int(*test)(char)) {
char *result;
while (eof(self) && (*test)(next_char(self))) {
// append the return value from the function consumed_char(self)
// onto the "result" string defined above.
}
return result;
}
But I'm kinda new to the whole string manipulation aspect of C, so how would I append the character returned from the function consumed_char(self) to the result char pointer? I've seen people using the strcat function, but that wont work as it takes two constant char pointers, but I'm dealing with a char* and a char. In java it would be something like this:
result += consumed_char(self);
What's the equivalent in C?
Thanks :)
In C, strings do not exist as a type, they are just char arrays with a null-terminating character. This means, assuming your buffers are big enough and filled with zeroes, it can be as simple as:
result[(strlen(result)] = consumed_char(self);
if not, your best bet is to use strcat and change your consumed_self function to return a char *.
That being said, writing a parser without basic understanding of C-style strings, is, to say the least, quite ambitious.
I often want to do something like this:
unsigned char urlValid[66] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
...or:
unsigned char listOfChar[4] = "abcd";
...that is, initialize a character array from a string literal and ignoring the null terminator from that literal. It is very convenient, plus I can do things like sizeof urlValid and get the right answer.
But unfortunately it gives the error initializer-string for array of chars is too long.
Is there a way to either:
Turn off errors and warnings for this specific occurrence (ie, if there's no room for null terminator when initialising a char array)
Do it better, maintaining convenience and readability?
You tagged your question as both C and C++. In reality in C language you would not receive this error. Instead, the terminating zero simply would not be included into the array. I.e. in C it works exactly as you want it to work.
In C++ you will indeed get the error. In C++ you'd probably have to accommodate the terminating zero and remember to subtract 1 from the result of sizeof.
Note also, that as #Daniel Fischer suggested in the comments, you can "decouple" definition from initialization
char urlValid[66];
memcpy(urlValid, "ab...", sizeof urlValid);
thus effectively simulating the behavior of C language in C++.
Well, in C++ you should always use std::string. It's convenient and not prone to memory leaks etc.
You can, however, initialize an array without specifying the size:
char urlValid[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
This works since the compiler can deduce the correct size from the string literal. Another advantage is that you don't have to change the size if the literal changes.
Edit:You should not use unsigned char for strings.
Initialise with an actual array of chars?
char urlValid[] = {'a','b','c','d','e','f',...};
there are two simple solutions.
You can either add an extra element into the array like this:
unsigned char urlValid[67] =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
or leave out the size of the array all together:
unsigned char urlValid[] =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
I'm trying to write a simple C program on Ubuntu using Eclipse CDT (yes, I'm more comfortable with an IDE and I'm used to Eclipse from Java development), and I'm stuck with something weird. On one part of my code, I initialize a char array in a function, and it is by default pointing to the same location with one of the inputs, which has nothing to do with that char array. Here is my code:
char* subdir(const char input[], const char dir[]){
[*] int totallen = strlen(input) + strlen(dir) + 2;
char retval[totallen];
strcpy(retval, input);
strcat(retval, dir);
...}
Ok at the part I've marked with [*], there is a checkpoint. Even at that breakpoint, when I check y locals, I see that retval is pointing to the same address with my argument input. It not even possible as input comes from another function and retval is created in this function. Is is me being unexperienced with C and missing something, or is there a bug somewhere with the C compiler?
It seems so obvious to me that they should't point to the same (and a valid, of course, they aren't NULL) location. When the code goes on, it literally messes up everything; I get random characters and shapes in console and the program crashes.
I don't think it makes sense to check the address of retval BEFORE it appears, it being a VLA and all (by definition the compiler and the debugger don't know much about it, it's generated at runtime on the stack).
Try checking its address after its point of definition.
EDIT
I just read the "I get random characters and shapes in console". It's obvious now that you are returning the VLA and expecting things to work.
A VLA is only valid inside the block where it was defined. Using it outside is undefined behavior and thus very dangerous. Even if the size were constant, it still wouldn't be valid to return it from the function. In this case you most definitely want to malloc the memory.
What cnicutar said.
I hate people who do this, so I hate me ... but ... Arrays of non-const size are a C99 extension and not supported by C++. Of course GCC has extensions to make it happen.
Under the covers you are essentially doing an _alloca, so your odds of blowing out the stack are proportional to who has access to abuse the function.
Finally, I hope it doesn't actually get returned, because that would be returning a pointer to a stack allocated array, which would be your real problem since that array is gone as of the point of return.
In C++ you would typically use a string class.
In C you would either pass a pointer and length in as parameters, or a pointer to a pointer (or return a pointer) and specify the calls should call free() on it when done. These solutions all suck because they are error prone to leaks or truncation or overflow. :/
Well, your fundamental problem is that you are returning a pointer to the stack allocated VLA. You can't do that. Pointers to local variables are only valid inside the scope of the function that declares them. Your code results in Undefined Behaviour.
At least I am assuming that somewhere in the ..... in the real code is the line return retval.
You'll need to use heap allocation, or pass a suitably sized buffer to the function.
As well as that, you only need +1 rather than +2 in the length calculation - there is only one null-terminator.
Try changing retval to a character pointer and allocating your buffer using malloc().
Pass the two string arguments as, char * or const char *
Rather than returning char *, you should just pass another parameter with a string pointer that you already malloc'd space for.
Return bool or int describing what happened in the function, and use the parameter you passed to store the result.
Lastly don't forget to free the memory since you're having to malloc space for the string on the heap...
//retstr is not a const like the other two
bool subdir(const char *input, const char *dir,char *retstr){
strcpy(retstr, input);
strcat(retstr, dir);
return 1;
}
int main()
{
char h[]="Hello ";
char w[]="World!";
char *greet=(char*)malloc(strlen(h)+strlen(w)+1); //Size of the result plus room for the terminator!
subdir(h,w,greet);
printf("%s",greet);
return 1;
}
This will print: "Hello World!" added together by your function.
Also when you're creating a string on the fly you must malloc. The compiler doesn't know how long the two other strings are going to be, thus using char greet[totallen]; shouldn't work.
I've seen people's code as:
char *str = NULL;
and I've seen this is as well,
char *str;
I'm wonder, what is the proper way of initializing a string? and when are you supposed to initialize a string w/ and w/out NULL?
You're supposed to set it before using it. That's the only rule you have to follow to avoid undefined behaviour. Whether you initialise it at creation time or assign to it just before using it is not relevant.
Personally speaking, I prefer to never have variables set to unknown values myself so I'll usually do the first one unless it's set in close proximity (within a few lines).
In fact, with C99, where you don't have to declare locals at the tops of blocks any more, I'll generally defer creating it until it's needed, at which point it can be initialised as well.
Note that variables are given default values under certain circumstances (for example, if they're static storage duration such as being declared at file level, outside any function).
Local variables do not have this guarantee. So, if your second declaration above (char *str;) is inside a function, it may have rubbish in it and attempting to use it will invoke the afore-mentioned, dreaded, undefined behaviour.
The relevant part of the C99 standard 6.7.8/10:
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules;
if it is a union, the first named member is initialized (recursively) according to these rules.
I'm wonder, what is the proper way of initializing a string?
Well, since the second snippet defines an uninitialized pointer to string, I'd say the first one. :)
In general, if you want to play it safe, it's good to initialize to NULL all pointers; in this way, it's easy to spot problems derived from uninitialized pointers, since dereferencing a NULL pointer will yield a crash (actually, as far as the standard is concerned, it's undefined behavior, but on every machine I've seen it's a crash).
However, you should not confuse a NULL pointer to string with an empty string: a NULL pointer to string means that that pointer points to nothing, while an empty string is a "real", zero-length string (i.e. it contains just a NUL character).
char * str=NULL; /* NULL pointer to string - there's no string, just a pointer */
const char * str2 = ""; /* Pointer to a constant empty string */
char str3[] = "random text to reach 15 characters ;)"; /* String allocated (presumably on the stack) that contains some text */
*str3 = 0; /* str3 is emptied by putting a NUL in first position */
this is a general question about c variables not just char ptrs.
It is considered best practice to initialize a variable at the point of declaration. ie
char *str = NULL;
is a Good Thing. THis way you never have variables with unknown values. For example if later in your code you do
if(str != NULL)
doBar(str);
What will happen. str is in an unknown (and almost certainly not NULL) state
Note that static variables will be initialized to zero / NULL for you. Its not clear from the question if you are asking about locals or statics
Global variables are initialized with default values by a compiler, but local variables must be initialized.
an unitialized pointer should be considered as undefined so to avoid generating errors by using an undefined value it's always better to use
char *str = NULL;
also because
char *str;
this will be just an unallocated pointer to somewhere that will mostly cause problems when used if you forget to allocate it, you will need to allocate it ANYWAY (or copy another pointer).
This means that you can choose:
if you know that you will allocate it shortly after its declaration you can avoid setting it as NULL (this is a sort of rule to thumb)
in any other case, if you want to be sure, just do it. The only real problem occurs if you try to use it without having initialized it.
It depends entirely on how you're going to use it. In the following, it makes more sense not to initialize the variable:
int count;
while ((count = function()) > 0)
{
}
Don't initialise all your pointer variables to NULL on declaration "just in case".
The compiler will warn you if you try to use a pointer variable that has not been initialised, except when you pass it by address to a function (and you usually do that in order to give it a value).
Initialising a pointer to NULL is not the same as initialising it to a sensible value, and initialising it to NULL just disables the compiler's ability to tell you that you haven't initialised it to a sensible value.
Only initialise pointers to NULL on declaration if you get a compiler warning if you don't, or you are passing them by address to a function that expects them to be NULL.
If you can't see both the declaration of a pointer variable and the point at which it is first given a value in the same screen-full, your function is too big.
static const char str[] = "str";
or
static char str[] = "str";
Because free() doesn't do anything if you pass it a NULL value you can simplify your program like this:
char *str = NULL;
if ( somethingorother() )
{
str = malloc ( 100 );
if ( NULL == str )
goto error;
}
...
error:
cleanup();
free ( str );
If for some reason somethingorother() returns 0, if you haven't initialized str you will
free some random address anywhere possibly causing a failure.
I apologize for the use of goto, I know some finds it offensive. :)
Your first snippet is a variable definition with initialization; the second snippet is a variable definition without initialization.
The proper way to initialize a string is to provide an initializer when you define it. Initializing it to NULL or something else depends on what you want to do with it.
Also be aware of what you call "string". C has no such type: usually "string" in a C context means "array of [some number of] char". You have pointers to char in the snippets above.
Assume you have a program that wants the username in argv[1] and copies it to the string "name". When you define the name variable you can keep it uninitialized, or initialize it to NULL (if it's a pointer to char), or initialize with a default name.
int main(int argc, char **argv) {
char name_uninit[100];
char *name_ptr = NULL;
char name_default[100] = "anonymous";
if (argc > 1) {
strcpy(name_uninit, argv[1]); /* beware buffer overflow */
name_ptr = argv[1];
strcpy(name_default, argv[1]); /* beware buffer overflow */
}
/* ... */
/* name_uninit may be unusable (and untestable) if there were no command line parameters */
/* name_ptr may be NULL, but you can test for NULL */
/* name_default is a definite name */
}
By proper you mean bug free? well, it depends on the situation. But there are some rules of thumb I can recommend.
Firstly, note that strings in C are not like strings in other languages.
They are pointers to a block of characters. The end of which is terminated with a 0 byte or NULL terminator. hence null terminated string.
So for example, if you're going to do something like this:
char* str;
gets(str);
or interact with str in any way, then it's a monumental bug. The reason is because as I have just said, in C strings are not strings like other languages. They are just pointers. char* str is the size of a pointer and will always be.
Therefore, what you need to do is allocate some memory to hold a string.
/* this allocates 100 characters for a string
(including the null), remember to free it with free() */
char* str = (char*)malloc(100);
str[0] = 0;
/* so does this, automatically freed when it goes out of scope */
char str[100] = "";
However, sometimes all you need is a pointer.
e.g.
/* This declares the string (not intialized) */
char* str;
/* use the string from earlier and assign the allocated/copied
buffer to our variable */
str = strdup(other_string);
In general, it really depends on how you expect to use the string pointer.
My recommendation is to either use the fixed size array form if you're only going to be using it in the scope of that function and the string is relatively small. Or initialize it to NULL. Then you can explicitly test for NULL string which is useful when it's passed into a function.
Beware that using the array form can also be a problem if you use a function that simply checks for NULL as to where the end of the string is. e.g. strcpy or strcat functions don't care how big your buffer is. Therefore consider using an alternative like BSD's strlcpy & strlcat. Or strcpy_s & strcat_s (windows).
Many functions expect you to pass in a proper address as well. So again, be aware that
char* str = NULL;
strcmp(str, "Hello World");
will crash big time because strcmp doesn't like having NULL passed in.
You have tagged this as C, but if anyone is using C++ and reads this question then switch to using std::string where possible and use the .c_str() member function on the string where you need to interact with an API that requires a standard null terminated c string.