How can I check if string contains character using ASSERT? For example, I want to do something with my string only if I have '+' in it. How can I do using assert in C?
You can use strchr along with assert function.
char x[20] = "hello+";
assert(strchr(x,'+') != NULL);
strchr(x,'+') will return NULL if character is not found in the string hence you can use the its return value to assert.
According to assert man page
If expression is false (i.e., compares equal to zero), assert()
prints an error message to standard error and terminates the program
by calling abort(3). The error message includes the name of the file
and function containing the assert() call, the source code line
number of the call, and the text of the argument; something like:
And according to strchr man page
The strchr() and strrchr() functions return a pointer to the matched
character or NULL if the character is not found. The terminating
null byte is considered part of the string, so that if c is specified
as '\0', these functions return a pointer to the terminator.
Related
I understand that strings in C are just character arrays. So I tried the following code, but it gives strange results, such as garbage output or program crashes:
#include <stdio.h>
int main (void)
{
char str [5] = "hello";
puts(str);
}
Why doesn't this work?
It compiles cleanly with gcc -std=c17 -pedantic-errors -Wall -Wextra.
Note: This post is meant to be used as a canonical FAQ for problems stemming from a failure to allocate room for a NUL terminator when declaring a string.
A C string is a character array that ends with a null terminator.
All characters have a symbol table value. The null terminator is the symbol value 0 (zero). It is used to mark the end of a string. This is necessary since the size of the string isn't stored anywhere.
Therefore, every time you allocate room for a string, you must include sufficient space for the null terminator character. Your example does not do this, it only allocates room for the 5 characters of "hello". Correct code should be:
char str[6] = "hello";
Or equivalently, you can write self-documenting code for 5 characters plus 1 null terminator:
char str[5+1] = "hello";
But you can also use this and let the compiler do the counting and pick the size:
char str[] = "hello"; // Will allocate 6 bytes automatically
When allocating memory for a string dynamically in run-time, you also need to allocate room for the null terminator:
char input[n] = ... ;
...
char* str = malloc(strlen(input) + 1);
If you don't append a null terminator at the end of a string, then library functions expecting a string won't work properly and you will get "undefined behavior" bugs such as garbage output or program crashes.
The most common way to write a null terminator character in C is by using a so-called "octal escape sequence", looking like this: '\0'. This is 100% equivalent to writing 0, but the \ serves as self-documenting code to state that the zero is explicitly meant to be a null terminator. Code such as if(str[i] == '\0') will check if the specific character is the null terminator.
Please note that the term null terminator has nothing to do with null pointers or the NULL macro! This can be confusing - very similar names but very different meanings. This is why the null terminator is sometimes referred to as NUL with one L, not to be confused with NULL or null pointers. See answers to this SO question for further details.
The "hello" in your code is called a string literal. This is to be regarded as a read-only string. The "" syntax means that the compiler will append a null terminator in the end of the string literal automatically. So if you print out sizeof("hello") you will get 6, not 5, because you get the size of the array including a null terminator.
It compiles cleanly with gcc
Indeed, not even a warning. This is because of a subtle detail/flaw in the C language that allows character arrays to be initialized with a string literal that contains exactly as many characters as there is room in the array and then silently discard the null terminator (C17 6.7.9/15). The language is purposely behaving like this for historical reasons, see Inconsistent gcc diagnostic for string initialization for details. Also note that C++ is different here and does not allow this trick/flaw to be used.
From the C Standard (7.1.1 Definitions of terms)
1 A string is a contiguous sequence of characters terminated by and
including the first null character. The term multibyte string is
sometimes used instead to emphasize special processing given to
multibyte characters contained in the string or to avoid confusion
with a wide string. A pointer to a string is a pointer to its initial
(lowest addressed) character. The length of a string is the number of
bytes preceding the null character and the value of a string is the
sequence of the values of the contained characters, in order.
In this declaration
char str [5] = "hello";
the string literal "hello" has the internal representation like
{ 'h', 'e', 'l', 'l', 'o', '\0' }
so it has 6 characters including the terminating zero. Its elements are used to initialize the character array str which reserve space only for 5 characters.
The C Standard (opposite to the C++ Standard) allows such an initialization of a character array when the terminating zero of a string literal is not used as an initializer.
However as a result the character array str does not contain a string.
If you want that the array would contain a string you could write
char str [6] = "hello";
or just
char str [] = "hello";
In the last case the size of the character array is determined from the number of initializers of the string literal that is equal to 6.
Can all strings be considered an array of characters (Yes), can all character arrays be considered strings (No).
Why Not? and Why does it matter?
In addition to the other answers explaining that the length of a string is not stored anywhere as part of the string and the references to the standard where a string is defined, the flip-side is "How do the C library functions handle strings?"
While a character array can hold the same characters, it is simply an array of characters unless the last character is followed by the nul-terminating character. That nul-terminating character is what allows the array of characters to be considered (handled as) a string.
All functions in C that expect a string as an argument expect the sequence of characters to be nul-terminated. Why?
It has to do with the way all string functions work. Since the length isn't included as part of an array, string-functions, scan forward in the array until the nul-character (e.g. '\0' -- equivalent to decimal 0) is found. See ASCII Table and Description. Regardless whether you are using strcpy, strchr, strcspn, etc.. All string functions rely on the nul-terminating character being present to define where the end of that string is.
A comparison of two similar functions from string.h will emphasize the importance of the nul-terminating character. Take for example:
char *strcpy(char *dest, const char *src);
The strcpy function simply copies bytes from src to dest until the nul-terminating character is found telling strcpy where to stop copying characters. Now take the similar function memcpy:
void *memcpy(void *dest, const void *src, size_t n);
The function performs a similar operation, but does not consider or require the src parameter to be a string. Since memcpy cannot simply scan forward in src copying bytes to dest until a nul-terminating character is reached, it requires an explicit number of bytes to copy as a third parameter. This third parameter provides memcpy with the same size information strcpy is able to derive simply by scanning forward until a nul-terminating character is found.
(which also emphasizes what goes wrong in strcpy (or any function expecting a string) if you fail to provide the function with a nul-terminated string -- it has no idea where to stop and will happily race off across the rest of your memory segment invoking Undefined Behavior until a nul-character just happens to be found somewhere in memory -- or a Segmentation Fault occurs)
That is why functions expecting a nul-terminated string must be passed a nul-terminated string and why it matters.
Intuitively...
Think of an array as a variable (holds things) and a string as a value (can be placed in a variable).
They are certainly not the same thing. In your case the variable is too small to hold the string, so the string gets cut off. ("quoted strings" in C have an implicit null character at the end.)
However it's possible to store a string in an array that is much larger than the string.
Note that the usual assignment and comparison operators (= == < etc.) don't work as you might expect. But the strxyz family of functions comes pretty close, once you know what you're doing. See the C FAQ on strings and arrays.
I am writing a simple program to convert a number(+ve,32-bit) from binary to decimal. Here's my code:
int main()
{
int n=0,i=0;
char binary[33];
gets(binary);
for (i = 0; i < 33, binary[i] != '\0'; i++)
n=n*2+binary[i]-'0';
printf("%d",n);
}
If I remove binary[i]!='\0', then it gives wrong answer due to garbage values but if I don't it gives the correct answer. My question is: does the gets function automatically add a '\0' (NULL) character at the end of the string or is this just a coincidence?
Yes it does, writing past the end of binary[33] if it needs to.
Never use gets; automatic buffer overrun.
See Why is the gets function so dangerous that it should not be used? for details.
When gets was last supported (though deprecated) by the C standard, it had the following description (§ 7.19.7.7, The gets function):
The gets function reads characters from the input stream pointed to by stdin, into the
array pointed to by s, until end-of-file is encountered or a new-line character is read.
Any new-line character is discarded, and a null character is written immediately after the last character read into the array.
This means that if the string read from stdin was exactly as long as, or longer than, the array pointed to by s, gets would still (try to) append the null character to the end of the string.
Even if you are on a compiler or C standard revision that supports gets, don't use it. fgets is much safer since it requires the size of the buffer being written to as a parameter, and will not write past its end. Another difference is that it will leave the newline in the buffer, unlike gets did.
I know that i C , you always have to add a null terminator \0 so that the processor knows when a word was ended .
But i get hard time to understand when you have to do it . so for example this code works for me without it :
char connectcmd[50]={0};
sprintf(connectcmd,"AT+CWJAP=\"%s\",\"%s\"",MYSSID,MYPASS);
How is that possible ?
When do you really have to add them ?
sprintf always terminates it with null character , so no need to mannually add it.
From C99 standard -
7.21.6.6 The sprintf function
[...]A null character is written at the end of the characters written; it is not counted as part of the returned value. If copying takes place between objects that overlap, the behavior is undefined.
sprintf writes a null terminated string connectcmd, regardless of its initial contents. This works as long as you don't try to write beyond the bounds of the buffer.
On top of that, when you say this:
char connectcmd[50]={0};
you initialize all 50 elements of connectcmd to zero, which is the value of the null-terminator \0. So it would be null-terminated even if you wrote characters to it manually, as long as you write less than 50 non-null characters.
for example this code works for me without it
It does not (work without it).
The string literal "AT+CWJAP=\"%s\",\"%s\"" has a null terminator at the end (like every string literal). sprintf copies that null terminator to connectcmd as well.
When do you really have to add them ?
When you're manually building a string, or using a library function whose documentation explicitly states that it isn't going to add a terminating null.
In the following program, strtok() works as expected in the major part but I just can't comprehend the reason behind one finding. I have read about strtok() that:
To determine the beginning and the end of a token, the function first scans from the starting location for the first character not contained in delimiters (which becomes the beginning of the token). And then scans starting from this beginning of the token for the first character contained in delimiters, which becomes the end of the token.
Source: http://www.cplusplus.com/reference/cstring/strtok/
And as we know, strtok() places a \0 at the end of each token. But in the following program, the last delimiter is a dot(.), after which there is Toad between that dot and the quotation mark ("). Now the dot is a delimiter in my program, but there is no delimiter after Toad, not even a white space (which is a delimiter in my program). Please clear the following confusion arising from this premise:
Why is strtok() considering Toad as a token even though it is not between 2 delimiters? This is what I read about strtok() when it encounters a NULL character (\0):
Once the terminating null character of str has been found in a call to strtok, all subsequent calls to this function with a null pointer as the first argument return a null pointer.
Source: http://www.cplusplus.com/reference/cstring/strtok/
Nowhere does it say that once a null character is encountered,a pointer to the beginning of the token is returned (we don't even have a token here as we didn't get an end of the token as there was no delimiter character found after the scan begun from the beginning of the token (i.e. from 'T' of Toad), we only found a null character, not a delimiter). So why is the part between last delimiter and quotation mark of argument string considered a token by strtok()? Please explain this.
Code:
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] =" Falcon,eagle-hawk..;buzzard,gull..pigeon sparrow,hen;owl.Toad";
char * pch=strtok(str," ;,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ;,.-");
}
return 0;
}
Output:
Falcon
eagle
hawk
buzzard
gull
pigeon
sparrow
hen
owl
Toad
The standard's specification of strtok (7.24.5.8) is pretty clear. In particular paragraph 4 (emphasis added by me) is directly relevant to the question, if I understand that correctly:
3 The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2. If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. If such a character is found, it is the start of the first token.
4 The strtok function then searches from there for a character that is contained in the current separator string. If no such character is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token will return a null pointer. If such a character is found, it is overwritten by a null character, which terminates the current token. The strtok function saves a pointer to the following character, from which the next search for a token will start.
In a call
char *where = strtok(string_or_NULL, delimiters);
the token (a pointer to which is) returned - if any - extends from the first non-delimiter character found from the starting position (inclusive) until the next delimiter character (exclusive), if one exists, or the end of the string, if no later delimiter character exists.
The linked description doesn't explicitly mention the case of a token extending until the end of the string, as opposed to the standard, so it is incomplete in that respect.
Going to the description in POSIX for strtok(), the description says:
char *strtok(char *restrict s1, const char *restrict s2);
A sequence of calls to strtok() breaks the string pointed to by s1 into a sequence of tokens, each of which is delimited by a byte from the string pointed to by s2. The first call in the sequence has s1 as its first argument, and is followed by calls with a null pointer as their first argument. The separator string pointed to by s2 may be different from call to call.
The first call in the sequence searches the string pointed to by s1 for the first byte that is not contained in the current separator string pointed to by s2. If no such byte is found, then there are no tokens in the string pointed to by s1 and strtok() shall return a null pointer. If such a byte is found, it is the start of the first token.
The strtok() function then searches from there for a byte that is contained in the current separator string. If no such byte is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token shall return a null pointer. If such a byte is found, it is overwritten by a NUL character, which terminates the current token. The strtok() function saves a pointer to the following byte, from which the next search for a token shall start.
Note the second sentence of the third paragraph:
If no such byte is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token shall return a null pointer.
This clearly states that in the example in the question, Toad is indeed a token. One way to think of it is that the list of delimiters always includes the NUL '\0' at the end of the delimiter string.
Having diagnosed that, note that strtok() is not a good function to use — it is not thread safe or reentrant. On Windows, you can use strtok_s() instead; on Unix, you can usually use strtok_r(). These are better functions because they don't store internally the pointer at which the search is to resume.
Because strtok() is not reentrant, you cannot call a function that uses strtok() from inside a function that itself uses strtok() while it is using strtok(). Also, any library function that uses strtok() must be clearly identified as doing so because it cannot be called from a function that is using strtok(). So, using strtok() makes life hard.
The other problem with the strtok() family of functions (and with strsep(), which is related) is that they overwrite the delimiter; you can't find out what the delimiter was after the tokenizer has tokenized the string. This can matter in some applications (such as parsing shell command lines; it matters whether the delimiter is a pipe or a semicolon or an ampersand (or ...). So shell parsers usually don't use strtok(), despite the number of questions on SO about shells where the parser does use strtok().
Generally, you should steer clear of plain strtok(), and it is up to you to decide whether strtok_r() or strtok_s() is appropriate for your purposes.
Because cplusplus.com isn't telling you the whole story. Cppreference.com has a better description.
Cplusplus.com also fails to mention that strtok is not thread-safe, and only documents the strtok function of the C++ programming language, whereas cppreference.com does mention the thread safety issue and documents the strtok functions of both the C and the C++ programming languages.
Are you perhaps just mis-reading the description?
Once the terminating null character of str has been found in a call to
strtok, all subsequent calls to this function with a null pointer
as the first argument return a null pointer.
Given 'subsequent', I'm reading this as every call to strtok after the one that discovered \0, not necessarily the current one itself. So, the definition is consistent with behavior (and with what you would expect from strtok).
strtok breaks a string to a sequence of tokens, separated by the given delimeters.
Delimeters only separate tokens, not necesarily terminate them on both side.
Is there a C function which can do the equivalent of find_first_not_of, receiving a string to search and a set of characters and returning the first character in the string that's not of the set?
The strspn function will get you most of the way there. (You just have to massage the return value a bit.)