How does C know the end of my string? - c

I have a program in which I wanted to remove the spaces from a string. I wanted to find an elegant way to do so, so I found the following (I've changed it a little so it could be better readable) code in a forum:
char* line_remove_spaces (char* line)
{
char *non_spaced = line;
int i;
int j = 0;
for (i = 0; i <= strlen(line); i++)
{
if ( line[i] != ' ' )
{
non_spaced[j] = line[i];
j++;
}
}
return non_spaced;
}
As you can see, the function takes a string and, using the same allocated memory space, selects only the non-spaced characters. It works!
Anyway, according to Wikipedia, a string in C is a "Null-terminated string". I always thought this way and everything was good. But the problem is: we put no "null-character" in the end of the non_spaced string. And somehow the compiler knows that it ends at the last character changed by the "non_spaced" string. How does it know?

This does not happen by magic. You have in your code:
for (i = 0; i <= strlen(line); i++)
^^
The loop index i runs till strlen(line) and at this index there is a nul character in the character array and this gets copied as well. As a result your end result has nul character at the desired index.
If you had
for (i = 0; i < strlen(line); i++)
^^
then you had to put the nul character manually as:
for (i = 0; i < strlen(line); i++)
{
if ( line[i] != ' ' )
{
non_spaced[j] = line[i];
j++;
}
}
// put nul character
line[j] = 0;

Others have answered your question already, but here is a faster, and perhaps clearer version of the same code:
void line_remove_spaces (char* line)
{
char* non_spaced = line;
while(*line != '\0')
{
if(*line != ' ')
{
*non_spaced = *line;
non_spaced++;
}
line++;
}
*non_spaced = '\0';
}

The loop uses <= strlen so you will copy the null terminator as well (which is at i == strlen(line)).

You could try it. Debug it while it is processing a string containing only one space: " ". Watch carefully what happens to the index i.

How do you know that it "knows"? The most likely scenario is that you're simply having luck with your undefined behavior, and that there is a '\0'-character after the valid bytes of line end.
It's also highly likely that you're not seeing spaces at the end, which might be printed before hitting the stray "lucky '\0'".
A few other points:
There's no need to write this using indexing.
It's not very efficient to call strlen() on each loop iteration.
You might want to use isspace() to remove more whitespace characters.
Here's how I would write it, using isspace() and pointers:
char * remove_spaces(char *str)
{
char *ret = str, *put = str;
for(; *str != '\0'; str++)
{
if(!isspace((unsigned char) *str)
*put++ = *str;
}
*put = '\0';
return ret;
}
Note that this does terminate the space-less version of the string, so the returned pointer is guaranteed to point at a valid string.

The string parameter of your function is null-terminated, right?
And in the loop, the null character of the original string get also copied into the non spaced returned string. So the non spaced string is actually also null-terminated!
For your compiler, the null character is just another binary data that doesn't get any special treatment, but it's used by string APIs as a handy character to easily detect end of strings.

If you use the <= strlen(line), the length of the strlen(line) include the '\0' so your program can work. You can use debug and run analysis.

Related

Cannot assign chars from one string to another

So I have a function that takes a string, and strips out special format characters, and assigns it to another string for later processing.
A sample call would be:
act_new("$t does $d");
It should strip out the $t and the $d and leave the second string as " does ", but its not assigning anything. I am getting back into programming after quite a few years of inactivity, and this is someone elses code (A MUD codebase, Rom), but I feel like I am missing something fundamental with pointer assignments. Any tips?
(This is truncated code, the rest has no operations on str or point until much later)
void act_new(const char *format)
{
const char *str;
char *point;
str = format;
while ( *str != '\0' ) {
if ( *str != '$' ) {
*point++ = *str++;
continue;
}
}
}
You need to increment str every time through the loop, not only when you assign to point. Otherwise you end up in an infinite loop when the character doesn't match the if condition.
You also want to skip the character after $, so you have to increment str twice when you encounter $.
The code is simpler if you use a for loop and array indexing rather than pointer arithmetic.
size_t len = strlen(format);
for (size_t i = 0; i < len; i++) {
if (format[i] == '$') {
i++; // extra increment to skip character after $
} else {
*point++ = format[i];
}
}
There are a few problems with your code, as pointed out in the comments:
point is not initialized (garbage pointer value)
continue doesn't do anything
infinite loop if a $ is encountered
When writing the function, one must also keep in mind to skip an extra character if a $ is encountered if and only if it's not the last character in the string (except for the '\0').
Since you know how many times you need to loop, a for loop is better suited and, as a bonus, you don't have to explicitly check if the character after a $ is '\0' when skipping an extra character in format (renamed src below). Also, don't forget to terminate the destination string.
This code will take care of those things for you:
void act_new(const char *src)
{
const size_t length = strlen(src);
char * const dst = (char*)malloc(sizeof(char)*(length+1));
if(dst == NULL)
// Error handling left out
return;
char *point = dst;
for(size_t i = 0; i < length; ++i)
{
if(src[i] == '$')
{
++i;
continue;
}
*point++ = src[i];
}
*point = '\0'; //Terminate string properly
printf("%s\n", dst);
free(dst);
}

How to return empty string from a function in C?

How should I return an empty string from a function? I tried using lcp[i] = ' ' but it creates an error. Then I used lcp[i] = 0 and it returned an empty string. However, I do not know if it's right.
Also, is it necessary to use free(lcp) in the caller function? Since I could not free and return at the same time.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LEN 50
char *find_LCP(char str1[], char str2[]);
char *find_LCP(char str1[], char str2[]){
char * lcp = malloc(MAX_LEN * sizeof(char));
int a = strlen(str1);
int b = strlen(str2);
int min = a < b ? a : b;
for(int i = 0; i < min; i++){
if(str1[i] == str2[i])
lcp[i] = str1[i];
else
lcp[i] = 0;
}
return lcp;
}
int main()
{
char str1[MAX_LEN], str2[MAX_LEN];
char * lcp;
printf("Enter first word > ");
scanf("%s", str1);
printf("Enter second word > ");
scanf("%s", str2);
lcp = find_LCP(str1, str2);
printf("\nLongest common prefix: '%s'\n", lcp);
free(lcp);
return 0;
}
An "empty" string is just a string with the first byte zero, so you can write:
s[0] = 0;
However, it is not clear what you are trying to do. The LCP of "foo" and "fob" is "fo", not the empty string.
You can also return as soon as you find the first non-matching character, no need to go until the end.
Further, you can simply pass the output string as a parameter and have lcp be an array. That way you avoid both malloc and free:
char lcp[MAX_LEN];
...
find_LCP(lcp, str1, str2);
If you want to empty a string without using a for loop then you can do
lcp[0] = 0
but for emptying a string it was right the way you did using a for loop.
There are plenty other ways of emptying the string word by word using for loop:
lcp[i] = '\0';
and it's the right way to make string empty as letter by letter you trying to do using for loop
But if you are not using some loops and simply empty a string then you can do this.
memset(buffer,0,strlen(buffer));
but this will only work for zeroing up to the first NULL character.
If the string is a static array, you can use:
memset(buffer,0,sizeof(buffer));
Your program has a bug: If you supply two identical strings, lcp[i] = 0; never executes which means that your function will return a string which is not NUL-terminated. This will cause undefined behvaior when you use that string in your printf in main.
The fix for this is easy, NUL-terminate the string after the loop:
int i;
for (i = 0; i < min; i++){
if(str1[i] == str2[i])
lcp[i] = str1[i];
else
break;
}
lcp[i] = 0;
As for the answer to the question, an empty string is one which has the NUL-terminator right at the start. We've already handled that as we've NUL-terminated the string outside the loop.
Also, is it necessary to use free(lcp) in the caller function?
In this case, it is not required as the allocated memory will get freed when the program exits, but I'd recommend keeping it because it is good practice.
As the comments say, you can use calloc instead of malloc which fills the allocated memory with zeros so you don't have to worry about NUL-terminating.
In the spirit of code golf. No need to calculate string lengths. Pick any string and iterate through it until the current character either null or differs from the corresponding character in the other string. Store the index, then copy appropriate number of bytes.
char *getlcp(const char *s1, const char *s2) {
int i = 0;
while (s1[i] == s2[i] && s1[i] != '\0') ++i;
char *lcp = calloc((i + 1), sizeof(*lcp));
memcpy(lcp, s1, i);
return lcp;
}
P.S. If you don't care about preserving one of input strings then you can simplify the code even further and just return the index (the position of the last character of the common prefix) from the function, then put '\0' at that index into one of the strings.

Printing a string due to a new line

Is there any efficient (- in terms of performance) way for printing some arbitrary string, but only until the first new line character in it (excluding the new line character) ?
Example:
char *string = "Hello\nWorld\n";
printf(foo(string + 6));
Output:
World
If you are concerned about performance this might help (untested code):
void MyPrint(const char *str)
{
int len = strlen(str) + 1;
char *temp = alloca(len);
int i;
for (i = 0; i < len; i++)
{
char ch = str[i];
if (ch == '\n')
break;
temp[i] = ch;
}
temp[i] = 0;
puts(temp);
}
strlen is fast, alloca is fast, copying the string up to the first \n is fast, puts is faster than printf but is is most likely far slower than all three operations mentioned before together.
size_t writetodelim(char const *in, int delim)
{
char *end = strchr(in, delim);
if (!end)
return 0;
return fwrite(in, 1, end - in, stdout);
}
This can be generalized somewhat (pass the FILE* to the function), but it's already flexible enough to terminate the output on any chosen delimiter, including '\n'.
Warning: Do not use printf without format specifier to print a variable string (or from a variable pointer). Use puts instead or "%s", string.
C strings are terminated by '\0' (NUL), not by newline. So, the functions print until the NUL terminator.
You can, however, use your own loop with putchar. If that is any performance penalty is to be tested. Normally printf does much the same in the library and might be even slower, as it has to care for more additional constraints, so your own loop might very well be even faster.
for ( char *sp = string + 6 ; *sp != '\0'; sp++ ) {
if ( *sp == '\n' ) break; // newline will not be printed
putchar(*sp);
}
(Move the if-line to the end of the loop if you want newline to be printed.)
An alternative would be to limit the length of the string to print, but that would require finding the next newline before calling printf.
I don't know if it is fast enough, but there is a way to build a string containing the source string up to a new line character only involving one standard function.
char *string = "Hello\nWorld\nI love C"; // Example of your string
static char newstr [256]; // String large enough to contain the result string, fulled with \0s or NULL-terimated
sscanf(string + 6, "%s", newstr); // sscanf will ignore whitespaces
sprintf(newstr); // printing the string
I guess there is no more efficient way than simply looping over your string until you find the first \n in it. As Olaf mentioned it, a string in C ends with a terminating \0 so if you want to use printf to print the string you need to make sure it contains the terminating \0 or yu could use putchar to print the string character by character.
If you want to provide a function creating a string up to the first found new line you could do something like that:
#include <stdio.h>
#include <string.h>
#define MAX 256
void foo(const char* string, char *ret)
{
int len = (strlen(string) < MAX) ? (int) strlen(string) : MAX;
int i = 0;
for (i = 0; i < len - 1; i++)
{
if (string[i] == '\n') break;
ret[i] = string[i];
}
ret[i + 1] = '\0';
}
int main()
{
const char* string = "Hello\nWorld\n";
char ret[MAX];
foo(string, ret);
printf("%s\n", ret);
foo(string+6, ret);
printf("%s\n", ret);
}
This will print
Hello
World
Another fast way (if the new line character is truly unwanted)
Simply:
*strchr(string, '\n') = '\0';

Reading from textfile in C, finding newline and blank space in pattern

I am reading from a textfile and there is a pattern. I am currently reading the file with help of tokens. There are many lines and the pattern breaks if there are any spaces or row breaks after the pattern is finished. This is what I have tried so far:
char newLine[10];
strcpy(newLine, "\n");
int stringValue;
....
*readfromfile*
{
...
stringValue = strcmp(token, newLine);
if(stringValue == 0)
{
...
So if there is a new line or blank space after the line I want the if statement to go through. Does anyone know how to solve this? Is it that token doesn't aquire the character, " " and "\n". If so, what can I do to solve it?
Just trim(newline) before calling strcmp(). trim() will remove any blanks at the beginning or the end of the string.
You can found an example in C in this question.
There's another way: if there's blanks at the end of newLine that you don't want to compare by using strcmp(), use strncmp() instead (with the length of token).
As per your comment you are reading buffer first and then want to remove characters then do as below example, (with assumption your buffer size is of 1024)
int RemoveTokens(char *string)
{
char buf[1024] = {0};
int i = 0;
int j = 0;
strncpy(buf, string, 1024);
memset(string, 0x00, 1024);
for (i=0; i<1024; i++)
{
if (' ' != buf[i] && '\n' != buf[i])
{
string[j] = buf[i];
j++;
}
}
}
Hope it will helps you to do it easily. :D

Writing String.trim() in C [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Painless way to trim leading/trailing whitespace in C?
Trim a string in C
I was writing the String trim method in c and this is the code I came up with. I think it does the job of eliminating leading and trailing whitespaces however, I wish the code could be cleaner. Can you suggest improvements?
void trim(char *String)
{
int i=0;j=0;
char c,lastc;
while(String[i])
{
c=String[i];
if(c!=' ')
{
String[j]=c;
j++;
}
else if(lastc!= ' ')
{
String[j]=c;
j++;
}
lastc = c;
i++;
}
Does this code look clean ??
It doesn't look clean. Assuming the first character is a space, you're using lastc with an undefined value. You're leaving one space at the end (if there's a space at the end, when it's hit c will be a space and lastc won't).
You're also not terminating the string. Assuming you fix the uninitialized lastc problem, you'll transform " abc" to "abcbc", since it's not being shortened at any point.
The code also collapses multiple spaces inside the string. This isn't what you described; is it desired behavior?
It often makes your code more readable if you make judicious use of the standard library functions - for example, isspace() and memmove() are particularly useful here:
#include <string.h>
#include <ctype.h>
void trim(char *str)
{
char *start, *end;
/* Find first non-whitespace */
for (start = str; *start; start++)
{
if (!isspace((unsigned char)start[0]))
break;
}
/* Find start of last all-whitespace */
for (end = start + strlen(start); end > start + 1; end--)
{
if (!isspace((unsigned char)end[-1]))
break;
}
*end = 0; /* Truncate last whitespace */
/* Shift from "start" to the beginning of the string */
if (start > str)
memmove(str, start, (end - start) + 1);
}
There's several problems with that code. It only checks for space. Not tabs or newlines. You are copying the entire non-whitespace part of the string. And you are using lastc before setting it.
Here's an alternate version (compiled but not tested):
char *trim(char *string)
{
char *start;
int len = strlen(string);
int i;
/* Find the first non whitespace char */
for (i = 0; i < len; i++) {
if (! isspace(string[i])) {
break;
}
}
if (i == len) {
/* string is all whitespace */
return NULL;
}
start = &string[i];
/* Remove trailing white space */
for (i = len; i > 0; i--) {
if (isspace(string[i])) {
string[i] = '\0';
} else {
break;
}
}
return start;
}
There are some problems: lastc could be used uninitialized. And you could make use of a for loop instead of a while loop, for example. Furthermore, trim/strip functions usually replace spaces, tabs and newlines.
Here's a solution using pointers that I wrote quite a while ago:
void trim(char *str)
{
char *ptr = str;
while(*ptr == ' ' || *ptr == '\t' || *ptr == '\r' || *ptr == '\n') ++ptr;
char *end = ptr;
while(*end) ++end;
if(end > ptr)
{
for(--end; end >= ptr && (*end == ' ' || *end == '\t' || *end == '\r' || *end == '\n'); --end);
}
memmove(str, ptr, end-ptr);
str[end-ptr] = 0;
}
Here is my solution.
Short, simple, clean, commented, and lightly tested.
It uses the "isspace" classification function, so you can easily change your definition of "white space" to be trimmed.
void trim(char* String)
{
int dest;
int src=0;
int len = strlen(String);
// Advance src to the first non-whitespace character.
while(isspace(String[src])) src++;
// Copy the string to the "front" of the buffer
for(dest=0; src<len; dest++, src++)
{
String[dest] = String[src];
}
// Working backwards, set all trailing spaces to NULL.
for(dest=len-1; isspace(String[dest]); --dest)
{
String[dest] = '\0';
}
}
Instead of comparing a character with the space character ' ', I'd use the "isspace" function, which I believe is defined in ctype.h.
I don't know about clean, but I find it hard to follow. If I needed to do this I'd initially think of it in two phases:
Figure out how many characters to drop from the beginning, then memmove the rest of the string (including the null terminator) to the start address. (You might not need the memmove if you are allowed to return a different start pointer, but if so you need to be very careful with memory hygiene.)
Figure out how many characters to drop from the (new) end and set a new null terminator there.
I might then look more closely at a one-pass solution like you seem to be trying to implement, but only if there was a speed problem.
By the way, you probably want to use isspace() rather than checking only for space.

Resources