Linear search vs strlen - c

In one of my assignments, I was required to use linear search to find the last char of a string and set it to a pointer, it is just a simple string eg "blah blah blah". To do this I used
int length = strlen(string);
to find the length, then used a for loop
for (i=1;i<length;i++){
if (string[i]==0){;
end_pointer = &string[i-1];
}
Is there any difference between using linear search for 0 to set the pointer as opposed to using length:
end_pointer = &string[length-1];

I think what your professor is really looking for is:
int i = 0;
while( '\0' != string[i] ) i++;
for the search
Assign after the looping has completed for best efficiency:
char * end_pointer = &string[i - 1];

I think I need to explain how strings are stored in C.
Since strings can, in general, have arbitrary length, there needs to be a way to represent that length along with the content of the string. The two most trivial ways of representing the length are as follows.
explicitly keep track of length
use a special token to denote the end of string
The C language went with the second option. Specifically, '\0' is used to denote the end of the string. So if you have a char * p, then that is a pointer† that points to the first character; the second character in the string is p[1] == *(p+1), and so on.
So how do you get the length of the string? In the first method of representing strings (NOT the C way), it's already explicitly available. With C strings, you have to start at the beginning and count how many characters there are until the special token ('\0' in C). This is called a linear search for the end of string token.
strlen implements such a linear search, but it sounds like you are not supposed to use it. Regardless, strlen doesn't actually give you the pointer to the end of the string; you would have to compute it as
char *endPtr = string + strlen(string);
In this case, endPtr will actually point to the null-termination character, which is just past the end of the string. This is a common paradigm in C for specifying ranges: the start of the range (string in this case) is usually inclusive, and the end of the range (endPtr in this case) is usually exclusive.
†
char * could just be a pointer to a single char and not necessarily a string, but that doesn't concern us here.

The difference between using a linear search for '\0' to set the pointer as opposed to using length derived from strlen() is a slight potential efficiency change.
If one rolls their own code or uses the standard library function strlen(), it is still the same order of complexity O(n). If anything, srtrlen() has potential of being more efficient.
If the goal is to create you own code and point to the last char in a string (not the '\0'), handle "" as a special case, otherwise perform a simple loop;
char *LastCharPointer(char *string) {
if (*string == '\0') {
return NULL;
}
do {
string++;
} while (*string);
return string - 1;
}
If the goal is to point to the null chanracter '\0':
char *NullCharPointer(char *string) {
while (*string) string++;
return string;
}

I am assuming the code you pasted above is not the actual code you wrote, it would be :
for( i = 0; i < strlen( string ); i++ ) {
if( string[ i ] ){
end_pointer = &string[i - 1];
}
}
You can do this in two ways :
char * end_pointer = &string[ strlen( string ) - 1 ]
or
for( i = 0; string[ i ] ; i++ );
char * end_pointer = &string[ i - 1 ]
Effectively when you call strlen( ), it runs in linear time to calculate the length. Once you have the length you can index into the string directly or, you could yourself search for the terminating '\0' character. All this works assuming that your string is null-terminated.
EDIT : The second option had a missing ";".

Related

CamelCase to snake_case in C without tolower

I want to write a function that converts CamelCase to snake_case without using tolower.
Example: helloWorld -> hello_world
This is what I have so far, but the output is wrong because I overwrite a character in the string here: string[i-1] = '_';.
I get hell_world. I don't know how to get it to work.
void snake_case(char *string)
{
int i = strlen(string);
while (i != 0)
{
if (string[i] >= 65 && string[i] <= 90)
{
string[i] = string[i] + 32;
string[i-1] = '_';
}
i--;
}
}
This conversion means, aside from converting a character from uppercase to lowercase, inserting a character into the string. This is one way to do it:
iterate from left to right,
if an uppercase character if found, use memmove to shift all characters from this position to the end the string one position to the right, and then assigning the current character the to-be-inserted value,
stop when the null-terminator (\0) has been reached, indicating the end of the string.
Iterating from right to left is also possible, but since the choice is arbitrary, going from left to right is more idiomatic.
A basic implementation may look like this:
#include <stdio.h>
#include <string.h>
void snake_case(char *string)
{
for ( ; *string != '\0'; ++string)
{
if (*string >= 65 && *string <= 90)
{
*string += 32;
memmove(string + 1U, string, strlen(string) + 1U);
*string = '_';
}
}
}
int main(void)
{
char string[64] = "helloWorldAbcDEFgHIj";
snake_case(string);
printf("%s\n", string);
}
Output: hello_world_abc_d_e_fg_h_ij
Note that:
The size of the string to move is the length of the string plus one, to also move the null-terminator (\0).
I am assuming the function isupper is off-limits as well.
The array needs to be large enough to hold the new string, otherwise memmove will perform invalid writes!
The latter is an issue that needs to be dealt with in a serious implementation. The general problem of "writing a result of unknown length" has several solutions. For this case, they may look like this:
First determine how long the resulting string will be, reallocating the array, and only then modifying the string. Requires two passes.
Every time an uppercase character is found, reallocate the string to its current size + 1. Requires only one pass, but frequent reallocations.
Same as 2, but whenever the array is too small, reallocate the array to twice its current size. Requires a single pass, and less frequent (but larger) reallocations. Finally reallocate the array to the length of the string it actually contains.
In this case, I consider option 1 to be the best. Doing two passes is an option if the string length is known, and the algorithm can be split into two distinct parts: find the new length, and modify the string. I can add it to the answer on request.

Finding the Beginning of a string in C

To solve a question, I am looking for a way to stop a loop after it has reached the beginning of the string, assuming the loop starts from the end and decrements, is there an alternative way to do this without finding the length of the string first and decrementing till the number is zero?
Please keep in mind the only functions I can use are malloc, free and write.
This is not possible, because there is nothing special about a string's contents at the beginning. C strings have a "sentinel value" at their end - '\0' - but the first character, and the byte in memory before the first character, can have any value.
is there an alternative way to do this without finding the length of the string first and decrementing till the number is zero?
Apparently you already know where the end of the string is. I suppose you must have a pointer to the terminator character, since you think you do not know the string length.
If finding the length of the string is a viable option at all, however, then you must already know where the beginning is, too. And if you know where the beginning is and you know where the end is, then you already know the length: it is end - beginning. But you do not need to keep a separate counter to iterate backward from the end of a string to the beginning, supposing that you do know where both the end and the beginning are. You can simply use pointer comparisons instead. For example:
int count_a_backwards(const char *beginning, const char *end) {
int count = 0;
for (const char *c = end; c > beginning; ) {
if (*--c == 'a') count += 1;
}
return count;
}
If in fact you do not know where the beginning of the string is, however, then you cannot identify it at all, at least not in the general case. Perhaps you can recognize the beginning if you have some kind of prior knowledge about the string's contents, or about its alignment, or some such, but in general, the beginning of a string cannot be recognized.
Please keep in mind the only functions I can use are malloc, free and
write.
If you are using the function malloc then the function returns pointer to the first byte of the allocated memory. So if the allocated array will contain a string then its beginning will be known.
The task is to find the end of the string.
You can use either the standard C function strlen or write your own loop that will find the end of the stored string.
So if you have two pointers, one that points to the beginning of a string and the second that points to the end of the same string then to traverse the string in the reverse order is not a hard work.
Pay attention to that if you have a character array that contains a string like this
char s[] = "Hello";
then the expressions s, s + 1, s + 2 and so on all points to a string correspondingly "Hello", "ello", "llo" and so on.
You could find the beginning of a string having a pointer to its end provided that the first element of the array contains a unique symbol that is a sentinel value. However in general this is a very rare case.
Here is a demonstrative program that shows how you can traverse a string in the reverse order without using standard C string functions except a function that places a string in a dynamically allocated array.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
enum { N = 12 };
char *s = malloc( N );
strcpy( s, "Hello World" );
puts( s );
char *p = s;
while ( *p ) ++p;
while ( p != s ) putchar( *--p );
putchar( '\n');
free( s );
return 0;
}
The program output is
Hello World
dlroW olleH

Check if a string is the substring of another string

I was having some problem when trying to check if a string is the substring of another string. Here is the expected output:
Enter a source string: abc
Enter the target string: abcde
findSubstring(): 1
Enter a source string: abcde
Enter the target string: cdef
findSubstring(): -1
And here is my code which used the strstr standard string library:
int main()
{
char sourceStr[40], targetStr[40];
printf("Enter a source string: ");
gets(sourceStr);
printf("Enter the target string: ");
gets(targetStr);
printf("findSubstring(): %d\n", findSubstring(sourceStr, targetStr));
return 0;
}
int findSubstring(char *s, char *t) {
if (strstr(s, t) != NULL) {
return 1;
}
return 0;
}
With these code, it works perfectly. However, I was told that I am not supposed to use any standard string library, so I was thinking how should I modify it?
Sorry for posting the question with no error but I seriously need a head start, as I googled for quite a while and I still have no idea how to do it.
Thanks in advance.
The key here is that when working in C, strings do not exist as an actual data structure as they do in, say, C++'s STL (std::string) or Java's String objects.
Instead, you should be treating them as a sequence of individual characters, whose end is denoted by an agreed-upon convention, which in this case is the 'NULL' character, which is the value 0 and can be represented in a string literal using the escaping symbol.
This is why strings are passed around as pointers to char in C, which actually point to the first character in the sequence.
Therefore, you can use pointer arithmetic to check the subsequent characters, until you find a value of 0, which means the string has ended.
By using a snippet like this, you can check any given string character by character.
char * pointer_to_string; //Pointer to start of string
char * pointer_to_character = pointer_to_string; //Start at the first character
while (*pointer_to_character != '\0'){ // Repeat while we haven't found the end
char c = *pointer_to_character; // The character the pointer is pointing to.
//do what you need to with the character
pointer_to_character++; //Now it points to the next character
}
// We exit the loop once the end of the string is found
HOWEVER:
This means you must be careful since this kind of string manipulation has its risks, since you are depending on finding an actual NULL character that ends the string, and if it's not present, the loop would run indefinitely, and in more complex examples, would easily lead to a segmentation fault and a crash.
In short, when using raw pointers in C, gotta be extra careful with what you do with the underlying memory, and certainly using known libraries and not reinventing the wheel tends to be the best option, but since I'm inclined to believe the purpose of the assignment is learning about string representation and pointer arithmetic, we'll do with that.
With this, you should be able to figure out what you need to do to solve the problem.
Well, if you don't want to use standard library, here is one way to do it.
This is simple code that satisfies the purpose:
int FindString(char *Str,const char *SubStr)
{
size_t count = 0 , x , y ;
size_t Str_len = strlen( Str ) ;
size_t SubStr_len = strlen( SubStr );
size_t diff = Str_len - SubStr_len;
if( SubStr_len > Str_len )
return 0;
for( x = 0 ; x <= diff ; x++ )
{
for( y = 0 ; y < SubStr_len ; y++ )
{
if( Str[ x + y ] == SubStr[ y ] )
count++;
else
{
count = 0;
break;
}
}
if( count == SubStr_len )
return 1;
}
return 0;
}
Also, if you want the version that compares insensitively, notify me in a comment.

How to add a character to the back of a char array when you obtain it with a gets() function in c?

I have an array of charracters where I put in information using a gets().
char inname[30];
gets(inname);
How can I add another character to this array without knowing the length of the string in c? (the part that are actual letters and not like empty memmory spaces of romething)
note: my buffer is long enough for what I want to ask the user (a filename, Probebly not many people have names longer that 29 characters)
Note that gets is prone to buffer overflow and should be avoided.
Reading a line of input:
char inname[30];
sscanf("%.*s", sizeof(inname), inname);
int len = strlen(inname);
// Remove trailing newline
if (len > 0 && inname[len-1] == '\n') {
len--;
inname[len] = '\0'
}
Appending to the string:
char *string_to_append = ".";
if (len + strlen(string_to_append) + 1) <= sizeof(inname)) {
// There is enough room to append the string
strcat(inname, string_to_append);
}
Optional way to append a single character to the string:
if (len < sizeof(inname) - 2) {
// There is room to add another character
inname[len++] = '.'; // Add a '.' character to the string.
inname[len] = '\0'; // Don't forget to nul-terminate
}
As you have asked in comment, to determine the string length you can directly use
strlen(inname);
OR
you can loop through string in a for loop until \0 is found.
Now after getting the length of prvious string you can append new string as
strcat(&inname[prevLength],"NEW STRING");
EDIT:
To find the Null Char you can write a for loop like this
for(int i =0;inname[i] != 0;i++)
{
//do nothing
}
Now you can use i direcly to copy any character at the end of string like:
inname[i] = Youe Char;
After this increment i and again copy Null char to(0) it.
P.S.
Any String in C end with a Null character termination. ASCII null char '\0' is equivalent to 0 in decimal.
You know that the final character of a C string is '\0', e.g. the array:
char foo[10]={"Hello"};
is equivalent to this array:
['H'] ['e'] ['l'] ['l'] ['0'] ['\0']
Thus you can iterate on the array until you find the '\0' character, and then you can substitute it with the character you want.
Alternatively you can use the function strcat of string.h library
Short answer is you can't.
In c you must know the length of the string to append char's to it, in other languages the same applies but it happens magically, and without a doubt, internally the same must be done.
c strings are defined as sequences of bytes terminated by a special byte, the nul character which has ascii code 0 and is represented by the character '\0' in c.
You must find this value to append characters before it, and then move it after the appended character, to illustrate this suppose you have
char hello[10] = "Hello";
then you want to append a '!' after the 'o' so you can just do this
size_t length;
length = strlen(hello);
/* move the '\0' one position after it's current position */
hello[length + 1] = hello[length];
hello[length] = '!';
now the string is "Hello!".
Of course, you should take car of hello being large enough to hold one extra character, that is also not automatic in c, which is one of the things I love about working with it because it gives you maximum flexibility.
You can of course use some available functions to achieve this without worrying about moving the '\0' for example, with
strcat(hello, "!");
you will achieve the same.
Both strlen() and strcat() are defined in string.h header.

C - start traversing from the middle of a string

Just double checking because I keep mixing up C and C++ or C# but say that I have a string that I was parsing using strcspn(). It returns the length of the string up until the first delimiter it finds. Using strncpy (is that C++ only or was that available in C also?) I copy the first part of the string somewhere else and have a variable store my position. Let's say strcspn returned 10 (so the delimiter is the 10th character)
Now, my code does some other stuff and eventually I want to keep traversing the string. Do I have to copy the second half of the string and then call strncspn() from the beginning. Can I just make a pointer and point it at the 11th character of my string and pass that to strncspn() (I guess something like char* pos = str[11])? Something else simpler I'm just missing?
You can get a pointer to a location in the middle of the string and you don't need to copy the second half of the string to do it.
char * offset = str + 10;
and
char * offset = &str[10];
mean the same thing and both do what you want.
You mean str[9] for the 10th char, or str[10] for the 11th, but yes you can do that.
Just be careful that you are not accessing beyond the length of the string and beyond the size of memory allocated.
It sounds like you are performing tokenization, I would suggest that you can directly use strtok instead, it would be cleaner, and it already handles both of what you want to do (strcspn+strncpy and continue parsing after the delimiter).
you can call strcspn again with (str + 11) as first argument. But make sure that length of str is greater than 11.
n = strcspn(str, pattern);
while ((n+1) < strlen(str))
{
n2 = strcspn((str+n), pattern);
n += n2;
}
Note : using char *pos = str[11] is wrong. You should use like char *pos = str + 11;

Resources