CamelCase to snake_case in C without tolower - c

I want to write a function that converts CamelCase to snake_case without using tolower.
Example: helloWorld -> hello_world
This is what I have so far, but the output is wrong because I overwrite a character in the string here: string[i-1] = '_';.
I get hell_world. I don't know how to get it to work.
void snake_case(char *string)
{
int i = strlen(string);
while (i != 0)
{
if (string[i] >= 65 && string[i] <= 90)
{
string[i] = string[i] + 32;
string[i-1] = '_';
}
i--;
}
}

This conversion means, aside from converting a character from uppercase to lowercase, inserting a character into the string. This is one way to do it:
iterate from left to right,
if an uppercase character if found, use memmove to shift all characters from this position to the end the string one position to the right, and then assigning the current character the to-be-inserted value,
stop when the null-terminator (\0) has been reached, indicating the end of the string.
Iterating from right to left is also possible, but since the choice is arbitrary, going from left to right is more idiomatic.
A basic implementation may look like this:
#include <stdio.h>
#include <string.h>
void snake_case(char *string)
{
for ( ; *string != '\0'; ++string)
{
if (*string >= 65 && *string <= 90)
{
*string += 32;
memmove(string + 1U, string, strlen(string) + 1U);
*string = '_';
}
}
}
int main(void)
{
char string[64] = "helloWorldAbcDEFgHIj";
snake_case(string);
printf("%s\n", string);
}
Output: hello_world_abc_d_e_fg_h_ij
Note that:
The size of the string to move is the length of the string plus one, to also move the null-terminator (\0).
I am assuming the function isupper is off-limits as well.
The array needs to be large enough to hold the new string, otherwise memmove will perform invalid writes!
The latter is an issue that needs to be dealt with in a serious implementation. The general problem of "writing a result of unknown length" has several solutions. For this case, they may look like this:
First determine how long the resulting string will be, reallocating the array, and only then modifying the string. Requires two passes.
Every time an uppercase character is found, reallocate the string to its current size + 1. Requires only one pass, but frequent reallocations.
Same as 2, but whenever the array is too small, reallocate the array to twice its current size. Requires a single pass, and less frequent (but larger) reallocations. Finally reallocate the array to the length of the string it actually contains.
In this case, I consider option 1 to be the best. Doing two passes is an option if the string length is known, and the algorithm can be split into two distinct parts: find the new length, and modify the string. I can add it to the answer on request.

Related

C dealing with variable length string

I'm new to C, taking a university course.
In one of the tasks I'm given, I deal with strings. I take strings either entered by user or parsed from a file and then use a function on them to produce an answer (if a specific quality exists).
The string can be of variable length but it is acceptable to assume that their maximum length is 80 characters.
I created the program using a
char s[81];
and then filling up the same array with the different strings each time.
Since the string has to be null-terminated I just added a '\0' at index 80;
s[80] = '\0';
But then I got all kind of weird behaviors - Unrelated characters at the end of the string I entered. I assumed this is because there was space between the end of the 'real' characters and the '\0' character filled with garbage(?).
So what I did is I created a function:
void clean_string(char s[], int string_size) {
int index = 0;
while(index < string_size) {
s[index++] = '\0';
}
}
What I call clean, is just filling a string up with zero characters. I do this every time I am done dealing with a string and ready to accept a new one. Then I fill up the string again character by character and when ever I'll stop, the following character will be a '\0' for sure.
To not include any magic numbers in code (81 each time I call clean_string) I used the following:
#define STRING_LENGTH 81
That works for me. The strings show no strange behavior. But I wondered if this is considered bad practice. Are there problems with this approach?
Just emphasizing, I'm not asking for help in the assignment itself, but tips on how to approach these kind of situations better.
Rather than prefilling the entire array with zeros, it should be simple to just add a single zero after you've read all relevant characters.
For example:
char s[STRING_LENGTH];
int c;
int idx = 0;
while (((c = getchar()) != '\n') && (idx < STRING_LENGTH - 1) && (c != EOF)) {
s[idx++] = c;
}
s[idx] = 0;

How to add a character to the back of a char array when you obtain it with a gets() function in c?

I have an array of charracters where I put in information using a gets().
char inname[30];
gets(inname);
How can I add another character to this array without knowing the length of the string in c? (the part that are actual letters and not like empty memmory spaces of romething)
note: my buffer is long enough for what I want to ask the user (a filename, Probebly not many people have names longer that 29 characters)
Note that gets is prone to buffer overflow and should be avoided.
Reading a line of input:
char inname[30];
sscanf("%.*s", sizeof(inname), inname);
int len = strlen(inname);
// Remove trailing newline
if (len > 0 && inname[len-1] == '\n') {
len--;
inname[len] = '\0'
}
Appending to the string:
char *string_to_append = ".";
if (len + strlen(string_to_append) + 1) <= sizeof(inname)) {
// There is enough room to append the string
strcat(inname, string_to_append);
}
Optional way to append a single character to the string:
if (len < sizeof(inname) - 2) {
// There is room to add another character
inname[len++] = '.'; // Add a '.' character to the string.
inname[len] = '\0'; // Don't forget to nul-terminate
}
As you have asked in comment, to determine the string length you can directly use
strlen(inname);
OR
you can loop through string in a for loop until \0 is found.
Now after getting the length of prvious string you can append new string as
strcat(&inname[prevLength],"NEW STRING");
EDIT:
To find the Null Char you can write a for loop like this
for(int i =0;inname[i] != 0;i++)
{
//do nothing
}
Now you can use i direcly to copy any character at the end of string like:
inname[i] = Youe Char;
After this increment i and again copy Null char to(0) it.
P.S.
Any String in C end with a Null character termination. ASCII null char '\0' is equivalent to 0 in decimal.
You know that the final character of a C string is '\0', e.g. the array:
char foo[10]={"Hello"};
is equivalent to this array:
['H'] ['e'] ['l'] ['l'] ['0'] ['\0']
Thus you can iterate on the array until you find the '\0' character, and then you can substitute it with the character you want.
Alternatively you can use the function strcat of string.h library
Short answer is you can't.
In c you must know the length of the string to append char's to it, in other languages the same applies but it happens magically, and without a doubt, internally the same must be done.
c strings are defined as sequences of bytes terminated by a special byte, the nul character which has ascii code 0 and is represented by the character '\0' in c.
You must find this value to append characters before it, and then move it after the appended character, to illustrate this suppose you have
char hello[10] = "Hello";
then you want to append a '!' after the 'o' so you can just do this
size_t length;
length = strlen(hello);
/* move the '\0' one position after it's current position */
hello[length + 1] = hello[length];
hello[length] = '!';
now the string is "Hello!".
Of course, you should take car of hello being large enough to hold one extra character, that is also not automatic in c, which is one of the things I love about working with it because it gives you maximum flexibility.
You can of course use some available functions to achieve this without worrying about moving the '\0' for example, with
strcat(hello, "!");
you will achieve the same.
Both strlen() and strcat() are defined in string.h header.

Linear search vs strlen

In one of my assignments, I was required to use linear search to find the last char of a string and set it to a pointer, it is just a simple string eg "blah blah blah". To do this I used
int length = strlen(string);
to find the length, then used a for loop
for (i=1;i<length;i++){
if (string[i]==0){;
end_pointer = &string[i-1];
}
Is there any difference between using linear search for 0 to set the pointer as opposed to using length:
end_pointer = &string[length-1];
I think what your professor is really looking for is:
int i = 0;
while( '\0' != string[i] ) i++;
for the search
Assign after the looping has completed for best efficiency:
char * end_pointer = &string[i - 1];
I think I need to explain how strings are stored in C.
Since strings can, in general, have arbitrary length, there needs to be a way to represent that length along with the content of the string. The two most trivial ways of representing the length are as follows.
explicitly keep track of length
use a special token to denote the end of string
The C language went with the second option. Specifically, '\0' is used to denote the end of the string. So if you have a char * p, then that is a pointer† that points to the first character; the second character in the string is p[1] == *(p+1), and so on.
So how do you get the length of the string? In the first method of representing strings (NOT the C way), it's already explicitly available. With C strings, you have to start at the beginning and count how many characters there are until the special token ('\0' in C). This is called a linear search for the end of string token.
strlen implements such a linear search, but it sounds like you are not supposed to use it. Regardless, strlen doesn't actually give you the pointer to the end of the string; you would have to compute it as
char *endPtr = string + strlen(string);
In this case, endPtr will actually point to the null-termination character, which is just past the end of the string. This is a common paradigm in C for specifying ranges: the start of the range (string in this case) is usually inclusive, and the end of the range (endPtr in this case) is usually exclusive.
†
char * could just be a pointer to a single char and not necessarily a string, but that doesn't concern us here.
The difference between using a linear search for '\0' to set the pointer as opposed to using length derived from strlen() is a slight potential efficiency change.
If one rolls their own code or uses the standard library function strlen(), it is still the same order of complexity O(n). If anything, srtrlen() has potential of being more efficient.
If the goal is to create you own code and point to the last char in a string (not the '\0'), handle "" as a special case, otherwise perform a simple loop;
char *LastCharPointer(char *string) {
if (*string == '\0') {
return NULL;
}
do {
string++;
} while (*string);
return string - 1;
}
If the goal is to point to the null chanracter '\0':
char *NullCharPointer(char *string) {
while (*string) string++;
return string;
}
I am assuming the code you pasted above is not the actual code you wrote, it would be :
for( i = 0; i < strlen( string ); i++ ) {
if( string[ i ] ){
end_pointer = &string[i - 1];
}
}
You can do this in two ways :
char * end_pointer = &string[ strlen( string ) - 1 ]
or
for( i = 0; string[ i ] ; i++ );
char * end_pointer = &string[ i - 1 ]
Effectively when you call strlen( ), it runs in linear time to calculate the length. Once you have the length you can index into the string directly or, you could yourself search for the terminating '\0' character. All this works assuming that your string is null-terminated.
EDIT : The second option had a missing ";".

C—Infinite loop, I think?

I'm having an issue with a program in C and I think that a for loop is the culprit, but I'm not certain. The function is meant to take a char[] that has already been passed through a reverse function, and write it into a different char[] with all trailing white space characters removed. That is to say, any ' ' or '\t' characters that lie between a '\n' and any other character shouldn't be part of the output.
It works perfectly if there are no trailing white space characters, as in re-writing an exact duplicate of the input char[]. However, if there are any, there is no output at all.
The program is as follows:
#include<stdio.h>
#define MAXLINE 1000
void trim(char output[], char input[], int len);
void reverse(char output[], char input[], int len);
main()
{
int i, c;
int len;
char block[MAXLINE];
char blockrev[MAXLINE];
char blockout[MAXLINE];
char blockprint[MAXLINE];
i = 0;
while ((c = getchar()) != EOF)
{
block[i] = c;
++i;
}
printf("%s", block); // for debugging purposes
reverse(blockrev, block, i); // reverses block for trim function
trim(blockout, blockrev, i);
reverse(blockprint, blockout, i); // reverses it back to normal
// i also have a sneaking suspicion that i don't need this many arrays?
printf("%s", blockprint);
}
void trim(char output[], char input[], int len)
{
int i, j;
i = 0;
j = 0;
while (i <= len)
{
if (input[i] == ' ' || input[i] == '\t')
{
if (i > 0 && input[i-1] == '\n')
for (; input[i] == ' ' || input[i] == '\t'; ++i)
{
}
else
{
output[j] = input[i];
++i;
++j;
}
}
else
{
output[j] = input[i];
++i;
++j;
}
}
}
void reverse(char output[], char input[], int len)
{
int i;
for (i = 0; len - i >= 0; ++i)
{
output[i] = input[len - i];
}
}
I should note that this is a class assignment that doesn't allow the use of string functions, hence why it's so roundabout.
Change
for (i; input[i] == ' ' || input[i] == '\t'; ++i);
to
for (; i <= len && (input[i] == ' ' || input[i] == '\t'); ++i);
With the first method, if the whitespace is at the end, the loop will iterate indefinitely. Not sure how you didn't get an out of bounds access exception, but that's C/C++ for you.
Edit As Arkku brought up in the comments, make sure your character array is still NUL-terminated (the \0 character), and you can check on that case instead. Make sure you're not trimming the NUL character from the end either.
Declaring your main() function simply as main() is an obsolete style that should not be used. The function must be declared either as int main(void) or as int main(int argc, char *argv[]).
Your input process does not null-terminate your input. This means that what you're working with is not a "string", because a C string, by definition, is an array of char that the last element is a null character ('\0'). Instead, what you've got are simple arrays of char. This wouldn't be a problem as long as you're expecting that, and indeed your code is passing array lengths about, but you're also trying to print it with printf(), which requires C strings, not simple arrays of char.
Your reverse() function has an off-by-one error, because you aren't accounting for the fact that C arrays are zero-indexed, so what you're reversing is always one byte longer than your actual input.
What this means is that if you call reverse(output, input, 10), your code will start by assigning the value at input[10] to output[0], but input[10] is one past the end of your data, and since you didn't initialize your arrays before starting to fill them, that's an indeterminate value. In my testing, that indeterminate value happens, coincidentally, to have zero values much of the time, which means that output[0] gets filled with a null ('\0').
You need to be subtracting one more from the index into the input than you actually are. The loop-termination condition in the reverse() function is also wrong, in compensation, that condition should be len - i > 0, not len - i >= 0.
Your trim() function is unnecessarily complex. Additionally, it too has an incorrect loop condition to compensate for the off-by-one error in reverse(). The loop should be while ( i < len ), not while ( i <= len ).
Additionally, the trim() function has the ability to reduce the size of your data, but you don't provide a way to retain that information. (I see in the comments of Arkku's answer that you've corrected for this already. Good.)
Once you've fixed the issue with not keeping track of your data's size changes, and the off-by-one error which is copying indeterminate data (which happens, coincidentally, to be a null) from the end of the blockout array to the beginning of the blockprint array when you do the second reverse(), and you fix the incorrect <= condition in trim() and the incorrect >= condition in reverse(), and you null-terminate your byte array before passing it to printf(), your program will work.
(Moved from comments to an answer)
My guess is that the problem is outside this function, and is caused by the fact that in the described problem cases the output is shorter than the input. Since you are passing the length of the string as an argument, you need to calculate the length of the string after trim, because it may have changed...
For instance, passing an incorrect length to reverse can cause the terminating NUL character (and possibly some leftover whitespace) to end up at the beginning of the string, thus making the output appear empty.
edit: After seeing the edited question with the code of reverse included, in addition to the above problem, your reverse puts the terminating NUL as the first character of the reversed string, which causes it to be the empty string (in some cases your second reverse puts it back, so you don't see it without printing the output of the first reverse). Note that input[len] contains the '\0', not the last character of the string itself.
edit 2: Furthermore, you are not actually terminating the string in block before using it. It may be the case that the uninitialised array often happens to contain zeroes that serve to terminate the string, but for the program to be correct you absolutely need to terminate it with block[i] = '\0'; immediately after the input loop. Similarly you need ensure NUL-termination of the outputs of reverse and trim (in case of trim it seems to me that this already happens as a side-effect of having the loop condition i <= len instead of i < len, but it's not a sign of good code that it's hard to tell).

Determining the length of an array for memory efficiency

Write a function in C language that:
Takes as its only parameter a sentence stored in a string (e.g., "This is a short sentence.").
Returns a string consisting of the number of characters in each word (including punctuation), with spaces separating the numbers. (e.g., "4 2 1 5 9").
I wrote the following program:
int main()
{
char* output;
char *input = "My name is Pranay Godha";
output = numChar(input);
printf("output : %s",output);
getch();
return 0;
}
char* numChar(char* str)
{
int len = strlen(str);
char* output = (char*)malloc(sizeof(char)*len);
char* out = output;
int count = 0;
while(*str != '\0')
{
if(*str != ' ' )
{
count++;
}
else
{
*output = count+'0';
output++;
*output = ' ';
output++;
count = 0;
}
str++;
}
*output = count+'0';
output++;
*output = '\0';
return out;
}
I was just wondering that I am allocating len amount of memory for output string which I feel is more than I should have allocated hence there is some wasting of memory. Can you please tell me what can I do to make it more memory efficient?
I see lots of little bugs. If I were your instructor, I'd grade your solution at "C-". Here's some hints on how to turn it into "A+".
char* output = (char*)malloc(sizeof(char)*len);
Two main issues with the above line. For starters, you are forgetting to "free" the memory you allocate. But that's easily forgiven.
Actual real bug. If your string was only 1 character long (e.g. "x"), you would only allocate one byte. But you would likely need to copy two bytes into the string buffer. a '1' followed by a null terminating '\0'. The last byte gets copied into invalid memory. :(
Another bug:
*output = count+'0';
What happens when "count" is larger than 9? If "count" was 10, then *output gets assigned a colon, not "10".
Start by writing a function that just counts the number of words in a string. Assign the result of this function to a variable call num_of_words.
Since you could very well have words longer than 9 characters, so some words will have two or more digits for output. And you need to account for the "space" between each number. And don't forget the trailing "null" byte.
If you think about the case in which a 1-byte unsigned integer can have at most 3 chars in a string representation ('0'..'255') not including the null char or negative numbers, then sizeof(int)*3 is a reasonable estimate of the maximum string length for an integer representation (not including a null char). As such, the amount of memory you need to alloc is:
num_of_words = countWords(str);
num_of_spaces = (num_of_words > 0) ? (num_of_words - 1) : 0;
output = malloc(num_of_spaces + sizeof(int)*3*num_of_words + 1); // +1 for null char
So that's a pretty decent memory allocation estimate, but it will definitely allocate enough memory in all scenarios.
I think you have a few other bugs in your program. For starters, if there are multiple spaces between each word e.g.
"my gosh"
I would expect your program to print "2 4". But your code prints something else. Likely other bugs exist if there are leading or trailing spaces in your string. And the memory allocation estimate doesn't account for the extra garbage chars you are inserting in those cases.
Update:
Given that you have persevered and attempted to make a better solution in your answer below, I'm going to give you a hint. I have written a function that PRINTs the length of all words in a string. It doesn't actually allocate a string. It just prints it - as if someone had called "printf" on the string that your function is to return. Your job is to extrapolate how this function works - and then modify it to return a new string (that contains the integer lengths of all the words) instead of just having it print. I would suggest you modify the main loop in this function to keep a running total of the word count. Then allocate a buffer of size = (word_count * 4 *sizeof(int) + 1). Then loop through the input string again to append the length of each word into the buffer you allocated. Good luck.
void PrintLengthOfWordsInString(const char* str)
{
if ((str == NULL) || (*str == '\0'))
{
return;
}
while (*str)
{
int count = 0;
// consume leading white space
while ((*str) && (*str == ' '))
{
str++;
}
// count the number of consecutive non-space chars
while ((*str) && (*str != ' '))
{
count++;
str++;
}
if (count > 0)
{
printf("%d ", count);
}
}
printf("\n");
}
The answer is: it depends. There are trade-offs.
Yes, it's possible to write some extra code that, before performing this action, counts the number of words in the original string and then allocates the new string based on the number of words rather than the number of characters.
But is it worth it? The extra code would make your program longer. That is, you would have more binary code, taking up more memory, which may be more than you gain. In addition, it will take more time to run.
By the way, you have a memory leak in your program, which is more of a problem.
As long as none of the words in the sentence are longer than 9 characters, the length of your output array needs only to be the number of words in the sentence, multiplied by 2 (to account for the spaces), plus an extra one for the null terminator.
So for the string
My name is Pranay Godha
...you need only an array of length 11.
If any of the words are ten characters or more, you'll need to calculate how many extra char your array will need by determining the length of the numeric required. (e.g. a word of length 10 characters clearly requires two char to store the number 10.)
The real question is, is all of this worth it? Unless you're specifically required (homework?) to use the minimal space required in your output array, I'd be minded to allocate a suitably large array and perform some bounds checking when writing to it.

Resources