Writing a function to split a string - c

I'm trying to write a function to split a string (not use strtok) to learn how it works. I've come up with the following so far:
char ** split_string(char * string, char sep) {
// Allow single separators only for now
// get length of the split string array
int array_length = 0;
char c;
for (int i=0; (c=string[i]) != 0; i++)
if (c == sep) array_length ++;
// allocate the array
char * array[array_length + 1];
array[array_length] = '\0';
// add the strings to the array
for (int i=0, word=0; (c=string[i]) != 0;) {
if (c == sep) {
i=0;
word ++;
} else {
array[i][word] = c;
i++;
}
}
return array;
}
This is my first time working with a pointer to a pointer (a list of strings), so I'm a bit unclear how to do this, as you can probably tell from the above function.
How would this be properly done? Specifically, is the return type correct? How would you add the \0 to the end of the array?

the one mistake you are making is not allocating space for the words to be copied. You must explicitly allocate space for the words in the destination array before copying. Following program achieves what's intended. To know the number of words, declare array_length to be a global variable, so that you can use that in the function where split_string was called.
int array_length=0;
char** split_string(char* str, char sep){
for(int i = 0;str[i] != '\0';++i){
if(str[i] == sep) ++array_length;
char** str_arr = (char**)malloc(sizeof(char*) * (array_length+1));
for(int i=0, j, k = 0; str[i] != '\0'; ++k){ // k is used to index the destination array for the extracted word
for(j = i; str[j] != sep && str[j] != '\0'; ++j); // from the current character, find the position of the next separator
str_arr[k] = (char*)malloc((j-i+2)*sizeof(char)); // Allocate as many chars in the heap and make str_arr[k] pointer point to it
strncpy(str_arr[k], str+i, j-i); // copy the word to the allocated space
i=j+1; // move the array iterator to the next non-sep character
}
return str_arr;
}
If you don't want to use malloc explicitly, you can also use library function strndup which takes the pointer to the start character of the source string and number of characters to be copied as input and does the memory allocation, copies the word and returns the pointer to the allocated space. so two of the lines in the function
str_arr[k] = (char*)malloc((j-i+2)*sizeof(char)); // Allocate as many chars in the heap and make str_arr[k] pointer point to it
strncpy(str_arr[k], str+i, j-i);
can be replaced by a single line as-
str_arr[k] = strndup(str+i, j-i);
But I would recommend using the first method as a beginner for better understanding and debugging.
Note: The above program works only for single delimiter between words, if there is an occurrence of multiple consecutive delimiters between words, you will have to tweak the program a bit in order to get it working.

Related

Optimizing a do while loop/for loop in C

I was doing an exercise from LeetCode in which consisted in deleting any adjacent elements from a string, until there are only unique characters adjacent to each other. With some help I could make a code that can solve most testcases, but the string length can be up to 10^5, and in a testcase it exceeds the time limit, so I'm in need in some tips on how can I optimize it.
My code:
char res[100000]; //up to 10^5
char * removeDuplicates(char * s){
//int that verifies if any char from the string can be deleted
int ver = 0;
//do while loop that reiterates to eliminate the duplicates
do {
int lenght = strlen(s);
int j = 0;
ver = 0;
//for loop that if there are duplicates adds one to ver and deletes the duplicate
for (int i = 0; i < lenght ; i++){
if (s[i] == s[i + 1]){
i++;
j--;
ver++;
}
else {
res[j] = s[i];
}
j++;
}
//copying the res string into the s to redo the loop if necessary
strcpy(s,res);
//clar the res string
memset(res, '\0', sizeof res);
} while (ver > 0);
return s;
}
The code can't pass a speed test that has a string that has around the limit (10^5) length, I won't put it here because it's a really big text, but if you want to check it, it is the 104 testcase from the LeetCode Daily Problem
If it was me doing something like that, I would basically do it like a simple naive string copy, but keep track of the last character copied and if the next character to copy is the same as the last then skip it.
Perhaps something like this:
char result[1000]; // Assumes no input string will be longer than this
unsigned source_index; // Index into the source string
unsigned dest_index; // Index into the destination (result) string
// Always copy the first character
result[0] = source_string[0];
// Start with 1 for source index, since we already copies the first character
for (source_index = 1, dest_index = 0; source_string[source_index] != '\0'; ++source_index)
{
if (source_string[source_index] != result[dest_index])
{
// Next character is not equal to last character copied
// That means we can copy this character
result[++dest_index] = source_string[source_index];
}
// Else: Current source character was equal to last copied character
}
// Terminate the destination string
result[dest_index + 1] = '\0';

Right algorithm but wrong implementation?

I am a beginner in C with some experience in python and java. I want to solve a problem with C. The problem goes like this:
Take an input as a sentence with words separated by blank spaces only (assume lower case only), re-write the sentence with the following rules:
1) If a word occurs the first time, keep it the same.
2) If the word occurs twice, replace the second occurrence with the word being copied twice (e.g. two --> twotwo).
3) If the word occurs three times or more, delete all the occurrences after the second one.
Print the output as a sentence. The maximum lengths of the input sentence and each individual word is 500 chars and 50 chars.
Exemplary input: jingle bells jingle bells jingle all the way
Exemplary output: jingle bells jinglejingle bellsbells all the way
The approach I take is:
1) Read the input, separate each word and put them into an array of char pointers.
2) Use nested for loop to go through the array. For each word after the first word:
A - If there is no word before it that is equal to it, nothing happens.
B - If there is already one word before it that is equal to it, change the word as its "doubled form".
C - If there is already a "doubled form" of itself that exists before it, delete the word (set the element to NULL.
3) Print the modified array.
I am fairly confident about the correctness of this approach. However, when I actually write the code:
'''
int main()
{
char input[500];
char *output[500];
// Gets the input
printf("Enter a string: ");
gets(input);
// Gets the first token, put it in the array
char *token = strtok(input, " ");
output[0] = token;
// Keeps getting tokens and filling the array, untill no blank space is found
int i = 1;
while (token != NULL) {
token = strtok(NULL, " ");
output[i] = token;
i++;
}
// Processes the array, starting from the second element
int j, k;
char *doubled;
for (j = 1; j < 500; j++) {
strcpy(doubled, output[j]);
strcat(doubled, doubled); // Create the "doubled form"
for (k = 0; k < j; k++) {
if (strcmp(output[k], output[j]) == 0) { // Situation B
output[j] = doubled;
}
if (strcmp(output[k], doubled) == 0) { // Situation C
output[j] = ' ';
}
}
}
// Convert the array to a string
char *result = output[0]; // Initialize a string with the first element in the array
int l;
char *blank_space = " "; // The blank spaces that need to be addded into the sentence
for (l = 1; l < 500; l++) {
if (output[l] != '\0'){ // If there is a word that exists at the given index, add it
strcat(result, blank_space);
strcat(result, output[l]);
}
else { // If reaches the end of the sentence
break;
}
}
// Prints out the result string
printf("%s", result);
return 0;
}
'''
I did a bunch of tests on each individual block. There are several issues:
1) When processing the array, strcmp, strcat, and strcpy in the loop seem to give Segmentation fault error reports.
2) When printing the array, the words did not show the order that they are supposed to do.
I am now frustrated because it seems that the issues are all coming from some internal structural defects of my code and they are very much related to the memory mechanism of C which I am not really too familiar with. How should I fix this?
One issues jumps out at me. This code is wrong:
char *doubled;
for (j = 1; j < 500; j++) {
strcpy(doubled, output[j]);
strcat(doubled, doubled); // Create the "doubled form"
doubled doesn't point to any actual memory. So trying to copy data to where it points is undefined behavior and will almost certainly cause a SIGSEGV - and it will corrupt memory if it doesn't cause a SIGSEGV.
That needs to be fixed - you can't copy a string with strcpy() or strcat() to an pointer that doesn't point to actual memory.
This would be better, but still not ideal as no checking is done to ensure there's no buffer overflow:
char doubled[ 2000 ];
for (j = 1; j < 500; j++) {
strcpy(doubled, output[j]);
strcat(doubled, doubled); // Create the "doubled form"
This is also a problem with doubled defined like that:
if (strcmp(output[k], output[j]) == 0) { // Situation B
output[j] = doubled;
}
That just points output[j] at doubled. The next loop iteration will overwrite doubled, and the data that output[j] still points to to will get changed.
This would address that problem:
if (strcmp(output[k], output[j]) == 0) { // Situation B
output[j] = strdup( doubled );
}
strdup() is a POSIX function that, unsurprising, duplicates a string. That string will need to be free()'d later, though, as strdup() is the same as:
char *strdup( const char *input )
{
char *duplicate = malloc( 1 + strlen( input ) );
strcpy( duplicate, input );
return( duplicate );
}
As pointed out, strcat(doubled, doubled); is also a problem. One possible solution:
memmove(doubled + strlen( doubled ), doubled, 1 + strlen( doubled ) );
That copies the contents of the doubled string to the memory starting at the original '\0' terminator. Note that since the original '\0' terminator is part of the string, you can't use strcpy( doubled + strlen( doubled ), doubled );. Nor can you use memcpy(), for the same reason.
Your code invokes undefined behavior in several places by writing to memory that is not yet owned, and is likely the cause of the segmentation fault you are seeing. For example, in this segment:
char *result = output[0]; // Initialize a string with the first element in the array
int l;
char *blank_space = " "; // The blank spaces that need to be addded into the sentence
for (l = 1; l < 500; l++) {
if (output[l] != '\0'){ // If there is a word that exists at the given index, add it
strcat(result, blank_space);
strcat(result, output[l]);
By itself...
char *result = output[0]; //creates a pointer, but provides no memory.
...is not sufficient to receive content such as
strcat(result, blank_space);
strcat(result, output[l]);
It needs memory:
char *result = malloc(501);//or more space if needed.
if(result)
{
//now use strcat or strcpy to add content

Design a Character Searching Function, While Forced to Use strchr

Background Information
I was recently approached by a friend who was given a homework problem to develop a searching algorithm. Before anyone asks, I did think of a solution! However, my solution is not what the teacher is asking for...
Anyway, this is an introductory C programming course where the students have been asked to write a search function called ch_search that is supposed to search an array of characters to determine how many times a specific character occurs. The constraints are what I don't understand...
Constraints:
The arguments are: array to search, character to search for, and length of the array being searched.
The function must use a for-loop.
The algorithm must use the strchr function.
Okay, so the first two constraints I can understand... but the 3rd constraint is what really gets me... I was initially thinking that we could just use a for-loop to iterate through the string from the beginning to the end, simply counting each instance of the character. When the student originally described the problem to me, I came up with (although incorrect) the solution:
Proposed Solution
int ch_search(char array_to_search[], char char_to_search_for, int array_size)
{
int count = 0;
for (int i = 0; i < array_size; i++)
{
// count each character instance
if (array_to_search[i] == char_to_search_for)
{
// keep incrementing the count
count++;
}
}
return count;
}
Then I was told that I had to specifically use the character position function (and apparently it has to be strchr and not strrchr so we can't start at the end I guess?)... I just don't see how that wouldn't be overcomplicating this. I don't see how that would help at all, especially counting from the beginning... Even strrchr might make a little more sense to me. Thoughts?
It's true that having the length of the array and having to use a for loop,
the most natural thing to do would be to iterate over every characters of the
source array. But you can also loop over the result of strchr like this:
int ch_search(char haystack[], char needle, int size)
{
int count = 0;
char *found;
for(; (found = strchr(haystack, needle)) != NULL; haystack = found + 1)
count++;
return count;
}
In this case you don't need the size of the array but the assignment doesn't say
that you have to use it. Obviously this solution requires the source to be '\0'-terminated.
I think the teacher wanted you to use strchr to navigate to the next occurrence of the char_to_search_for within a string:
int ch_search(char array_to_search[], char char_to_search_for, int array_size) {
int count = 0;
for (char *ptr = array_to_search ; ptr != &array_to_search[array_size] ; ptr++) {
ptr = strchr(ptr, char_to_search_for);
if (!ptr) {
break; // Character is not found
}
count++;
}
return count;
}
Note that array_to_search must be null-terminated in order to be used together with strchr solution above.
This sounds like your friend was given a trick question. The function gets an array of chars and the length of that array but is required to use strchr() even though that function only works on '\0' terminated strings (and there was not given any guaranty that the array is '\0' terminated).
You might thing that it would be fine to use strchr() on the array anyway and then compare the returned pointer to the given length of the array to check if it went past the end of the array. But there are two problems with that:
If strchr() searches past the end of the array, then you already have Undefined Behavior before getting to the check. The program might have crashed before returning from strchr(), the returned pointer might be some total garbage or you might get a pointer to an address a bit further in memory than the end of the array.
Even if the returned pointer is just to an address a bit further in memory than the end of the array, then there is the problem that comparing two pointers (or subtracting them to find the distance between the pointed addresses) is Undefined Behavior unless they're both pointing to parts of the same memory object (or one position past the end of the object). In this instance it means that checking if the returned pointer is within the bounds of the array is only defined behavior if the returned pointer is within the bounds of the array (or one past the end) making the check a bit useless.
The only solution to that is to make sure that strchr() is working with a '\0' terminated string. For example:
int ch_search(char array_to_search[], char char_to_search_for, int array_size)
{
char *buffer = malloc(array_size + 1);
// Add test here to check if malloc was succesful
strncpy(buffer, array_to_search, array_size);
buffer[array_size] = '\0';
int count = 0;
for (char *i = buffer; (i = strchr(i, char_to_search_for)) != NULL; i++) {
count++;
}
free(buffer);
return count;
}
strchr is a very convenient function to search for a char in a string.
Find and read more about strchr. This is my favorite function ever!
The C library function char *strchr(const char *str, int c) searches for the first occurrence of the character c (an unsigned char) in the string pointed to by the argument str.
Declaration
Following is the declaration for strchr() function.
char *strchr(const char *str, int c)
Parameters
str − This is the C string to be scanned.
c − This is the character to be searched in str.
Return value
Function returns a pointer to the first occurrence of the character c in the string str, or NULL if the character is not found.
Constraints:
1) The arguments are: array to search, character to search for, and
length of the array being searched.
This constrain gives the length of the array to be searched. The given array has to contain '\0' at some point. However the length of search search can be shorter and specified by the search_length.
Following compact solution takes this under account.
int ch_search(char array_to_search[], char char_to_search_for, int search_length)
{
int count = 0;
for(char *p = array_to_search; ;p++)
{
p = strchr(p, char_to_search_for);
if( p != NULL && (p - array_to_search < search_length) )
count++;
else
break;
}
return count;
}
Or equivalent ch_search2:
#include<stdio.h>
#include<string.h>
int ch_search(char array_to_search[], char char_to_search_for, int search_length)
{
int count = 0;
for(char *p = array_to_search; ;p++)
{
p = strchr(p, char_to_search_for);
if( p != NULL && (p - array_to_search < search_length) )
count++;
else
break;
}
return count;
}
// Your original function:
int ch_search1(char array_to_search[], char char_to_search_for, int array_size)
{
int count = 0;
for (int i = 0; i < array_size; i++){
// count each character instance
if (array_to_search[i] == char_to_search_for){
count++; // keep incrementing the count
}
}
return count;
}
int ch_search2(char array_to_search[], char char_to_search_for, int array_size)
{
int count = 0;
char *p = array_to_search;
for(;;)
{
p = strchr(p, char_to_search_for);
if( p != NULL )
{
if (p - array_to_search >= array_size) // we reached beyond
{
break;
}
else
{
count++;
p++;
}
}
else
break; // char not found
}
return count;
}
int main(void)
{
// the arr has to contain '\0' terminator but we can search within the specified length.
char arr[]={'1','1','2','2','1','1','3','3','3','1','4','4', '1','1','!','1','\0','1'};
char arr1[] = "zdxbab";
printf("count %d count %d \n",ch_search(arr , '1', 12),ch_search2(arr , '1', 12));
printf("count %d count %d \n",ch_search(arr1,'b',strlen(arr1)),ch_search2(arr1,'b',strlen(arr1)));
return 0;
}
Output:
count 5 count 5
count 2 count 2

First Not Repeating Character Code

Here is the question:
Write a solution that only iterates over the string once and uses O(1) additional memory, since this is what you would be asked to do during a real interview.
Given a string s, find and return the first instance of a non-repeating character in it. If there is no such character, return '_'.
And here is my code:
char firstNotRepeatingCharacter(char * s) {
int count;
for (int i=0;i<strlen(s);i++){
count=0;
char temp=s[i];
s[i]="_";
char *find= strchr(s,temp);
s[i]=temp;
if (find!=NULL) count++;
else return s[i];
}
if (count!=0) return '_';
}
I dont know what's wrong but when given an input:
s: "abcdefghijklmnopqrstuvwxyziflskecznslkjfabe"
the output is for my code is "g" instead of "d".
I thought the code should have escaped the loop and return "d" soon as "d" was found.
Thx in advance!!!
In your program, problem is in this statement-
s[i]="_";
You are assigning a string to a character type variable s[i]. Change it to -
s[i]='_';
At the bottom of your firstNotRepeatingCharacter() function, the return statement is under the if condition and compiler must be giving a warning for this as the function is supposed to return a char. Moreover, count variable is not needed. You could do something like:
char firstNotRepeatingCharacter(char * s) {
for (int i=0;i<strlen(s);i++){
char temp=s[i];
s[i]='_';
char *find= strchr(s,temp);
s[i]=temp;
if (find==NULL)
return s[i];
}
return '_';
}
But this code is using strchr inside the loop which iterates over the string so, this is not the exact solution of your problem as you have a condition that - the program should iterates over the string once only. You need to reconsider the solution for the problem.
May you use recursion to achieve your goal, something like - iterate the string using recursion and, somehow, identify the repetitive characters and while the stack winding up identify the first instance of a non-repeating character in the string. It's implementation -
#include <stdio.h>
int ascii_arr[256] = {0};
char firstNotRepeatingCharacter(char * s) {
char result = '-';
if (*s == '\0')
return result;
ascii_arr[*s] += 1;
result = firstNotRepeatingCharacter(s+1);
if (ascii_arr[*s] == 1)
result = *s;
return result;
}
int main()
{
char a[] = "abcdefghijklmnopqrstuvwxyziflskecznslkjfabe";
printf ("First non repeating character: %c\n", firstNotRepeatingCharacter(a));
return 0;
}
In the above code, firstNotRepeatingCharacter() function iterates over the string only once using recursion and during winding up of the stack it identifies the first non-repetitive character. I am using a global int array ascii_arr of length 256 to keep the track of non-repetitive character.
Java Solution:
Time Complexity: O(n)
Space Complexity: with constant space as it will only use more 26 elements array to maintain count of chars in the input
Using Java inbuilt utilities : but for inbuilt utilities time complexity is more than O(n)
char solution(String s) {
char[] c = s.toCharArray();
for (int i = 0; i < s.length(); i++) {
if (s.indexOf(c[i]) == s.lastIndexOf(c[i]))
return c[i];
}
return '_';
}
Using simple arrays. O(n)
char solution(String s) {
// maintain count of the chars in a constant space
int[] base = new int[26];
// convert string to char array
char[] input = s.toCharArray();
// linear loop to get count of all
for(int i=0; i< input.length; i++){
int index = input[i] - 'a';
base[index]++;
}
// just find first element in the input that is not repeated.
for(int j=0; j<input.length; j++){
int inputIndex = input[j]-'a';
if(base[inputIndex]==1){
System.out.println(j);
return input[j];
}
}
return '_';
}

Returning the length of a char array in C

I am new to programming in C and am trying to write a simple function that will normalize a char array. At the end i want to return the length of the new char array. I am coming from java so I apologize if I'm making mistakes that seem simple. I have the following code:
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c functions to analyze it */
int i;
if(isspace(buf[0])){
buf[0] = "";
}
if(isspace(buf[len-1])){
buf[len-1] = "";
}
for(i = 0;i < len;i++){
if(isupper(buf[i])) {
buf[i]=tolower(buf[i]);
}
if(isspace(buf[i])) {
buf[i]=" ";
}
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
}
return strlen(*buf);
}
How can I return the length of the char array at the end? Also does my procedure properly do what I want it to?
EDIT: I have made some corrections to my program based on the comments. Is it correct now?
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c funstions to analyze it */
int i = 0;
int j = 0;
if(isspace(buf[0])){
//buf[0] = "";
i++;
}
if(isspace(buf[len-1])){
//buf[len-1] = "";
i++;
}
for(i;i < len;i++){
if(isupper(buf[i])) {
buf[j]=tolower(buf[i]);
j++;
}
if(isspace(buf[i])) {
buf[j]=' ';
j++;
}
if(isspace(buf[i]) && isspace(buf[i+1])){
//buf[i]="";
i++;
}
}
return strlen(buf);
}
The canonical way of doing something like this is to use two indices, one for reading, and one for writing. Like this:
int normalizeString(char* buf, int len) {
int readPosition, writePosition;
bool hadWhitespace = false;
for(readPosition = writePosition = 0; readPosition < len; readPosition++) {
if(isspace(buf[readPosition]) {
if(!hadWhitespace) buf[writePosition++] = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
return writePosition;
}
Warning: This handles the string according to the given length only. While using a buffer + length has the advantage of being able to handle any data, this is not the way C strings work. C-strings are terminated by a null byte at their end, and it is your job to ensure that the null byte is at the right position. The code you gave does not handle the null byte, nor does the buffer + length version I gave above. A correct C implementation of such a normalization function would look like this:
int normalizeString(char* string) { //No length is passed, it is implicit in the null byte.
char* in = string, *out = string;
bool hadWhitespace = false;
for(; *in; in++) { //loop until the zero byte is encountered
if(isspace(*in) {
if(!hadWhitespace) *out++ = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
*out = 0; //add a new zero byte
return out - string; //use pointer arithmetic to retrieve the new length
}
In this code I replaced the indices by pointers simply because it was convenient to do so. This is simply a matter of style preference, I could have written the same thing with explicit indices. (And my style preference is not for pointer iterations, but for concise code.)
if(isspace(buf[i])) {
buf[i]=" ";
}
This should be buf[i] = ' ', not buf[i] = " ". You can't assign a string to a character.
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
This has two problems. One is that you're not checking whether i < len - 1, so buf[i + 1] could be off the end of the string. The other is that buf[i] = "" won't do what you want at all. To remove a character from a string, you need to use memmove to move the remaining contents of the string to the left.
return strlen(*buf);
This would be return strlen(buf). *buf is a character, not a string.
The notations like:
buf[i]=" ";
buf[i]="";
do not do what you think/expect. You will probably need to create two indexes to step through the array — one for the current read position and one for the current write position, initially both zero. When you want to delete a character, you don't increment the write position.
Warning: untested code.
int i, j;
for (i = 0, j = 0; i < len; i++)
{
if (isupper(buf[i]))
buf[j++] = tolower(buf[i]);
else if (isspace(buf[i])
{
buf[j++] = ' ';
while (i+1 < len && isspace(buf[i+1]))
i++;
}
else
buf[j++] = buf[i];
}
buf[j] = '\0'; // Null terminate
You replace the arbitrary white space with a plain space using:
buf[i] = ' ';
You return:
return strlen(buf);
or, with the code above:
return j;
Several mistakes in your code:
You cannot assign buf[i] with a string, such as "" or " ", because the type of buf[i] is char and the type of a string is char*.
You are reading from buf and writing into buf using index i. This poses a problem, as you want to eliminate consecutive white-spaces. So you should use one index for reading and another index for writing.
In C/C++, a native string is an array of characters that ends with 0. So in essence, you can simply iterate buf until you read 0 (you don't need to use the len variable at all). In addition, since you are "truncating" the input string, you should set the new last character to 0.
Here is one optional solution for the problem at hand:
int normalize(char* buf)
{
char c;
int i = 0;
int j = 0;
while (buf[i] != 0)
{
c = buf[i++];
if (isspace(c))
{
j++;
while (isspace(c))
c = buf[i++];
}
if (isupper(c))
buf[j] = tolower(c);
j++;
}
buf[j] = 0;
return j;
}
you should write:
return strlen(buf)
instead of:
return strlen(*buf)
The reason:
buf is of type char* - it's an address of a char somewhere in the memory (the one in the beginning of the string). The string is null terminated (or at least should be), and therefore the function strlen knows when to stop counting chars.
*buf will de-reference the pointer, resulting on a char - not what strlen expects.
Not much different then others but assumes this is an array of unsigned char and not a C string.
tolower() does not itself need the isupper() test.
int normalize(unsigned char *buf, int len) {
int i = 0;
int j = 0;
int previous_is_space = 0;
while (i < len) {
if (isspace(buf[i])) {
if (!previous_is_space) {
buf[j++] = ' ';
}
previous_is_space = 1;
} else {
buf[j++] = tolower(buf[i]);
previous_is_space = 0;
}
i++;
}
return j;
}
#OP:
Per the posted code it implies leading and trailing spaces should either be shrunk to 1 char or eliminate all leading and trailing spaces.
The above answer simple shrinks leading and trailing spaces to 1 ' '.
To eliminate trailing and leading spaces:
int i = 0;
int j = 0;
while (len > 0 && isspace(buf[len-1])) len--;
while (i < len && isspace(buf[i])) i++;
int previous_is_space = 0;
while (i < len) { ...

Resources