As part of a protocol I'm receiving C string of the following format:
WORD * WORD
Where both WORDs are the same given string.
And, * - is any string of printable characters, NOT including spaces!
So the following are all legal:
WORD asjdfnkn WORD
WORD 234kjk2nd32jk WORD
And the following are illegal:
WORD akldmWORD
WORD asdm zz WORD
NOTWORD admkas WORD
NOTWORD admkas NOTWORD
Where (1) is missing a trailing space; (2) has 3 or more spaces; (3)/(4) do not open/end with the correct string (WORD).
Of-course this could be implemented pretty straight-forward, however I'm not sure what I'm doing is the most efficient.
Note: WORD is pre-set for a whole run, however could change from run to run.
Currently I'm strncmping each string against "WORD ".
If that checks manually (char-by-char) run over the string, to check for the second space char.
[If found] I then strcmp (all the way) with "WORD".
Would love to hear your solution, with an emphasis on efficiency as I'll be running over millions of theses in real-time.
I'd say, have a look at the algorithms in Handbook of Exact String-Matching Algorithms, compare the complexities and choose the one that you like best, implement it.
Or you can use some ready-made implementations.
You have some really classical algorithms for searching strings inside another string here:
KMP(Knuth-Morris-Pratt)
Rabin-Karp
Boyer-Moore
Hope this helps :)
Have you profiled?
There's not much gain to be had here, since you're doing basic string comparisons. If you want to go for the last few percent of performance, I'd change out the str... functions for mem... functions.
char *bufp, *bufe; // pointer to buffer, one past end of buffer
if (bufe - bufp < wordlen * 2 + 2)
error();
if (memcmp(bufp, word, wordlen) || bufp[wordlen] != ' ')
error();
bufp += wordlen + 1;
char *datap = bufp;
char *datae = memchr(bufp, ' ', bufe - buf);
if (!datae || bufe - datae < wordlen + 1)
error();
if (memcmp(datae + 1, word, wordlen))
error();
// Your data is in the range [datap, datae).
The performance gains are likely less than spectacular. You have to examine each character in the buffer since each character could be a space, and any character in the delimiters could be wrong. Changing a loop to memchr is slick, but modern compilers know how to do that for you. Changing a strncmp or strcmp to memcmp is also probably going to be negligible.
There is probably a tradeoff to be made between the shortest code and the fastest implementation. Choices are:
The regular expression ^WORD \S+ WORD$ (requires a regex engine)
strchr on "WORD " and a strrchr on " WORD" with a lot of messy checks (not really recommended)
Walking the whole string character by character, keeping track of the state you are in (scanning first word, scanning first space, scanning middle, scanning last space, scanning last word, expecting end of string).
Option 1 requires the least code but backtracks near the end, and Option 2 has no redeeming qualities. I think you can do option 3 elegantly. Use a state variable and it will look okay. Remember to manually enter the last two states based on the length of your word and the length of your overall string and this will avoid the backtracking that a regex will most likely have.
Do you know how long the string that is to be checked is? If not, your are somewhat limited in what you can do. If you do know how long the string is, you can speed things up a bit. You have not specified for sure that the '*' part has to be at least one character. You've also not stipulated whether tabs are allowed, or newlines, or ... is it only alphanumerics (as in your examples) or are punctuation and other characters allowed? Control characters?
You know how long WORD is, and can pre-construct both the start and end markers. The function error() reports an error (however you need it to be reported) and returns false. The test function might be bool string_is_ok(const char *string, int actstrlen);, returning true on success and false when there is a problem:
// Preset variables characterizing the search
static int wordlen = 4;
static int marklen = wordlen + 1;
static int minstrlen = 2 * marklen + 1; // Two blanks and one other character.
static char bword[] = "WORD "; // Start marker
static char eword[] = " WORD"; // End marker
static char verboten[] = " "; // Forbidden characters
bool string_is_ok(const char *string, int actstrlen)
{
if (actstrlen < minstrlen)
return error("string too short");
if (strncmp(string, bword, marklen) != 0)
return error("string does not start with WORD");
if (strcmp(string + actstrlen - marklen, eword) != 0)
return error("string does not finish with WORD");
if (strcspn(string + marklen, verboten) != actstrlen - 2 * marklen)
return error("string contains verboten characters");
return true;
}
You probably can't reduce the tests by much if you want your guarantees. The part that would change most depending on the restrictions in the alphabet is the strcspn() line. That is relatively fast for a small list of forbidden characters; it will likely be slower as the number of characters forbidden is increased. If you only allow alphanumerics, you have 62 OK and 193 not OK characters, unless you count some of the high-bit set characters as alphabetic too. That part will probably be slow. You might do better with a custom function that takes a start position and length and reports whether all characters are OK. This could be along the lines of:
#include <stdbool.h>
static bool ok_chars[256] = { false };
static void init_ok_chars(void)
{
const unsigned char *ok = "abcdefghijklmnopqrstuvwxyz...0123456789";
int c;
while ((c = *ok++) != 0)
ok_chars[c] = 1;
}
static bool all_chars_ok(const char *check, int numchars)
{
for (i = 0; i < numchars; i++)
if (ok_chars[check[i]] == 0)
return false;
return true;
}
You can then use:
return all_chars_ok(string + marklen, actstrlen - 2 * marklen);
in place of the call to strcspn().
If your "stuffing" should contain only '0'-'9', 'A'-'Z' and 'a'-'z' and are in some encoding based on ASCII (like most Unicode based encodings), then you can skip two comparisons in one of your loops, since only one bit differ between capital and minor characters.
Instead of
ch>='0' && ch<='9' && ch>='A' && ch<='Z' && ch>='a' && ch<='a'
you get
ch2 = ch & ~('a' ^ 'A')
ch>='0' && ch<='9' && ch2>='A' && ch2<='Z'
But you better look at the assembler code your compiler generate and do some benchmarking, depending on computer architecture and compiler, this trick could give slower code.
If branching is expensive compared to comparisons on your computer, you can also replace the && with &. But most modern compilers know this trick in most situations.
If, on the other hand, you test for any printable glyph from some large character encoding, then it is most likely less expensive to test for white-space glyphs, rather then printable glyph.
Also, compile specifically for the computer that the code will run on and don't forget turn of any generation of debugging-code.
Added:
Don't make subroutine calls within your scan loops, unless it is worth it.
Whatever trick you use to speed up your loops, it will diminish if you have to make a sub-routine call within one of them. It is fine to use built-in functions that your compiler inline into your code, but if you use something lika an external regex-library and your compiler is unable to inline those functions (gcc can do that, sometimes, if you ask it to), then making that subroutine call will shuffle a lot of memory around, in worse case between different types of memory (registers, CPU buffers, RAM, harddisk et.c.) and may mess up CPU predictions and pipelines. Unless your text-snippets are very long, so that you spend much time parsing each of them, and the subroutine is effective enough to compensate for the cost of the call, don't do that. Some functions for parsing use call-backs, it might be more effective then you making a lot of subroutine calls from your loops (since the function can scan several pattern-matches in one sweep and bunch several call-backs together outside the critical loop), but that depend on how someone else have written that function and basically it is the same thing as you making the call.
WORD is 4 characters, with uint32_t you could do a quick comparison. You will need a different constant depending on system endianness. The rest seems to be fine.
Since WORD can change you have to precalculate the uint32_t, uint64_t, ... you need depending on the length of the WORD.
Not sure from the description, but if you trust the source you could just chomp the first n+1 and last n+1 characters.
bool check_legal(
const char *start, const char *end,
const char *delim_start, const char *delim_end,
const char **content_start, const char **content_end
) {
const size_t delim_len = delim_end - delim_start;
const char *p = start;
if (start + delim_len + 1 + 0 + 1 + delim_len < end)
return false;
if (memcmp(p, delim_start, delim_len) != 0)
return false;
p += delim_len;
if (*p != ' ')
return false;
p++;
*content_start = p;
while (p < end - 1 - delim_len && *p != ' ')
p++;
if (p + 1 + delim_len != end)
return false;
*content_end = p;
p++;
if (memcmp(p, delim_start, delim_len) != 0)
return false;
return true;
}
And here is how to use it:
const char *line = "who is who";
const char *delim = "who";
const char *start, *end;
if (check_legal(line, line + strlen(line), delim, delim + strlen(delim), &start, &end)) {
printf("this %*s nice\n", (int) (end - start), start);
}
(It's all untested.)
using STL find the number of spaces..if they are not two obviously the string is wrong..and using find(algorithm.h) you can get the position of the two spaces and the middle word! Check for WORD at the beginning and the end! you are done..
This should return the true/false condition in O(n) time
int sameWord(char *str)
{
char *word1, *word2;
word1 = word2 = str;
// Word1, Word2 points to beginning of line where the first word is found
while (*word2 && *word2 != ' ') ++word2; // skip to first space
if (*word2 == ' ') ++word2; // skip space
// Word1 points to first word, word2 points to the middle-filler
while (*word2 && *word2 != ' ') ++word2; // skip to second space
if (*word2 == ' ') ++word2; // skip space
// Word1 points to first word, word2 points to the second word
// Now just compare that word1 and word2 point to identical strings.
while (*word1 != ' ' && *word2)
if (*word1++ != *word2++) return 0; //false
return *word1 == ' ' && (*word2 == 0 || *word2 == ' ');
}
Related
I want to write a function that converts CamelCase to snake_case without using tolower.
Example: helloWorld -> hello_world
This is what I have so far, but the output is wrong because I overwrite a character in the string here: string[i-1] = '_';.
I get hell_world. I don't know how to get it to work.
void snake_case(char *string)
{
int i = strlen(string);
while (i != 0)
{
if (string[i] >= 65 && string[i] <= 90)
{
string[i] = string[i] + 32;
string[i-1] = '_';
}
i--;
}
}
This conversion means, aside from converting a character from uppercase to lowercase, inserting a character into the string. This is one way to do it:
iterate from left to right,
if an uppercase character if found, use memmove to shift all characters from this position to the end the string one position to the right, and then assigning the current character the to-be-inserted value,
stop when the null-terminator (\0) has been reached, indicating the end of the string.
Iterating from right to left is also possible, but since the choice is arbitrary, going from left to right is more idiomatic.
A basic implementation may look like this:
#include <stdio.h>
#include <string.h>
void snake_case(char *string)
{
for ( ; *string != '\0'; ++string)
{
if (*string >= 65 && *string <= 90)
{
*string += 32;
memmove(string + 1U, string, strlen(string) + 1U);
*string = '_';
}
}
}
int main(void)
{
char string[64] = "helloWorldAbcDEFgHIj";
snake_case(string);
printf("%s\n", string);
}
Output: hello_world_abc_d_e_fg_h_ij
Note that:
The size of the string to move is the length of the string plus one, to also move the null-terminator (\0).
I am assuming the function isupper is off-limits as well.
The array needs to be large enough to hold the new string, otherwise memmove will perform invalid writes!
The latter is an issue that needs to be dealt with in a serious implementation. The general problem of "writing a result of unknown length" has several solutions. For this case, they may look like this:
First determine how long the resulting string will be, reallocating the array, and only then modifying the string. Requires two passes.
Every time an uppercase character is found, reallocate the string to its current size + 1. Requires only one pass, but frequent reallocations.
Same as 2, but whenever the array is too small, reallocate the array to twice its current size. Requires a single pass, and less frequent (but larger) reallocations. Finally reallocate the array to the length of the string it actually contains.
In this case, I consider option 1 to be the best. Doing two passes is an option if the string length is known, and the algorithm can be split into two distinct parts: find the new length, and modify the string. I can add it to the answer on request.
I'm working on a project in which I have two main files. Essentially, the program reads in a text file defining a dictionary with key-value mappings. Each key has a unique value and the file is formatted like this where each key-value pair is on its own line:
ipsum i%##!
fubar fubar
IpSum XXXXX24
Ipsum YYYYY211
Then the program reads in input from stdin, and if any of the "words" match the keys in the dictionary file, they get replaced with the value. There is a slight thing about upper and lower cases -- this is the order of "match priority"
The exact word is in the replacement set
The word with all but the first character converted to lower case is in the replacement set
The word converted completely to lower case is in the replacement set
Meaning if the exact word is in the dictionary, it gets replaced, but if not the next possibility (2) is checked and so on...
My program passes the basic cases we were provided but then the terminal shows
that the output vs reference binary files differ.
I went into both files (not c files, but binary files), and one was super long with tons of numbers and the other just had a line of random characters. So that didn't really help. I also reviewed my code and made some small tests but it seems okay? A friend recommended I make sure I'm accounting for the null operator in processInput() and I already was (or at least I think so, correct me if I'm wrong). I also converted getchar() to an int to properly check for EOF, and allocated extra space for the char array. I also tried vimdiff and got more confused. I would love some help debugging this, please! I've been at it all day and I'm very confused.
There are multiple issues in the processInput() function:
the loop should not stop when the byte read is 0, you should process the full input with:
while ((ch = getchar()) != EOF)
the test for EOF should actually be done differently so the last word of the file gets a chance to be handled if it occurs exactly at the end of the file.
the cast in isalnum((char)ch) is incorrect: you should pass ch directly to isalnum. Casting as char is actually counterproductive because it will turn byte values beyond CHAR_MAX to negative values for which isalnum() has undefined behavior.
the test if(ind >= cap) is too loose: if word contains cap characters, setting the null terminator at word[ind] will write beyond the end of the array. Change the test to if (cap - ind < 2) to allow for a byte and a null terminator at all times.
you should check that there is at least one character in the word to avoid calling checkData() with an empty string.
char key[ind + 1]; is useless: you can just pass word to checkData().
checkData(key, ind) is incorrect: you should pass the size of the buffer for the case conversions, which is at least ind + 1 to allow for the null terminator.
the cast in putchar((char)ch); is useless and confusing.
There are some small issues in the rest of the code, but none that should cause a problem.
Start by testing your tokeniser with:
$ ./a.out <badhash2.c >zooi
$ diff badhash2.c zooi
$
Does it work for binary files, too?:
$ ./a.out <./a.out > zooibin
$ diff ./a.out zooibin
$
Yes, it does!
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
void processInput(void);
int main(int argc, char **argv) {
processInput();
return 0;
}
void processInput() {
int ch;
char *word;
int len = 0;
int cap = 60;
word = malloc(cap);
while(1) {
ch = getchar(); // (1)
if( ch != EOF && isalnum(ch)) { // (2)
if(len+1 >= cap) { // (3)
cap += cap/2;
word = realloc(word, cap);
}
word[len++] = ch;
} else {
if (len) { // (4)
#if 0
char key[len + 1];
memcpy(key, word, len); key[len] = 0;
checkData(key, len);
#else
word[len] = 0;
fputs(word, stdout);
#endif
len = 0;
}
if (ch == EOF) break; // (5)
putchar(ch);
}
}
free(word);
}
I only repaired your tokeniser, leaving out the hash table and the search & replace stuff. It is now supposed to generate a verbatim copy of the input. (which is silly, but great for testing)
If you want to allow binary input, you cannot use while((ch = getchar()) ...) : a NUL in the input would cause the loop to end. You must pospone testing for EOF, because ther could still be a final word in your buffer ...&& ch != EOF)
treat EOF just like a space here: it could be the end of a word
you must reserve space for the NUL ('\0') , too.
if (len==0) there would be no word, so no need to look it up.
we treated EOF just like a space, but we don't want to write it to the output. Time to break out of the loop.
So I am working away on the 'less comfortable' version of the initials problem in CS50, and after beginning with very verbose code I've managed to whittle it down to this:
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>
int c = 0;
int main(void)
{
string name = get_string();
int n = strlen(name);
char initials[10];
// first letter is always going to be the first initial
initials[0] = name[0];
// count through letters looking for spaces + add the first letter after a
// space to the initials array
for (int j = 0; j < n; j++)
{
if (name[j] == 32)
{
c += 1;
initials[c] += name[j+1];
}
}
// print out initials
for (int k = 0; k <= c; k++)
{
printf("%c", toupper(initials[k]));
}
printf("\n");
}
As it stands like that it passes, but I feel like I am copping out a little cos I just pick [10] out of the air for the initial array size which I know isn't good practice. To make it a little 'better' I've tried to run a 'for' loop to iterate through the name string and add up the number of spaces. I then want to make the array [spaces + 1] as if there are 2 spaces then there will be 3 initials. The code I am trying for that is:
string name = get_string();
int n = strlen(name);
for (int i = 0; i < n; i++)
{
if (name[i] == 32)
{
spaces +=1;
}
}
The thought is that I then make 'char initials[spaces + 1]' on the next line, but even before I can do that, compiling my code with just this 'for' loop returns a fail when I upload it for checking (although it compiles no problem). Even if I don't use any of the 'for' loops output the mere fact it is there gives me this error.
Where am I going wrong?
Any help on this would be much appreciated.
Thanks!
First of all, keep in mind that execution speed is most often more valuable than memory use. If you first go look for spaces and after that allocate memory, you have to iterate through the array twice. This is an optimization of memory use at the cost of execution speed. So it might make more sense to just allocate a "large enough" array of lets say 100 characters and keep the code that you have.
I then want to make the array [spaces + 1] as if there are 2 spaces then there will be 3 initials
Keep in mind that C strings are null terminated, so you need to allocate room for the null terminator too, spaces + 1 + 1.
compiling my code with just this 'for' loop returns a fail when I upload it for checking (although it compiles no problem). Even if I don't use any of the 'for' loops output the mere fact it is there gives me this error.
What error? Does it compile or does it not compile, your text is contradicting.
Make sure you initialize spaces to zero.
As a side note, never use "magic numbers" in C code. if (name[i] == 32), 32 is gibberish to anyone who can't cite the ASCII table by memory. In addition, it is non-portable to systems with other symbol tables that might not have the same index numbers. Instead write:
if (name[i] == ' ')
In my opinion, a good approach to cater for such situations is the one the library function snprintf uses: It requires you to pass in the string to fill and the size of that string. In ensures that the string isn't overwritten and that the string is zero-terminated.
The function returns the length of the characters written to the string if the had the string been large enough. You can now do one of two things: Guess a reasonable buffer size and accept that the string will be cut short occasionally. Or call the function with a zero length, use the return value to allocate a char buffer and then fill it with a second call.
Applying this approach to your initials problem:
int initials(char *ini, int max, const char *str)
{
int prev = ' '; // pretend there's a space before the string
int n = 0; // actual number of initials
while (*str) {
if (prev == ' ' && *str != ' ') {
if (n + 1 < max) ini[n] = *str;
n++;
}
prev = *str++;
}
if (n < max) {
ini[n] = '\0';
} else if (max > 0) {
ini[max] = '\0';
}
return n;
}
You can then either use the fixed-size bufer approach:
char *name = "Theodore Quick Brown Fox";
char ini[4];
initials(ini, sizeof(ini), name);
puts(ini); // prints "TQB", "F" is truncated
Or the two-step dynamic-size approach:
char *name = "Theodore Quick Brown Fox";
int n;
n = initials(NULL, 0, name);
char ini[n + 1];
initials(ini, sizeof(ini), name);
puts(ini); // prints "TQBF"
(Note that this implementation of initals will ignore multiple spaces and spaces at the end or at the beginning of the string. Your look-one-ahead function will insert spaces in these cases.)
You know your initials array can't be any bigger than the name itself; at most, it can't be more than half as big (every other character is a space). So use that as your size. The easiest way to do that is to use a variable-length array:
size_t n = strlen( name ); // strlen returns a size_t type, not int
char initials[n/2+1]; // n/2+1 is not a *constant expression*, so this is
// a variable-length array.
memset( initials, 0, n + 1 ); // since initials is a VLA, we can't use an initializer
// in the declaration.
The only problem is that VLA support may be iffy - VLAs were introduced in C99, but made optional in C2011.
Alternately, you can use a dynamically-allocated buffer:
#include <stdlib.h>
...
size_t n = strlen( name );
char *initials = calloc( n/2+1, sizeof *initials ); // calloc initializes memory to 0
/**
* code to find and display initials
*/
free( initials ); // release memory before you exit your program.
Although, if all you have to do is display the initials, there's really no reason to store them - just print them as you find them.
Like others have suggested, use the character constant ' ' instead of the ASCII code 32 for comparing against a space:
if ( name[j] == ' ' )
or use the isspace library function (which will return true for spaces, tabs, newlines, etc.):
#include <ctype.h>
...
if ( isspace( name[j] ) )
I'm given a string as such:
"Hello World"
With 5 Spaces characters inbetween the two words. I want to remove all but one of the spaces between the two words . However, my code seems to only work when there is 3 spaces or less. I'm using memmove to try and accomplish this.
Here is what I've tried:
int main(void) {
char * word = malloc(sizeof(char)*16);
strcpy(word,"Hello World");
checkWords(word);
return 0;
}
void checkWords(char * word) {
int i;
for(i=0; i < strlen(word); i++) {
if(word[i] == ' ' )
memmove(&word[i],&word[i+1],strlen(word)+1);
}
printf("The string without spaces is %s\n",word);
}
The output here is "Hello World"
Not "Hello World"
If It try input such as:
"Hello World" gets me "Hello World" -->correct
"Hello World" gets me "Hello World" -->correct
anything greater than 3 spaces, gets me incorrect output. (I want to have one space between the two words.
There are four problems. One is undefined behaviour: Take a 1,000,000 byte string where just the last character is a space. You will be moving about one megabyte after the end of the string one byte forward. That's quite fatal.
One is just a bug: You examine every character position in the string only once. If you have two consecutive spaces, you move the second space into the position of the first one and leave it there.
And one is a performance issue: If you fix the first two problems, and then you take a 1,000,000 byte string consisting of spaces only, you will do one million memmoves each moving between 0 and 1 megabyte. That's 500 gigabyte moved. That takes time.
And another performance issue is calling strlen in a loop. If your string is one million bytes, you do one million calls to strlen, and each call will go through the whole string until the end, scanning the whole megabyte string for the trailing zero byte.
PS. I misinterpreted what you are trying to achieve: Your code will delete any single space, and delete one out of any pairs of spaces. So it leaves one space for every two spaces you had. So it won't work if there is one space. It works by coincidence for two or three spaces. And if you have lots of spaces, it will keep about half of them.
Copying the "right" side of the string potentially multiple times is inefficient. Use of strcpy() or memcpy() is not an efficient approach.
Repeatedly calling strlen() is also inefficient.
Suggest using two indexes and walk them through the string.
#include <stdbool.h>
void RemoveExtraSpaces(char * word) {
if (word[0]) {
size_t src = 1;
size_t dest = 1;
do {
if (word[src] != ' ' || word[src - 1] != ' ') {
word[dest] = word[src];
dest++;
}
} while (word[src++]);
}
printf("The string without spaces is `%s`\n", word);
}
Unclear what should happen with multiple spaces before the first word or after the last word. This code shrinks those to 1 space.
After accept variation - slight simplification. Inspired by #Joachim Pileborg good answer.
void RemoveExtraSpaces2(char * word) {
size_t src = 0;
size_t dest = 0;
do {
if (word[src] == ' ' && word[src + 1] == ' ') {
src++;
}
word[dest] = word[src];
dest++;
} while (word[src++]);
printf("The string without spaces is `%s`\n", word);
}
As this is too close to being a duplicate of Replace multiple spaces by single space in C, (BTW: which did not accept the best answer IMO), I am making this community wiki.
This works for me
for(i=0; i < strlen(word); i++) {
if(word[i] == ' ' )
{
while (word[i+1] == ' ' && word[i+1] != '\0')
memmove(&word[i],&word[i+1], strlen(word)-i);
}
}
Write a function in C language that:
Takes as its only parameter a sentence stored in a string (e.g., "This is a short sentence.").
Returns a string consisting of the number of characters in each word (including punctuation), with spaces separating the numbers. (e.g., "4 2 1 5 9").
I wrote the following program:
int main()
{
char* output;
char *input = "My name is Pranay Godha";
output = numChar(input);
printf("output : %s",output);
getch();
return 0;
}
char* numChar(char* str)
{
int len = strlen(str);
char* output = (char*)malloc(sizeof(char)*len);
char* out = output;
int count = 0;
while(*str != '\0')
{
if(*str != ' ' )
{
count++;
}
else
{
*output = count+'0';
output++;
*output = ' ';
output++;
count = 0;
}
str++;
}
*output = count+'0';
output++;
*output = '\0';
return out;
}
I was just wondering that I am allocating len amount of memory for output string which I feel is more than I should have allocated hence there is some wasting of memory. Can you please tell me what can I do to make it more memory efficient?
I see lots of little bugs. If I were your instructor, I'd grade your solution at "C-". Here's some hints on how to turn it into "A+".
char* output = (char*)malloc(sizeof(char)*len);
Two main issues with the above line. For starters, you are forgetting to "free" the memory you allocate. But that's easily forgiven.
Actual real bug. If your string was only 1 character long (e.g. "x"), you would only allocate one byte. But you would likely need to copy two bytes into the string buffer. a '1' followed by a null terminating '\0'. The last byte gets copied into invalid memory. :(
Another bug:
*output = count+'0';
What happens when "count" is larger than 9? If "count" was 10, then *output gets assigned a colon, not "10".
Start by writing a function that just counts the number of words in a string. Assign the result of this function to a variable call num_of_words.
Since you could very well have words longer than 9 characters, so some words will have two or more digits for output. And you need to account for the "space" between each number. And don't forget the trailing "null" byte.
If you think about the case in which a 1-byte unsigned integer can have at most 3 chars in a string representation ('0'..'255') not including the null char or negative numbers, then sizeof(int)*3 is a reasonable estimate of the maximum string length for an integer representation (not including a null char). As such, the amount of memory you need to alloc is:
num_of_words = countWords(str);
num_of_spaces = (num_of_words > 0) ? (num_of_words - 1) : 0;
output = malloc(num_of_spaces + sizeof(int)*3*num_of_words + 1); // +1 for null char
So that's a pretty decent memory allocation estimate, but it will definitely allocate enough memory in all scenarios.
I think you have a few other bugs in your program. For starters, if there are multiple spaces between each word e.g.
"my gosh"
I would expect your program to print "2 4". But your code prints something else. Likely other bugs exist if there are leading or trailing spaces in your string. And the memory allocation estimate doesn't account for the extra garbage chars you are inserting in those cases.
Update:
Given that you have persevered and attempted to make a better solution in your answer below, I'm going to give you a hint. I have written a function that PRINTs the length of all words in a string. It doesn't actually allocate a string. It just prints it - as if someone had called "printf" on the string that your function is to return. Your job is to extrapolate how this function works - and then modify it to return a new string (that contains the integer lengths of all the words) instead of just having it print. I would suggest you modify the main loop in this function to keep a running total of the word count. Then allocate a buffer of size = (word_count * 4 *sizeof(int) + 1). Then loop through the input string again to append the length of each word into the buffer you allocated. Good luck.
void PrintLengthOfWordsInString(const char* str)
{
if ((str == NULL) || (*str == '\0'))
{
return;
}
while (*str)
{
int count = 0;
// consume leading white space
while ((*str) && (*str == ' '))
{
str++;
}
// count the number of consecutive non-space chars
while ((*str) && (*str != ' '))
{
count++;
str++;
}
if (count > 0)
{
printf("%d ", count);
}
}
printf("\n");
}
The answer is: it depends. There are trade-offs.
Yes, it's possible to write some extra code that, before performing this action, counts the number of words in the original string and then allocates the new string based on the number of words rather than the number of characters.
But is it worth it? The extra code would make your program longer. That is, you would have more binary code, taking up more memory, which may be more than you gain. In addition, it will take more time to run.
By the way, you have a memory leak in your program, which is more of a problem.
As long as none of the words in the sentence are longer than 9 characters, the length of your output array needs only to be the number of words in the sentence, multiplied by 2 (to account for the spaces), plus an extra one for the null terminator.
So for the string
My name is Pranay Godha
...you need only an array of length 11.
If any of the words are ten characters or more, you'll need to calculate how many extra char your array will need by determining the length of the numeric required. (e.g. a word of length 10 characters clearly requires two char to store the number 10.)
The real question is, is all of this worth it? Unless you're specifically required (homework?) to use the minimal space required in your output array, I'd be minded to allocate a suitably large array and perform some bounds checking when writing to it.