I've got a function which does some stuff with strings, however it has to save the original string by copying it into a char array, making it all upper-case and substitute any w/W for V.
char* function(const char* text){
int textLength = strlen(text);
char text_copy[textLength];
for(int i = 0; i < textLength; i++){
if(text[i] == 'W' || text[i] == 'w')
text_copy[i] = 'V';
else
text_copy[i] = toupper(text[i]);
}
return 'a';
}
It doesn't really matter what the function returns, however whenever I try to printf("%s\n", text_copy);, with some strings, it returns this:
belfast: BELFAST
please: PLEASE
aardvark: AARDVARK??
hello world: HELLO VORLD
taxxxiii: TAXXXIII???
swag: SVAG?
Why is it that some strings turn out fine and some don't? Thanks.
You need to null-terminate the copy.
char text_copy[textLength+1];
...
text_copy[textLength]='\0';
Though if you are returning it from your function (that isn't clear) you should be mallocing it instead.
Why is it that some strings turn out fine and some don't?
Pure chance.
You only allocate enoufgh space for the visible characters in the string and not the terminating \0. You are just lucky that for some of the strings a null byte is on the stack just after the character array.
Change your code like so...
int textLength = strlen(text);
char text_copy[textLength + 1]; // << enough space for the strings and \0
for(int i = 0; i < textLength; i++){
if(text[i] == 'W' || text[i] == 'w')
text_copy[i] = 'V';
else
text_copy[i] = toupper(text[i]);
}
text_copy[textLength] = '\0'; // Make sure it is terminated properly.
Related
I have built a function with the goal of taking text that is fed from elsewhere in the program and removing all whitespace and punctuation from it. I'm able to remove whitespace and punctuation, but the changes don't stay after they are made. For instance, I put the character array/string into a for-loop to remove whitespace and verify that the whitespace is removed by printing the current string to the screen. When I send the string through a loop to remove punctuation, though, it acts as though I did not remove whitespace from earlier. This is an example of what I'm talking about:
Example of output to screen
The function that I'm using is here.
//eliminates all punctuation, capital letters, and whitespace in plaintext
char *formatPlainText(char *plainText) {
int length = strlen(plainText);
//turn capital letters into lower case letters
for (int i = 0; i < length; i++)
plainText[i] = tolower(plainText[i]);
//remove whitespace
for (int i = 0; i < length; i++) {
if (plainText[i] == ' ')
plainText[i] = plainText[i++];
printf("%c", plainText[i]);
}
printf("\n\n");
//remove punctuation from text
for (int i = 0; i < length; i++) {
if (ispunct(plainText[i]))
plainText[i] = plainText[i++];
printf("%c", plainText[i]);
}
}
Any help as to why the text is unchanged after if exits the loop would be appreciated.
Those for loops are not necessary. Your function can be modified as follows and I commented where I made those changes:
char* formatPlainText(char *plainText)
{
char *dest = plainText; //dest to hold the modified version of plainText
while ( *plainText ) // as far as *plainText is not '\0'
{
int k = tolower(*plainText);
if( !ispunct(k) && k != ' ') // check each char for ' ' and any punctuation mark
*dest++ = tolower(*plainText); // place the lower case of *plainText to *dest and increment dest
plainText++;
}
*dest = '\0'; // This is important because in the while loop we escape it
return dest;
}
From main:
int main( void ){
char str[] = "Practice ????? &&!!! makes ??progress!!!!!";
char * res = formatPlainText(str);
printf("%s \n", str);
}
The code does convert the string to lower case, but the space and punctuation removal phases are broken: plainText[i] = plainText[i++]; has undefined behavior because you use i and modify it elsewhere in the same expression.
Furthermore, you do not return plainText from the function. Depending on how you use the function, this leads to undefined behavior if you store the return value to a pointer and later dereference it.
You can fix the problems by using 2 different index variables for reading and writing to the string when removing characters.
Note too that you should not use a length variable as the string length changes in the second and third phase. Texting for the null terminator is simpler.
Also note that tolower() and ispunct() and other functions from <ctype.h> are only defined for argument values in the range 0..UCHAR_MAX and the special negative value EOF. char arguments must be cast as (unsigned char) to avoid undefined behavior on negative char values on platforms where char is signed by default.
Here is a modified version:
#include <ctype.h>
//eliminate all punctuation, capital letters, and whitespace in plaintext
char *formatPlainText(char *plainText) {
size_t i, j;
//turn capital letters into lower case letters
for (i = 0; plainText[i] != '\0'; i++) {
plainText[i] = tolower((unsigned char)plainText[i]);
}
printf("lowercase: %s\n", plainText);
//remove whitespace
for (i = j = 0; plainText[i] != '\0'; i++) {
if (plainText[i] != ' ')
plainText[j++] = plainText[i];
}
plainText[j] = '\0';
printf("no white space: %s\n", plainText);
//remove punctuation from text
for (i = j = 0; plainText[i] != '\0'; i++) {
if (!ispunct((unsigned char)plainText[i]))
plainText[j++] = plainText[i];
}
plainText[j] = '\0';
printf("no punctuation: %s\n", plainText);
return plainText;
}
I have the following string function:
char * to_upper(const char * str) {
char * upper = malloc(strlen(str)+1);
int i;
for (i=0; str[i] != 0; i++)
upper[i] = toupper(str[i]);
upper[i+1] = '\0';
return upper;
}
However, when I call it, it adds a "?" to the end (probably an invalid character). If I change the last line, from upper[i+1] = '\0' to upper[i] = '\0', it works as expected. What is wrong then with code above?
Additionally, is this the right way to allocate for the string?
char * upper = malloc(strlen(str)+1);
Or should I instead do:
char upper[strlen(str)+1];
Update: my error above is because length starts at 1, index starts at 0. How should I initialize the string though?
Your code is fine, you just need to remove the +1 as you found out. The for loop ends when str[i] is equal to '\0', so it makes sense that upper[i] should then be set to '\0' as well.
Your string initialization is fine.
I fixed the answer, accoridng to comments:
char * to_upper(const char * str) {
char* upper = malloc(strlen(str));
int i;
for (i=0; str[i] != '\0'; i++)
upper[i] = toupper(str[i]);
upper[i] = '\0';
return upper;
}
With the comments i saw the error in my and your logic. The null string is already there, hence we fall out of the for at str[i] == '\0'. So we know i is the index we need to set as \0 in upper.
I am working with a pointer to an array of characters. This code is supposed to switch the cases of letters, delete digits, print two spaces instead of one, and print all other chars the same. It does all the rest fine except print other characters the same. This seems like it should be a non issue but I cannot figure it out. See anything that looks like its wrong?
void compareDuplicates(FILE * ifp, char mode){
/* size MAXCHARS+1 to hold full array + null terminate*/
char newArray [MAXCHARS +1] = {0};
char oldArray [MAXCHARS +1] = {0};
char * newStr = newArray;
char * oldStr = oldArray;
char * tempStr;
/* fill array, test for EOF*/
while(fgets(newStr,MAXCHARS, ifp)){
//if strings are the same, do not print anything
if(strcmp(newStr, oldStr) !=0){
//else print
testStrings(newStr);
}
//set oldStr pointer to newStr, set newStr pointer to oldStr reset newStr memory
//reset memory of newStr to all null chars
tempStr = oldStr;
oldStr = newStr;
newStr = tempStr;
memset(newStr,'\0',MAXCHARS);
}
}
void testStrings(char * array1){
int i = 0;
char c;
while(*(array1+i) != '\0'){
if(*(array1+i) >= 'A' && *(array1+i) <= 'Z'){
c = *(array1+i)+32;
printf("%c",c);
}
else if(*(array1+i) >= 'a' && *(array1+i) <='z'){
c = *(array1+i)-32;
printf("%c",c);
}
else if(*(array1+i) == ' '){
c = *(array1+i);
printf("%c",c);
printf("%c",c);
}
else if(*(array1+i) >= '0' || *(array1+i) <= '9'){
i++;
continue;
}
else{
c = *(array1+i);
printf("%c",c);
}
i++;
}
printf("\n");
}
for example, if given the lines:
CONSECUTIVE_LINE
CONSECUTIVE_LINE
CONSECUTIVE_LINE
123 REPEAT
123 REPEAT
232unique-line
the output will be:
consecutiveline
repeat
UNIQUELINE
representing a deletion of consecutive lines, the changing of cases, adding two spaces in the place of one and the deletion of digits. However, it will not print the normal underscores and other characters not targeted.
This test...
*(array1+i) >= '0' || *(array1+i) <= '9'
... will always yield true. Any character you check against is going to be more than '0' or less than '9' because '0' < '9'. You probably wanted to check a character is inside this range, which requires a && (logical AND), like you do in all the others.
As a side note, don't assume the character encoding is going to put the alphabetic characters in sequence. It's only guaranteed to be true for the digit characters. A better check would utilize isalpha and islower or isupper from the standard library's ctype.h header.
This should be relatively simple.
I've got a string/character pointer that looks like this
" 1001"
Notice the space before the 1. How do I remove this space while still retaining the integer after it (not converting to characters or something)?
The simplest answer is:
char *str = " 1001";
char *p = str+1; // this is what you need
If the space is at the beginning of string.You also can do it.
char *str = " 1001";
char c[5];
sscanf(str,"%s",c);
printf("%s\n",c);
%s will ignore the first space at the beginning of the buffer.
One solution to this is found here: How to remove all occurrences of a given character from string in C?
I recommend removing the empty space character or any character by creating a new string with the characters you want.
You don't seem to be allocating memory so you don't have to worry about letting the old string die.
If it is a character pointer, I believe
char* new = &(old++);
Should do the trick
I'm guessing your reading in a String representation of an integer from stdin and want to get rid of the white space? If you can't use the other tricks above with pointers and actually need to modify the memory, use the following functions.
You can also use sprintf to get the job done.
I'm sure there is more efficient ways to trim the string. Here is just an example.
void trim(unsigned char * str)
{
trim_front(str);
trim_back(str);
}
void trim_front(unsigned char * str)
{
int i = 0;
int index = 0;
int length = strlen(str);
while(index < length && (str[index] == ' ' || str[index] == '\t' || str[index] == '\n'))
{
index++;
}
while(index < length)
{
str[i] = str[index];
i++;
index++;
}
}
void trim_back(unsigned char * str)
{
int i;
for(i = 0; str[i] != ' ' && str[i] != '\n' && str[i] != '\t' && str[i] != '\0'; i++);
str[i] = '\0';
}
We have a function longest, which returns the longest substring that consists of letters. Example:
longest("112****hel 5454lllllo454")
would return: lllllo
However, when I run the program it seems to return lllllo454. Here is the function:
char *longest(char *s){
char *pMax = NULL;
int nMax = 0;
char *p = NULL;
int n = 0;
int inside = 0; //flag
while(*s!='\0'){
char c = *s;
if((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')){
if(inside == 0){
n = 1;
p = s;
inside = 1;
}
else
n++;
if(inside == 1){
if(n > nMax){
nMax = n;
pMax = p;
inside = 0;
}
}
}//end isLetter if
s++;
}
return pMax;
}
There's something I'm not seeing here...what do you guys think?
You are just returning a pointer to the first character in the longest substring. You don't actually add a string terminator after the end of the substring, so it continues to the end of the original string. You probably ought to copy the substring (only those characters in the sequence) to a new string and return a pointer to that.
char* newStr = malloc(nMax+1);
strncpy( newStr, pMax, nMax );
*(newStr+nMax) = '\0';
return newStr;
You are calculating nMax but not doing anything with that information. In C, a char* points to the start of a string of characters, which is terminated by a NUL character. Since you are not modifying the buffer passed to your function, the returned pointer points to the first 'l' and continues to the end of the original string.
You are returning the pointer to the first letter of the longest substring; you are not making a new string out of it. Thus when you print it out, it prints the substring up until the NULL terminator.
See the strncpy function: http://www.cplusplus.com/reference/clibrary/cstring/strncpy/
This returns the largest substring that starts with a string. The part of the function that sets inside, as follows:
if((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')){
will only be executed if c is a letter. Since you want the largest substring that includes a letter, you need this to be:
if(c != ' '){
Then, inside that loop, have another variable, say containsLetter, that is only true if you encounter a letter before another space.