How to remove punctuation from a String in C - c

I'm looking to remove all punctuation from a string and make all uppercase letters lower case in C, any suggestions?

Just a sketch of an algorithm using functions provided by ctype.h:
#include <ctype.h>
void remove_punct_and_make_lower_case(char *p)
{
char *src = p, *dst = p;
while (*src)
{
if (ispunct((unsigned char)*src))
{
/* Skip this character */
src++;
}
else if (isupper((unsigned char)*src))
{
/* Make it lowercase */
*dst++ = tolower((unsigned char)*src);
src++;
}
else if (src == dst)
{
/* Increment both pointers without copying */
src++;
dst++;
}
else
{
/* Copy character */
*dst++ = *src++;
}
}
*dst = 0;
}
Standard caveats apply: Completely untested; refinements and optimizations left as exercise to the reader.

Loop over the characters of the string. Whenever you meet a punctuation (ispunct), don't copy it to the output string. Whenever you meet an "alpha char" (isalpha), use tolower to convert it to lowercase.
All the mentioned functions are defined in <ctype.h>
You can either do it in-place (by keeping separate write pointers and read pointers to the string), or create a new string from it. But this entirely depends on your application.

The idiomatic way to do this in C is to have two pointers, a source and a destination, and to process each character individually: e.g.
#include <ctype.h>
void reformat_string(char *src, char *dst) {
for (; *src; ++src)
if (!ispunct((unsigned char) *src))
*dst++ = tolower((unsigned char) *src);
*dst = 0;
}
src and dst can be the same string since the destination will never be larger than the source.
Although it's tempting, avoid calling tolower(*src++) since tolower may be implemented as a macro.
Avoid solutions that search for characters to replace (using strchr or similar), they will turn a linear algorithm into a geometric one.

Here's a rough cut of an answer for you:
void strip_punct(char * str) {
int i = 0;
int p = 0;
int len = strlen(str);
for (i = 0; i < len; i++) {
if (! ispunct(str[i]) {
str[p] = tolower(str[i]);
p++;
}
}
}

Related

C : removing duplicated letters from string

I am trying to remove duplicated letters in each word from string.(I haven't specified it for upper and lower case letters yet)
Input:
Ii feel good todday!!
thhis iss fixed
Output:
I fel god today!
this is fixed
I am calling this function in the main and i have to use the result in another function. That's why I call it by reference.
int main(){
char string[100];
printf("Enter a string:");
gets(string);
dup_letters_rule(&string);
return 0;
}
void dup_letters_rule(char *str_[]){
char new_str_[100];
int i=0, j=0;
printf("Fixed duplicates:\n");
while(*str_[i]!='\0'){
if(*str_[i]== *str_[i+1] && *str_[i+1]!='\0'){
while(*str_[i]==*str_[i+1] && *str_[i+1]!='\0'){
i++;
}
*str_[i]=new_str_[j];
j++;
i++;
}
else{
*str_[i]=new_str_[j];
j++;
i++;
}
}
new_str_[j]='\0';
puts(new_str_);
}
It works like:
void dup_letters_rule(char *str_[]){
char *new_str_=*str_, *temp=*str_;
temp++;
printf("Fixed duplicates:\n");
while(*new_str_!='\0'){
if(*new_str_== *temp && *temp!='\0'){
while(*new_str_==*temp && *temp!='\0'){
new_str_++;
temp++;
}
putchar(*new_str_);
new_str_++;
temp++;
}
else{
putchar(*new_str_);
new_str_++;
temp++;
}
}
}
But then, I can't use *str_ string in another function.
The code can be simplified.
We can keep an int value that is the previous char seen and compare it against the current char and only "copy it out" if they are different. (i.e. we only need two pointers).
We also have to use tolower because Ii goes to I.
Although a second/output buffer could be used, the function can do the cleanup "in-place". Then, the caller can use the cleaned up buffer. This is what we'd normally want to do.
If the caller needs to keep the original string, it can save the original to a temp buffer and call the function with the temp
I had to refactor your code. I tested it against your sample input. It is annotated:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
void
dup_letters_rule(char *src)
{
char *dst = src;
int prev = -1;
// rchr -- the "raw" char
// lchr -- the result of tolower(rchr)
// prev -- the previous value of lchr (starts with -1 to force output of
// first char)
for (int rchr = *src++; rchr != 0; rchr = *src++) {
// get lowercase char
int lchr = tolower((unsigned char) rchr);
// output if _not_ a dup
if (lchr != prev)
*dst++ = rchr;
// remember this char for the next iteration
prev = lchr;
}
*dst = 0;
}
int
main(void)
{
char *cp;
char buf[1000];
while (1) {
cp = fgets(buf,sizeof(buf),stdin);
if (cp == NULL)
break;
// get rid of newline
buf[strcspn(buf,"\n")] = 0;
// eliminate dups
dup_letters_rule(buf);
// output the clean string
printf("%s\n",buf);
}
return 0;
}
UPDATE:
can i print the clean string in the dup_letters_rule function? – hamster
Sure, of course. We're the programmers, so we can do whatever we want ;-)
There is a maxim for functions: Do one thing well
In many actual (re)use cases, we don't want the simple/low level function to do printing. That is the usual.
But, we could certainly add printing to the function. We'd move the printf from main into the function itself.
To get the best of both worlds, we can use two functions. One that just does the transformation. And, a second that calls the simple function and then prints the result.
Here's a slight change that illustrates that. I renamed my function and created dup_letters_rule with the printf embedded in it:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
void
dup_letters_rule_basic(char *src)
{
char *dst = src;
int prev = -1;
// rchr -- the "raw" char
// lchr -- the result of tolower(rchr)
// prev -- the previous value of lchr (starts with -1 to force output of
// first char)
for (int rchr = *src++; rchr != 0; rchr = *src++) {
// get lowercase char
int lchr = tolower((unsigned char) rchr);
// output if _not_ a dup
if (lchr != prev)
*dst++ = rchr;
// remember this char for the next iteration
prev = lchr;
}
*dst = 0;
}
void
dup_letters_rule(char *buf)
{
dup_letters_rule_basic(buf);
// output the clean string
printf("%s\n",buf);
}
int
main(void)
{
char *cp;
char buf[1000];
while (1) {
cp = fgets(buf,sizeof(buf),stdin);
if (cp == NULL)
break;
// get rid of newline
buf[strcspn(buf,"\n")] = 0;
dup_letters_rule(buf);
}
return 0;
}
UPDATE #2:
and why it's not char *dst = *src; but char *dst = src; – hamster
This is basic C. We want dst to have the same value/contents that src does. Just as if we did:
int x = 23;
int y = x;
If we do what you're suggesting, the compiler flags the statement:
bad.c: In function ‘dup_letters_rule_basic’:
bad.c:8:14: warning: initialization of ‘char *’ from ‘char’ makes pointer from integer without a cast [-Wint-conversion]
char *dst = *src;
^
Doing char *dst = *src [as you suggest] is using * in two different ways.
Doing char *dst says that dst is defined as a pointer to a char.
Doing *src here [which is the initializer for dst and is an expression], the * is the dereference operator. It says "fetch the value (a char) pointed to by src". Not what we want.
Perhaps this would be more clear if we didn't use an initializer. We use a definition (without an initializer) and set the initial value of dst with an assignment statement:
char *dst; // define a char pointer (has _no_ initial value)
dst = src; // assign the value of dst from the value of src
The assignment [statement] can occur anywhere after the definition and before the for loop/statement. Here's the first few lines of the function body:
char *dst;
int prev = -1;
dst = src;
To remove the duplicate consecutive characters from a string in-place, keep track of position in string where the next character, which is not same as its previous character, to be write and check current processing character with previous character (ignore the difference in their case) except when the character is the first character of string because the first character does not have any character previous to it. If current processing character is same as previous character then move to next character in the string and if they are not same then overwrite the character at tracked position with current processing character and increment tracked position pointer by 1.
Its implementation:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
void remove_consecutive_dup_chars (char * pstr) {
if (pstr == NULL) {
printf ("Invalid input..\n");
return;
}
/* Pointer to keep track of position where next character
* to be write in order to remove consecutive duplicate character.
*/
char * p = pstr;
for (unsigned int i = 0; pstr[i] ; ++i) {
if ((i) && (tolower (pstr[i]) == tolower (pstr[i - 1]))) {
continue;
}
*p++ = pstr[i];
}
/* Add the null terminating character.
*/
*p = '\0';
}
int main (void) {
char buf[256] = {'\0'};
strcpy (buf, "Ii feel good todday!!");
remove_consecutive_dup_chars (buf);
printf ("%s\n", buf);
strcpy (buf, "thhis iss fixed");
remove_consecutive_dup_chars (buf);
printf ("%s\n", buf);
strcpy (buf, "");
remove_consecutive_dup_chars (buf);
printf ("%s\n", buf);
strcpy (buf, "aaaaaa zzzzzz");
remove_consecutive_dup_chars (buf);
printf ("%s\n", buf);
return 0;
}
Output:
I fel god today!
this is fixed
a z

Char *strcat Implementation leading to Segmentation Fault

char *strcat(char*dest, char*src) {
while (dest != '\0') {
*dest++;
}
while (src != '\0') {
*dest++ = *src++;
}
return dest;
}
I keep getting a segmentation fault on the line *dest++ = *src++. Any ideas on how to fix the problem?
Your code has 4 problems:
you are comparing pointers to the null character instead of comparing the character they point to. Since it will take incrementing the pointer an awful lot of times before it becomes 0, if at all, you are reading and/or writing beyond the end of the buffers, from/to invalid memory before this happens, hence the crash.
you do not null terminate the destination string.
you return the pointer to the end of the destination string instead of the original destination string. This might be a useful API, but you should use a different name for that.
the src pointer should be declared as const char * to conform to the standard declaration for this function and to allow passing pointers to constant strings as sources.
Here is a corrected version:
char *strcat(char *dest, const char *src) {
char *saved = dest;
while (*dest != '\0') {
dest++;
}
while ((*dest++ = *src++) != '\0') {
continue;
}
return saved;
}
Okay: the Kernighan way:
char *strcat(char *dest, char *src)
{
char *org = dest;
while(*dest++){;}
// at this moment, *dest MUST be pointing to '\0'
while(*dest++ = *src++){;}
// at this moment, *dest MUST be pointing to '\0' agian
return org;
}
Update (courtously #chqrlie):
char *strcat(char *dest, char *src)
{
char *org = dest;
for(; *dest; dest++) {;}
// at this moment, dest MUST be pointing to '\0'
while(*dest++ = *src++) {;}
// at this moment, dest points past the '\0', but who cares?
return org;
}
dest and source will never become '\0' if they aren't null to begin with (or maybe after a long time to be correct, but you'll probalby run out of memory long before that).
You should use:
while(*dest != '\0'){
dest++;
}
while(*src != '\0'){
*dest++ = *src++;
}
to check the values underneath the pointers.
There are some other problems too:
the resulting string is not null-terminated.
a pointer to the end of the string is returned.
As mentioned by others: src should be a const pointer too.
This should do it:
char *strcat(char *dest, const char *src)
{
char *start_pos = dest;
while(*dest != '\0')
dest++;
while(*src != '\0')
*dest++ = *src++;
*dest = '\0';
return start_pos;
}
Minor detail: I would give this funtion some other name than the standard used strcat().

How do I make my program abort (without any lib)

I have an exercise in class where I have to copy the strstr() function in C (from <string.h> library). Here is my code
char *ft_strcpy(char *dest, char *src)
{
int i;
i = 0;
while (src[i] != '\0')
{
dest[i] = src[i];
i++;
}
dest[i] = '\0';
return (dest);
}
This code is fully functional but when I try with a dest string that is smaller than the src string, it shows an unpredictable result that ends up overwriting src too. The original strstr() function's answer to that is to abort program in that situation. How can I make my program abort given that I can't use the abort() function?
You must check your string.
May be one of them is NULL.
When you do : While (str[i] ..)
You don't know if (str) is NULL or not.
Do :
While (str && str[i] != '\0')
It's better.
You can also check if dest is malloc fine because if you don't have any memory for your string dest you can't assign value.
I would suggest a compact implementation
char * ft_strcpy(char * dest, const char * src) {
if (strlen(dest) < strlen(src)) { /* abort(); or exit(1); */ }
char * s = dest;
while (*dest++ = *src++);
return s;
}
The actual bounds checking is implemented in the safer version of strcpy, i.e. strcpy_s. You can find an implementation in Safe C Library.

C Way to Extract Variables from Strings

I was wondering how do C programmers usually extract data from a string? I read a lot about strtok, but I personally dislike the way the function works. Having to call it again with NULL as parameter seems odd to me. I once stumbled upon this little piece of code which I find pretty sleek :
sscanf(data, "%*[^=]%*c%[^&]%*[^=]%*c%[^&]", usr, pw);
This would extract data from a URL query string (only var1=value&var2=value).
Is there a reason to use strtok over sscanf? Performance maybe?
IMHO the best way is the most readable and understandable way. sscanf and strtok totally disqualify with your user/pw extraction from an URL.
Instead, look for the boundaries of the strings you are looking for (in an URL the slash, the at-sign, the colon, what have you) with strchr and strrchr, then memcpy from start to end to where you need the data and tack on a NUL. This also allows for appropriate error handling should the string have an unexpected format.
They are each better or more convenient at certain kinds of tasks:
sscanf allows you to concisely specify a fairly complex template for parsing values out of a line of text, but it is very unforgiving. If your input text differs by even a character from your template, the scan will fail. For that reason, it's almost never the right tool to use for human-generated input, for example. It is most useful for scanning automatically generated output, e.g. server log lines.
strtok is much more flexible, but also much more verbose: parsing a line with only a few fields may take many lines of code. It is also destructive: it actually modifies the string that is passed to it, so you may need to make a copy of the data before invoking strtok.
strtok is a much simpler, low level function mostly used to tokenize strings that have an unknown element count.
NULL is used to tell strtok to continue scanning the string from the last position, saving you some pointer manipulation and probably (internally to strtok) some initialization.
There's also the matter of readability. looking at the code snippet, it takes some time to understand what's going on.
sscanf uses a very incomplete (though efficient to implement) regular expression syntax, so if you wanted to do something more complicated, you cannot use sscanf.
That being said, strtok isn't re entrant so if you're using threading then you're out of luck.
But generally speaking, the one that ends up running faster for a particular circumstance and is more elegant is often considered to be the most idiomatic for that circumstance.
I myself created a small header file with a few definitions of functions that can help such as a char **Split(src, sep) function and a int DoubleArrLen(char **arr),
If you can improve it in any way here is the small 1-hour work thing.
#include <string.h>
#include <stdlib.h>
#include <malloc.h>
#include <assert.h>
char *substring(char *string, int position, int length)
{
char *pointer;
int c;
pointer = malloc(length+1);
if (pointer == NULL)
{
printf("Unable to allocate memory.\n");
exit(EXIT_FAILURE);
}
for (c = 0 ; c < position -1 ; c++)
string++;
for (c = 0 ; c < length ; c++)
{
*(pointer+c) = *string;
string++;
}
*(pointer+c) = '\0';
return pointer;
}
char **Split(char *a_str, const char a_delim)
{
char **result = 0;
size_t count = 0;
char *tmp = a_str;
char *last_comma = 0;
/* Count how many elements will be extracted. */
while (*tmp)
{
if (a_delim == *tmp)
{
count++;
last_comma = tmp;
}
tmp++;
}
/* Add space for trailing token. */
count += last_comma < (a_str + strlen(a_str) - 1);
/* Add space for terminating null string so caller
knows where the list of returned strings ends. */
count++;
result = malloc(sizeof(char *) * count);
if (result)
{
char delim[2] = { a_delim, '\0' }; // Fix for inconsistent splitting
size_t idx = 0;
char *token = strtok(a_str, delim);
while (token)
{
assert(idx < count);
*(result + idx++) = strdup(token);
token = strtok(0, delim);
}
assert(idx == count - 1);
*(result + idx) = 0;
}
return result;
}
static int SplitLen(char **array)
{
int i = 0;
while (*array++ != 0)
i++;
return i;
}
int IndexOf(char *str, char *ch)
{
int i;
int cnt;
int result = -1;
if(strlen(str) >= strlen(ch))
{
for(i = 0; i<strlen(str); i++)
{
if(str[i] == ch[0])
{
result = i;
for(cnt = 1; cnt < strlen(ch); cnt++)
{
if(str[i + cnt] != ch[cnt]) result = -1; break;
}
}
}
}
return result;
}
int IndexOfChar(char *str, char ch)
{
int result = -1;
int i = 0;
for(;i<strlen(str); i++)
{
if(str[i] == ch)
{
result = i;
break;
}
}
return result;
}
A little explanation can be the functions:
the substring function extracts a part of a string.
the IndexOf() function searches for a string inside the source string.
Others should be self-explanatory.
This includes a Split function as I pointed out earlier, you can use that instead of strtok..

function to transfer count number of characters from source string to destination string from right

I have written this function to transfer count number of characters from source string to destination string from right. I am passing string to src, NULL to dst and the count value to function
If i send input string as "Stack overflow" and count as 4 i want the o/p string as "flow". But here my o/p string is always empty, can u pls tell what is wrong in my logic. pls
char *Rprint(const char *src, char *dst, int count)
{
int i = 0;
char *ret = NULL;
while(*src!= '\0')
src++;
dst = malloc(sizeof(char) * (count + 1));
ret = dst;
dst = dst + (count + 1);
while(count)
{
*dst++ = *src--;
count--;
}
*dst++ = '\0';
//return ret;
printf("String:%s \n", ret);
}
I expect you meant to do this:
*dst-- = *src--;
I don't like the way you are doing this, but that should get you on track without me suggesting that you completely rewrite your code.
You should not null-terminate the string afterwards, because you have already copied the terminator. You are copying your string from the end to the beginning (ie reverse-copy), but confusing it with the more usual forward-copy.
Be careful with your loop condition. You might have an off-by-one error there. Same with adding count+1 to dst. I think you should only add count.
Oh, and don't forget to return a value from your function!
Here is the working code , based on your original approach , but with few corrections.
#include <stdio.h>
void Rprint(const char [], char [], int );
int main()
{
char buff[50] = "stack overflow";
char cut [50];
Rprint(buff,cut,5);
puts(cut);
}
void Rprint(const char src[], char dst[], int count)
{
while(*src!= '\0')
src++;
src = src - count;
while(count--)
*(dst++) = *(src++);
*(dst++) = '\0';
}

Resources