Remove punctuation at beginning and end of a string

Remove punctuation at beginning and end of a string - c

I have a string and I want to remove all the punctuation from the beginning and the end of it only, but not the middle.
I have wrote a code to remove the punctuation from the first and last character of a string only, which is clearly very inefficient and useless if a string has 2 or more punctuations at the end.
Here is an example:
{ Hello ""I am:: a Str-ing!! }
Desired output
{ Hello I am a Str-ing }
Are there any functions that I could use? Thanks.
This is what I've done so far. I'm actually editing the string in a linked-list
if(ispunct(removeend->string[(strlen(removeend->string))-1]) != 0) {
removeend->string[(strlen(removeend->string))-1] = '\0';
}
else {}

Iterate over the string, use isalpha() to check each character, write the characters which pass into a new string.

char *rm_punct(char *str) {
char *h = str;
char *t = str + strlen(str) - 1;
while (ispunct(*p)) p++;
while (ispunct(*t) && p < t) { *t = 0; t--; }
/* also if you want to preserve the original address */
{ int i;
for (i = 0; i <= t - p + 1; i++) {
str[i] = p[i];
} p = str; } /* --- */
return p;
}

Iterate over the string, use isalpha() to check each character, after the first character that passes start writing into a new string.
Iterate over the new string backwards, replace all punctuation with \0 until you find a character which isn't punctuation.

#include <stdio.h>
#include <ctype.h>
#include <string.h>
char* trim_ispunct(char* str){
int i ;
char* p;
if(str == NULL || *str == '\0') return str;
for(i=strlen(str)-1; ispunct(str[i]);--i)
str[i]='\0';
for(p=str;ispunct(*p);++p);
return strcpy(str, p);
}
int main(){
//test
char str[][16] = { "Hello", "\"\"I", "am::", "a", "Str-ing!!" };
int i, size = sizeof(str)/sizeof(str[0]);
for(i = 0;i<size;++i)
printf("%s\n", trim_ispunct(str[i]));
return 0;
}
/* result:
Hello
I
am
a
Str-ing
*/

Ok, in a while iteration, call multiple times the strtok function to separate each single string by the character (white space). You could also use sscanf instead of strtok.
Then, for each string, you have to do a for cycle, but beginning from the end of the string up to the beginning.As soon as you encounter !isalpha(current character) put a \0 in the current string position. You have eliminated the tail's punctuation chars.
Now, do another for cycle on the same string. Now from 0 to strlen(currentstring). While is !isalpha(current character) continue. If isalpha put the current character in in a buffer and all the remaining characters. The buffer is the cleaned string. Copy it into the original string.
Repeat the above two steps for the others strtok's outputs. End.

Construct a tiny state machine. The cha2class() function divides the characters into equivalence classes. The state machine will always skip punctuation, except when it has alphanumeric characters on the left and the right; in that case it will be preserved. (that is the memmove() in state 3)
#include <stdio.h>
#include <string.h>
#define IS_ALPHA 1
#define IS_WHITE 2
#define IS_PUNCT 3
int cha2class(int ch);
void scrutinize(char *str);
int cha2class(int ch)
{
if (ch >= 'a' && ch <= 'z') return IS_ALPHA;
if (ch >= 'A' && ch <= 'Z') return IS_ALPHA;
if (ch == ' ' || ch == '\t') return IS_WHITE;
if (ch == EOF || ch == 0) return IS_WHITE;
return IS_PUNCT;
}
void scrutinize(char *str)
{
size_t pos,dst,start;
int typ, state ;
state = 0;
for (dst = pos = start=0; ; pos++) {
typ = cha2class(str[pos]);
switch(state) {
case 0: /* BOF, white seen */
if (typ==IS_WHITE) break;
else if (typ==IS_ALPHA) { start = pos; state =1; }
else if (typ==IS_PUNCT) { start = pos; state =2; continue;}
break;
case 1: /* inside a word */
if (typ==IS_ALPHA) break;
else if (typ==IS_WHITE) { state=0; }
else if (typ==IS_PUNCT) { start = pos; state =3;continue; }
break;
case 2: /* inside punctuation after whitespace: skip it */
if (typ==IS_PUNCT) continue;
else if (typ==IS_WHITE) { state=0; }
else if (typ==IS_ALPHA) {state=1; }
break;
case 3: /* inside punctuation after a word */
if (typ==IS_PUNCT) continue;
else if (typ==IS_WHITE) { state=0; }
else if (typ==IS_ALPHA) {
memmove(str+dst, str+start, pos-start); dst += pos-start;
state =1; }
break;
}
str[dst++] = str[pos];
if (str[pos] == '\0') break;
}
}
int main (int argc, char **argv)
{
char test[] = ".This! is... ???a.string?" ;
scrutinize(test);
printf("Result=%s\n", test);
return 0;
}
int main (int argc, char **argv)
{
char test[] = ".This! is... ???a.string?" ;
scrutinize(test);
printf("Result=%s\n", test);
return 0;
}
OUTPUT:
Result=This is a.string

Related

tempWord[0]='\0' Does not reset String somehow

I wrote a program in C, The expected result should be:
$ cat poem.txt
Said Hamlet to Ophelia,
I'll draw a sketch of thee,
What kind of pencil shall I use?
2B or not 2B?
$ ./censor Ophelia < poem.txt
Said Hamlet to CENSORED,
I'll draw a sketch of thee,
What kind of pencil shall I use?
2B or not 2B?
But I got this:
$ ./censor Ophelia < poem.txt
Said Hamlet tomlet CENSORED,
I'lllia drawlia arawlia sketcha ofetcha theecha,
Whatcha kindcha ofndcha pencila shallla Ihallla usellla?
2Bsellla orellla notllla 2Botllla?
I use tempWord to store every word and compare it with the word that needs to be censored. Then I use tempWord[0]='\0' to reset the temp String, so that I can do another comparison. But it seems not working. Can anyone help?
# include <stdio.h>
# include <string.h>
int compareWord(char *list1, char *list2);
int printWord(char *list);
int main(int argc, char *argv[]) {
int character = 0;
char tempWord[128];
int count = 0;
while (character != EOF) {
character = getchar();
if ((character <= 'z' && character >= 'a') ||
(character <= 'Z' && character >= 'A') ||
character == 39) {
tempWord[count] = character;
count++;
} else {
if (count != 0 && compareWord(tempWord, argv[1])) {
printf("CENSORED");
count = 0;
tempWord[0] = '\0';
}
if (count != 0 && !compareWord(tempWord, argv[1])) {
printWord(tempWord);
count = 0;
tempWord[0] = '\0';
}
if (count == 0) {
printf("%c", character);
}
}
}
return 0;
}
int printWord(char *list) {
// print function
}
int compareWord(char *list1, char *list2) {
// compareWord function
}

There are multiple issues in your code:
You do not test for end of file at the right spot: if getc() returns EOF, you should exit the loop immediately instead of processing EOF and exiting at the next iteration. The classic C idiom to do this is:
while ((character = getchar()) != EOF) {
...
For portability and readability, you should use isalpha() from <ctype.h> to check if the byte is a letter and avoid hardcoding the value of the value of the apostrophe as 39, use '\'' instead.
You have a potential buffer overflow when storing the bytes into the tempWord array. You should compare the offset with the buffer size.
You do not null terminate tempWord, hence the compareWord() function cannot determine the length of the first string. The behavior is undefined.
You do not check if a command line argument was provided.
The second test is redundant: you could just use an else clause.
You have undefined behavior when printing the contents of tempWord[] because of the lack of null termination. This explains the unexpected behavior, but you might have much worse consequences.
printWord just prints a C string, use fputs().
The compWord function is essentially the same as strcmp(a, b) == 0.
Here is a simplified and corrected version:
#include <ctype.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]) {
char tempWord[128];
size_t count = 0;
int c;
while ((c = getchar()) != EOF) {
if (isalpha(c) || c == '\'') {
if (count < sizeof(tempWord) - 1) {
tempWord[count++] = c;
}
} else {
tempWord[count] = '\0';
if (argc > 1 && strcmp(tempWord, argv[1]) == 0) {
printf("CENSORED");
} else {
fputs(tempWord, stdout);
}
count = 0;
putchar(c);
}
}
return 0;
}
EDIT: chux rightfully commented that the above code does not handle 2 special cases:
words that are too long are truncated in the output.
the last word is omitted if it falls exactly at the end of file.
I also realized the program does not handle the case of long words passed on the command line.
Here is a different approach without a buffer that fixes these shortcomings:
#include <ctype.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
const char *word = (argc > 1) ? argv[1] : "";
int count = 0;
int c;
for (;;) {
c = getchar();
if (isalpha(c) || c == '\'') {
if (count >= 0 && (unsigned char)word[count] == c) {
count++;
} else {
if (count > 0) {
printf("%.*s", count, word);
}
count = -1;
putchar(c);
}
} else {
if (count > 0) {
if (word[count] == '\0') {
printf("CENSORED");
} else {
printf("%.*s", count, word);
}
}
if (c == EOF)
break;
count = 0;
putchar(c);
}
}
return 0;
}

tempWord[0] = '\0';
It will not reset the variable to null. It just assign the '\0' to the first position. But The values which are assigned are still in memory only. Only the first position is assigned to '\0'. So, to reset the character array try the below.
memset(tempWord, 0, 128);
Add the above line instead of your tempWord[0] = '\0'.
And also this will solves you don't need to add the '\0' at end of each word. This itself will work. But for the first time your have to reset the character array using the same memset function. Before entering to the loop you have to set the tempWord to null using the memset function.

Using tempWord[0]='\0' will not reset the whole array, just the first element. Looking at your code, there are 2 ways you could go forward, either reset the whole array by using memset:
memset(tempWord, 0, sizeof tempWord);
or
memset(tempWord, 0, 128);
(or you can only clear it by the size of last word, also it needs string.h which you have already included),
Or you could just set the element after the length of 'current word' to be '\0' (ex, if current word is the then set tempWord[3]='\0', since strlen checks the string till null char only) which can be placed before those 2 ifs checking if the strings are equal or not, your new while loop will look like this:
{
character = getchar();
if((character<='z' && character>='a')||(character<='Z' && character>='A')||character == 39)
{
tempWord[count]=character;
count++;
}else {
tempWord[count]='\0';
if(count!=0 && compareWord(tempWord, argv[1]))
{
printf("CENSORED");
count=0;
}
if(count!=0 && !compareWord(tempWord, argv[1]))
{
printWord(tempWord);
count=0;
}
if (count==0)
{
printf("%c", character);
}
}
}
(it works, tested)

K&R Exercise 1-19: Reverse Char Array

I'm able to reverse the array fine, but I can't get the program to terminate when I do CTRL+D(EOF) in terminal.
The only way I can get the program to terminate is if the very first thing I do after compiling is doing CTRL+D. But if I type in one string, then CTRL+D will not work after that.
I'm not quite sure where my error is.
#include <stdio.h>
#define MAXLINE 1000 // Maximum input.
// ----------------- reverseLine -----------------
// This method reads in chars to be put into an
// array to make a string. EOF and \n are the
// delimiters on the chars, then \0 is the
// delimiter for the string itself. Then the
// array is swapped in place to give the reverse
// of the string.
//------------------------------------------------
int reverseLine(char s[], int lim)
{
int c, i, newL;
// c is the individual chars, and i is for indices of the array.
for (i = 0; i < lim - 1 && (c=getchar()) != EOF && c != '\n'; ++i)
{
s[i] = c;
}
if (c == '\n') // This lets me know if the text ended in a new line.
{
newL = 1;
}
// REVERSE
int toSwap;
int end = i-1;
int begin = 0;
while(begin <= end) // Swap the array in place starting from both ends.
{
toSwap = s[begin];
s[begin] = s[end];
s[end] = toSwap;
--end;
++begin;
}
if (newL == 1) // Add the new line if it's there.
{
s[i] = '\n';
++i;
}
s[i] = '\0'; // Terminate the string.
return i;
}
int main()
{
int len;
char line[MAXLINE];
while ((len = reverseLine(line, MAXLINE)) > 0) // If len is zero, then there is no line to recored.
{
printf("%s", line);
}
return 0;
}
The only thing I can think of is the while loop in main checks if len > 0, so if I type EOF, maybe it can't make a valid comparison? But that wouldn't make sense as to why it works when that's the first and only thing I type.

Your program will never read the EOF because of this condition:
(c=getchar()) != EOF && c != '\n';
As soon as c is equal to '\n' the loop terminates and all the following characters are ignored. I think you should separate input from line reversing and make the usual checks on the reverse function parameters.
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE_MAX (256U)
static bool linereverse(char *line);
static bool deletenewline(char *s);
int main(void)
{
char buff[SIZE_MAX];
bool success;
(void) fputs("Enter a string: ", stdout);
if( NULL == fgets(buff,(size_t) SIZE_MAX, stdin))
{
(void) fputs("Error: invalid input!\n",stderr);
return EXIT_FAILURE;
}
success = deletenewline(buff);
if(false == success)
{
(void) fputs("Error: cannot remove newline\n",stderr);
return EXIT_FAILURE;
}
success = linereverse(buff);
if(false == success)
{
(void) fputs("Error: cannot reverse the line");
return EXIT_FAILURE;
}
(void) fputs("The line reversed is: ", stdout);
(void) fputs(buff, stdout);
(void) puchar('\n');
return EXIT_SUCCESS;
}
static bool linereverse(char *line)
{
size_t i;
size_t j;
char tmp;
if(NULL == line)
{
return false;
}
i = 0;
j = strlen(line) - 1;
while(i < j)
{
tmp = line[i];
line[i] = line[j];
line[j] tmp;
++i;
--j;
}
return true;
}
static bool deletenewline(char *s)
{
char *p;
if(NULL == s)
{
return false;
}
p = strrchr(s,'\n');
if(NULL != p)
{
*p = '\0';
}
return true;
}

Tokenize String by using pointer

I'm trying to tokenize a sting and here is my attempt.
char new_str[1024];
void tokenize_init(const char str[]){//copy the string into global section
strcpy(new_str,str);
}
int i = 0;
char *tokenize_next() {
const int len = strlen(new_str);
for(; i <= len; i++) {
if ( i == len) {
return NULL;
}
if ((new_str[i] >= 'a' && new_str[i] <= 'z') ||
(new_str[i] >= 'A' && new_str[i] <= 'Z')) {
continue;
}else {
new_str[i] = '\0';
i = i + 1;
return new_str;
}
}
return NULL;
}
//main function
int main(void) {
char sentence[] = "This is a good-sentence for_testing 1 neat function.";
printf("%s\n", sentence);
tokenize_init(sentence);
for (char *nt = tokenize_next();
nt != NULL;
nt = tokenize_next())
printf("%s\n",nt);
}
However, it just print out the first word of the sentence(which is "This") and then stop. Can someone tell me why? My guess is my new_str is not persisent and when the main function recall tokenize_next() the new_str become just the first word of the sentence. Thanks in advance.

The reason that it only prints out "This" is because you iterate to the first non-letter character which happens to be a space, and you replace this with a null terminating character at this line:
new_str[i] = '\0';
After that, it doesn't matter what you do to the rest of the string, it will only print up to that point. The next time tokenize_next is called the length of the string is no longer what you think it is because it is only counting the word "This" and since "i" has already reached that amount the function returns and so does every successive call to it:
if ( i == len)
{
return NULL;
}
To fix the function you would need to somehow update your pointer to look past that character on the next iteration.
However, this is quite kludgy. You are much better off using one of the mentioned functions such as strtok or strsep
UPDATE:
If you cannot use those functions then a redesign of your function would be ideal, however, per your request, try the following modifications:
#include <string.h>
#include <cstdio>
char new_str[1024];
char* str_accessor;
void tokenize_init(const char str[]){//copy the string into global section
strcpy(new_str,str);
str_accessor = new_str;
}
int i = 0;
char* tokenize_next(void) {
const int len = strlen(str_accessor);
for(i = 0; i <= len; i++) {
if ( i == len) {
return NULL;
}
if ((str_accessor[i] >= 'a' && str_accessor[i] <= 'z') ||
(str_accessor[i] >= 'A' && str_accessor[i] <= 'Z')) {
continue;
}
else {
str_accessor[i] = '\0';
char* output = str_accessor;
str_accessor = str_accessor + i + 1;
if (strlen(output) <= 0)
{
str_accessor++;
continue;
}
return output;
}
}
return NULL;
}
//main function
int main(void) {
char sentence[] = "This is a good-sentence for_testing 1 neater function.";
printf("%s\n", sentence);
tokenize_init(sentence);
for (char *nt = tokenize_next(); nt != NULL; nt = tokenize_next())
printf("%s\n",nt);
}

Parse string into array based on spaces or "double quotes strings"

Im trying to take a user input string and parse is into an array called char *entire_line[100]; where each word is put at a different index of the array but if a part of the string is encapsulated by a quote, that should be put in a single index.
So if I have
char buffer[1024]={0,};
fgets(buffer, 1024, stdin);
example input: "word filename.txt "this is a string that shoudl take up one index in an output array";
tokenizer=strtok(buffer," ");//break up by spaces
do{
if(strchr(tokenizer,'"')){//check is a word starts with a "
is_string=YES;
entire_line[i]=tokenizer;// if so, put that word into current index
tokenizer=strtok(NULL,"\""); //should get rest of string until end "
strcat(entire_line[i],tokenizer); //append the two together, ill take care of the missing space once i figure out this issue
}
entire_line[i]=tokenizer;
i++;
}while((tokenizer=strtok(NULL," \n"))!=NULL);
This clearly isn't working and only gets close if the double quote encapsulated string is at the end of the input string
but i could have
input: word "this is text that will be user entered" filename.txt
Been trying to figure this out for a while, always get stuck somewhere.
thanks

The strtok function is a terrible way to tokenize in C, except for one (admittedly common) case: simple whitespace-separated words. (Even then it's still not great due to lack of re-entrance and recursion ability, which is why we invented strsep for BSD way back when.)
Your best bet in this case is to build your own simple state-machine:
char *p;
int c;
enum states { DULL, IN_WORD, IN_STRING } state = DULL;
for (p = buffer; *p != '\0'; p++) {
c = (unsigned char) *p; /* convert to unsigned char for is* functions */
switch (state) {
case DULL: /* not in a word, not in a double quoted string */
if (isspace(c)) {
/* still not in a word, so ignore this char */
continue;
}
/* not a space -- if it's a double quote we go to IN_STRING, else to IN_WORD */
if (c == '"') {
state = IN_STRING;
start_of_word = p + 1; /* word starts at *next* char, not this one */
continue;
}
state = IN_WORD;
start_of_word = p; /* word starts here */
continue;
case IN_STRING:
/* we're in a double quoted string, so keep going until we hit a close " */
if (c == '"') {
/* word goes from start_of_word to p-1 */
... do something with the word ...
state = DULL; /* back to "not in word, not in string" state */
}
continue; /* either still IN_STRING or we handled the end above */
case IN_WORD:
/* we're in a word, so keep going until we get to a space */
if (isspace(c)) {
/* word goes from start_of_word to p-1 */
... do something with the word ...
state = DULL; /* back to "not in word, not in string" state */
}
continue; /* either still IN_WORD or we handled the end above */
}
}
Note that this does not account for the possibility of a double quote inside a word, e.g.:
"some text in quotes" plus four simple words p"lus something strange"
Work through the state machine above and you will see that "some text in quotes" turns into a single token (that ignores the double quotes), but p"lus is also a single token (that includes the quote), something is a single token, and strange" is a token. Whether you want this, or how you want to handle it, is up to you. For more complex but thorough lexical tokenization, you may want to use a code-building tool like flex.
Also, when the for loop exits, if state is not DULL, you need to handle the final word (I left this out of the code above) and decide what to do if state is IN_STRING (meaning there was no close-double-quote).

Torek's parts of parsing code are excellent but require little more work to use.
For my own purpose, I finished c function.
Here I share my work that is based on Torek's code.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
size_t split(char *buffer, char *argv[], size_t argv_size)
{
char *p, *start_of_word;
int c;
enum states { DULL, IN_WORD, IN_STRING } state = DULL;
size_t argc = 0;
for (p = buffer; argc < argv_size && *p != '\0'; p++) {
c = (unsigned char) *p;
switch (state) {
case DULL:
if (isspace(c)) {
continue;
}
if (c == '"') {
state = IN_STRING;
start_of_word = p + 1;
continue;
}
state = IN_WORD;
start_of_word = p;
continue;
case IN_STRING:
if (c == '"') {
*p = 0;
argv[argc++] = start_of_word;
state = DULL;
}
continue;
case IN_WORD:
if (isspace(c)) {
*p = 0;
argv[argc++] = start_of_word;
state = DULL;
}
continue;
}
}
if (state != DULL && argc < argv_size)
argv[argc++] = start_of_word;
return argc;
}
void test_split(const char *s)
{
char buf[1024];
size_t i, argc;
char *argv[20];
strcpy(buf, s);
argc = split(buf, argv, 20);
printf("input: '%s'\n", s);
for (i = 0; i < argc; i++)
printf("[%u] '%s'\n", i, argv[i]);
}
int main(int ac, char *av[])
{
test_split("\"some text in quotes\" plus four simple words p\"lus something strange\"");
return 0;
}
See program output:
input: '"some text in quotes" plus four simple words p"lus something strange"'
[0] 'some text in quotes'
[1] 'plus'
[2] 'four'
[3] 'simple'
[4] 'words'
[5] 'p"lus'
[6] 'something'
[7] 'strange"'

I wrote a qtok function some time ago that reads quoted words from a string. It's not a state machine and it doesn't make you an array but it's trivial to put the resulting tokens into one. It also handles escaped quotes and trailing and leading spaces:
#include <stdio.h>
#include <ctype.h>
#include <assert.h>
// Strips backslashes from quotes
char *unescapeToken(char *token)
{
char *in = token;
char *out = token;
while (*in)
{
assert(in >= out);
if ((in[0] == '\\') && (in[1] == '"'))
{
*out = in[1];
out++;
in += 2;
}
else
{
*out = *in;
out++;
in++;
}
}
*out = 0;
return token;
}
// Returns the end of the token, without chaning it.
char *qtok(char *str, char **next)
{
char *current = str;
char *start = str;
int isQuoted = 0;
// Eat beginning whitespace.
while (*current && isspace(*current)) current++;
start = current;
if (*current == '"')
{
isQuoted = 1;
// Quoted token
current++; // Skip the beginning quote.
start = current;
for (;;)
{
// Go till we find a quote or the end of string.
while (*current && (*current != '"')) current++;
if (!*current)
{
// Reached the end of the string.
goto finalize;
}
if (*(current - 1) == '\\')
{
// Escaped quote keep going.
current++;
continue;
}
// Reached the ending quote.
goto finalize;
}
}
// Not quoted so run till we see a space.
while (*current && !isspace(*current)) current++;
finalize:
if (*current)
{
// Close token if not closed already.
*current = 0;
current++;
// Eat trailing whitespace.
while (*current && isspace(*current)) current++;
}
*next = current;
return isQuoted ? unescapeToken(start) : start;
}
int main()
{
char text[] = " \"some text in quotes\" plus four simple words p\"lus something strange\" \"Then some quoted \\\"words\\\", and backslashes: \\ \\ \" Escapes only work insi\\\"de q\\\"uoted strings\\\" ";
char *pText = text;
printf("Original: '%s'\n", text);
while (*pText)
{
printf("'%s'\n", qtok(pText, &pText));
}
}
Outputs:
Original: ' "some text in quotes" plus four simple words p"lus something strange" "Then some quoted \"words\", and backslashes: \ \ " Escapes only work insi\"de q\"uoted strings\" '
'some text in quotes'
'plus'
'four'
'simple'
'words'
'p"lus'
'something'
'strange"'
'Then some quoted "words", and backslashes: \ \ '
'Escapes'
'only'
'work'
'insi\"de'
'q\"uoted'
'strings\"'

I think the answer to your question is actually fairly simple, but I'm taking on an assumption where it seems the other responses have taken a different one. I'm assuming that you want any quoted block of text to be separated out on its own regardless of spacing with the rest of the text being separated by spaces.
So given the example:
"some text in quotes" plus four simple words p"lus something strange"
The output would be:
[0] some text in quotes
[1] plus
[2] four
[3] simple
[4] words
[5] p
[6] lus something strange
Given that this is the case, only a simple bit of code is required, and no complex machines. You would first check if there is a leading quote for the first character and if so tick a flag and remove the character. As well as removing any quotes at the end of the string. Then tokenize the string based on quotation marks. Then tokenize every other of the strings obtained previously by spaces. Tokenize starting with the first string obtained if there was no leading quote, or the second string obtained if there was a leading quote. Then each of the remaining strings from the first part will be added to an array of strings interspersed with the strings from the second part added in place of the strings they were tokenized from. In this way you can get the result listed above. In code this would look like:
#include<string.h>
#include<stdlib.h>
char ** parser(char * input, char delim, char delim2){
char ** output;
char ** quotes;
char * line = input;
int flag = 0;
if(strlen(input) > 0 && input[0] == delim){
flag = 1;
line = input + 1;
}
int i = 0;
char * pch = strchr(line, delim);
while(pch != NULL){
i++;
pch = strchr(pch+1, delim);
}
quotes = (char **) malloc(sizeof(char *)*i+1);
char * token = strtok(input, delim);
int n = 0;
while(token != NULL){
quotes[n] = strdup(token);
token = strtok(NULL, delim);
n++;
}
if(delim2 != NULL){
int j = 0, k = 0, l = 0;
for(n = 0; n < i+1; n++){
if(flag & n % 2 == 1 || !flag & n % 2 == 0){
char ** new = parser(delim2, NULL);
l = sizeof(new)/sizeof(char *);
for(k = 0; k < l; k++){
output[j] = new[k];
j++;
}
for(k = l; k > -1; k--){
free(new[n]);
}
free(new);
} else {
output[j] = quotes[n];
j++;
}
}
for(n = i; n > -1; n--){
free(quotes[n]);
}
free(quotes);
} else {
return quotes;
}
return output;
}
int main(){
char * input;
char ** result = parser(input, '\"', ' ');
return 0;
}
(May not be perfect, I haven't tested it)

Trim a string in C [duplicate]

This question already has answers here:
How do I trim leading/trailing whitespace in a standard way?
(40 answers)
Closed 5 years ago.
Briefly:
I'm after the equivalent of .NET's String.Trim in C using the win32 and standard C api (compiling with MSVC2008 so I have access to all the C++ stuff if needed, but I am just trying to trim a char*).
Given that there is strchr, strtok, and all manner of other string functions, surely there should be a trim function, or one that can be repurposed...
Thanks

There is no standard library function to do this, but it's not too hard to roll your own. There is an existing question on SO about doing this that was answered with source code.

This made me want to write my own - I didn't like the ones that had been provided. Seems to me there should be 3 functions.
char *ltrim(char *s)
{
while(isspace(*s)) s++;
return s;
}
char *rtrim(char *s)
{
char* back = s + strlen(s);
while(isspace(*--back));
*(back+1) = '\0';
return s;
}
char *trim(char *s)
{
return rtrim(ltrim(s));
}

You can use the standard isspace() function in ctype.h to achieve this. Simply compare the beginning and end characters of your character array until both ends no longer have spaces.
"spaces" include:
' ' (0x20) space (SPC)
'\t' (0x09) horizontal tab (TAB)
'\n' (0x0a) newline (LF)
'\v' (0x0b) vertical tab (VT)
'\f' (0x0c) feed (FF)
'\r' (0x0d) carriage return (CR)
although there is no function which will do all of the work for you, you will have to roll your own solution to compare each side of the given character array repeatedly until no spaces remain.
Edit:
Since you have access to C++, Boost has a trim implementation waiting for you to make your life a lot easier.

Surprised to see such implementations. I usually do trim like this:
char *trim(char *s) {
char *ptr;
if (!s)
return NULL; // handle NULL string
if (!*s)
return s; // handle empty string
for (ptr = s + strlen(s) - 1; (ptr >= s) && isspace(*ptr); --ptr);
ptr[1] = '\0';
return s;
}
It is fast and reliable - serves me many years.

/* Function to remove white spaces on both sides of a string i.e trim */
void trim (char *s)
{
int i;
while (isspace (*s)) s++; // skip left side white spaces
for (i = strlen (s) - 1; (isspace (s[i])); i--) ; // skip right side white spaces
s[i + 1] = '\0';
printf ("%s\n", s);
}

#include "stdafx.h"
#include <string.h>
#include <ctype.h>
char* trim(char* input);
int _tmain(int argc, _TCHAR* argv[])
{
char sz1[]=" MQRFH ";
char sz2[]=" MQRFH";
char sz3[]=" MQR FH";
char sz4[]="MQRFH ";
char sz5[]="MQRFH";
char sz6[]="M";
char sz7[]="M ";
char sz8[]=" M";
char sz9[]="";
char sz10[]=" ";
printf("sz1:[%s] %d\n",trim(sz1), strlen(sz1));
printf("sz2:[%s] %d\n",trim(sz2), strlen(sz2));
printf("sz3:[%s] %d\n",trim(sz3), strlen(sz3));
printf("sz4:[%s] %d\n",trim(sz4), strlen(sz4));
printf("sz5:[%s] %d\n",trim(sz5), strlen(sz5));
printf("sz6:[%s] %d\n",trim(sz6), strlen(sz6));
printf("sz7:[%s] %d\n",trim(sz7), strlen(sz7));
printf("sz8:[%s] %d\n",trim(sz8), strlen(sz8));
printf("sz9:[%s] %d\n",trim(sz9), strlen(sz9));
printf("sz10:[%s] %d\n",trim(sz10), strlen(sz10));
return 0;
}
char *ltrim(char *s)
{
while(isspace(*s)) s++;
return s;
}
char *rtrim(char *s)
{
char* back;
int len = strlen(s);
if(len == 0)
return(s);
back = s + len;
while(isspace(*--back));
*(back+1) = '\0';
return s;
}
char *trim(char *s)
{
return rtrim(ltrim(s));
}
Output:
sz1:[MQRFH] 9
sz2:[MQRFH] 6
sz3:[MQR FH] 8
sz4:[MQRFH] 7
sz5:[MQRFH] 5
sz6:[M] 1
sz7:[M] 2
sz8:[M] 2
sz9:[] 0
sz10:[] 8

I like it when the return value always equals the argument. This way, if the string array has been allocated with malloc(), it can safely be free() again.
/* Remove leading whitespaces */
char *ltrim(char *const s)
{
size_t len;
char *cur;
if(s && *s) {
len = strlen(s);
cur = s;
while(*cur && isspace(*cur))
++cur, --len;
if(s != cur)
memmove(s, cur, len + 1);
}
return s;
}
/* Remove trailing whitespaces */
char *rtrim(char *const s)
{
size_t len;
char *cur;
if(s && *s) {
len = strlen(s);
cur = s + len - 1;
while(cur != s && isspace(*cur))
--cur, --len;
cur[isspace(*cur) ? 0 : 1] = '\0';
}
return s;
}
/* Remove leading and trailing whitespaces */
char *trim(char *const s)
{
rtrim(s); // order matters
ltrim(s);
return s;
}

void ltrim(char str[PATH_MAX])
{
int i = 0, j = 0;
char buf[PATH_MAX];
strcpy(buf, str);
for(;str[i] == ' ';i++);
for(;str[i] != '\0';i++,j++)
buf[j] = str[i];
buf[j] = '\0';
strcpy(str, buf);
}

static inline void ut_trim(char * str) {
char * start = str;
char * end = start + strlen(str);
while (--end >= start) { /* trim right */
if (!isspace(*end))
break;
}
*(++end) = '\0';
while (isspace(*start)) /* trim left */
start++;
if (start != str) /* there is a string */
memmove(str, start, end - start + 1);
}

How about this... It only requires one iteration over the string (doesn't use strlen, which iterates over the string). When the function returns you get a pointer to the start of the trimmed string which is null terminated. The string is trimmed of spaces from the left (until the first character is found). The string is also trimmed of all trailing spaces after the last nonspace character.
char* trim(char* input) {
char* start = input;
while (isSpace(*start)) { //trim left
start++;
}
char* ptr = start;
char* end = start;
while (*ptr++ != '\0') { //trim right
if (!isSpace(*ptr)) { //only move end pointer if char isn't a space
end = ptr;
}
}
*end = '\0'; //terminate the trimmed string with a null
return start;
}
bool isSpace(char c) {
switch (c) {
case ' ':
case '\n':
case '\t':
case '\f':
case '\r':
return true;
break;
default:
return false;
break;
}
}

/* iMode 0:ALL, 1:Left, 2:Right*/
char* Trim(char* szStr,const char ch, int iMode)
{
if (szStr == NULL)
return NULL;
char szTmp[1024*10] = { 0x00 };
strcpy(szTmp, szStr);
int iLen = strlen(szTmp);
char* pStart = szTmp;
char* pEnd = szTmp+iLen;
int i;
for(i = 0;i < iLen;i++){
if (szTmp[i] == ch && pStart == szTmp+i && iMode != 2)
++pStart;
if (szTmp[iLen-i-1] == ch && pEnd == szTmp+iLen-i && iMode != 1)
*(--pEnd) = '\0';
}
strcpy(szStr, pStart);
return szStr;
}

Here's my implementation, behaving like the built-in string functions in libc (that is, it expects a c-string, it modifies it and returns it to the caller).
It trims leading spaces & shifts the remaining chars to the left, as it parses the string from left to right. It then marks a new end of string and starts parsing it backwards, replacing trailing spaces with '\0's until it finds either a non-space char or the start of the string. I believe those are the minimum possible iterations for this particular task.
// ----------------------------------------------------------------------------
// trim leading & trailing spaces from string s (return modified string s)
// alg:
// - skip leading spaces, via cp1
// - shift remaining *cp1's to the left, via cp2
// - mark a new end of string
// - replace trailing spaces with '\0', via cp2
// - return the trimmed s
//
char *s_trim(char *s)
{
char *cp1; // for parsing the whole s
char *cp2; // for shifting & padding
// skip leading spaces, shift remaining chars
for (cp1=s; isspace(*cp1); cp1++ ) // skip leading spaces, via cp1
;
for (cp2=s; *cp1; cp1++, cp2++) // shift left remaining chars, via cp2
*cp2 = *cp1;
*cp2-- = 0; // mark new end of string for s
// replace trailing spaces with '\0'
while ( cp2 > s && isspace(*cp2) )
*cp2-- = 0; // pad with '\0's
return s;
}

Not the best way but it works
char* Trim(char* str)
{
int len = strlen(str);
char* buff = new char[len];
int i = 0;
memset(buff,0,len*sizeof(char));
do{
if(isspace(*str)) continue;
buff[i] = *str; ++i;
} while(*(++str) != '\0');
return buff;
}

void inPlaceStrTrim(char* str) {
int k = 0;
int i = 0;
for (i=0; str[i] != '\0';) {
if (isspace(str[i])) {
// we have got a space...
k = i;
for (int j=i; j<strlen(str)-1; j++) {
str[j] = str[j+1];
}
str[strlen(str)-1] = '\0';
i = k; // start the loop again where we ended..
} else {
i++;
}
}
}

Easiest thing to do is a simple loop. I'm going to assume that you want the trimmed string returned in place.
char *
strTrim(char * s){
int ix, jx;
int len ;
char * buf
len = strlen(s); /* possibly should use strnlen */
buf = (char *) malloc(strlen(s)+1);
for(ix=0, jx=0; ix < len; ix++){
if(!isspace(s[ix]))
buf[jx++] = s[ix];
buf[jx] = '\0';
strncpy(s, buf, jx); /* always looks as far as the null, but who cares? */
free(buf); /* no good leak goes unpunished */
return s; /* modifies s in place *and* returns it for swank */
}
This gets rid of embedded blanks too, if String.Trim doesn't then it needs a bit more logic.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Remove punctuation at beginning and end of a string - c

Iterate over the string, use isalpha() to check each character, write the characters which pass into a new string.

char rm_punct(char str) { char h = str; char t = str + strlen(str) - 1; while (ispunct(p)) p++; while (ispunct(t) && p < t) { t = 0; t--; } / also if you want to preserve the original address / { int i; for (i = 0; i <= t - p + 1; i++) { str[i] = p[i]; } p = str; } / --- */ return p; }

Iterate over the string, use isalpha() to check each character, after the first character that passes start writing into a new string. Iterate over the new string backwards, replace all punctuation with \0 until you find a character which isn't punctuation.

Related

tempWord[0]='\0' Does not reset String somehow

K&R Exercise 1-19: Reverse Char Array

Tokenize String by using pointer

Parse string into array based on spaces or "double quotes strings"

Trim a string in C [duplicate]

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Remove punctuation at beginning and end of a string - c

Iterate over the string, use isalpha() to check each character, write the characters which pass into a new string.

char *rm_punct(char *str) { char *h = str; char *t = str + strlen(str) - 1; while (ispunct(*p)) p++; while (ispunct(*t) && p < t) { *t = 0; t--; } /* also if you want to preserve the original address */ { int i; for (i = 0; i <= t - p + 1; i++) { str[i] = p[i]; } p = str; } /* --- */ return p; }

Iterate over the string, use isalpha() to check each character, after the first character that passes start writing into a new string. Iterate over the new string backwards, replace all punctuation with \0 until you find a character which isn't punctuation.

Related

tempWord[0]='\0' Does not reset String somehow

K&R Exercise 1-19: Reverse Char Array

Tokenize String by using pointer

Parse string into array based on spaces or "double quotes strings"

Trim a string in C [duplicate]

Categories

Resources

char rm_punct(char str) { char h = str; char t = str + strlen(str) - 1; while (ispunct(p)) p++; while (ispunct(t) && p < t) { t = 0; t--; } / also if you want to preserve the original address / { int i; for (i = 0; i <= t - p + 1; i++) { str[i] = p[i]; } p = str; } / --- */ return p; }