Strip whitespace from a string in-place? - c

I saw this in a "list of interview questions". Got me wondering.
Not limited to whitespace necessarily, of course, easily generalized to "removing some specific character from a string, in-place".
My solution is:
void stripChar(char *str, char c = ' ') {
int x = 0;
for (int i=0;i<strlen(str);i++) {
str[i-x]=str[i];
if (str[i]==c) x++;
}
str[strlen(str)-x] = '\0';
}
I doubt there is one more efficient, but is there a solution that is more elegant?
edit: totally forgot I left strlen in there, it is most definitely not efficient

C doesn't have default arguments, and if you're programming in C++ you should use std::string and remove_if from <algorithm>.
You definitely can make this more efficient, by eliminating the calls to strlen, which are turning an O(N) algorithm into an O(N2) algorithm, and are totally unnecessary -- you're scanning the string anyway, so just look for the NUL yourself.
You can also make this more C-idiomatic, by using two pointers instead of array indexing. I'd do it like this:
void strip_char(char *str, char strip)
{
char *p, *q;
for (q = p = str; *p; p++)
if (*p != strip)
*q++ = *p;
*q = '\0';
}

First of all, i<strlen(str) is always an inefficient idiom for looping over a string. The correct loop condition is simply str[i], i.e. loop until str[i] is the null terminator.
With that said, here's the simplest/most concise algorithm I know:
for (size_t i=0, j=0; s[j]=s[i]; j+=!isspace(s[i++]));
Note: My solution is for the question as written in the subject (whitespace) as opposed to the body (particular character). You can easily adapt it if needed.

void prepend(char* s,char ch){
int len = strlen(s);
memmove(s, s + 1, len - 1);
s[len - 1] = '\x0';
}
void RemoveWhitespace(char* InStr, char ch){
int n(0);
if (InStr == NULL){
return;
}
else if ((*InStr) == '\x0'){
return;
}
else if ((*InStr) != ch){
RemoveWhitespace(InStr + 1,ch);
}
else{
while ((*InStr) == ch){
prepend(InStr,InStr[0]);
n++;
}
RemoveWhitespace(InStr + n,ch);
}
}

Related

How to use two pointer to define a string isPalindrome?

Input: s = "A man, a plan, a canal: Panama"
Output: true
Explanation: "amanaplanacanalpanama" is a palindrome.
bool isPalindrome(char * s){
if(strlen(s) == 0) return true;
int m = 0;
for(int i = 0; i < strlen(s); i++)
if(isalnum(s[i])) s[m++] = tolower(s[i]);
int i = 0;
while(i<m)
if(s[i++] != s[--m]) return false;
return true;
}
My code's running time is 173ms. My instructor suggested me to use two pointers to improve the performance and memory usage, but I have no idea where to start.
Just position the two pointers like this
char* first = someString;
char* end = someString + strlen(s) - 1;
Now for it to be a palindrome what first and end point to must be the same
e.g. char someString[] = "1331";
So you in the first iteration *first == *last i.e. '1'
Now move the pointers towards each other until there is nothing left to compare or when they differ
++first, --end;
now *first and *last point to '3'
and so on, check if they are pointing to the same or have passed each other it is a palindrome.
Something like this
#include <stdio.h>
#include <string.h>
int palindrome(char* str)
{
char* start = str;
char* end = str + strlen(str) - 1;
for (; start < end; ++start, --end )
{
if (*start != *end)
{
return 0;
}
}
return 1;
}
int main()
{
printf("palindrome: %d\n", palindrome("1331"));
printf("palindrome: %d\n", palindrome("132331"));
printf("palindrome: %d\n", palindrome("74547"));
return 0;
}
You should add error checks, there are no error checks in the function.
My code's running time is 173ms. My instructor suggested me to use two pointers to improve the performance and memory usage, but I have no idea where to start.
It's already running in O(n) so you cannot reduce the time complexity (except for the iterative call to strlen, see below), although there are some room for improving performance.
Your function does not declare any arrays, and only use a few variables and the memory usage does not depend at all on input size. The memory usage is already O(1) and very low, so it's not a real concern.
But if you want to do it with pointers, here is one:
bool isPalindrome(char * s){
char *end = s + strlen(s);
char *a = s;
char *b = end-1;
while(true) {
// Skip characters that's not alphanumeric
while( a != end && !isalnum(*a) ) a++;
while( b != s && !isalnum(*b) ) b--;
// We're done when we have passed the middle
if(b < a) break;
// Perform the check
if(tolower(*a) != tolower(*b)) return false;
// Step to next character
a++;
b--;
}
return true;
}
When it comes to performance, your code has two issues, none of which gets solved by pointers. First one is that you're calling strlen for each iteration. The second is that you don't need to loop through the whole array, because that's checking it twice.
for(int i = 0; i < strlen(s); i++)
should be
size_t len = strlen(s);
for(size_t i = 0; i < len/2; i++)
Another remark I have on your code is that it changes the input string. That's not necessary. If I have a function that is called isPalindrome I'd expect it to ONLY check if the string is a palindrome or not. IMO, the signature should be bool isPalindrome(const char * s)

Find number of string separators in a string

I am using the following to find the number of separators in a string:
char * string = "xi--len--xia"
char * split_on = "--";
size_t len_joined_string = strlen(string);
size_t len_split_on = strlen(split_on);
size_t num_separators_in_string = 0;
for (int i=0; i < len_joined_string; i++) {
if (joined_string_buffer[i] == split_on[0]) {
int has_match = 1;
for (int j=1; j < len_split_on; j++) {
if (joined_string_buffer[i+j] != split_on[j])
has_match = 0;
}
if (has_match)
num_separators_in_string ++;
}
}
Is there a way to do the above in a built-in C function, or is it required that I write the code above?
From another question, Counting number of occurrences of a char in a string in C, it looks a bit simpler to do this for a char:
for (i=0; s[i]; s[i]=='.' ? i++ : *s++);
But is there something similar like this when splitting on a string (char-array) instead of a single char?
Just getting it done
You could do something like this:
const char * split = "--";
int i;
for(i=0; s[i]; strncmp(&s[i], split, strlen(split)) ? *s++ : i++);
Note that I flipped *s++ and i++ because strncmp returns 0 on equal. Also, you might want to modify it depending on how you want to handle a string like "xi---len----xia".
Making it readable
If you ask me, the above snippet looks a bit clunky and hard to understand. If you asked me what it does, I would need quite some time to figure it out. It has the look of "look what I can do".
Put it in a function
I would put it in a function like this to hide this terrible for loop for someone who is reading your code.
size_t no_splits(const char *s, const char *split)
{
size_t i;
for(i=0; s[i]; strncmp(&s[i], split, strlen(split)) ? *s++ : i++)
; // Put semicolon here to suppress warnings
return i;
}
Make the logic readable
But then again, when you have inserted the code in a well named function, the need to shorten down the code this much is basically gone. So for readability, I would rewrite it as:
size_t no_splits(const char *s, const char *split)
{
size_t i=0;
// Not only more readable, but also better for performance
const size_t len=strlen(split);
while(s[i]) {
if(strncmp(&s[i], split, len))
// The only reason to use * in above code was to suppress a warning
s++;
else
i++;
}
return i;
}
Note that in the last piece of code, I removed two things whose only purpose was to suppress warnings. I'm not saying that it's always wrong to do things only to suppress warnings, but when you do, that's a sign that you should consider redesigning your code instead. Even though it can be used different, the usual way of using a for loop is for(<init>; <condition>; <inc/dec>) and it's often a bad thing diverging from this convention. Not only because of readability, but also because of that it makes it harder for the optimizer. The reason is that the optimizer recognizes common patterns and have rules to optimize them.
Change the logic to something more intuitive
Actually, I also think this alternating between incrementing s and i is very confusing. Here is a version that (to me) makes much more sense. Change the while loop to:
while(*s) {
if(strncmp(s, split, len) == 0)
i++;
s++;
}
And if you REALLY want to compress it, change to:
// *s++ is back, but this time with another purpose than suppressing warnings
while(*s++) // Or for(; *s; s++) which I personally think looks better
if(strncmp(s, split, len) == 0)
i++;
Abusing the syntax
Here is an example of how you really can abuse the syntax of a for loop. It's a matrix multiplication that I wrote with empty for body:
// Do NOT do like this!!!
void matrix_multiply(int *A, int *B, int *C, int N)
{
for( int i=0,j=0,k=0;
k++<N ? 1 : ++j<N ? k=1 : ++i<N ? k=1+(j=0) : 0;
C[i*N + j] = A[i*N + k -1] * B[(k-1)*N + j] + (k==1 ? 0 : C[i*N + j])
);
}
And here is an example of insertion sort:
// Do NOT do like this either!!!
void insertionSort(int *list, int length)
{
for(int i=0, j=0, max=0;
++j<length ? 1 : i<length-1 ? max=j=i+++1+ 0*
(0*((0*(j=list[i-1])) ? 0 : ((0*(list[i-1]=list[max]))
? 0 : list[max]=j))) : 0;
list[j]>list[max] ? max=j : 0
);
}
The above snippets are examples that is basically your for loop taken to the absolute extreme.
Summary
In general, I would say that you should have very strong reasons to write the function no_splits in another way than I did with more readable versions with a while loop. Performance is a valid reason, but first make sure that this code really is the bottleneck. And remember that short code does not imply fast code. If you really want to use the for loop instead, then at least put it in a function like I did, and give it a comment describing what it does. But this snippet is my final recommendation for your purpose:
// Count the number of occurrences of substr in str
size_t no_splits(const char *str, const char *substr)
{
size_t ret = 0;
const size_t len = strlen(substr);
for(; *str; str++)
if(strncmp(str, substr, len) == 0)
ret++;
return ret;
}
If you want to make it a bit quicker, especially for long substr, you can do like this:
size_t no_splits(const char *str, const char *substr)
{
size_t ret = 0;
const size_t len = strlen(substr);
for(size_t n = strlen(str) - len; n >= 0; n--)
if(strncmp(&str[n], substr, len) == 0)
ret++;
return ret;
}

Reversing a string without two loops?

I came up with the following basic item to reverse a string in C:
void reverse(char in[], char out[]) {
int string_length = 0;
for(int i=0; in[i] != '\0'; i++) {
string_length += 1;
}
for(int i=0; i < string_length ; i++) {
out[string_length-i] = in[i];
}
out[string_length+1] = '\0';
}
Is there a way to do this in one for loop or is it necessary to first use a for length to get the string length, and then do a second one to reverse it? Are there other approaches to doing a reverse, or is this the basic one?
Assuming you can't use functions to get the string length and you want to preserve the second loop I'm afraid this is the shortest way.
Just as a side-note though: this code is not very safe as at for(int i=0; in[i] != '\0'; i++) you are not considering cases where the argument passed to parameter in is not a valid C string where there isn't a single \0 in all elements of the array pointed by in and this code will end up manifesting a buffer over-read at the first for loop when it will read beyond in boundaries and a buffer overflow in the second for loop where you can write beyond the boundaries of out. In functions like this you should ask the caller for the length of both arrays in and out and use that as a max index when accessing them both.
As pointed by Rishikesh Raje in comments: you should also change the exit condition in the second for loop from i <= string_length to i < string_length as it will generate another buffer over-read when i == string_length as it will access out by a negative index.
void reverse(char *in, char *out) {
static int index;
index = 0;
if (in == NULL || in[0] == '\0')
{
out[0] = '\0';
return;
}
else
{
reverse(in + 1, out);
out[index + 1] = '\0';
out[index++] = in[0];
}
}
With no loops.
This code is surely not efficient and robust and also won't work for multithreaded programs. Also the OP just asked for an alternative method and the stress was on methods with lesser loops.
Are there other approaches to doing a reverse, or is this the basic one
Also, there was no real need of using static int. This would cause it not to work with multithreaded programs. To get it working correct in those cases:
int reverse(char *in, char *out) {
int index;
if (in == NULL || in[0] == '\0')
{
out[0] = '\0';
return 0;
}
else
{
index = reverse(in + 1, out);
out[index + 1] = '\0';
out[index++] = in[0];
return index;
}
}
You can always tweak two loops into one, more confusing version, by using some kind of condition to determine which phase in the algorithm you are in. Below code is untested, so most likely contains bugs, but you should get the idea...
void reverse(const char *in, char *out) {
if (*in == '\0') {
// handle special case
*out = *in;
return;
}
char *out_begin = out;
char *out_end;
do {
if (out == out_begin) {
// we are still looking for where to start copying from
if (*in != '\0') {
// end of input not reached, just go forward
++in;
++out_end;
continue;
}
// else reached end of input, put terminating NUL to out
*out_end = '\0';
}
// if below line seems confusing, write it out as 3 separate statements.
*(out++) = *(--in);
} while (out != out_end); // end loop when out reaches out_end (which has NUL already)
}
However, this is exactly as many loop iterations so it is not any faster, and it is much less clear code, so don't do this in real code...

Remove substring of string

So i was doing some coding exercice on Talentbuddy (for those who know), and i cant get why i cant finish this one.
The exercice is removing a substring from a string, given as input the string, the position P where beginning to remove characters and N the number of characters needed to be remove.
Here is what i've done :
#include <stdio.h>
#include <unistd.h>
void remove_substring(char *s, int p, int n)
{
int idx;
idx = -1;
while (s[++idx] != '\0')
write(1, &s[idx == p - 1 ? idx + n : idx], 1);
}
When the input is "abcdefghi", P = 9 and N = 1, the result given is "abcdefgh" exactly the same as the one i get with my function. But TalentBuddy keep saying me that my output is wrong and i dont thing he (talentbuddy) is wrong.
Maybe there is a blank space or something between the "h" and the '\0'.
But i cant figure it cause when i add another write(1, "END", 3) at the end it appears like "abcdefghEND".
If the question is exclusively for strings( NULL Terminated )
Why can't this be as simple as this, unless it is a homework.
void removesubstr( const char *string, const char *substring )
{
char *p = strstr(string, substring);
if(p)
{
strcpy(p,p+strlen(substring));
}
}
Your problem is that you write something for every original index, even if it should be suppressed. What you write looks like abcdefgh, but it is abcdefgh<nul>, where the terminal doesn't render the <nul>.
You are mixing two methods here. Either filter out the removed substring:
void remove_substring(char *s, int p, int n)
{
int i = 0;
p--; /* convert to C-style index */
while (s[i] != '\0') {
if (i < p || i >= p + n) putchar(s[i]);
i++;
}
}
or skip the substring by jumping over it:
void remove_substring(char *s, int p, int n)
{
int i = 0;
int l = strlen(s);
while (i < l) {
if (i + 1 == p) {
i += n;
} else {
putchar(s[i++]);
}
}
}
You're trying to do a bit of both.
(I've avoided the awkward combination of prefix increment and starting at minus 1. And I've used putchar instead of unistd's write. And the termination by length is so you don't inadvertently jump beyond the terminating <nul>.)

Effective way of checking if a given string is palindrome in C

I was preparing for my interview and started working from simple C programming questions. One question I came across was to check if a given string is palindrome. I wrote a a code to find if the user given string is palindrome using Pointers. I'd like to know if this is the effective way in terms of runtime or is there any enhancement I could do to it. Also It would be nice if anyone suggests how to remove other characters other than letters (like apostrophe comas) when using pointer.I've added my function below. It accepts a pointer to the string as parameter and returns integer.
int palindrome(char* string)
{
char *ptr1=string;
char *ptr2=string+strlen(string)-1;
while(ptr2>ptr1){
if(tolower(*ptr1)!=tolower(*ptr2)){
return(0);
}
ptr1++;ptr2--;
}
return(1);
}
"how to remove other characters other than letters?"
I think you don't want to actually remove it, just skip it and you could use isalpha to do so. Also note that condition ptr2 > ptr1 will work only for strings with even amount of characters such as abba, but for strings such as abcba, the condition should be ptr2 >= ptr1:
int palindrome(char* string)
{
size_t len = strlen(string);
// handle empty string and string of length 1:
if (len == 0) return 0;
if (len == 1) return 1;
char *ptr1 = string;
char *ptr2 = string + len - 1;
while(ptr2 >= ptr1) {
if (!isalpha(*ptr2)) {
ptr2--;
continue;
}
if (!isalpha(*ptr1)) {
ptr1++;
continue;
}
if( tolower(*ptr1) != tolower(*ptr2)) {
return 0;
}
ptr1++; ptr2--;
}
return 1;
}
you might need to #include <ctype.h>
How about doing like this if you want to do it using pointers only:
int main()
{
char str[100];
char *p,*t;
printf("Your string : ");
gets(str);
for(p=str ; *p!=NULL ; p++);
for(t=str, p-- ; p>=t; )
{
if(*p==*t)
{
p--;
t++;
}
else
break;
}
if(t>p)
printf("\nPalindrome");
else
printf("\nNot a palindrome");
getch();
return 0;
}
int main()
{
const char *p = "MALAYALAM";
int count = 0;
int len = strlen(p);
for(int i = 0; i < len; i++ )
{
if(p[i] == p[len - i - 1])
count++;
}
cout << "Count: " << count;
if(count == len)
cout << "Palindrome";
else
cout << "Not Palindrome";
return 0;
}
I have actually experimented quite a lot with this kind of problem.
There are two optimisations that can be done:
Check for odd string length, odd stings can't be palindromes
Start using vectorised compares, but this only really gives you performance if you expect a lot of palindromes. If the majority of your strings aren't palindromes you are still best off with byte by byte comparisons. In fact my vectorised palindrome checker ran 5% slower then the non-vectorised just because palindromes were so rare in the input. The extra branch that decided vectorised vs non vectorised made this big difference.
Here is code draft how you can do it vectorised:
int palindrome(char* string)
{
size_t length = strlen(string);
if (length >= sizeof(uintptr_t)) { // if the string fits into a vector
uintptr_t * ptr1 = (uintptr_t*)string;
size_t length_v /= sizeof(uintptr_t);
uintptr_t * ptr2 = (uintptr_t*)(string + (length - (length_v * sizeof(uintptr_t)))) + length_v - 1;
while(ptr2>ptr1){
if(*ptr1 != bswap(*ptr2)){ // byte swap for your word length, x86 has an instruction for it, needs to be defined separately
return(0);
}
ptr1++;ptr2--;
}
} else {
// standard byte by byte comparison
}
return(1);
}

Resources