Find number of string separators in a string - c

I am using the following to find the number of separators in a string:
char * string = "xi--len--xia"
char * split_on = "--";
size_t len_joined_string = strlen(string);
size_t len_split_on = strlen(split_on);
size_t num_separators_in_string = 0;
for (int i=0; i < len_joined_string; i++) {
if (joined_string_buffer[i] == split_on[0]) {
int has_match = 1;
for (int j=1; j < len_split_on; j++) {
if (joined_string_buffer[i+j] != split_on[j])
has_match = 0;
}
if (has_match)
num_separators_in_string ++;
}
}
Is there a way to do the above in a built-in C function, or is it required that I write the code above?
From another question, Counting number of occurrences of a char in a string in C, it looks a bit simpler to do this for a char:
for (i=0; s[i]; s[i]=='.' ? i++ : *s++);
But is there something similar like this when splitting on a string (char-array) instead of a single char?

Just getting it done
You could do something like this:
const char * split = "--";
int i;
for(i=0; s[i]; strncmp(&s[i], split, strlen(split)) ? *s++ : i++);
Note that I flipped *s++ and i++ because strncmp returns 0 on equal. Also, you might want to modify it depending on how you want to handle a string like "xi---len----xia".
Making it readable
If you ask me, the above snippet looks a bit clunky and hard to understand. If you asked me what it does, I would need quite some time to figure it out. It has the look of "look what I can do".
Put it in a function
I would put it in a function like this to hide this terrible for loop for someone who is reading your code.
size_t no_splits(const char *s, const char *split)
{
size_t i;
for(i=0; s[i]; strncmp(&s[i], split, strlen(split)) ? *s++ : i++)
; // Put semicolon here to suppress warnings
return i;
}
Make the logic readable
But then again, when you have inserted the code in a well named function, the need to shorten down the code this much is basically gone. So for readability, I would rewrite it as:
size_t no_splits(const char *s, const char *split)
{
size_t i=0;
// Not only more readable, but also better for performance
const size_t len=strlen(split);
while(s[i]) {
if(strncmp(&s[i], split, len))
// The only reason to use * in above code was to suppress a warning
s++;
else
i++;
}
return i;
}
Note that in the last piece of code, I removed two things whose only purpose was to suppress warnings. I'm not saying that it's always wrong to do things only to suppress warnings, but when you do, that's a sign that you should consider redesigning your code instead. Even though it can be used different, the usual way of using a for loop is for(<init>; <condition>; <inc/dec>) and it's often a bad thing diverging from this convention. Not only because of readability, but also because of that it makes it harder for the optimizer. The reason is that the optimizer recognizes common patterns and have rules to optimize them.
Change the logic to something more intuitive
Actually, I also think this alternating between incrementing s and i is very confusing. Here is a version that (to me) makes much more sense. Change the while loop to:
while(*s) {
if(strncmp(s, split, len) == 0)
i++;
s++;
}
And if you REALLY want to compress it, change to:
// *s++ is back, but this time with another purpose than suppressing warnings
while(*s++) // Or for(; *s; s++) which I personally think looks better
if(strncmp(s, split, len) == 0)
i++;
Abusing the syntax
Here is an example of how you really can abuse the syntax of a for loop. It's a matrix multiplication that I wrote with empty for body:
// Do NOT do like this!!!
void matrix_multiply(int *A, int *B, int *C, int N)
{
for( int i=0,j=0,k=0;
k++<N ? 1 : ++j<N ? k=1 : ++i<N ? k=1+(j=0) : 0;
C[i*N + j] = A[i*N + k -1] * B[(k-1)*N + j] + (k==1 ? 0 : C[i*N + j])
);
}
And here is an example of insertion sort:
// Do NOT do like this either!!!
void insertionSort(int *list, int length)
{
for(int i=0, j=0, max=0;
++j<length ? 1 : i<length-1 ? max=j=i+++1+ 0*
(0*((0*(j=list[i-1])) ? 0 : ((0*(list[i-1]=list[max]))
? 0 : list[max]=j))) : 0;
list[j]>list[max] ? max=j : 0
);
}
The above snippets are examples that is basically your for loop taken to the absolute extreme.
Summary
In general, I would say that you should have very strong reasons to write the function no_splits in another way than I did with more readable versions with a while loop. Performance is a valid reason, but first make sure that this code really is the bottleneck. And remember that short code does not imply fast code. If you really want to use the for loop instead, then at least put it in a function like I did, and give it a comment describing what it does. But this snippet is my final recommendation for your purpose:
// Count the number of occurrences of substr in str
size_t no_splits(const char *str, const char *substr)
{
size_t ret = 0;
const size_t len = strlen(substr);
for(; *str; str++)
if(strncmp(str, substr, len) == 0)
ret++;
return ret;
}
If you want to make it a bit quicker, especially for long substr, you can do like this:
size_t no_splits(const char *str, const char *substr)
{
size_t ret = 0;
const size_t len = strlen(substr);
for(size_t n = strlen(str) - len; n >= 0; n--)
if(strncmp(&str[n], substr, len) == 0)
ret++;
return ret;
}

Related

How to use two pointer to define a string isPalindrome?

Input: s = "A man, a plan, a canal: Panama"
Output: true
Explanation: "amanaplanacanalpanama" is a palindrome.
bool isPalindrome(char * s){
if(strlen(s) == 0) return true;
int m = 0;
for(int i = 0; i < strlen(s); i++)
if(isalnum(s[i])) s[m++] = tolower(s[i]);
int i = 0;
while(i<m)
if(s[i++] != s[--m]) return false;
return true;
}
My code's running time is 173ms. My instructor suggested me to use two pointers to improve the performance and memory usage, but I have no idea where to start.
Just position the two pointers like this
char* first = someString;
char* end = someString + strlen(s) - 1;
Now for it to be a palindrome what first and end point to must be the same
e.g. char someString[] = "1331";
So you in the first iteration *first == *last i.e. '1'
Now move the pointers towards each other until there is nothing left to compare or when they differ
++first, --end;
now *first and *last point to '3'
and so on, check if they are pointing to the same or have passed each other it is a palindrome.
Something like this
#include <stdio.h>
#include <string.h>
int palindrome(char* str)
{
char* start = str;
char* end = str + strlen(str) - 1;
for (; start < end; ++start, --end )
{
if (*start != *end)
{
return 0;
}
}
return 1;
}
int main()
{
printf("palindrome: %d\n", palindrome("1331"));
printf("palindrome: %d\n", palindrome("132331"));
printf("palindrome: %d\n", palindrome("74547"));
return 0;
}
You should add error checks, there are no error checks in the function.
My code's running time is 173ms. My instructor suggested me to use two pointers to improve the performance and memory usage, but I have no idea where to start.
It's already running in O(n) so you cannot reduce the time complexity (except for the iterative call to strlen, see below), although there are some room for improving performance.
Your function does not declare any arrays, and only use a few variables and the memory usage does not depend at all on input size. The memory usage is already O(1) and very low, so it's not a real concern.
But if you want to do it with pointers, here is one:
bool isPalindrome(char * s){
char *end = s + strlen(s);
char *a = s;
char *b = end-1;
while(true) {
// Skip characters that's not alphanumeric
while( a != end && !isalnum(*a) ) a++;
while( b != s && !isalnum(*b) ) b--;
// We're done when we have passed the middle
if(b < a) break;
// Perform the check
if(tolower(*a) != tolower(*b)) return false;
// Step to next character
a++;
b--;
}
return true;
}
When it comes to performance, your code has two issues, none of which gets solved by pointers. First one is that you're calling strlen for each iteration. The second is that you don't need to loop through the whole array, because that's checking it twice.
for(int i = 0; i < strlen(s); i++)
should be
size_t len = strlen(s);
for(size_t i = 0; i < len/2; i++)
Another remark I have on your code is that it changes the input string. That's not necessary. If I have a function that is called isPalindrome I'd expect it to ONLY check if the string is a palindrome or not. IMO, the signature should be bool isPalindrome(const char * s)

Reversing a string without two loops?

I came up with the following basic item to reverse a string in C:
void reverse(char in[], char out[]) {
int string_length = 0;
for(int i=0; in[i] != '\0'; i++) {
string_length += 1;
}
for(int i=0; i < string_length ; i++) {
out[string_length-i] = in[i];
}
out[string_length+1] = '\0';
}
Is there a way to do this in one for loop or is it necessary to first use a for length to get the string length, and then do a second one to reverse it? Are there other approaches to doing a reverse, or is this the basic one?
Assuming you can't use functions to get the string length and you want to preserve the second loop I'm afraid this is the shortest way.
Just as a side-note though: this code is not very safe as at for(int i=0; in[i] != '\0'; i++) you are not considering cases where the argument passed to parameter in is not a valid C string where there isn't a single \0 in all elements of the array pointed by in and this code will end up manifesting a buffer over-read at the first for loop when it will read beyond in boundaries and a buffer overflow in the second for loop where you can write beyond the boundaries of out. In functions like this you should ask the caller for the length of both arrays in and out and use that as a max index when accessing them both.
As pointed by Rishikesh Raje in comments: you should also change the exit condition in the second for loop from i <= string_length to i < string_length as it will generate another buffer over-read when i == string_length as it will access out by a negative index.
void reverse(char *in, char *out) {
static int index;
index = 0;
if (in == NULL || in[0] == '\0')
{
out[0] = '\0';
return;
}
else
{
reverse(in + 1, out);
out[index + 1] = '\0';
out[index++] = in[0];
}
}
With no loops.
This code is surely not efficient and robust and also won't work for multithreaded programs. Also the OP just asked for an alternative method and the stress was on methods with lesser loops.
Are there other approaches to doing a reverse, or is this the basic one
Also, there was no real need of using static int. This would cause it not to work with multithreaded programs. To get it working correct in those cases:
int reverse(char *in, char *out) {
int index;
if (in == NULL || in[0] == '\0')
{
out[0] = '\0';
return 0;
}
else
{
index = reverse(in + 1, out);
out[index + 1] = '\0';
out[index++] = in[0];
return index;
}
}
You can always tweak two loops into one, more confusing version, by using some kind of condition to determine which phase in the algorithm you are in. Below code is untested, so most likely contains bugs, but you should get the idea...
void reverse(const char *in, char *out) {
if (*in == '\0') {
// handle special case
*out = *in;
return;
}
char *out_begin = out;
char *out_end;
do {
if (out == out_begin) {
// we are still looking for where to start copying from
if (*in != '\0') {
// end of input not reached, just go forward
++in;
++out_end;
continue;
}
// else reached end of input, put terminating NUL to out
*out_end = '\0';
}
// if below line seems confusing, write it out as 3 separate statements.
*(out++) = *(--in);
} while (out != out_end); // end loop when out reaches out_end (which has NUL already)
}
However, this is exactly as many loop iterations so it is not any faster, and it is much less clear code, so don't do this in real code...

How to "delete" every element in array by value in C

I have been trying to solve this problem for about 5 days..can't find any solution please send help. I am supposed to implement a function to "delete" every element in an array by value. Let's say my array is "Hello" and I want to delete every "l". So far I can only delete l once. By the way keep in mind I am not allowed to use pointers for this function...(we haven't learned that yet in my school) Here's my code:
#include <stdio.h>
#include <string.h>
void strdel(char array[], char c);
int main(void)
{
char source[40];
printf("\nStrdel test: ");
strcpy(source, "Hello");
printf("\nsource = %s", source);
strdel(source, 'l');
printf("\nStrdel: new source = %s", source);
return 0;
}
void strdel(char array[], char c)
{
int string_lenght;
int i;
for (string_lenght = 0; array[string_lenght] != '\0'; string_lenght++) {}
for (i = 0; i < string_lenght; i++) {
if (array[i] == c) {
for (i = i; array[i] != '\0'; ++i)
array[i] = array[i + 1];
}
}
}
Simple use 2 indexes, one for reading and one for writing. #Carl Norum
void strdel(char array[], char c) {
int read_index = 0;
int write_index = 0;
while (array[read_index] != '\0') {
if (array[read_index] != c) {
array[write_index] = array[read_index];
write_index++; // Only advance write_index when a character is copied
}
read_index++; // Always advance read_index
}
array[write_index] = '\0';
}
The has O(n) performance, much faster than using nested for() loops which is O(n*n).
Details:
OP: By the way keep in mind I am not allowed to use pointers for this function.
Note that array in void strdel(char array[], char c) is a pointer, even though it might look like an array.
int for array indexing is OK for learner and much code, yet better to use size_t. int may lack the range needed. Type size_t is an unsigned type that is neither too narrow nor too wide for array indexing needs. This becomes important for very long strings.
Your problem is related to using the variable i in both loops. So once the inner loop is executed, outer loop will terminate right after.
Use another variable for the inner loop.
void strdel(char array[], char c)
{
int string_lenght;
int i, j;
for (string_lenght = 0; array[string_lenght] != '\0'; string_lenght++) {}
for (i = 0; i < string_lenght; i++) {
if (array[i] == c) {
for (j = i; array[j] != '\0'; ++j) // Use variable j instead of i
array[j] = array[j + 1];
--i; // Decrement i to "stay" at the same index
--string_lenght; // As one character were just removed
}
}
}
The above shows how to make OPs approach work. For a better solution see the answer from #chux : https://stackoverflow.com/a/53487767/4386427

remove a specified number of characters from a string in C

I can't write a workable code for a function that deletes N characters from the string S, starting from position P. How you guys would you write such a function?
void remove_substring(char *s, int p, int n) {
int i;
if(n == 0) {
printf("%s", s);
}
for (i = 0; i < p - 1; i++) {
printf("%c", s[i]);
}
for (i = strlen(s) - n; i < strlen(s); i++) {
printf("%c", s[i]);
}
}
Example:
s: "abcdefghi"
p: 4
n: 3
output:
abcghi
But for a case like n = 0 and p = 1 it's not working!
Thanks a lot!
A few people have shown you how to do this, but most of their solutions are highly condensed, use standard library functions or simply don't explain what's going on. Here's a version that includes not only some very basic error checking but some explanation of what's happening:
void remove_substr(char *s, size_t p, size_t n)
{
// p is 1-indexed for some reason... adjust it.
p--;
// ensure that we're not being asked to access
// memory past the current end of the string.
// Note that if p is already past the end of
// string then p + n will, necessarily, also be
// past the end of the string so this one check
// is sufficient.
if(p + n >= strlen(s))
return;
// Offset n to account for the data we will be
// skipping.
n += p;
// We copy one character at a time until we
// find the end-of-string character
while(s[n] != 0)
s[p++] = s[n++];
// And make sure our string is properly terminated.
s[p] = 0;
}
One caveat to watch out for: please don't call this function like this:
remove_substr("abcdefghi", 4, 3);
Or like this:
char *s = "abcdefghi";
remove_substr(s, 4, 3);
Doing so will result in undefined behavior, as string literals are read-only and modifying them is not allowed by the standard.
Strictly speaking, you didn't implement a removal of a substring: your code prints the original string with a range of characters removed.
Another thing to note is that according to your example, the index p is one-based, not zero-based like it is in C. Otherwise the output for "abcdefghi", 4, 3 would have been "abcdhi", not "abcghi".
With this in mind, let's make some changes. First, your math is a little off: the last loop should look like this:
for (i = p+n-1; i < strlen(s); i++) {
printf("%c", s[i]);
}
Demo on ideone.
If you would like to use C's zero-based indexing scheme, change your loops as follows:
for (i = 0; i < p; i++) {
printf("%c", s[i]);
}
for (i = p+n; i < strlen(s); i++) {
printf("%c", s[i]);
}
In addition, you should return from the if at the top, or add an else:
if(n == 0) {
printf("%s", s);
return;
}
or
if(n == 0) {
printf("%s", s);
} else {
// The rest of your code here
...
}
or remove the if altogether: it's only an optimization, your code is going to work fine without it, too.
Currently, you code would print the original string twice when n is 0.
If you would like to make your code remove the substring and return a result, you need to allocate the result, and replace printing with copying, like this:
char *remove_substring(char *s, int p, int n) {
// You need to do some checking before calling malloc
if (n == 0) return s;
size_t len = strlen(s);
if (n < 0 || p < 0 || p+n > len) return NULL;
size_t rlen = len-n+1;
char *res = malloc(rlen);
if (res == NULL) return NULL;
char *pt = res;
// Now let's use the two familiar loops,
// except printf("%c"...) will be replaced with *p++ = ...
for (int i = 0; i < p; i++) {
*pt++ = s[i];
}
for (int i = p+n; i < strlen(s); i++) {
*pt++ = s[i];
}
*pt='\0';
return res;
}
Note that this new version of your code returns dynamically allocated memory, which needs to be freed after use.
Here is a demo of this modified version on ideone.
Try copying the first part of the string, then the second
char result[10];
const char input[] = "abcdefg";
int n = 3;
int p = 4;
strncpy(result, input, p);
strncpy(result+p, input+p+n, length(input)-p-n);
printf("%s", result);
If you are looking to do this without the use of functions like strcpy or strncpy (which I see you said in a comment) then use a similar approach to how strcpy (or at least one possible variant) works under the hood:
void strnewcpy(char *dest, char *origin, int n, int p) {
while(p-- && *dest++ = *origin++)
;
origin += n;
while(*dest++ = *origin++)
;
}
metacode:
allocate a buffer for the destination
decalre a pointer s to your source string
advance the pointer "p-1" positions in your source string and copy them on the fly to destination
advance "n" positions
copy rest to destination
What did you try? Doesn't strcpy(s+p, s+p+n) work?
Edit: Fixed to not rely on undefined behaviour in strcpy:
void remove_substring(char *s, int p, int n)
{
p--; // 1 indexed - why?
memmove(s+p, s+p+n, strlen(s) - n);
}
If your heart's really set on it, you can also replace the memmove call with a loop:
char *dst = s + p;
char *src = s + p + n;
for (int i = 0; i < strlen(s) - n; i++)
*dst++ = *src++;
And if you do that, you can strip out the strlen call, too:
while ((*dst++ = *src++) != '\0);
But I'm not sure I recommend compressing it that much.

Strip whitespace from a string in-place?

I saw this in a "list of interview questions". Got me wondering.
Not limited to whitespace necessarily, of course, easily generalized to "removing some specific character from a string, in-place".
My solution is:
void stripChar(char *str, char c = ' ') {
int x = 0;
for (int i=0;i<strlen(str);i++) {
str[i-x]=str[i];
if (str[i]==c) x++;
}
str[strlen(str)-x] = '\0';
}
I doubt there is one more efficient, but is there a solution that is more elegant?
edit: totally forgot I left strlen in there, it is most definitely not efficient
C doesn't have default arguments, and if you're programming in C++ you should use std::string and remove_if from <algorithm>.
You definitely can make this more efficient, by eliminating the calls to strlen, which are turning an O(N) algorithm into an O(N2) algorithm, and are totally unnecessary -- you're scanning the string anyway, so just look for the NUL yourself.
You can also make this more C-idiomatic, by using two pointers instead of array indexing. I'd do it like this:
void strip_char(char *str, char strip)
{
char *p, *q;
for (q = p = str; *p; p++)
if (*p != strip)
*q++ = *p;
*q = '\0';
}
First of all, i<strlen(str) is always an inefficient idiom for looping over a string. The correct loop condition is simply str[i], i.e. loop until str[i] is the null terminator.
With that said, here's the simplest/most concise algorithm I know:
for (size_t i=0, j=0; s[j]=s[i]; j+=!isspace(s[i++]));
Note: My solution is for the question as written in the subject (whitespace) as opposed to the body (particular character). You can easily adapt it if needed.
void prepend(char* s,char ch){
int len = strlen(s);
memmove(s, s + 1, len - 1);
s[len - 1] = '\x0';
}
void RemoveWhitespace(char* InStr, char ch){
int n(0);
if (InStr == NULL){
return;
}
else if ((*InStr) == '\x0'){
return;
}
else if ((*InStr) != ch){
RemoveWhitespace(InStr + 1,ch);
}
else{
while ((*InStr) == ch){
prepend(InStr,InStr[0]);
n++;
}
RemoveWhitespace(InStr + n,ch);
}
}

Resources