Eject excess space from string in C - c

I need to write a function which will eject excess space from string in C.
Example:
char s[]=" abcde abcde ";
OUTPUT:
"abcde abcde"
Code:
#include <stdio.h>
#include <ctype.h>
char *eject(char *str) {
int i, x;
for (i = x = 0; str[i]; ++i)
if (!isspace(str[i]) || (i > 0 && !isspace(str[i - 1])))
str[x++] = str[i];
if(x > 0 && str[x-1] == ' ') str[x-1] = '\0';
return str;
}
int main() {
char s[] = " abcde abcde ";
printf("\"%s\"", eject(s));
return 0;
}
This code doesn't work for string " "
If this string is found program should print:
""
How to fix this?

Basically, you need to remove the consecutive space characters between the words in the input string and all leading and trailing space characters of the input string. That means, write code to remove the consecutive space characters in the input string and while removing the consecutive space characters, remove the leading and trailing space characters completely.
You can do it in just one iteration. No need to write the different functions for removing the leading and trailing spaces of input string, as shown in the other post.
You can do:
#include <stdio.h>
#include <ctype.h>
char * eject (char *str) {
if (str == NULL) {
printf ("Invalid input..\n");
return NULL;
}
/* Pointer to keep track of position where next character to be write
*/
char * p = str;
for (unsigned int i = 0; str[i] ; ++i) {
if ((isspace (str[i])) && ((p == str) || (str[i + 1] == '\0') || (str[i] == (str[i + 1])))) {
continue;
}
*p++ = str[i];
}
/* Add the null terminating character.
*/
*p = '\0';
return str;
}
int main (void) {
char s[] = " abcde abcde ";
printf("\"%s\"\n", eject(s));
char s1[] = " ";
printf("\"%s\"\n", eject(s1));
char s2[] = "ab yz ";
printf("\"%s\"\n", eject(s2));
char s3[] = " ddd xx jj m";
printf("\"%s\"\n", eject(s3));
char s4[] = "";
printf("\"%s\"\n", eject(s4));
return 0;
}
Output:
# ./a.out
"abcde abcde"
""
"ab yz"
"ddd xx jj m"
""

You could write two functions which trim leading and trailing whitespace characters.
void trim_front(char *src) {
size_t i = 0, j = 0;
while (isspace(src[i])) i++;
while (i < strlen(src)) src[j++] = src[i++];
src[j] = '\0';
}
void trim_back(char *src) {
char *ch = src + strlen(src) - 1;
while (isspace(*ch)) *ch-- = '\0';
}
If you know you don't have to deal with trailing or leading spaces, your task becomes much simpler.
void reduce_spaces(char *src) {
size_t i = 0, j = 0;
for (; i < strlen(src); ++i) {
if (i == strlen(src) - 1 ||
(isspace(src[i]) && !isspace(src[i + 1])) ||
!isspace(src[i])) {
src[j++] = src[i];
}
}
src[j] = '\0';
}
And testing this:
int main(void) {
char s[] = " hello world ";
trim_front(s);
trim_back(s);
reduce_spaces(s);
printf(">%s<\n", s);
return 0;
}
% gcc test.c
% ./a.out
>hello world<
%
Of course, if you really want to, you can transplant the code from those functions into reduce_spaces, but decomposing a problem into multiple smaller problems can make things much easier.

A slightly more advanced answer just for reference - suppose you were tasked in writing a professional library for the use in real world programs. Then one would first list all requirements that make sense:
It's good practice to treat strings as "immutable" - that is, build up a new string based on the old one rather than in-place replacement.
Take the destination string as parameter but also return a pointer to it (similar to strcpy etc functions).
In case of empty strings, set the destination string empty too.
Remove all "white space" not just the ' ' character.
Instead of always inserting a space character after each word, why not insert a variable delimiter? Might as well be something like , or ;.
No delimiter should be inserted after the last word.
The algorithm should only traverse the data once for performance reasons. That is, internal calls like strlen etc are unacceptable.
Byte by byte iteration is fine - we need not care about alignment.
Then we might come up with something like this:
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
char* trim_delimit (char* restrict dst, const char* restrict src, char delim)
{
char* start = dst;
*dst = '\0';
bool remove_spaces = true;
char* insert_delim_pos = NULL;
for(; *src != '\0'; src++)
{
if(remove_spaces)
{
if(!isspace(*src))
{
remove_spaces = false;
if(insert_delim_pos != NULL)
{
// we only get here if more words were found, not yet at the end of the string
*insert_delim_pos = delim;
insert_delim_pos = NULL;
}
}
}
if(!remove_spaces)
{
if(isspace(*src))
{
remove_spaces = true;
insert_delim_pos = dst; // remember where to insert delimiter for later
}
else
{
*dst = *src;
}
dst++;
}
}
return start;
}
Test cases:
int main (void)
{
char s[]=" abcde abcde ";
char trimmed[100];
puts(trim_delimit(trimmed, s, ' '));
puts(trim_delimit(trimmed, "", ' '));
puts(trim_delimit(trimmed, s, ';'));
}
Output:
abcde abcde
abcde;abcde

Related

How to add space between the characters if two consecutive characters are equal in c?

I need to add add space if two consecutive characters are same.
For example:
input:
ttjjjiibbbbhhhhhppuuuu
Output:
t tjjji ibbbbhhhhhp puuuu
If the two consecutive characters are same then need to print space between two consecutive characters....if the consecutive characters are greater than two no need to add space.
My code:
#include <stdio.h>
#include <string.h>
int main()
{
char s[100]="ttjjjiibbbbhhhhhppuuuu";
for(int i=0;i<strlen(s);i++){
if(s[i]!=s[i-1] && s[i]==s[i+1]){
s[i+1]=' ';
}
}
printf("%s",s);
}
my output:
t j ji b b h h hp u u
What mistake i made??
Your primary mistake is writing to your input when the string needs to grow. That's not going to work well and is hard to debug.
This is typical of C Code: measure once, process once. Same-ish code appears twice.
Variables:
int counter;
char *ptr1;
char *ptr2;
char *t;
Step 1: measure
for (ptr1 = s; *ptr1; ptr1++)
{
++counter;
if (ptr1[0] == ptr1[1] && ptr1[0] != ptr1[2] && (ptr1 == s || ptr1[-1] != ptr1[0]))
++counter;
}
Step 2: copy and process
t = malloc(counter + 1);
for (ptr1 = s, ptr2 = t; *ptr1; ptr1++)
{
*ptr2++ = *ptr1;
if (ptr1[0] == ptr1[1] && ptr1[0] != ptr1[2] && (ptr1 == s || ptr1[-1] != ptr1[0]))
*ptr2++ = ' ';
}
ptr2[0] = '\0';
Another solution: Calculate the length of consective characters and handle the special case(Length == 2).
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv) {
char s[100] = "ttjjjiibbbbhhhhhppuuuu";
char tmp_ch = s[0];
int cnt = 1;
for (int i = 1; i < strlen(s); i++) {
while (s[i] == tmp_ch) {
cnt++;
i++;
if (i == strlen(s)) {
break;
}
}
if (cnt == 2) {
putchar(tmp_ch);
putchar(' ');
putchar(tmp_ch);
} else {
for (int j = 0; j < cnt; j++) {
putchar(tmp_ch);
}
}
tmp_ch = s[i];
cnt = 1;
}
return 0;
}
Another approach is to use strspn() to get the number of consecutive characters as you work down the string. The prototype for strspn() is:
size_t strspn(const char *s, const char *accept);
Where strspn() returns the number of bytes in the initial segment of s which consist only of bytes from accept. (e.g. using the current character in a 2-character string as accept, it gives the number of times that character appears in sequence)
Tracking the number of charters returned and updating an offset from the beginning allows you to simply loop letting strspn() do the work as you work though your string. All you are concerned with is when strspn() returns 2 identifying where two, and only two, of the same character are adjacent to one another.
You can do:
#include <stdio.h>
#include <string.h>
int main (void) {
char *input = "ttjjjiibbbbhhhhhppuuuu";
char chstr[2] = {0}; /* 2 char string for accept parameter */
size_t nchr = 0, offset = 0; /* no. chars retured, current offset */
*chstr = input[offset]; /* initialize with 1st char */
/* while not at end, get number of consecutive character(s) */
while (*chstr && (nchr = strspn (input + offset, chstr))) {
if (nchr == 2) { /* if 2 - add space */
putchar (input[offset]);
putchar (' ');
putchar (input[offset]);
}
else { /* otherwise, loop nchr times outputting char */
size_t n = nchr;
while (n--)
putchar(input[offset]);
}
offset += nchr; /* add nchr to offset */
*chstr = input[offset]; /* store next char in string */
}
putchar ('\n'); /* tidy up with newline */
}
Example Use/Output
$ /bin/space_between_2
t tjjji ibbbbhhhhhp puuuu
Let me know if you have further questions concerning the use of strspn().

add additional letters in a string if there are two same letters beside each other

I'm trying to add an additional letter if there are two equal letters beside each other.
That's what I was thinking, but it doesn't put in an x between the two letters; instead of that, it copies one of the double letters, and now I have, for example, MMM instead of MXM.
for (index_X = 0; new_text[index_X] != '\0'; index_X++)
{
if (new_text[index_X] == new_text[index_X - 1])
{
double_falg = 1;
}
text[index_X] = new_text[index_X];
}
if (double_falg == 1)
{
for (counter_X = 0; text[counter_X] != '\0'; counter_X++)
{
transfer_X = counter_X;
if (text[transfer_X - 1] == text[transfer_X])
{
text_X[transfer_X] = 'X';
cnt_double++;
printf("%c\n", text[transfer_X]);
}
text_X[transfer_X] = text[transfer_X - cnt_double];
}
printf("%s\n", text_X);
}
If you're trying to create the modified array in text_X, copying data from new_text and putting an X between adjacent repeated letters (ignoring the possibility that the input contains XX), then you only need:
char new_text[] = "data with appalling repeats";
char text_X[SOME_SIZE];
int out_pos = 0;
for (int i = 0; new_text[i] != '\0'; i++)
{
text_X[out_pos++] = new_text[i];
if (new_text[i] == new_text[i+1])
text_X[out_pos++] = 'X';
}
text_X[out_pos] = '\0';
printf("Input: [%s]\n", new_text);
printf("Output: [%s]\n", text_X);
When wrapped in a basic main() function (and enum { SOME_SIZE = 64 };), that produces:
Input: [data with appalling repeats]
Output: [data with apXpalXling repeats]
To deal with repeated X's in the input, you could use:
text_X[out_pos++] = (new_text[i] == 'X') ? 'Q' : 'X';
It seems that your approach is more complicated than needed - too many loops and too many arrays involved. A single loop and two arrays should do.
The code below iterates the original string with idx to track position and uses the variable char_added to count how many extra chars that has been added to the new array.
#include <stdio.h>
#define MAX_LEN 20
int main(void) {
char org_arr[MAX_LEN] = "aabbcc";
char new_arr[MAX_LEN] = {0};
int char_added = 0;
int idx = 1;
new_arr[0] = org_arr[0];
if (new_arr[0])
{
while(org_arr[idx])
{
if (org_arr[idx] == org_arr[idx-1])
{
new_arr[idx + char_added] = '*';
++char_added;
}
new_arr[idx + char_added] = org_arr[idx];
++idx;
}
}
puts(new_arr);
return 0;
}
Output:
a*ab*bc*c
Note: The code isn't fully tested. Also it lacks out-of-bounds checking.
There is a lot left to be desired in your Minimal, Complete, and Verifiable Example (MCVE) (MCVE). However, that said, what you will need to do is fairly straight-forward. Take a simple example:
"ssi"
According to your statement, you need to add a character between the adjacent 's' characters. (you can use whatever you like for the separator, but if your input are normal ASCII character, then you can set the current char to the next ASCII character (or subtract one if current is the last ASCII char '~')) See ASCII Table and Description.
For example, you could use memmove() to shift all characters beginning with the current character up by one and then set the current character to the replacement. You also need to track the current length so you don't write beyond your array bounds.
A simple function could be:
#include <stdio.h>
#include <string.h>
#define MAXC 1024
char *betweenduplicates (char *s)
{
size_t len = strlen(s); /* get length to validate room */
if (!len) /* if empty string, nothing to do */
return s;
for (int i = 1; s[i] && len + 1 < MAXC; i++) /* loop until end, or out of room */
if (s[i-1] == s[i]) { /* adjacent chars equal? */
memmove (s + i + 1, s + i, len - i + 1); /* move current+ up by one */
if (s[i-1] != '~') /* not last ASCII char */
s[i] = s[i-1] + 1; /* set to next ASCII char */
else
s[i] = s[i-1] - 1; /* set to previous ASCII char */
len += 1; /* add one to len */
}
return s; /* convenience return so it can be used immediately if needed */
}
A short example program taking the string to check as the first argument could be:
int main (int argc, char **argv) {
char str[MAXC];
if (argc > 1) /* if argument given */
strcpy (str, argv[1]); /* copy to str */
else
strcpy (str, "mississippi"); /* otherwise use default */
puts (str); /* output original */
puts (betweenduplicates (str)); /* output result */
}
Example Use/Output
$ ./bin/betweenduplicated
mississippi
mistsistsipqpi
or when there is nothing to replace:
$ ./bin/betweenduplicated dog
dog
dog
Or checking the extremes:
$ ./bin/betweenduplicated "two spaces and alligators ~~"
two spaces and alligators ~~
two ! spaces ! and ! almligators ! ~}~
There are a number of ways to approach it. Let me know if you have further questions.

Store every string that start and end with a special words into an array in C

I have a long string and I want to store every string that starts and ends with a special word into an array, and then remove duplicate strings. In my long string, there is no space, , or any other separation between words so that I cannot use strtok. The start marker is start and the end marker is end. This is the code I have so far (but it doesn't work because it is using strtok()).
char buf[] = "start-12-3.endstart-12-4.endstart-13-3.endstart-12-4.end";
char *array[5];
char *x;
int i = 0, j = 0;
array[i] = strtok(buf, "start");
while (array[i] != NULL) {
array[++i] = strtok(NULL, "start");
}
//removeDuplicate(array[i]);
for (i = 0; i < 5; i++)
for (j = 0; j < 5; j++)
if (strcmp(array[i], array[j]) == 0)
x[i++] = array[i];
printf("%s", x[i]);
Example input:
start-12-3.endstart-12-4.endstart-13-3.endstart-12-4.end
Output equivalent to:
char *array[]= { "start-12-3.end", "start-12-4.end", "start-13-3.end" };
The second start-12-4.end string has been eliminated in the output.
*I've also used strstr but has some issue:
int main(int argc, char **argv)
{
char string[] = "This-one.testthis-two.testthis-three.testthis-two.test";
int counter = 0;
while (counter < 4)
{
char *result1 = strstr(string, "this");
int start = result1 - string;
char *result = strstr(string, "test");
int end = result - string;
end += 4;
printf("\n%s\n", result);
memmove(result, result1, end += 4);
counter++;
}
}
To put string into array and remove duplicate string, I've tried following code but it has issue:
int main(void)
{
char string[] = "this-one.testthis-two.testthis-three.testthis-two.test";
int counter = 0;
const char *b_token = "this";
const char *e_token = "test";
int e_len = strlen(e_token);
char *buffer = string;
char *b_mark;
char *e_mark;
char *a[50];
int i=0, j;
char *s;
while ((b_mark = strstr(buffer, b_token)) != 0 && (e_mark =strstr(b_mark, e_token)) != 0)
{
int length = e_mark + e_len - b_mark;
s = (char *) malloc(length);
strncpy(s, b_mark, length);
a[i]=s;
i++;
buffer = e_mark + e_len;
}
for (i=0; i<strlen(s); i++)
printf ("%s",a[i]);
free(s);
/*
//remove duplicate string
for (i=0; i<4; i++)
for (j=0; j<4; j++)
{
if (a[i] == NULL || a[j] == NULL || i == j)
continue;
if (strcmp (a[i], a[j]) == 0) {
free(a[i]);
a[i] = NULL;
}
printf("%s\n", a[i]);
*/
return 0;
}
Works with provided example of yours and tested in Valgrind for mem leaks, but might require further testing.
#include <malloc.h>
#include <stdio.h>
#include <string.h>
unsigned tokens_find_amount( char const* const string, char const* const delim )
{
unsigned counter = 0;
char const* pos = string;
while( pos != NULL )
{
if( ( pos = strstr( pos, delim ) ) != NULL )
{
pos++;
counter++;
}
}
return counter;
}
void tokens_remove_duplicate( char** const tokens, unsigned tokens_num )
{
for( unsigned i = 0; i < tokens_num; i++ )
{
for( unsigned j = 0; j < tokens_num; j++ )
{
if( tokens[i] == NULL || tokens[j] == NULL || i == j )
continue;
if( strcmp( tokens[i], tokens[j] ) == 0 )
{
free( tokens[i] );
tokens[i] = NULL;
}
}
}
}
void tokens_split( char const* const string, char const* const delim, char** tokens )
{
unsigned counter = 0;
char const* pos, *lastpos;
lastpos = string;
pos = string + 1;
while( pos != NULL )
{
if( ( pos = strstr( pos, delim ) ) != NULL )
{
*(tokens++) = strndup( lastpos, (unsigned long )( pos - lastpos ));
lastpos = pos;
pos++;
counter++;
continue;
}
*(tokens++) = strdup( lastpos );
}
}
void tokens_free( char** tokens, unsigned tokens_number )
{
for( unsigned i = 0; i < tokens_number; ++i )
{
free( tokens[ i ] );
}
}
void tokens_print( char** tokens, unsigned tokens_number )
{
for( unsigned i = 0; i < tokens_number; ++i )
{
if( tokens[i] == NULL )
continue;
printf( "%s ", tokens[i] );
}
}
int main(void)
{
char const* buf = "start-12-3.endstart-12-4.endstart-13-3.endstart-12-4.end";
char const* const delim = "start";
unsigned tokens_number = tokens_find_amount( buf, delim );
char** tokens = malloc( tokens_number * sizeof( char* ) );
tokens_split( buf, delim, tokens );
tokens_remove_duplicate( tokens, tokens_number );
tokens_print( tokens, tokens_number );
tokens_free( tokens, tokens_number );
free( tokens );
return 0;
}
Basic splitting — identifying the strings
In a comment, I suggested:
Use strstr() to locate occurrences of your start and end markers. Then use memmove() (or memcpy()) to copy parts of the strings around. Note that since your start and end markers are adjacent in the original string, you can't simply insert extra characters into it — which is also why you can't use strtok(). So, you'll have to make a copy of the original string.
Another problem with strtok() is that it looks for any one of the delimiter characters — it does not look for the characters in sequence. But strtok() modifies its input string, zapping the delimiter it finds, which is clearly not what you need. Generally, IMO, strtok() is only a source of headaches and seldom an answer to a problem. If you must use something like strtok(), use POSIX strtok_r() or Microsoft's strtok_s(). Microsoft's function is essentially the same as strtok_r() except for the spelling of the function name. (The Standard C Annex K version of strtok_s() is different from both POSIX and Microsoft — see Do you use the TR 24731 'safe' functions?)
In another comment, I noted:
Use strstr() again, starting from where the start portion ends, to find the next end marker. Then, knowing the start of the whole section, and the start of the end and the length of the end, you can arrange to copy precisely the correct number of characters into the new string, and then null terminate if that's appropriate, or comma terminate. Something like:
if ((start = strstr(source, "start")) != 0 && ((end = strstr(start, "end")) != 0)
then the data is between start and end + 2 (inclusive) in your source string. Repeat starting from the character after the end of 'end'.
You then said:
I've tried following code but it doesn't work fine; would u please tell me what's wrong with it?
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
char string[] = "This-one.testthis-two.testthis-three.testthis-two.test";
int counter = 0;
while (counter < 4)
{
char *result1 = strstr(string, "This");
int start = result1 - string;
char *result = strstr(string, "test");
int end = result - string;
end += 4;
printf("\n%s\n", result);
memmove(result, result1, end += 4);
counter++;
}
}
I observed:
The main problem appears to be searching for This with a capital T but the string only contains a single capital T. You should also look at Is there a way to specify how many characters of a string to print out using printf()?
Even assuming you fix the This vs this glitch, there are other issues.
You print the entire string.
You don't change the starting point for the search.
Your moving code adds 4 to end a second time.
You don't use start.
The code should print from result1, not result.
With those fixed, the code runs but produces:
testthis-two.testthis-three.testthis-two.test
testtestthis-three.testthis-two.test
testtthis-two.test
test?
and a core dump (segmentation fault).
Code identifying the strings
This is what I created, based on a mix of your code and my commentary:
#include <stdio.h>
#include <string.h>
int main(void)
{
char string[] = "this-one.testthis-two.testthis-three.testthis-two.test";
int counter = 0;
const char *b_token = "this";
const char *e_token = "test";
int e_len = strlen(e_token);
char *buffer = string;
char *b_mark;
char *e_mark;
while ((b_mark = strstr(buffer, b_token)) != 0 &&
(e_mark = strstr(b_mark, e_token)) != 0)
{
int length = e_mark + e_len - b_mark;
printf("%d: %.*s\n", ++counter, length, b_mark);
buffer = e_mark + e_len;
}
return 0;
}
Clearly, this code does no moving of data, but being able to isolate the data to be moved is a key first step to completing that part of the exercise. Extending it to make copies of the strings so that they can be compared is fairly easy. If it is available to you, the strndup() function will be useful:
char *strndup(const char *s1, size_t n);
The strndup() function copies at most n characters from the string s1 always NUL terminating the copied string.
If you don't have it available, it is pretty straight-forward to implement, though it is more straight-forward if you have strnlen() available:
size_t strnlen(const char *s, size_t maxlen);
The strnlen() function attempts to compute
the length of s, but never scans beyond the first maxlen bytes of s.
Neither of these is a standard C library function, but they're defined as part of POSIX (strnlen()
and strndup()) and are available on BSD and Mac OS X; Linux has them, and probably other versions of Unix do too. The specifications shown are quotes from the Mac OS X man pages.
Example output:
I called the program stst (for start-stop).
$ ./stst
1: this-one.test
2: this-two.test
3: this-three.test
4: this-two.test
$
There are multiple features to observe:
Since main() ignores its arguments, I removed the arguments (my default compiler options won't allow unused arguments).
I case-corrected the string.
I set up constant strings b_token and e_token for the beginning and end markers. The names are symmetric deliberately. This could readily be transplanted into a function where the tokens are arguments to the function, for example.
Similarly I created the b_mark and e_mark variables for the positions of the begin and end markers.
The name buffer is a pointer to where to start searching.
The loop uses the test I outlined in the comments, adapted to the chosen names.
The printing code determines how long the found string is and prints only that data. It prints the counter value.
The reinitialization code skips all the previously printed material.
Command line options for generality
You could generalize the code a bit by accepting command line arguments and processing each of those in turn if any are provided; you'd use the string you provide as a default when no string is provided. A next level beyond that would allow you to specify something like:
./stst -b beg -e end 'kalamazoo-beg-waffles-end-tripe-beg-for-mercy-end-of-the-road'
and you'd get output such as:
1: beg-waffles-end
2: beg-for-mercy-end
Here's code that implements that, using the POSIX getopt().
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char **argv)
{
char string[] = "this-one.testthis-two.testthis-three.testthis-two.test";
const char *b_token = "this";
const char *e_token = "test";
int opt;
int b_len;
int e_len;
while ((opt = getopt(argc, argv, "b:e:")) != -1)
{
switch (opt)
{
case 'b':
b_token = optarg;
break;
case 'e':
e_token = optarg;
break;
default:
fprintf(stderr, "Usage: %s [-b begin][-e end] ['beginning-to-end...' ...]\n", argv[0]);
return 1;
}
}
/* Use string if no argument supplied */
if (optind == argc)
{
argv[argc-1] = string;
optind = argc - 1;
}
b_len = strlen(b_token);
e_len = strlen(e_token);
printf("Begin: (%d) [%s]\n", b_len, b_token);
printf("End: (%d) [%s]\n", e_len, e_token);
for (int i = optind; i < argc; i++)
{
char *buffer = argv[i];
int counter = 0;
char *b_mark;
char *e_mark;
printf("Analyzing: [%s]\n", buffer);
while ((b_mark = strstr(buffer, b_token)) != 0 &&
(e_mark = strstr(b_mark + b_len, e_token)) != 0)
{
int length = e_mark + e_len - b_mark;
printf("%d: %.*s\n", ++counter, length, b_mark);
buffer = e_mark + e_len;
}
}
return 0;
}
Note how this program documents what it is doing, printing out the control information. That can be very important during debugging — it helps ensure that the program is working on the data you expect it to be working on. The searching is better too; it works correctly with the same string as the start and end marker (or where the end marker is a part of the start marker), which the previous version did not (because this version uses b_len, the length of b_token, in the second strstr() call). Both versions are quite happy with adjacent end and start tokens, but they're equally happy to skip material between an end token and the next start token.
Example runs:
$ ./stst -b beg -e end 'kalamazoo-beg-waffles-end-tripe-beg-for-mercy-end-of-the-road'
Begin: (3) [beg]
End: (3) [end]
Analyzing: [kalamazoo-beg-waffles-end-tripe-beg-for-mercy-end-of-the-road]
1: beg-waffles-end
2: beg-for-mercy-end
$ ./stst -b th -e th
Begin: (2) [th]
End: (2) [th]
Analyzing: [this-one.testthis-two.testthis-three.testthis-two.test]
1: this-one.testth
2: this-th
$ ./stst -b th -e te
Begin: (2) [th]
End: (2) [te]
Analyzing: [this-one.testthis-two.testthis-three.testthis-two.test]
1: this-one.te
2: this-two.te
3: this-three.te
4: this-two.te
$
After update to question
You have to account for the trailing null byte by allocating enough space for length + 1 bytes. Using strncpy() is fine but in this context guarantees that the string is not null terminated; you must null terminate it.
Your duplicate elimination code, commented out, was not particularly good — too many null checks when none should be necessary. I've created a print function; the tag argument allows it to identify which set of data it is printing. I should have put the 'free' loop into a function. The duplicate elimination code could (should) be in a function; the string extraction code could (should) be in a function — as in the answer by pikkewyn. I extended the test data (string concatenation is wonderful in contexts like this).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void dump_strings(const char *tag, char **strings, int num_str)
{
printf("%s (%d):\n", tag, num_str);
for (int i = 0; i < num_str; i++)
printf("%d: %s\n", i, strings[i]);
putchar('\n');
}
int main(void)
{
char string[] =
"this-one.testthis-two.testthis-three.testthis-two.testthis-one.test"
"this-1-testthis-1-testthis-2-testthis-1-test"
"this-1-testthis-1-testthis-1-testthis-1-test"
;
const char *b_token = "this";
const char *e_token = "test";
int b_len = strlen(b_token);
int e_len = strlen(e_token);
char *buffer = string;
char *b_mark;
char *e_mark;
char *a[50];
int num_str = 0;
while ((b_mark = strstr(buffer, b_token)) != 0 && (e_mark = strstr(b_mark + b_len, e_token)) != 0)
{
int length = e_mark + e_len - b_mark;
char *s = (char *) malloc(length + 1); // Allow for null
strncpy(s, b_mark, length);
s[length] = '\0'; // Null terminate the string
a[num_str++] = s;
buffer = e_mark + e_len;
}
dump_strings("After splitting", a, num_str);
//remove duplicate strings
for (int i = 0; i < num_str; i++)
{
for (int j = i + 1; j < num_str; j++)
{
if (strcmp(a[i], a[j]) == 0)
{
free(a[j]); // Free the higher-indexed duplicate
a[j] = a[--num_str]; // Move the last element here
j--; // Examine the new string next time
}
}
}
dump_strings("After duplicate elimination", a, num_str);
for (int i = 0; i < num_str; i++)
free(a[i]);
return 0;
}
Testing with valgrind gives this a clean bill of health: no memory faults, no leaked data.
Sample output:
After splitting (13):
0: this-one.test
1: this-two.test
2: this-three.test
3: this-two.test
4: this-one.test
5: this-1-test
6: this-1-test
7: this-2-test
8: this-1-test
9: this-1-test
10: this-1-test
11: this-1-test
12: this-1-test
After duplicate elimination (5):
0: this-one.test
1: this-two.test
2: this-three.test
3: this-1-test
4: this-2-test

How to reverse a sentence in C and Perl

If the sentence is
"My name is Jack"
then the output should be
"Jack is name My".
I did a program using strtok() to separate the words and then push them onto a stack,
popping them and concatenating.
Is there any other, more efficient way than this?
Is it easier to do in Perl?
Whether it is more efficient or not will be something you can test but in Perl you could do something along the lines of:
my $reversed = join( " ", reverse( split( / /, $string ) ) );
Perl makes this kind of text manipulation very easy, you can even test this easily on the shell:
echo "run as fast as you can" | perl -lne 'print join $",reverse split /\W+/'
or:
echo "all your bases are belong to us" | perl -lne '#a=reverse /\w+/g;print "#a"'
The strategy for C could be this:
1) Reverse the characters of the string. This results in the words being the right general position, albeit backward.
2) Reverse the characters of each word in the string.
We will need one function to reverse characters in a buffer:
/*
* Reverse characters in a buffer.
*
* If provided "My name is Jack", modifies the input to become
* "kcaJ si eman yM".
*/
void reverse_chars(char * buf, int cch_len)
{
char * front = buf, *back = buf + cch_len - 1;
while (front < back)
{
char tmp = *front;
*front = *back;
*back = tmp;
front ++;
back --;
}
}
For the purpose of breaking the input buffer into words, a function which returns the number of non-space characters in the buffer. (strtok() modifies the buffer and is harder to use in-place)
int word_len(char *input)
{
char * p = input;
while (*p && !isspace(*p))
p++;
return p - input;
}
Finally, we will need a function which uses those two helpers to achieve the strategy described in the first paragraph.
/*
* Reverse words in a buffer.
*
* Given the input "My name is Jack", modifies the input to become
* "Jack is name My"
*/
void reverse_words(char *input)
{
int cch_len = strlen(input);
/* Part 1: Reverse the string characters. */
reverse_chars(input, cch_len);
char * p = input;
/* Part 2: Loop over one word at a time. */
while (*p)
{
/* Skip leading spaces */
while (*p && isspace(*p))
p++;
if (*p)
{
/* Advance one complete word. */
int cch_word = word_len(p);
reverse_chars(p, cch_word);
p += cch_word;
}
}
}
You've gotten a couple of versions in C, but they strike me as a bit more verbose than is probably really necessary. Absent a reason to do otherwise, I'd consider something like this:
#define MAX 32
char *words[MAX];
char word[256];
int pos = 0;
for (pos=0; pos<MAX && scanf("%255s", word); pos++)
words[pos] = strdup(word);
while (--pos >= 0)
printf("%s ", words[pos]);
One possible "intermediate" level between C and Perl would be C++:
std::istringstream input("My name is Jack");
std::vector<std::string> words((std::istream_iterator<std::string>(input)),
std::istream_iterator<std::string>());
std::copy(words.rbegin(), words.rend(),
std::ostream_iterator<std::string>(std::cout, " "));
Here is a C idea that uses a little recursion to do the stacking for you:
void rev(char * x){
char * p;
if(p = strchr(x, ' ')){
rev(p+1);
printf("%.*s ", p-x, x);
}
else{
printf("%s ", x);
}
}
Some fun with a little help from regexp and perl special variables :)
$_ = "My name is Jack";
unshift #_, "$1 " while /(\w+)/g;
print #_;
EDIT
And a killer (by now):
$,=' ';print reverse /\w+/g;
Little explanation: $, is special variable for print output separator. Of course you can do it in shorter way without this special var:
print reverse /\w+ ?/g;
but the result might be not as satisfactiry as example above.
Using reverse:
my #words = split / /, $sentence;
my $newSentence = join(' ', reverse #words);
It's probably a lot easier to do in Perl, but...
char *strrtok(char *str, const char *delim)
{
int i, j;
for (i = strlen(str) - 1; i > 0; i--)
{
// Sets the furthest set of contiguous delimiters to null characters
if (strchr(delim, str[i]))
{
j = i + 1;
while (strchr(delim, str[i]) && i >= 0)
{
str[i] = '\0';
i--;
}
return &(str[j]);
}
}
return str;
}
This should work similarly to strtok() in reverse, but you continue to pass the pointer to the original string location rather than passing NULL after the first call. Also, you should get empty strings for start and end cases.
C version:
#include <string.h>
int main()
{
char s[] = "My name is Jack";
char t[100];
int i = 0, j = 0, k = 0;
for(i = strlen(s) - 1 ; i >= 0 ;i--)
{
if(s[i] == ' ' || i == 0)
{
j = i == 0 ? i : i + 1;
for(j = j; s[j] != '\0'; j++) t[k++] = s[j];
t[k++] = ' ';
s[i] = '\0';
}
}
t[k] = '\0';
printf("%s\n", t);
return 0;
}
C example
char * srtrev (char * str) {
int l = strlen(str);
char * rev;
while(l != 0)
{
rev += str[ --l];
}
return rev;
}

C - Largest String From a Big One

So pray tell, how would I go about getting the largest contiguous string of letters out of a string of garbage in C? Here's an example:
char *s = "(2034HEY!!11 th[]thisiswhatwewant44";
Would return...
thisiswhatwewant
I had this on a quiz the other day...and it drove me nuts (still is) trying to figure it out!
UPDATE:
My fault guys, I forgot to include the fact that the only function you are allowed to use is the strlen function. Thus making it harder...
Uae strtok() to split your string into tokens, using all non-letter characters as delimiters, and find the longest token.
To find the longest token you will need to organise some storage for tokens - I'd use linked list.
As simple as this.
EDIT
Ok, if strlen() is the only function allowed, you can first find the length of your source string, then loop through it and replace all non-letter characters with NULL - basically that's what strtok() does.
Then you need to go through your modified source string second time, advancing one token at a time, and find the longest one, using strlen().
This sounds similar to the standard UNIX 'strings' utility.
Keep track of the longest run of printable characters terminated by a NULL.
Walk through the bytes until you hit a printable character. Start counting. If you hit a non-printable character stop counting and throw away the starting point. If you hit a NULL, check to see if the length of the current run is greater then the previous record holder. If so record it, and start looking for the next string.
What defines the "good" substrings compared to the many others -- being lowercase alphas only? (i.e., no spaces, digits, punctuation, uppercase, &c)?
Whatever the predicate P that checks for a character being "good", a single pass over s applying P to each character lets you easily identify the start and end of each "run of good characters", and remember and pick the longest. In pseudocode:
longest_run_length = 0
longest_run_start = longest_run_end = null
status = bad
for i in (all indices over s):
if P(s[i]): # current char is good
if status == bad: # previous one was bad
current_run_start = current_run_end = i
status = good
else: # previous one was also good
current_run_end = i
else: # current char is bad
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length
status = bad
# if a good run ends with end-of-string:
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length
Why use strlen() at all?
Here's my version which uses no function whatsoever.
#ifdef UNIT_TEST
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#endif
/*
// largest_letter_sequence()
// Returns a pointer to the beginning of the largest letter
// sequence (including trailing characters which are not letters)
// or NULL if no letters are found in s
// Passing NULL in `s` causes undefined behaviour
// If the string has two or more sequences with the same number of letters
// the return value is a pointer to the first sequence.
// The parameter `len`, if not NULL, will have the size of the letter sequence
//
// This function assumes an ASCII-like character set
// ('z' > 'a'; 'z' - 'a' == 25; ('a' <= each of {abc...xyz} <= 'z'))
// and the same for uppercase letters
// Of course, ASCII works for the assumptions :)
*/
const char *largest_letter_sequence(const char *s, size_t *len) {
const char *p = NULL;
const char *pp = NULL;
size_t curlen = 0;
size_t maxlen = 0;
while (*s) {
if ((('a' <= *s) && (*s <= 'z')) || (('A' <= *s) && (*s <= 'Z'))) {
if (p == NULL) p = s;
curlen++;
if (curlen > maxlen) {
maxlen = curlen;
pp = p;
}
} else {
curlen = 0;
p = NULL;
}
s++;
}
if (len != NULL) *len = maxlen;
return pp;
}
#ifdef UNIT_TEST
void fxtest(const char *s) {
char *test;
const char *p;
size_t len;
p = largest_letter_sequence(s, &len);
if (len && (len < 999)) {
test = malloc(len + 1);
if (!test) {
fprintf(stderr, "No memory.\n");
return;
}
strncpy(test, p, len);
test[len] = 0;
printf("%s ==> %s\n", s, test);
free(test);
} else {
if (len == 0) {
printf("no letters found in \"%s\"\n", s);
} else {
fprintf(stderr, "ERROR: string too large\n");
}
}
}
int main(void) {
fxtest("(2034HEY!!11 th[]thisiswhatwewant44");
fxtest("123456789");
fxtest("");
fxtest("aaa%ggg");
return 0;
}
#endif
While I waited for you to post this as a question I coded something up.
This code iterates through a string passed to a "longest" function, and when it finds the first of a sequence of letters it sets a pointer to it and starts counting the length of it. If it is the longest sequence of letters yet seen, it sets another pointer (the 'maxStringStart' pointer) to the beginning of that sequence until it finds a longer one.
At the end, it allocates enough room for the new string and returns a pointer to it.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int isLetter(char c){
return ( (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') );
}
char *longest(char *s) {
char *newString = 0;
int maxLength = 0;
char *maxStringStart = 0;
int curLength = 0;
char *curStringStart = 0;
do {
//reset the current string length and skip this
//iteration if it's not a letter
if( ! isLetter(*s)) {
curLength = 0;
continue;
}
//increase the current sequence length. If the length before
//incrementing is zero, then it's the first letter of the sequence:
//set the pointer to the beginning of the sequence of letters
if(curLength++ == 0) curStringStart = s;
//if this is the longest sequence so far, set the
//maxStringStart pointer to the beginning of it
//and start increasing the max length.
if(curLength > maxLength) {
maxStringStart = curStringStart;
maxLength++;
}
} while(*s++);
//return null pointer if there were no letters in the string,
//or if we can't allocate any memory.
if(maxLength == 0) return NULL;
if( ! (newString = malloc(maxLength + 1)) ) return NULL;
//copy the longest string into our newly allocated block of
//memory (see my update for the strlen() only requirement)
//and null-terminate the string by putting 0 at the end of it.
memcpy(newString, maxStringStart, maxLength);
newString[maxLength + 1] = 0;
return newString;
}
int main(int argc, char *argv[]) {
int i;
for(i = 1; i < argc; i++) {
printf("longest all-letter string in argument %d:\n", i);
printf(" argument: \"%s\"\n", argv[i]);
printf(" longest: \"%s\"\n\n", longest(argv[i]));
}
return 0;
}
This is my solution in simple C, without any data structures.
I can run it in my terminal like this:
~/c/t $ ./longest "hello there, My name is Carson Myers." "abc123defg4567hijklmnop890"
longest all-letter string in argument 1:
argument: "hello there, My name is Carson Myers."
longest: "Carson"
longest all-letter string in argument 2:
argument: "abc123defg4567hijklmnop890"
longest: "hijklmnop"
~/c/t $
the criteria for what constitutes a letter could be changed in the isLetter() function easily. For example:
return (
(c >= 'a' && c <= 'z') ||
(c >= 'A' && c <= 'Z') ||
(c == '.') ||
(c == ' ') ||
(c == ',') );
would count periods, commas and spaces as 'letters' also.
as per your update:
replace memcpy(newString, maxStringStart, maxLength); with:
int i;
for(i = 0; i < maxLength; i++)
newString[i] = maxStringStart[i];
however, this problem would be much more easily solved with the use of the C standard library:
char *longest(char *s) {
int longest = 0;
int curLength = 0;
char *curString = 0;
char *longestString = 0;
char *tokens = " ,.!?'\"()#$%\r\n;:+-*/\\";
curString = strtok(s, tokens);
do {
curLength = strlen(curString);
if( curLength > longest ) {
longest = curLength;
longestString = curString;
}
} while( curString = strtok(NULL, tokens) );
char *newString = 0;
if( longest == 0 ) return NULL;
if( ! (newString = malloc(longest + 1)) ) return NULL;
strcpy(newString, longestString);
return newString;
}
First, define "string" and define "garbage". What do you consider a valid, non-garbage string? Write down a concrete definition you can program - this is how programming specs get written. Is it a sequence of alphanumeric characters? Should it start with a letter and not a digit?
Once you get that figured out, it's very simple to program. Start with a naive method of looping over the "garbage" looking for what you need. Once you have that, look up useful C library functions (like strtok) to make the code leaner.
Another variant.
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[] = "(2034HEY!!11 th[]thisiswhatwewant44";
int len = strlen(s);
int i = 0;
int biggest = 0;
char* p = s;
while (p[0])
{
if (!((p[0] >= 'A' && p[0] <= 'Z') || (p[0] >= 'a' && p[0] <= 'z')))
{
p[0] = '\0';
}
p++;
}
for (; i < len; i++)
{
if (s[i] && strlen(&s[i]) > biggest)
{
biggest = strlen(&s[i]);
p = &s[i];
}
}
printf("%s\n", p);
return 0;
}

Resources