Related
I'm trying to divide a string of alphabetically sorted words char *str = "a/apple/arm/basket/bread/car/camp/element/..."
into an array of strings alphabetically like so:
arr[0] = "a/apple/arm"
arr[1] = "basket/bread"
arr[2] = "car/camp"
arr[3] = ""
arr[4] = "element"
...
I'm not very skilled in C, so my approach was going to be to declare:
char arr[26][100];
char curr_letter = "a";
and then iterate over each char in the string looking for "/" follow by char != curr_letter, then strcpy that substring to the correct location.
I'm not sure if my approach is very good, let alone how to implement it properly.
Any help would be greatly appreciated!
So we basically loop through the string, and check if we found the "split character' and we also check that we didn't find the 'curr_letter' as the next character.
We keep track of the consumed length, the current length (used for memcpy later to copy the current string to the array).
When we find a position where we can add the current string to the array, we allocate space and copy the string to it as the next element in the array. We also add the current_length to consumed, and the current_length is reset.
We use due_to_end to find out if we have a / in the current string, and remove it accordingly.
Try:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main() {
char *str = "a/apple/arm/basket/bread/car/camp/element/...";
char split_char = '/';
char nosplit_char = 'a';
char **array = NULL;
int num_elts = 0;
// read all the characters one by one, and add to array if
// your condition is met, or if the string ends
int current_length = 0; // holds the current length of the element to be added
int consumed = 0; // holds how much we already added to the array
for (int i = 0; i < strlen(str); i++) { // loop through string
current_length++; // increment first
int due_to_end = 0;
if ( ( str[i] == split_char // check if split character found
&& ( i != (strlen(str) - 1) // check if its not the end of the string, so when we check for the next character, we don't overflow
&& str[i + 1] != nosplit_char ) ) // check if the next char is not the curr_letter(nosplit_char)
|| (i == strlen(str) - 1 && (due_to_end = 1))) { // **OR**, check if end of string
array = realloc(array, (num_elts + 1) * sizeof(char *)); // allocate space in the array
array[num_elts] = calloc(current_length + 1, sizeof(char)); // allocate space for the string
memcpy(array[num_elts++], str + consumed, (due_to_end == 0 ? current_length - 1 : current_length)); // copy the string to the current array offset's allocated memory, and remove 1 character (slash) if this is not the end of the string
consumed += current_length; // add what we consumed right now
current_length = 0; // reset current_length
}
}
for (int i = 0; i < num_elts; i++) { // loop through all the elements for overview
printf("%s\n", array[i]);
free(array[i]);
}
free(array);
}
Yes, the approach that you specify in your question seems good, in principle. However, I see the following problem:
Using strcpy will require a null-terminated source string. This means if you want to use strcpy, you will have to overwrite the / with a null character. If you don't want to have to modify the source string by writing null characters into it, then an alternative would be to use the function memcpy instead of strcpy. That way, you can specify the exact number of characters to copy and you don't require the source string to have a null terminating character. However, this also means that you will somehow have to count the number of characters to copy.
On the other hand, instead of using strcpy or memcpy, you could simply copy one character at a time from str into arr[0], until you encounter the next letter, and then copy one character at a time from str into arr[1], and so on. That solution may be simpler.
In accordance with the community guidelines for homework questions, I will not provide a full solution to your problem at this time.
EDIT: Since another answer has already provides a full solution which uses memcpy, I will now also provide a full solution, which uses the simpler solution mentioned above of copying one character at a time:
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#define NUM_LETTERS 26
#define MAX_CHARS_PER_LETTER 99
int main( void )
{
//declare the input string
char *str =
"a/apple/arm/basket/bread/car/camp/element/"
"frog/glass/saddle/ship/water";
//declare array which holds all the data
//we must add 1 for the terminating null character
char arr[NUM_LETTERS][MAX_CHARS_PER_LETTER+1];
//this variable will store the current letter that we
//have reached
char curr_letter = 'a';
//this variable will store the number of chars that are
//already used in the current letter, which will be a
//number between 0 and MAX_CHARS_PER_LETTER
int chars_used = 0;
//this variable stores whether the next character is
//the start of a new word
bool new_word = true;
//initialize the arrays to contain empty strings
for ( int i = 0; i < NUM_LETTERS; i++ )
arr[i][0] = '\0';
//read one character at a time
for ( const char *p = str; *p != '\0'; p++ )
{
//determine whether we have reached the end of a word
if ( *p == '/' )
{
new_word = true;
}
else
{
//determine whether we have reached a new letter
if ( new_word && *p != curr_letter )
{
//write terminating null character to string of
//previous letter, overwriting the "/"
if ( chars_used != 0 )
arr[curr_letter-'a'][chars_used-1] = '\0';
curr_letter = *p;
chars_used = 0;
}
new_word = false;
}
//verify that buffer is large enough
if ( chars_used == MAX_CHARS_PER_LETTER )
{
fprintf( stderr, "buffer overflow!\n" );
exit( EXIT_FAILURE );
}
//copy the character
arr[curr_letter-'a'][chars_used++] = *p;
}
//the following code assumes that the string pointed to
//by "str" will not end with a "/"
//write terminating null character to string
arr[curr_letter-'a'][chars_used] = '\0';
//print the result
for ( int i = 0; i < NUM_LETTERS; i++ )
printf( "%c: %s\n", 'a' + i, arr[i] );
}
This program has the following output:
a: a/apple/arm
b: basket/bread
c: car/camp
d:
e: element
f: frog
g: glass
h:
i:
j:
k:
l:
m:
n:
o:
p:
q:
r:
s: saddle/ship
t:
u:
v:
w: water
x:
y:
z:
Here is another solution which uses strtok:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NUM_LETTERS 26
#define MAX_CHARS_PER_LETTER 99
int main( void )
{
//declare the input string
char str[] =
"a/apple/arm/basket/bread/car/camp/element/"
"frog/glass/saddle/ship/water";
//declare array which holds all the data
//we must add 1 for the terminating null character
char arr[NUM_LETTERS][MAX_CHARS_PER_LETTER+1];
//this variable will store the current letter that we
//have reached
char curr_letter = 'a';
//this variable will store the number of chars that are
//already used in the current letter, which will be a
//number between 0 and MAX_CHARS_PER_LETTER
int chars_used = 0;
//initialize the arrays to contain empty strings
for ( int i = 0; i < NUM_LETTERS; i++ )
arr[i][0] = '\0';
//find first token
char *p = strtok( str, "/" );
//read one token at a time
while ( p != NULL )
{
int len;
//determine whether we have reached a new letter
if ( p[0] != curr_letter )
{
curr_letter = p[0];
chars_used = 0;
}
//count length of string
len = strlen( p );
//verify that buffer is large enough to copy string
if ( chars_used + len >= MAX_CHARS_PER_LETTER )
{
fprintf( stderr, "buffer overflow!\n" );
exit( EXIT_FAILURE );
}
//add "/" if necessary
if ( chars_used != 0 )
{
arr[curr_letter-'a'][chars_used++] = '/';
arr[curr_letter-'a'][chars_used] = '\0';
}
//copy the word
strcpy( arr[curr_letter-'a']+chars_used, p );
//update number of characters used in buffer
chars_used += len;
//find next token
p = strtok( NULL, "/" );
}
//print the result
for ( int i = 0; i < NUM_LETTERS; i++ )
printf( "%c: %s\n", 'a' + i, arr[i] );
}
Could you help please ?
When I execute this code I receive that:
AAAAABBBBBCCCCCBBBBBCOMP¬ıd┐╔ LENGTH 31
There are some weirds characters after letters, while I've allocate just 21 bytes.
#include <stdio.h>
#include <stdlib.h>
char * lineDown(){
unsigned short state[4] = {0,1,2,1};
char decorationUp[3][5] = {
{"AAAAA"},{"BBBBB"},{"CCCCC"}
};
char * deco = malloc(21);
int k;
int p = 0;
for(int j = 0; j < 4; j++){
k = state[j];
for(int i = 0; i < 5; i++){
*(deco+p) = decorationUp[k][i];
p++;
}
}
return deco;
}
int main(void){
char * lineDOWN = lineDown();
int k = 0;
char c;
do{
c = *(lineDOWN+k);
printf("%c",*(lineDOWN+k));
k++;
}while(c != '\0');
printf("LENGTH %d\n\n",k);
}
The function does not build a string because the result array does not contain the terminating zero though a space for it was reserved when the array was allocated.
char * deco = malloc(21);
So you need to append the array with the terminating zero before exiting the function
//...
*(deco + p ) = '\0';
return deco;
}
Otherwise this do-while loop
do{
c = *(lineDOWN+k);
printf("%c",*(lineDOWN+k));
k++;
}while(c != '\0')
will have undefined behavior.
But even if you will append the array with the terminating zero the loop will count the length of the stored string incorrectly because it will increase the variable k even when the current character is the terminating zero.
Instead you should use a while loop. In this case the declaration of the variable c will be redundant. The loop can look like
while ( *( lineDOWN + k ) )
{
printf("%c",*(lineDOWN+k));
k++;
}
In this case this call
printf("\nLENGTH %d\n\n",k);
^^
will output the correct length of the string equal to 20.
And you should free the allocated memory before exiting the program
free( lineDOWN );
As some other wrote here in their answers that the array decorationUp must be declared like
char decorationUp[3][6] = {
{"AAAAA"},{"BBBBB"},{"CCCCC"}
};
then it is not necessary if you are not going to use elements of the array as strings and you are not using them as strings in your program.
Take into account that your program is full of magic numbers. Such a program is usually error-prone. Instead you should use named constants.
In
char decorationUp[3][5] = {
{"AAAAA"},{"BBBBB"},{"CCCCC"}
};
your string needs 6 characters to also place the null char, even in that case you do not use them as 'standard' string but only array of char. To get into the habit always reverse the place for the ending null character
you can do
char decorationUp[3][6] = {
{"AAAAA"},{"BBBBB"},{"CCCCC"}
};
Note it is useless to give the first size, the compiler counts for you
Because in main you stop when you read the null character you also need to place it in deco at the end, so you need to allocate 21 for it. As before you missed the place for the null character, but here that produces an undefined behavior because you read after the allocated block.
To do *(deco+p) is not readable, do deco[p]
So for instance :
char * lineDown(){
unsigned short state[] = {0,1,2,1};
char decorationUp[][6] = {
{"AAAAA"},{"BBBBB"},{"CCCCC"}
};
char * deco = malloc(4*5 + 1); /* a formula to explain why 21 is better than 21 directly */
int k;
int p = 0;
for(int j = 0; j < 4; j++){
k = state[j];
for(int i = 0; i < 5; i++){
deco[p] = decorationUp[k][i];
p++;
}
}
deco[p] = 0;
return deco;
}
This program, tokenizes a user input string, removes extra spaces and saves each word into a 2D array and then print the tokens
EXAMPLE:
input: " Hello world string house and car"
output and EXPECTED output:
token[0]: Hello
token[1]: world
token[2]: string
token[3]: house
token[4]: and
token[5]: car
THE PROBLEM:
the problem is that I achieved this by using strlen() function when printing the tokens(code located at the very bottom), I am not supposed to use any other library than stdio.h and stdlib.h, since strlen() function is defined in string.h i tried to use sizeof(arr) / sizeof(arr[0]); but it does not work as I want, the result using sizeof is :
token[0]: Hello
token[1]: world
token[2]: string
token[3]: house
token[4]: and
token[5]: car
�oken[6]: ��
token[7]: �
token[8]: ����
token[9]: �
token[10]:
I WOULD LIKE TO HAVE THE EXPECTED OUTPUT WITHOUT USING STRLEN()
#include<stdio.h>
#include <stdlib.h>
#define TRUE 1
char tokenize(char *str, char array[10][20])
{
int n = 0, i, j = 0;
for(i = 0; TRUE; i++)//infinite loop until is the end of the string '\0'
{
if(str[i] != ' '){
//position 1, char 1
array[n][j++] = str[i];// if, it is not space, we save the character
}
else{
array[n][j++] = '\0';//end of the first word
n++;// position for next new word
j=0;// start writting char at position 0
}
if(str[i] == '\0')
break;
}
return 0;
}
//removes extra spaces
char* find_word_start(char* str){
/*also removes all extra spaces*/
char *result = (char*) malloc(sizeof(char) *1000);
int c = 0, d = 0;
// no space at beginning
while(str[c] ==' ') {
c++;
}
while(str[c] != '\0'){ // till end of sentence
result[d++] = str[c++]; //take non-space characters
if(str[c]==' ') { // take one space between words
result[d++] = str[c++];
}
while(str[c]==' ') { //
c++;
}
}
result[d-1] = '\0';
//print or return char?
return result;
free(result);
}
int main()
{
char str[]=" Hello world string dudes and dudas ";
//words, and chars in each word
char arr[10][20];
//call the method to tokenize the string
tokenize(find_word_start(str),arr);
int row = sizeof(arr) / sizeof(arr[0]);
/*----------------------------------------------------------------------*/
/*----------------------------------------------------------------------*/
for(int i = 0;i <= strlen(arr);i++)
/*----------------------------------------------------------------------*/
/*----------------------------------------------------------------------*/
printf("token[%d]: %s\n", i, arr[i]);
return 0;
}
Your code using strlen() may appear the work in this instance but it is not correct.
strlen(arr) makes no semantic sense because arr is not a string. It happens in this case to return 5 because arr has the same address as arr[0], then you kludged it to work for the 6 word output by using the test i <= strlen(arr) in the for loop. The two values strlen(arr) and the number of strings stored in arr are not related.
The expression sizeof(arr) / sizeof(arr[0]) determines the run-time constant number arrays within the array of arrays arr (i.e. 10), not the number of valid strings assigned. It is your code's responsibility to keep track of that either with a sentinel value such as an empty string, or by maintaining a count of strings assigned.
I suggest you change tokenize to return the number of strings (currently it is inexplicably defined to return a char, but in fact only ever rather uselessly returns zero):
int tokenize( char* str, char array[][20] )
{
...
return n ;
}
Then:
int rows = tokenize( find_word_start(str), arr ) ;
for( int i = 0; i < rows; i++ )
{
printf( "token[%d]: %s\n", i, arr[i] ) ;
}
I am new to programming in C and am trying to write a simple function that will normalize a char array. At the end i want to return the length of the new char array. I am coming from java so I apologize if I'm making mistakes that seem simple. I have the following code:
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c functions to analyze it */
int i;
if(isspace(buf[0])){
buf[0] = "";
}
if(isspace(buf[len-1])){
buf[len-1] = "";
}
for(i = 0;i < len;i++){
if(isupper(buf[i])) {
buf[i]=tolower(buf[i]);
}
if(isspace(buf[i])) {
buf[i]=" ";
}
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
}
return strlen(*buf);
}
How can I return the length of the char array at the end? Also does my procedure properly do what I want it to?
EDIT: I have made some corrections to my program based on the comments. Is it correct now?
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c funstions to analyze it */
int i = 0;
int j = 0;
if(isspace(buf[0])){
//buf[0] = "";
i++;
}
if(isspace(buf[len-1])){
//buf[len-1] = "";
i++;
}
for(i;i < len;i++){
if(isupper(buf[i])) {
buf[j]=tolower(buf[i]);
j++;
}
if(isspace(buf[i])) {
buf[j]=' ';
j++;
}
if(isspace(buf[i]) && isspace(buf[i+1])){
//buf[i]="";
i++;
}
}
return strlen(buf);
}
The canonical way of doing something like this is to use two indices, one for reading, and one for writing. Like this:
int normalizeString(char* buf, int len) {
int readPosition, writePosition;
bool hadWhitespace = false;
for(readPosition = writePosition = 0; readPosition < len; readPosition++) {
if(isspace(buf[readPosition]) {
if(!hadWhitespace) buf[writePosition++] = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
return writePosition;
}
Warning: This handles the string according to the given length only. While using a buffer + length has the advantage of being able to handle any data, this is not the way C strings work. C-strings are terminated by a null byte at their end, and it is your job to ensure that the null byte is at the right position. The code you gave does not handle the null byte, nor does the buffer + length version I gave above. A correct C implementation of such a normalization function would look like this:
int normalizeString(char* string) { //No length is passed, it is implicit in the null byte.
char* in = string, *out = string;
bool hadWhitespace = false;
for(; *in; in++) { //loop until the zero byte is encountered
if(isspace(*in) {
if(!hadWhitespace) *out++ = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
*out = 0; //add a new zero byte
return out - string; //use pointer arithmetic to retrieve the new length
}
In this code I replaced the indices by pointers simply because it was convenient to do so. This is simply a matter of style preference, I could have written the same thing with explicit indices. (And my style preference is not for pointer iterations, but for concise code.)
if(isspace(buf[i])) {
buf[i]=" ";
}
This should be buf[i] = ' ', not buf[i] = " ". You can't assign a string to a character.
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
This has two problems. One is that you're not checking whether i < len - 1, so buf[i + 1] could be off the end of the string. The other is that buf[i] = "" won't do what you want at all. To remove a character from a string, you need to use memmove to move the remaining contents of the string to the left.
return strlen(*buf);
This would be return strlen(buf). *buf is a character, not a string.
The notations like:
buf[i]=" ";
buf[i]="";
do not do what you think/expect. You will probably need to create two indexes to step through the array — one for the current read position and one for the current write position, initially both zero. When you want to delete a character, you don't increment the write position.
Warning: untested code.
int i, j;
for (i = 0, j = 0; i < len; i++)
{
if (isupper(buf[i]))
buf[j++] = tolower(buf[i]);
else if (isspace(buf[i])
{
buf[j++] = ' ';
while (i+1 < len && isspace(buf[i+1]))
i++;
}
else
buf[j++] = buf[i];
}
buf[j] = '\0'; // Null terminate
You replace the arbitrary white space with a plain space using:
buf[i] = ' ';
You return:
return strlen(buf);
or, with the code above:
return j;
Several mistakes in your code:
You cannot assign buf[i] with a string, such as "" or " ", because the type of buf[i] is char and the type of a string is char*.
You are reading from buf and writing into buf using index i. This poses a problem, as you want to eliminate consecutive white-spaces. So you should use one index for reading and another index for writing.
In C/C++, a native string is an array of characters that ends with 0. So in essence, you can simply iterate buf until you read 0 (you don't need to use the len variable at all). In addition, since you are "truncating" the input string, you should set the new last character to 0.
Here is one optional solution for the problem at hand:
int normalize(char* buf)
{
char c;
int i = 0;
int j = 0;
while (buf[i] != 0)
{
c = buf[i++];
if (isspace(c))
{
j++;
while (isspace(c))
c = buf[i++];
}
if (isupper(c))
buf[j] = tolower(c);
j++;
}
buf[j] = 0;
return j;
}
you should write:
return strlen(buf)
instead of:
return strlen(*buf)
The reason:
buf is of type char* - it's an address of a char somewhere in the memory (the one in the beginning of the string). The string is null terminated (or at least should be), and therefore the function strlen knows when to stop counting chars.
*buf will de-reference the pointer, resulting on a char - not what strlen expects.
Not much different then others but assumes this is an array of unsigned char and not a C string.
tolower() does not itself need the isupper() test.
int normalize(unsigned char *buf, int len) {
int i = 0;
int j = 0;
int previous_is_space = 0;
while (i < len) {
if (isspace(buf[i])) {
if (!previous_is_space) {
buf[j++] = ' ';
}
previous_is_space = 1;
} else {
buf[j++] = tolower(buf[i]);
previous_is_space = 0;
}
i++;
}
return j;
}
#OP:
Per the posted code it implies leading and trailing spaces should either be shrunk to 1 char or eliminate all leading and trailing spaces.
The above answer simple shrinks leading and trailing spaces to 1 ' '.
To eliminate trailing and leading spaces:
int i = 0;
int j = 0;
while (len > 0 && isspace(buf[len-1])) len--;
while (i < len && isspace(buf[i])) i++;
int previous_is_space = 0;
while (i < len) { ...
So pray tell, how would I go about getting the largest contiguous string of letters out of a string of garbage in C? Here's an example:
char *s = "(2034HEY!!11 th[]thisiswhatwewant44";
Would return...
thisiswhatwewant
I had this on a quiz the other day...and it drove me nuts (still is) trying to figure it out!
UPDATE:
My fault guys, I forgot to include the fact that the only function you are allowed to use is the strlen function. Thus making it harder...
Uae strtok() to split your string into tokens, using all non-letter characters as delimiters, and find the longest token.
To find the longest token you will need to organise some storage for tokens - I'd use linked list.
As simple as this.
EDIT
Ok, if strlen() is the only function allowed, you can first find the length of your source string, then loop through it and replace all non-letter characters with NULL - basically that's what strtok() does.
Then you need to go through your modified source string second time, advancing one token at a time, and find the longest one, using strlen().
This sounds similar to the standard UNIX 'strings' utility.
Keep track of the longest run of printable characters terminated by a NULL.
Walk through the bytes until you hit a printable character. Start counting. If you hit a non-printable character stop counting and throw away the starting point. If you hit a NULL, check to see if the length of the current run is greater then the previous record holder. If so record it, and start looking for the next string.
What defines the "good" substrings compared to the many others -- being lowercase alphas only? (i.e., no spaces, digits, punctuation, uppercase, &c)?
Whatever the predicate P that checks for a character being "good", a single pass over s applying P to each character lets you easily identify the start and end of each "run of good characters", and remember and pick the longest. In pseudocode:
longest_run_length = 0
longest_run_start = longest_run_end = null
status = bad
for i in (all indices over s):
if P(s[i]): # current char is good
if status == bad: # previous one was bad
current_run_start = current_run_end = i
status = good
else: # previous one was also good
current_run_end = i
else: # current char is bad
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length
status = bad
# if a good run ends with end-of-string:
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length
Why use strlen() at all?
Here's my version which uses no function whatsoever.
#ifdef UNIT_TEST
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#endif
/*
// largest_letter_sequence()
// Returns a pointer to the beginning of the largest letter
// sequence (including trailing characters which are not letters)
// or NULL if no letters are found in s
// Passing NULL in `s` causes undefined behaviour
// If the string has two or more sequences with the same number of letters
// the return value is a pointer to the first sequence.
// The parameter `len`, if not NULL, will have the size of the letter sequence
//
// This function assumes an ASCII-like character set
// ('z' > 'a'; 'z' - 'a' == 25; ('a' <= each of {abc...xyz} <= 'z'))
// and the same for uppercase letters
// Of course, ASCII works for the assumptions :)
*/
const char *largest_letter_sequence(const char *s, size_t *len) {
const char *p = NULL;
const char *pp = NULL;
size_t curlen = 0;
size_t maxlen = 0;
while (*s) {
if ((('a' <= *s) && (*s <= 'z')) || (('A' <= *s) && (*s <= 'Z'))) {
if (p == NULL) p = s;
curlen++;
if (curlen > maxlen) {
maxlen = curlen;
pp = p;
}
} else {
curlen = 0;
p = NULL;
}
s++;
}
if (len != NULL) *len = maxlen;
return pp;
}
#ifdef UNIT_TEST
void fxtest(const char *s) {
char *test;
const char *p;
size_t len;
p = largest_letter_sequence(s, &len);
if (len && (len < 999)) {
test = malloc(len + 1);
if (!test) {
fprintf(stderr, "No memory.\n");
return;
}
strncpy(test, p, len);
test[len] = 0;
printf("%s ==> %s\n", s, test);
free(test);
} else {
if (len == 0) {
printf("no letters found in \"%s\"\n", s);
} else {
fprintf(stderr, "ERROR: string too large\n");
}
}
}
int main(void) {
fxtest("(2034HEY!!11 th[]thisiswhatwewant44");
fxtest("123456789");
fxtest("");
fxtest("aaa%ggg");
return 0;
}
#endif
While I waited for you to post this as a question I coded something up.
This code iterates through a string passed to a "longest" function, and when it finds the first of a sequence of letters it sets a pointer to it and starts counting the length of it. If it is the longest sequence of letters yet seen, it sets another pointer (the 'maxStringStart' pointer) to the beginning of that sequence until it finds a longer one.
At the end, it allocates enough room for the new string and returns a pointer to it.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int isLetter(char c){
return ( (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') );
}
char *longest(char *s) {
char *newString = 0;
int maxLength = 0;
char *maxStringStart = 0;
int curLength = 0;
char *curStringStart = 0;
do {
//reset the current string length and skip this
//iteration if it's not a letter
if( ! isLetter(*s)) {
curLength = 0;
continue;
}
//increase the current sequence length. If the length before
//incrementing is zero, then it's the first letter of the sequence:
//set the pointer to the beginning of the sequence of letters
if(curLength++ == 0) curStringStart = s;
//if this is the longest sequence so far, set the
//maxStringStart pointer to the beginning of it
//and start increasing the max length.
if(curLength > maxLength) {
maxStringStart = curStringStart;
maxLength++;
}
} while(*s++);
//return null pointer if there were no letters in the string,
//or if we can't allocate any memory.
if(maxLength == 0) return NULL;
if( ! (newString = malloc(maxLength + 1)) ) return NULL;
//copy the longest string into our newly allocated block of
//memory (see my update for the strlen() only requirement)
//and null-terminate the string by putting 0 at the end of it.
memcpy(newString, maxStringStart, maxLength);
newString[maxLength + 1] = 0;
return newString;
}
int main(int argc, char *argv[]) {
int i;
for(i = 1; i < argc; i++) {
printf("longest all-letter string in argument %d:\n", i);
printf(" argument: \"%s\"\n", argv[i]);
printf(" longest: \"%s\"\n\n", longest(argv[i]));
}
return 0;
}
This is my solution in simple C, without any data structures.
I can run it in my terminal like this:
~/c/t $ ./longest "hello there, My name is Carson Myers." "abc123defg4567hijklmnop890"
longest all-letter string in argument 1:
argument: "hello there, My name is Carson Myers."
longest: "Carson"
longest all-letter string in argument 2:
argument: "abc123defg4567hijklmnop890"
longest: "hijklmnop"
~/c/t $
the criteria for what constitutes a letter could be changed in the isLetter() function easily. For example:
return (
(c >= 'a' && c <= 'z') ||
(c >= 'A' && c <= 'Z') ||
(c == '.') ||
(c == ' ') ||
(c == ',') );
would count periods, commas and spaces as 'letters' also.
as per your update:
replace memcpy(newString, maxStringStart, maxLength); with:
int i;
for(i = 0; i < maxLength; i++)
newString[i] = maxStringStart[i];
however, this problem would be much more easily solved with the use of the C standard library:
char *longest(char *s) {
int longest = 0;
int curLength = 0;
char *curString = 0;
char *longestString = 0;
char *tokens = " ,.!?'\"()#$%\r\n;:+-*/\\";
curString = strtok(s, tokens);
do {
curLength = strlen(curString);
if( curLength > longest ) {
longest = curLength;
longestString = curString;
}
} while( curString = strtok(NULL, tokens) );
char *newString = 0;
if( longest == 0 ) return NULL;
if( ! (newString = malloc(longest + 1)) ) return NULL;
strcpy(newString, longestString);
return newString;
}
First, define "string" and define "garbage". What do you consider a valid, non-garbage string? Write down a concrete definition you can program - this is how programming specs get written. Is it a sequence of alphanumeric characters? Should it start with a letter and not a digit?
Once you get that figured out, it's very simple to program. Start with a naive method of looping over the "garbage" looking for what you need. Once you have that, look up useful C library functions (like strtok) to make the code leaner.
Another variant.
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[] = "(2034HEY!!11 th[]thisiswhatwewant44";
int len = strlen(s);
int i = 0;
int biggest = 0;
char* p = s;
while (p[0])
{
if (!((p[0] >= 'A' && p[0] <= 'Z') || (p[0] >= 'a' && p[0] <= 'z')))
{
p[0] = '\0';
}
p++;
}
for (; i < len; i++)
{
if (s[i] && strlen(&s[i]) > biggest)
{
biggest = strlen(&s[i]);
p = &s[i];
}
}
printf("%s\n", p);
return 0;
}