checking the end of a c string array [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
Im trying to find the end of a word for example school and at the end of a word put a s on to it.
Here is what I have so far:
for (int i = 0; i < 10; i++) {
plural[i] = orig[i];
if (plural[i] == NULL) {
plural[i] = 's';
plural[i + 1] = '\0';
}
}

Your code may function of the string in orig is less than 10 characters long and NULL is defined as 0.
But note that NULL is used to represent the null pointer, not the null byte at the end of a string. NULL can be defined this way:
#define NULL ((void*)0)
In this case, your code would generate a warning upon compilation.
It is considered good style to write '\0' for the null byte at the end of a string. The null character constant has the same value, 0 with type int, but is more explicit for the purpose of representing the null byte.
You should test if orig[i] is non null instead of iterating to 10:
char orig[] = "world";
char plural[10];
int i;
for (i = 0; orig[i] != '\0'; i++) {
plural[i] = orig[i];
}
plural[i] = 's';
plural[i + 1] = '\0';

You can determine the position of null-terminator ('\0') by the following:
int len;
for(len = 0; cstr[len]; ++len);
This is a possible minimal implementation of strlen which is stands for to determine the length of a char array. In example:
#include <stdio.h>
int main() {
char cstr[10] = "Test";
size_t len;
for(len = 0; cstr[len]; ++len);
if(sizeof(cstr) > len + 1) {
cstr[len++] = 's';
cstr[len] = '\0';
}
printf("%s\n", cstr);
}
Note: As David C. Rankin mentioned in comments, you have to protect the array bounds. Knowing that this is an array, you can read its size with the sizeof operator.

The most important part of this exercise is to insure you learn to protect your array bounds. If you declare an array of char [10], then the longest string it can hold is 9-chars + the nul-byte. If you plan to add a character to the end (e.g. 's'), then that means the original string can be no longer than 8-chars.
If you declare plural as:
char plural[10] = ""; /* note the initialization to all `0` */
then the maximum number of characters that can be held in plural in order to use plural as a string is sizeof plural - 1 chars (*preserving space for the nul-byte). So you can set a max for the length of your string with:
char plural[10] = ""; /* note the initialization to all `0` */
int max = sizeof plural - 1;
Then after you find your original string length, you can validate that there is sufficient room for the nul-byte, e.g.
if (len >= max) { /* validate room for 's' available */
fprintf (stderr, "error: adding 's' will exceed array size.\n");
return 1;
}
Putting all the pieces together in a short example, you could do something similar to the following:
#include <stdio.h>
int main (int argc, char **argv) {
char plural[10] = "", *def = "school";
int len = 0,
max = sizeof plural - 1;
if (argc == 1) { /* if no argument given, copy def to plural */
char *p = def;
for (int i = 0; *p && i < max; i++, len++)
plural[i] = *p++;
}
else /* otherwise copy argv[1] to plural */
len = snprintf (plural, max, "%s", argv[1]);
if (len >= max) { /* validate room for 's' available */
fprintf (stderr, "error: adding 's' will exceed array size.\n");
return 1;
}
plural[len] = 's'; /* add 's' - (nul-terminated via initialization) */
printf ("original : %s\nappended : %s\n", argc > 1 ? argv[1] : def, plural);
return 0;
}
Example Use/Output
$ ./bin/plural
original : school
appended : schools
$ ./bin/plural 12345678
original : 12345678
appended : 12345678s
$ ./bin/plural 123456789
error: adding 's' will exceed array size.
note: if you are more comfortable with array indexes than with pointer arithmetic, you can use the following equivalent statement for the length finding and copy:
if (argc == 1) /* if no argument given, copy def to plural */
for (int i = 0; def[i] && i < max; i++, len++)
plural[i] = def[i];
else /* otherwise copy argv[1] to plural */
len = snprintf (plural, max, "%s", argv[1]);
Look things over. There are many, many different ways to approach this. A normal addition would be to include string.h and use strlen and strcpy or memcpy instead of a loop to find your length or copy characters to plural (note: for long strings memcpy will be more efficient, but for 10 char -- it makes no difference) Let me know if you have any questions.

Related

How do you count the frequency of which a word of n length occurs within a string

I have this code here that correctly formats the hard-coded sentence and finds the frequency of which a certain letter shows up in that string:
#include <stdio.h>
#include <string.h>
int main() {
char words[1000][100];
int x = 0, y;
char myString[10000] = "The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping";
printf("Original Text:\n");
printf("%s\n", myString);
// Function for uppercase letters to become lowercase and to remove special characters
for (x = 0; x <= strlen(myString); ++x) {
if (myString[x] >= 65 && myString[x] <= 90)
myString[x] = myString[x] + 32;
}
for (x = 0; myString[x] != '\0'; ++x) {
while (!(myString[x] >= 'a' && myString[x] <= 'z') &&
!(myString[x] >= 'A' && myString[x] <= 'Z') &&
!(myString[x] >= '0' && myString[x] <= '9') &&
!(myString[x] == '\0') && !(myString[x] == ' ')) {
for (y = x; myString[y] != '\0'; ++y) {
myString[y] = myString[y + 1];
}
myString[y] = '\0';
}
}
printf("\nModified Text: \n%s\n", myString);
// Part A
int counts[26] = { 0 };
int k;
size_t myString_length = strlen(myString);
for (k = 0; k < myString_length; k++) {
char c = myString[k];
if (!isalpha(c))
continue;
counts[(int)(c - 'a')]++;
}
printf("\nLetter\tCount\n------ -----\n");
for (k = 0; k < 26; ++k) {
printf("%c\t%d\n", k + 'a', counts[k]);
}
// Part B
int i = 0, count = 0, occurrences[10000] = { 0 };
while (myString[i] != '\0') {
char wordArray[100];
int j = 0;
while (myString[i] != ' ' && myString[i] != '\0') {
wordArray[j++] = myString[i++];
}
if (wordArray[j - 1] == ',' || wordArray[j - 1] == '.') {
wordArray[j - 1] = '\0';
}
wordArray[j] = '\0';
int status = -1;
for (j = 0; j < count; ++j) {
if (strcmp(words[j], wordArray) == 0) {
status = j;
break;
}
}
if (status != -1) {
occurrences[status] += 1;
} else {
occurrences[count] += 1;
strcpy(words[count++], wordArray);
}
++i;
}
printf("\nWord Length\tOccurrences\n----------- -----------\n");
for (i = 0; i < count; ++i) {
// print each word and its occurrences
printf("%s\t\t%d\n", words[i], occurrences[i]);
}
}
Part B is where I'm having a problem though, I want the code to be able to tell me the occurrence of which a word of a specific length shows up, such as this instance:
Word length Occurrences
1 0
2 1
Here, there are no instances where there is a word with one character, but there is one instance where there is a word with two characters. However, my code is outputting the number of times a specific word is given and not what I want above, like this:
Word Length Occurrences
----------- -----------
the 3
quick 1
brown 1
3
fox 1
jumps 1
over 1
lazy 2
dog 2
and 1
is 1
still 1
sleeping 1
How would I go about changing it so that it shows the output I want with just the word length and frequency?
Here are some remarks about your code:
the first loop recomputes the length of the string for each iteration: for (x = 0; x <= strlen(myString); ++x). Since you modify the string inside the loop, it is difficult for the compiler to ascertain that the string length does not change, so a classic optimisation may not work. Use the same test as for the next loop:
for (x = 0; myString[x] != '\0'; ++x)
the test for uppercase is not very readable because you hardcode the ASCII values of the letters A and Z, you should either write:
if (myString[x] >= 'A' && myString[x] <= 'Z')
myString[x] += 'a' - 'A';
or use macros from <ctype.h>:
unsigned char c = myString[x];
if (isupper(c))
myString[x] = tolower(c);
or equivalently and possibly more efficiently:
myString[x] = tolower((unsigned char)myString[x]);
in the second loop, you remove characters that are neither letters, digits nor spaces. You have a redundant nested while loop and a third nested loop to shift the rest of the array for each byte removed: this method has cubic time complexity, O(N3), very inefficient. You should instead use a two finger method that operates in linear time:
for (x = y = 0; myString[x] != '\0'; ++x) {
unsigned char c = myString[x];
if (!isalnum(c) && c != ' ') {
myString[y++] = c;
}
}
myString[y] = '\0';
note that this loop removes all punctuation instead of replacing it with spaces: this might glue words together such as "a fine,good man" -> "a finegood man"
In the third loop, you use a char value c as an argument for isalpha(c). You should include <ctype.h> to use any function declared in this header file. Functions and macros from <ctype.h> are only defined for all values of the type unsigned char and the special negative value EOF. If type char is signed on your platform, isalpha(c) would have undefined behavior if the string has negative characters. In your particular case, you filtered characters that are not ASCII letters, digits or space, so this should not be a problem, yet it is a good habit to always use unsigned char for the character argument to isalpha() and equivalent functions.
Note also that this counting phase could have been combined into the previous loops.
to count the occurrences of words, the array occurrences should have the same number of elements as the words array, 1000. You do not check for boundaries so you have undefined behavior if there are more than 1000 different words and/or if any of these words has 100 characters or more.
in the next loop, you extract words from the string, incrementing i inside the nested loop body. You also increment i at the end of the outer loop, hence skipping the final null terminator. The test while (myString[i] != '\0') will test bytes beyond the end of the string, which is incorrect and potential undefined behavior.
to avoid counting empty words in this loop, you should skip sequences of spaces before copying the word if not at the end of the string.
According to the question, counting individual words is not what Part B is expected to do, you should instead count the frequency of word lengths. You can do this in the first loop by keeping track of the length of the current word and incrementing the array of word length frequencies when you find a separator.
Note that modifying the string is not necessary to count letter frequencies or word length occurrences.
Writing a separate function for each task is recommended.
Here is a modified version:
#include <ctype.h>
#include <stdio.h>
#define MAX_LENGTH 100
// Function to lowercase letters and remove special characters
void clean_string(char *str) {
int x, y;
printf("Original Text:\n");
printf("%s\n", str);
for (x = y = 0; str[x] != '\0'; x++) {
unsigned char c = str[x];
c = tolower(c);
if (isalnum(c) || c == ' ') {
str[y++] = c;
}
}
str[y] = '\0';
printf("\nModified Text:\n%s\n", str);
}
// Part A: count letter frequencies
void count_letters(const char *str) {
int letter_count['z' - 'a' + 1] = { 0 };
for (int i = 0; str[i] != '\0'; i++) {
unsigned char c = str[i];
if (c >= 'a' && c <= 'z') {
letter_count[c - 'a'] += 1;
} else
if (c >= 'A' && c <= 'Z') {
letter_count[c - 'A'] += 1;
}
}
printf("\nLetter\tCount"
"\n------\t-----\n");
for (int c = 'a'; c <= 'z'; c++) {
printf("%c\t%d\n", c, letter_count[c - 'a']);
}
}
// Part B: count word lengths frequencies
void count_word_lengths(const char *str) {
int length_count[MAX_LENGTH + 1] = { 0 };
for (int i = 0, len = -1;; i++) {
unsigned char c = str[i];
// counting words as sequences of letters or digits
if (isalnum(c)) {
len++;
} else {
if (len >= 0 && len <= MAX_LENGTH) {
length_count[len] += 1;
len = -1;
}
}
if (c == '\0')
break;
}
printf("\nWord Length\tOccurrences"
"\n-----------\t-----------\n");
for (int len = 0; len <= MAX_LENGTH; len++) {
if (length_count[len]) {
printf("%-11d\t%d\n", len, length_count[len]);
}
}
}
int main() {
char myString[] = "The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping";
// Uncomment if modifying the string is required
//clean_string(myString);
count_letters(myString);
count_word_lengths(myString);
return 0;
}
Output:
Letter Count
------ -----
a 3
b 1
c 1
d 3
e 6
f 1
g 3
h 3
i 4
j 1
k 1
l 5
m 1
n 3
o 5
p 2
q 1
r 2
s 4
t 4
u 2
v 1
w 1
x 1
y 2
z 2
Word Length Occurrences
----------- -----------
1 1
2 7
3 3
4 4
7 1
Use strtok_r() and simplify counting.
It's sibling strtok() is not thread-safe. Discussed in detail in Why is strtok() Considered Unsafe?
Also, strtok_r() chops input string by inserting \0 chars inside the string. If you want to keep a copy of original string, you have to make a copy of original string and pass it on to strtok_r().
There is also another catch. strtok_r() is not a part of C-Standard yet, but POSIX-2008 lists it. GNU glibc implements it, but to access this function we need to #define _POSIX_C_SOURCE before any includes in our source files.
There is also strdup() & strndup() which duplicate an input string, they allocate memory for you. You've to free that string-memory when you're done using it. strndup() was added in POSIX-2008 so we declare 200809L in our sources to use it.
It's always better to use new standards to write fresh code. POSIX 200809L is recommended with at least C standard 2011.
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define MAX_STR_LEN 1024
#define MAX_WORD_LEN 128
#define WORD_DELIMS " \n\t"
int is_word (const char* str, const size_t slen) {
int word = 0;
for (size_t ci = 0; ci < slen;)
if (isalnum (str[ci++])) {
word = 1;
break;
}
return word;
}
void get_word_stat (const char* str, int word_stat[]) {
char *copy = strndup (str, MAX_STR_LEN); // limiting copy
if (!copy) { // copying failed
printf ("Error duplicating input string\n");
exit (1);
}
for (char *token, *rmdStr = copy; (token = strtok_r (NULL, WORD_DELIMS, &rmdStr)); /* empty */) {
size_t token_len = strlen (token);
if (token_len > (MAX_WORD_LEN - 1)) {
printf ("Error: Increase MAX_WORD_LEN(%d) to handle words of length %lu\n", MAX_WORD_LEN, token_len);
exit (2);
}
if (is_word (token, token_len))
++word_stat[token_len];
else
printf ("[%s] not a word\n", token);
}
free (copy);
}
int main () {
char str [MAX_STR_LEN] = "The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping";
printf ("Original Text: [%s]\n", str);
int word_stat[MAX_WORD_LEN] = {0};
get_word_stat (str, word_stat);
printf ("\nWordLength Occurrences\n");
for (int si = 1; si < MAX_WORD_LEN; ++si) {
if (word_stat[si])
printf ("%d\t\t%d\n", si, word_stat[si]);
}
return 0;
}
Whenever you are interested in the frequency that something occurs, you want to use a Frequency Array containing the number of elements necessary to handle the entire range of possible occurrence. You want to track the frequency of word-lengths, so you need an array that is sized to track the longest word. (longest word in the non-medical unabridged dictionary is 29-characters, longest medical word is 45-characters)
So here a simple array of integers with 29 elements will do (unless you want to consider medical words, then use 45). If you want to consider non-sense words, then size appropriately, e.g. "Supercalifragilisticexpialidocious", 34-characters. Chose the type based on a reasonably anticipated maximum number of occurrences. Using signed int that limits the occurrences to INT_MAX (2147483647). Using unsigned will double the limit, or using uint64_t for a full 64-bit range.
How it works
How do you use a simple array to tract the occurrences of word lengths? Simple, declare an array of sufficient size and initialize all elements zero. Now all you do is read a word, use, e.g. size_t len = strlen(word); to get the length and then increment yourarray[len] += 1;.
Say the word has 10-characters, you will add one to yourarray[10]. So the array index corresponds word-length. When you have taken the length of all words and incremented the corresponding array index, to get your results, you just loop over your array and output the value (number of occurrences) at the index (word-length). If you have had two words that were 10-characters each, then yourarray[10] will contain 2 (and so on and so forth for every other index that corresponds to a different word-length number of characters).
Consideration When Choosing How to Separate Words
When selecting a method to split a string of space separated words into individual words, you need to know whether your original string is mutable. For example, if you choose to separate words with strtok(), it will modify the original string. In your case since your words are stored in an array or char, that is fine, but what if you had a string-literal like:
char *mystring = "The quick Brown ? Fox ? jumps over the Lazy Dog ";
In that case, passing mystring to strtok() would SEGFAULT when strtok() attempts to modify the region of read-only memory holding mystring (ignoring the non-standard treatment of string-literals by Microsoft)
You can of course make a copy of mystring and put the string-literal in mutable memory and then call strtok() on the copy. Or, you can use a method that does not modify mystring (like using sscanf() and an offset to parse the words, or using alternating calls to strcspn() and strspn() to locate and skip whitespace, or simply using a start and end pointer to work down the string bracketing words and copying characters between the pointers. Entirely up to you.
For example, using sscanf() with an offset to work down the string, updating the offset from the beginning with the number of characters consumed during each read you could do:
char *mystring = "The quick Brown ? Fox ? jumps over the Lazy Dog "
"and the !##! LAZY DOG is still sleeping",
*p = mystring, /* pointer to mystring to parse */
buf[MAXLEN] = ""; /* temporary buffer to hold each word */
int nchar = 0, /* characters consumed by sscanf */
offset = 0; /* offset from beginning of mystring */
/* loop over each word in mystring using sscanf and offset */
while (sscanf (p + offset, "%s%n", buf, &nchar) == 1) {
size_t len = strlen (buf); /* length of word */
offset += nchar; /* update offset with nchar */
/* do other stuff here */
}
Testing if Words is Alphanum
You can loop over each character calling the isalnum() macro from ctype.h on each character. Or, you can let strspn() do it for you given a list of characters that your words can contain. For example for digits and alpha-characters only, you can use a simple constant, and then call strspn() in your loop to determine if the word is made up only of the characters you will accept in a word, e.g.
#define ACCEPT "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
...
/* use strspn to test that word is valid (alphanum) or get next word */
if (strspn (buf, ACCEPT) != len) {
fprintf (stderr, " error: rejecting \"%s\"\n", buf); /* optional */
continue;
}
...
Neither way is more-right than the other, it's really a matter of convenience and readability. Using a library provided function also provides a bit of confidence that it is written in a manner that will allow the compiler to fully optimize the compiled code.
A Short Example
Putting the thoughts above together in a short example that will parse the words in mystring using sscanf() and then track the occurrences of all alphanum words (up to 31-characters, and outputting any word rejected) using a simple array of integers to hold the frequency of length, you could do:
#include <stdio.h>
#include <string.h>
#define MAXLEN 32 /* if you need a constant, #define one (or more) */
#define ACCEPT "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
int main (void) {
char *mystring = "The quick Brown ? Fox ? jumps over the Lazy Dog "
"and the !##! LAZY DOG is still sleeping",
*p = mystring, /* pointer to mystring to parse */
buf[MAXLEN] = ""; /* temporary buffer to hold each word */
int nchar = 0, /* characters consumed by sscanf */
offset = 0, /* offset from beginning of mystring */
lenfreq[MAXLEN] = {0}; /* frequency array for word length */
/* loop over each word in mystring using sscanf and offset */
while (sscanf (p + offset, "%s%n", buf, &nchar) == 1) {
size_t len = strlen (buf); /* length of word */
offset += nchar; /* update offset with nchar */
/* use strspn to test that word is valid (alphanum) or get next word */
if (strspn (buf, ACCEPT) != len) {
fprintf (stderr, " error: rejecting \"%s\"\n", buf); /* optional */
continue;
}
lenfreq[len] += 1; /* update frequency array of lengths */
}
/* output original string */
printf ("\nOriginal Text:\n\n%s\n\n", mystring);
/* output length frequency array */
puts ("word length Occurrences\n"
"----------- -----------");
for (size_t i = 0; i < MAXLEN; i++) {
if (lenfreq[i])
printf ("%2zu%14s%d\n", i, " ", lenfreq[i]);
}
}
Example Use/Output
Compiling and running the program would produce:
$ ./bin/wordlen-freq
error: rejecting "?"
error: rejecting "?"
error: rejecting "!##!"
Original Text:
The quick Brown ? Fox ? jumps over the Lazy Dog and the !##! LAZY DOG is still sleeping
word length Occurrences
----------- -----------
2 1
3 7
4 3
5 4
8 1
(note: you can output all lengths from 0 to 31 even if there were no occurrences by removing the print condition if (lenfreq[i]) -- up to you)
Look things over and let me know if you have questions.

add additional letters in a string if there are two same letters beside each other

I'm trying to add an additional letter if there are two equal letters beside each other.
That's what I was thinking, but it doesn't put in an x between the two letters; instead of that, it copies one of the double letters, and now I have, for example, MMM instead of MXM.
for (index_X = 0; new_text[index_X] != '\0'; index_X++)
{
if (new_text[index_X] == new_text[index_X - 1])
{
double_falg = 1;
}
text[index_X] = new_text[index_X];
}
if (double_falg == 1)
{
for (counter_X = 0; text[counter_X] != '\0'; counter_X++)
{
transfer_X = counter_X;
if (text[transfer_X - 1] == text[transfer_X])
{
text_X[transfer_X] = 'X';
cnt_double++;
printf("%c\n", text[transfer_X]);
}
text_X[transfer_X] = text[transfer_X - cnt_double];
}
printf("%s\n", text_X);
}
If you're trying to create the modified array in text_X, copying data from new_text and putting an X between adjacent repeated letters (ignoring the possibility that the input contains XX), then you only need:
char new_text[] = "data with appalling repeats";
char text_X[SOME_SIZE];
int out_pos = 0;
for (int i = 0; new_text[i] != '\0'; i++)
{
text_X[out_pos++] = new_text[i];
if (new_text[i] == new_text[i+1])
text_X[out_pos++] = 'X';
}
text_X[out_pos] = '\0';
printf("Input: [%s]\n", new_text);
printf("Output: [%s]\n", text_X);
When wrapped in a basic main() function (and enum { SOME_SIZE = 64 };), that produces:
Input: [data with appalling repeats]
Output: [data with apXpalXling repeats]
To deal with repeated X's in the input, you could use:
text_X[out_pos++] = (new_text[i] == 'X') ? 'Q' : 'X';
It seems that your approach is more complicated than needed - too many loops and too many arrays involved. A single loop and two arrays should do.
The code below iterates the original string with idx to track position and uses the variable char_added to count how many extra chars that has been added to the new array.
#include <stdio.h>
#define MAX_LEN 20
int main(void) {
char org_arr[MAX_LEN] = "aabbcc";
char new_arr[MAX_LEN] = {0};
int char_added = 0;
int idx = 1;
new_arr[0] = org_arr[0];
if (new_arr[0])
{
while(org_arr[idx])
{
if (org_arr[idx] == org_arr[idx-1])
{
new_arr[idx + char_added] = '*';
++char_added;
}
new_arr[idx + char_added] = org_arr[idx];
++idx;
}
}
puts(new_arr);
return 0;
}
Output:
a*ab*bc*c
Note: The code isn't fully tested. Also it lacks out-of-bounds checking.
There is a lot left to be desired in your Minimal, Complete, and Verifiable Example (MCVE) (MCVE). However, that said, what you will need to do is fairly straight-forward. Take a simple example:
"ssi"
According to your statement, you need to add a character between the adjacent 's' characters. (you can use whatever you like for the separator, but if your input are normal ASCII character, then you can set the current char to the next ASCII character (or subtract one if current is the last ASCII char '~')) See ASCII Table and Description.
For example, you could use memmove() to shift all characters beginning with the current character up by one and then set the current character to the replacement. You also need to track the current length so you don't write beyond your array bounds.
A simple function could be:
#include <stdio.h>
#include <string.h>
#define MAXC 1024
char *betweenduplicates (char *s)
{
size_t len = strlen(s); /* get length to validate room */
if (!len) /* if empty string, nothing to do */
return s;
for (int i = 1; s[i] && len + 1 < MAXC; i++) /* loop until end, or out of room */
if (s[i-1] == s[i]) { /* adjacent chars equal? */
memmove (s + i + 1, s + i, len - i + 1); /* move current+ up by one */
if (s[i-1] != '~') /* not last ASCII char */
s[i] = s[i-1] + 1; /* set to next ASCII char */
else
s[i] = s[i-1] - 1; /* set to previous ASCII char */
len += 1; /* add one to len */
}
return s; /* convenience return so it can be used immediately if needed */
}
A short example program taking the string to check as the first argument could be:
int main (int argc, char **argv) {
char str[MAXC];
if (argc > 1) /* if argument given */
strcpy (str, argv[1]); /* copy to str */
else
strcpy (str, "mississippi"); /* otherwise use default */
puts (str); /* output original */
puts (betweenduplicates (str)); /* output result */
}
Example Use/Output
$ ./bin/betweenduplicated
mississippi
mistsistsipqpi
or when there is nothing to replace:
$ ./bin/betweenduplicated dog
dog
dog
Or checking the extremes:
$ ./bin/betweenduplicated "two spaces and alligators ~~"
two spaces and alligators ~~
two ! spaces ! and ! almligators ! ~}~
There are a number of ways to approach it. Let me know if you have further questions.

How to rearrange array using spaces?

I'm struggling with rearranging my array. I have used from single to multiple loops trying to put spaces (white characters) between two pairs of characters, but I was constantly rewriting the original input. So there is always an input of even length, for example ABCDEFGH. And my task would be to extend the size of the array by putting spaces after every 2 chars (except the last one).
So the output would be:
AB CD EF GH
So the size of output (if I'm correct) will be (2*input_len)-1
Thanks.
EDIT:
This is my code so far
// output = "ABCDEFGHIJKL
char c1;
char c2;
char c3;
int o_len = strlen(output);
for(int i = 2; i < o_len + olen/2; i = i + 3){
if(i == 2){
c1 = output[i];
c2 = output[i+1];
c3 = output[i+2];
output[i] = ' ';
output[i+1] = c1;
output[i+2] = c2;
}
else{
c1 = output[i];
c2 = output[i+1];
output[i] = ' ';
output[i+1] = c3;
output[i+2] = c1;
c3 = c2;
}
}
So the first 3 pairs are printed correctly, then it is all a mess.
Presuming you need to store the space separate result, probably the easiest way to go about inserting the spaces is simply to use a pair of pointers (one to your input string and one to your output string) and then just loop continually writing a pair to your output string, increment both pointers by 2, check whether you are out of characters in your input string (if so break; and nul-terminate your output string), otherwise write a space to your output string and repeat.
You can do it fairly simply using memcpy (or you can just copy 2-chars to the current pointer and pointer + 1, your choice, but since you already include string.h for strlen() -- make it easy on yourself) You can do something similar to:
#include <stdio.h>
#include <string.h>
#define ARRSZ 128 /* constant for no. of chars in output string */
int main (int argc, char **argv) {
char *instr = argc > 1 ? argv[1] : "ABCDEFGH", /* in string */
outstr[ARRSZ] = "", /* out string */
*ip = instr, *op = outstr; /* pointers to each */
size_t len = strlen (instr); /* len of instr */
if (len < 4) { /* validate at least 2-pairs worth of input provided */
fputs ("error: less than two-pairs to separate.\n", stderr);
return 1;
}
if (len & 1) { /* validate even number of characters */
fputs ("error: odd number of characters in instr.\n", stderr);
return 1;
}
if (ARRSZ < len + len / 2) { /* validate sufficient storage in outstr */
fputs ("error: insufficient storage in outstr.\n", stderr);
return 1;
}
for (;;) { /* loop continually */
memcpy (op, ip, 2); /* copy pair to op */
ip += 2; /* increment ip by 2 for next pair */
op += 2; /* increment op by 2 for next pair */
if (!*ip) /* check if last pair written */
break;
*op++ = ' '; /* write space between pairs in op */
}
*op = 0; /* nul-terminate outstr */
printf ("instr : %s\noutstr : %s\n", instr, outstr);
}
Example Use/Output
$ ./bin/strspaceseppairs
instr : ABCDEFGH
outstr : AB CD EF GH
$ ./bin/strspaceseppairs ABCDEFGHIJLMNOPQ
instr : ABCDEFGHIJLMNOPQ
outstr : AB CD EF GH IJ LM NO PQ
Odd number of chars:
$ ./bin/strspaceseppairs ABCDEFGHIJLMNOP
error: odd number of characters in instr.
Or short string:
$ ./bin/strspaceseppairs AB
error: less than two-pairs to separate.
Look things over and let me know if you have further questions.
Edit To Simply Output Single-Pair or Empty-String
Based upon the comment by #chqrlie it may make more sense rather than issuing a diagnostic for a short string, just to output it unchanged. Up to you. You can modify the first conditional and move it after the odd character check in that case, e.g.
if (len & 1) { /* validate even number of characters */
fputs ("error: odd number of characters in instr.\n", stderr);
return 1;
}
if (len < 4) { /* validate at least 2-pairs worth of input provided */
puts(instr); /* (otherwise output unchanged and exit) */
return 0;
}
You can decide how you want to handle any aspect of your program and make the changes accordingly.
I think you are looking for a piece of code like the one below:
This function returns the output splitted array, as you requested to save it.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <math.h>
char* split_by_space(char* str, size_t length, size_t step) {
size_t i = 0, j = 0, spaces = (length / step);
char* splitted = malloc(length + spaces + 1);
for (i = 0, j = 0; i < length; ++i, ++j) {
if (i % step == 0 && i != 0) {
splitted[j] = ' ';
++j;
}
splitted[j] = str[i];
}
splitted[j] = '\0';
return splitted;
}
int main(void) {
// Use size_t instead of int.
size_t step = 2; // Also works with odd numbers.
char str[] = "ABCDEFGH";
char* new_str;
// Works with odd and even steps.
new_str = split_by_space(str, strlen(str), step);
printf("New splitted string is [%s]", new_str);
// Don't forget to clean the memory that the function allocated.
free(new_str);
return 0;
}
When run with a step value of 2, the above code, outputs:
New splitted string is [AB CD EF GH]
Inserting characters inside the array is cumbersome and cannot be done unless you know the array is large enough to accommodate the new string.
You probably want to allocate a new array and create the modified string there.
The length of the new string is not (2 * input_len) - 1, you insert a space every 2 characters, except the last 2: if the string has 2 or fewer characters, its length is unmodified, otherwise it increases by (input_len - 2) / 2. And in case the length is off, you should round this value to the next integer, which is done in integer arithmetics this way: (input_len - 2 + 1) / 2.
Here is an example:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *reformat_with_spaces(const char *str) {
size_t len = strlen(str);
size_t newlen = len > 2 ? len + (len - 2 + 1) / 2 : len;
char *out = malloc(newlen + 1);
if (out) {
for (size_t i = 0, j = 0; i < len; i++) {
if (i > 0 && i % 2 == 0) {
out[j++] = ' ';
}
out[j++] = str[i];
}
out[j] = '\0';
}
return out;
}
int main(void) {
char buf[256];
char *p;
while (fgets(buf, sizeof buf, stdin)) {
buf[strcspn(buf, "\n")] = '\0'; // strip the newline if any
p = reformat_with_spaces(buf);
if (p == NULL) {
fprintf(stderr, "out of memory\n");
return 1;
}
puts(p);
free(p);
}
return 0;
}
Try this,
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
void rearrange(char *str)
{
int len=strlen(str),n=0,i;
char *word=malloc((len+(int)(len/2)));
if(word==NULL)
{
printf("Memory Error");
exit(1);
}
for(i=0;i<len;i++)
{
if( i % 2 == 0 && i != 0)
{
word[n]=' ';
n++;
word[n]=str[i];
n++;
}
else
{
word[n]=str[i];
n++;
}
}
word[n]='\0';
strcpy(str,word);
free(word);
return;
}
int main()
{
char word[40];
printf("Enter word:");
scanf("%s",word);
rearrange(word);
printf("\n%s",word);
return 0;
}
See Below:
The rearrange function saves the letters in str into word. if the current position is divisible by 2 i.e i%2 it saves one space and letter into str, otherwise it saves letter only.

Returning the length of a char array in C

I am new to programming in C and am trying to write a simple function that will normalize a char array. At the end i want to return the length of the new char array. I am coming from java so I apologize if I'm making mistakes that seem simple. I have the following code:
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c functions to analyze it */
int i;
if(isspace(buf[0])){
buf[0] = "";
}
if(isspace(buf[len-1])){
buf[len-1] = "";
}
for(i = 0;i < len;i++){
if(isupper(buf[i])) {
buf[i]=tolower(buf[i]);
}
if(isspace(buf[i])) {
buf[i]=" ";
}
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
}
return strlen(*buf);
}
How can I return the length of the char array at the end? Also does my procedure properly do what I want it to?
EDIT: I have made some corrections to my program based on the comments. Is it correct now?
/* The normalize procedure normalizes a character array of size len
according to the following rules:
1) turn all upper case letters into lower case ones
2) turn any white-space character into a space character and,
shrink any n>1 consecutive whitespace characters to exactly 1 whitespace
When the procedure returns, the character array buf contains the newly
normalized string and the return value is the new length of the normalized string.
*/
int
normalize(unsigned char *buf, /* The character array contains the string to be normalized*/
int len /* the size of the original character array */)
{
/* use a for loop to cycle through each character and the built in c funstions to analyze it */
int i = 0;
int j = 0;
if(isspace(buf[0])){
//buf[0] = "";
i++;
}
if(isspace(buf[len-1])){
//buf[len-1] = "";
i++;
}
for(i;i < len;i++){
if(isupper(buf[i])) {
buf[j]=tolower(buf[i]);
j++;
}
if(isspace(buf[i])) {
buf[j]=' ';
j++;
}
if(isspace(buf[i]) && isspace(buf[i+1])){
//buf[i]="";
i++;
}
}
return strlen(buf);
}
The canonical way of doing something like this is to use two indices, one for reading, and one for writing. Like this:
int normalizeString(char* buf, int len) {
int readPosition, writePosition;
bool hadWhitespace = false;
for(readPosition = writePosition = 0; readPosition < len; readPosition++) {
if(isspace(buf[readPosition]) {
if(!hadWhitespace) buf[writePosition++] = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
return writePosition;
}
Warning: This handles the string according to the given length only. While using a buffer + length has the advantage of being able to handle any data, this is not the way C strings work. C-strings are terminated by a null byte at their end, and it is your job to ensure that the null byte is at the right position. The code you gave does not handle the null byte, nor does the buffer + length version I gave above. A correct C implementation of such a normalization function would look like this:
int normalizeString(char* string) { //No length is passed, it is implicit in the null byte.
char* in = string, *out = string;
bool hadWhitespace = false;
for(; *in; in++) { //loop until the zero byte is encountered
if(isspace(*in) {
if(!hadWhitespace) *out++ = ' ';
hadWhitespace = true;
} else if(...) {
...
}
}
*out = 0; //add a new zero byte
return out - string; //use pointer arithmetic to retrieve the new length
}
In this code I replaced the indices by pointers simply because it was convenient to do so. This is simply a matter of style preference, I could have written the same thing with explicit indices. (And my style preference is not for pointer iterations, but for concise code.)
if(isspace(buf[i])) {
buf[i]=" ";
}
This should be buf[i] = ' ', not buf[i] = " ". You can't assign a string to a character.
if(isspace(buf[i]) && isspace(buf[i+1])){
buf[i]="";
}
This has two problems. One is that you're not checking whether i < len - 1, so buf[i + 1] could be off the end of the string. The other is that buf[i] = "" won't do what you want at all. To remove a character from a string, you need to use memmove to move the remaining contents of the string to the left.
return strlen(*buf);
This would be return strlen(buf). *buf is a character, not a string.
The notations like:
buf[i]=" ";
buf[i]="";
do not do what you think/expect. You will probably need to create two indexes to step through the array — one for the current read position and one for the current write position, initially both zero. When you want to delete a character, you don't increment the write position.
Warning: untested code.
int i, j;
for (i = 0, j = 0; i < len; i++)
{
if (isupper(buf[i]))
buf[j++] = tolower(buf[i]);
else if (isspace(buf[i])
{
buf[j++] = ' ';
while (i+1 < len && isspace(buf[i+1]))
i++;
}
else
buf[j++] = buf[i];
}
buf[j] = '\0'; // Null terminate
You replace the arbitrary white space with a plain space using:
buf[i] = ' ';
You return:
return strlen(buf);
or, with the code above:
return j;
Several mistakes in your code:
You cannot assign buf[i] with a string, such as "" or " ", because the type of buf[i] is char and the type of a string is char*.
You are reading from buf and writing into buf using index i. This poses a problem, as you want to eliminate consecutive white-spaces. So you should use one index for reading and another index for writing.
In C/C++, a native string is an array of characters that ends with 0. So in essence, you can simply iterate buf until you read 0 (you don't need to use the len variable at all). In addition, since you are "truncating" the input string, you should set the new last character to 0.
Here is one optional solution for the problem at hand:
int normalize(char* buf)
{
char c;
int i = 0;
int j = 0;
while (buf[i] != 0)
{
c = buf[i++];
if (isspace(c))
{
j++;
while (isspace(c))
c = buf[i++];
}
if (isupper(c))
buf[j] = tolower(c);
j++;
}
buf[j] = 0;
return j;
}
you should write:
return strlen(buf)
instead of:
return strlen(*buf)
The reason:
buf is of type char* - it's an address of a char somewhere in the memory (the one in the beginning of the string). The string is null terminated (or at least should be), and therefore the function strlen knows when to stop counting chars.
*buf will de-reference the pointer, resulting on a char - not what strlen expects.
Not much different then others but assumes this is an array of unsigned char and not a C string.
tolower() does not itself need the isupper() test.
int normalize(unsigned char *buf, int len) {
int i = 0;
int j = 0;
int previous_is_space = 0;
while (i < len) {
if (isspace(buf[i])) {
if (!previous_is_space) {
buf[j++] = ' ';
}
previous_is_space = 1;
} else {
buf[j++] = tolower(buf[i]);
previous_is_space = 0;
}
i++;
}
return j;
}
#OP:
Per the posted code it implies leading and trailing spaces should either be shrunk to 1 char or eliminate all leading and trailing spaces.
The above answer simple shrinks leading and trailing spaces to 1 ' '.
To eliminate trailing and leading spaces:
int i = 0;
int j = 0;
while (len > 0 && isspace(buf[len-1])) len--;
while (i < len && isspace(buf[i])) i++;
int previous_is_space = 0;
while (i < len) { ...

C - Largest String From a Big One

So pray tell, how would I go about getting the largest contiguous string of letters out of a string of garbage in C? Here's an example:
char *s = "(2034HEY!!11 th[]thisiswhatwewant44";
Would return...
thisiswhatwewant
I had this on a quiz the other day...and it drove me nuts (still is) trying to figure it out!
UPDATE:
My fault guys, I forgot to include the fact that the only function you are allowed to use is the strlen function. Thus making it harder...
Uae strtok() to split your string into tokens, using all non-letter characters as delimiters, and find the longest token.
To find the longest token you will need to organise some storage for tokens - I'd use linked list.
As simple as this.
EDIT
Ok, if strlen() is the only function allowed, you can first find the length of your source string, then loop through it and replace all non-letter characters with NULL - basically that's what strtok() does.
Then you need to go through your modified source string second time, advancing one token at a time, and find the longest one, using strlen().
This sounds similar to the standard UNIX 'strings' utility.
Keep track of the longest run of printable characters terminated by a NULL.
Walk through the bytes until you hit a printable character. Start counting. If you hit a non-printable character stop counting and throw away the starting point. If you hit a NULL, check to see if the length of the current run is greater then the previous record holder. If so record it, and start looking for the next string.
What defines the "good" substrings compared to the many others -- being lowercase alphas only? (i.e., no spaces, digits, punctuation, uppercase, &c)?
Whatever the predicate P that checks for a character being "good", a single pass over s applying P to each character lets you easily identify the start and end of each "run of good characters", and remember and pick the longest. In pseudocode:
longest_run_length = 0
longest_run_start = longest_run_end = null
status = bad
for i in (all indices over s):
if P(s[i]): # current char is good
if status == bad: # previous one was bad
current_run_start = current_run_end = i
status = good
else: # previous one was also good
current_run_end = i
else: # current char is bad
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length
status = bad
# if a good run ends with end-of-string:
if status == good: # previous one was good -> end of run
current_run_length = current_run_end - current_run_start + 1
if current_run_length > longest_run_length:
longest_run_start = current_run_start
longest_run_end = current_run_end
longest_run_length = current_run_length
Why use strlen() at all?
Here's my version which uses no function whatsoever.
#ifdef UNIT_TEST
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#endif
/*
// largest_letter_sequence()
// Returns a pointer to the beginning of the largest letter
// sequence (including trailing characters which are not letters)
// or NULL if no letters are found in s
// Passing NULL in `s` causes undefined behaviour
// If the string has two or more sequences with the same number of letters
// the return value is a pointer to the first sequence.
// The parameter `len`, if not NULL, will have the size of the letter sequence
//
// This function assumes an ASCII-like character set
// ('z' > 'a'; 'z' - 'a' == 25; ('a' <= each of {abc...xyz} <= 'z'))
// and the same for uppercase letters
// Of course, ASCII works for the assumptions :)
*/
const char *largest_letter_sequence(const char *s, size_t *len) {
const char *p = NULL;
const char *pp = NULL;
size_t curlen = 0;
size_t maxlen = 0;
while (*s) {
if ((('a' <= *s) && (*s <= 'z')) || (('A' <= *s) && (*s <= 'Z'))) {
if (p == NULL) p = s;
curlen++;
if (curlen > maxlen) {
maxlen = curlen;
pp = p;
}
} else {
curlen = 0;
p = NULL;
}
s++;
}
if (len != NULL) *len = maxlen;
return pp;
}
#ifdef UNIT_TEST
void fxtest(const char *s) {
char *test;
const char *p;
size_t len;
p = largest_letter_sequence(s, &len);
if (len && (len < 999)) {
test = malloc(len + 1);
if (!test) {
fprintf(stderr, "No memory.\n");
return;
}
strncpy(test, p, len);
test[len] = 0;
printf("%s ==> %s\n", s, test);
free(test);
} else {
if (len == 0) {
printf("no letters found in \"%s\"\n", s);
} else {
fprintf(stderr, "ERROR: string too large\n");
}
}
}
int main(void) {
fxtest("(2034HEY!!11 th[]thisiswhatwewant44");
fxtest("123456789");
fxtest("");
fxtest("aaa%ggg");
return 0;
}
#endif
While I waited for you to post this as a question I coded something up.
This code iterates through a string passed to a "longest" function, and when it finds the first of a sequence of letters it sets a pointer to it and starts counting the length of it. If it is the longest sequence of letters yet seen, it sets another pointer (the 'maxStringStart' pointer) to the beginning of that sequence until it finds a longer one.
At the end, it allocates enough room for the new string and returns a pointer to it.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int isLetter(char c){
return ( (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') );
}
char *longest(char *s) {
char *newString = 0;
int maxLength = 0;
char *maxStringStart = 0;
int curLength = 0;
char *curStringStart = 0;
do {
//reset the current string length and skip this
//iteration if it's not a letter
if( ! isLetter(*s)) {
curLength = 0;
continue;
}
//increase the current sequence length. If the length before
//incrementing is zero, then it's the first letter of the sequence:
//set the pointer to the beginning of the sequence of letters
if(curLength++ == 0) curStringStart = s;
//if this is the longest sequence so far, set the
//maxStringStart pointer to the beginning of it
//and start increasing the max length.
if(curLength > maxLength) {
maxStringStart = curStringStart;
maxLength++;
}
} while(*s++);
//return null pointer if there were no letters in the string,
//or if we can't allocate any memory.
if(maxLength == 0) return NULL;
if( ! (newString = malloc(maxLength + 1)) ) return NULL;
//copy the longest string into our newly allocated block of
//memory (see my update for the strlen() only requirement)
//and null-terminate the string by putting 0 at the end of it.
memcpy(newString, maxStringStart, maxLength);
newString[maxLength + 1] = 0;
return newString;
}
int main(int argc, char *argv[]) {
int i;
for(i = 1; i < argc; i++) {
printf("longest all-letter string in argument %d:\n", i);
printf(" argument: \"%s\"\n", argv[i]);
printf(" longest: \"%s\"\n\n", longest(argv[i]));
}
return 0;
}
This is my solution in simple C, without any data structures.
I can run it in my terminal like this:
~/c/t $ ./longest "hello there, My name is Carson Myers." "abc123defg4567hijklmnop890"
longest all-letter string in argument 1:
argument: "hello there, My name is Carson Myers."
longest: "Carson"
longest all-letter string in argument 2:
argument: "abc123defg4567hijklmnop890"
longest: "hijklmnop"
~/c/t $
the criteria for what constitutes a letter could be changed in the isLetter() function easily. For example:
return (
(c >= 'a' && c <= 'z') ||
(c >= 'A' && c <= 'Z') ||
(c == '.') ||
(c == ' ') ||
(c == ',') );
would count periods, commas and spaces as 'letters' also.
as per your update:
replace memcpy(newString, maxStringStart, maxLength); with:
int i;
for(i = 0; i < maxLength; i++)
newString[i] = maxStringStart[i];
however, this problem would be much more easily solved with the use of the C standard library:
char *longest(char *s) {
int longest = 0;
int curLength = 0;
char *curString = 0;
char *longestString = 0;
char *tokens = " ,.!?'\"()#$%\r\n;:+-*/\\";
curString = strtok(s, tokens);
do {
curLength = strlen(curString);
if( curLength > longest ) {
longest = curLength;
longestString = curString;
}
} while( curString = strtok(NULL, tokens) );
char *newString = 0;
if( longest == 0 ) return NULL;
if( ! (newString = malloc(longest + 1)) ) return NULL;
strcpy(newString, longestString);
return newString;
}
First, define "string" and define "garbage". What do you consider a valid, non-garbage string? Write down a concrete definition you can program - this is how programming specs get written. Is it a sequence of alphanumeric characters? Should it start with a letter and not a digit?
Once you get that figured out, it's very simple to program. Start with a naive method of looping over the "garbage" looking for what you need. Once you have that, look up useful C library functions (like strtok) to make the code leaner.
Another variant.
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[] = "(2034HEY!!11 th[]thisiswhatwewant44";
int len = strlen(s);
int i = 0;
int biggest = 0;
char* p = s;
while (p[0])
{
if (!((p[0] >= 'A' && p[0] <= 'Z') || (p[0] >= 'a' && p[0] <= 'z')))
{
p[0] = '\0';
}
p++;
}
for (; i < len; i++)
{
if (s[i] && strlen(&s[i]) > biggest)
{
biggest = strlen(&s[i]);
p = &s[i];
}
}
printf("%s\n", p);
return 0;
}

Resources