Sort array alphabetically, upper letter always first - c

I want to sort arrays alphabetically but I want to have upper case letter always first. What I already achieved is a simple sort, which doesn't take to account size of letters. Shall I put a special condition for it?
EDIT:
This is what I want to achieve:
AaaAdDcCFfgGhHI should be sorted like this: AAaaCcDdFfGgHhI
#include <stdio.h>
#include <stdlib.h>
#define N 5
int compare(const void *w1, const void *w2);
int main(void) {
char s1[N][15] = {
{ "azghtdffopAsAfp" },
{ "poiuyjklhgADHTp" },
{ "hgjkFfGgBnVUuKk" },
{ "lokijuhygtfrdek" },
{ "AaaAdDcCFfgGhHI" } };
char *wsk;
int i, j;
wsk = s1;
for (i = 0; i < N; i++) {
for (j = 0; j < 15; j++) {
printf("%c", s1[i][j]);
}
printf("\n");
}
for (i = 0; i < N; i++)
qsort(s1[i], 15, sizeof(char), compare);
printf("\n");
for (i = 0; i < N; i++) {
for (j = 0; j < 15; j++) {
printf("%c", s1[i][j]);
}
printf("\n");
}
return 0;
}
int compare(const void *w1, const void *w2) {
char *a1 = w1;
char *a2 = w2;
while (*a1 && *a2) {
register r = tolower(*a1) - tolower(*a2);
if (r)
return r;
++a1;
++a2;
}
return tolower(*a1) - tolower(*a2);
}

We should start by fixing a few issues in your code. First, you need to add #include <ctype.h>. You have declared char *wsk;, and assigned wsk = s1; for no apparent reason. More importantly, these are incompatible types, since s1 is a pointer to an array of 15 chars. And more important still, s1 should be an array of 16 chars! You have forgotten to include space for the '\0' terminator in your character arrays. So, the declaration of s1 needs to become:
char s1[N][16] = { { "azghtdffopAsAfp" },
{ "poiuyjklhgADHTp" },
{ "hgjkFfGgBnVUuKk" },
{ "lokijuhygtfrdek" },
{ "AaaAdDcCFfgGhHI" } };
The call to qsort() can be improved. Rather than use the magic number 15, it would be better to store the length of the strings in a variable. Also, sizeof(char) is always 1:
for (i = 0; i<N; i++) {
size_t s1_len = strlen(s1[i]);
qsort(s1[i], s1_len, 1, compare);
}
In the compare() function itself, you need to change to:
const unsigned char *a1 = w1;
const unsigned char *a2 = w2;
The cast to const will avoid warnings about discarding const qualifiers. The cast to unsigned avoids undefined behavior since the ctype.h functions expect an int argument that is representable as an unsigned char, or equal to EOF. Also, register is a type qualifier: it needs to qualify a type. So you need register int r = ....
But your function is also relying on a property of the encoding of the execution character set that is not guaranteed by the Standard: that the letters are encoded in alphabetic sequence. You have taken the first step towards portability by using the tolower() function, rather than adding magic numbers to change the case of the characters. By using isupper() and islower() to test the case of characters, and by using strcoll() to test the ordering of characters, we can achieve something approaching maximum portability. strcoll() automatically orders uppercase letters before lowercase if it is appropriate for the locale, but it appears that all uppercase letters precede the lowercase, so an explicit test will be necessary to order two characters that compare equal after conversion to lowercase. One obstacle to overcome is that strcoll() compares strings for lexical ordering. To use it to compare characters we can deploy compound literals:
register int r = strcoll((const char[]){tolower(*c1), '\0'},
(const char[]){tolower(*c2), '\0'});
There is a loop in your compare() function that makes no sense to me. The compare() function should just compare two chars; there is no need to loop through anything, so I have removed this loop.
I wrote a new compare() function that uses strcoll() and compound literals to portably compare two chars. If the two characters compare equal (up to case), then their cases are checked. If the cases differ, the uppercase character is taken to come before the lowercase character.
#include <stdio.h>
#include <stdlib.h>
#include <string.h> // added for strlen() and strcoll()
#include <ctype.h> // must add this
#define N 5
int compare(const void *w1, const void *w2);
int main(void) {
/* Inner dimension should be 16 to include '\0' */
char s1[N][16] = { { "azghtdffopAsAfp" },
{ "poiuyjklhgADHTp" },
{ "hgjkFfGgBnVUuKk" },
{ "lokijuhygtfrdek" },
{ "AaaAdDcCFfgGhHI" } };
// char *wsk; // don't need this
int i, j;
// wsk = s1; // don't need this, also incompatible
for (i = 0; i<N; i++) {
for (j = 0; j<15; j++) {
printf("%c", s1[i][j]);
}
printf("\n");
}
for (i = 0; i<N; i++) {
size_t s1_len = strlen(s1[i]);
qsort(s1[i], s1_len, 1, compare); // improved call to qsort()
}
printf("\n");
for (i = 0; i<N; i++) {
for (j = 0; j<15; j++) {
printf("%c", s1[i][j]);
}
printf("\n");
}
return 0;
}
int compare(const void *a1, const void *a2) {
const unsigned char *c1 = a1;
const unsigned char *c2 = a2;
register int r = strcoll((const char[]){tolower(*c1), '\0'},
(const char[]){tolower(*c2), '\0'});
if (r == 0) {
if (isupper(*c1) && islower(*c2)) {
r = -1;
} else if (islower(*c1) && isupper(*c2)) {
r = 1;
}
}
return r;
}
Program output:
azghtdffopAsAfp
poiuyjklhgADHTp
hgjkFfGgBnVUuKk
lokijuhygtfrdek
AaaAdDcCFfgGhHI
AAadfffghoppstz
ADgHhijkloppTuy
BFfGgghjKkknUuV
defghijkklortuy
AAaaCcDdFfGgHhI

It is horribly unclear whether you want to sort all the characters in each ROW, or you want to sort the array of strings in the array, (or both). Both can be accomplished, but both have slightly different compare requirements.
Presuming you want to sort the array of arrays (easier if you make them strings), you would expect output like:
$ ./bin/sortcapsfirst
azghtdffopAsAfp
poiuyjklhgADHTp
hgjkFfGgBnVUuKk
lokijuhygtfrdek
AaaAdDcCFfgGhHI
AaaAdDcCFfgGhHI
azghtdffopAsAfp
hgjkFfGgBnVUuKk
lokijuhygtfrdek
poiuyjklhgADHTp
Otherwise, you would need to sort each row first (sorting each upper-case, before the same lower-case), then sort the array. That would result in output as follows:
$ ./bin/sortcapsfirst
azghtdffopAsAfp
poiuyjklhgADHTp
hgjkFfGgBnVUuKk
lokijuhygtfrdek
AaaAdDcCFfgGhHI
AAaaCcDdFfGgHhI
AAadfffghoppstz
ADgHhijkloppTuy
BFfGgghjKkknUuV
defghijkklortuy
You may be making things a bit harder on yourself than it needs to be. Generally, the natural string sort for your LOCALE will sort Caps first by default. In the case of sorting the array s1 ordering the rows so that capitals sort before lower-case, you need only make your number of columns 16 (to provide space for a nul-terminating character) and then call strcmp in your compare routine, e.g.:
int compare(const void *w1, const void *w2) {
const char *a1 = w1;
const char *a2 = w2;
return strcmp (a1, a2);
}
Putting it all together in an example, and properly terminating each j loop when the nul-terminating char is encountered, you could do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define N 5
#define R 16
int compare(const void *w1, const void *w2);
int main(void) {
char s1[][R] = {{ "azghtdffopAsAfp" },
{ "poiuyjklhgADHTp" },
{ "hgjkFfGgBnVUuKk" },
{ "lokijuhygtfrdek" },
{ "AaaAdDcCFfgGhHI" }};
int i, j;
for (i = 0; i<N; i++) {
for (j = 0; s1[i][j] && j<R; j++) {
putchar(s1[i][j]); /* don't use printf to print a single-char */
}
putchar('\n');
}
qsort (s1, N, sizeof *s1, compare); /* sort array (rows) */
putchar('\n');
for (i = 0; i<N; i++) {
for (j = 0; s1[i][j] && j<R; j++) {
putchar(s1[i][j]);
}
putchar('\n');
}
return 0;
}
int compare(const void *w1, const void *w2) {
const char *a1 = w1;
const char *a2 = w2;
return strcmp (a1, a2);
}
For the second case where you sort the upper-case in each row before the equivalent lower-case and then sort the array, you simply add a second qsort compare function and call that as you are, before calling qsort on the entire array. e.g. (to sort each upper-case before the corresponding lower-case):
int compare (const void *w1, const void *w2) {
const char *a1 = w1;
const char *a2 = w2;
while (*a1 && *a2)
{
int r = tolower(*a1) - tolower(*a2);
if (!r) {
if (*a1 - *a2)
return *a1 - *a2 > 0 ? 1 : -1;
}
else
break;
++a1;
++a2;
}
// return *a1 - *a2; /* to sort ALLcapsfirst */
return tolower(*a1) - tolower(*a2);
}
Then call qsort as done in the first example to sort the rows in the array:
int comparestr (const void *w1, const void *w2) {
const char *a1 = w1;
const char *a2 = w2;
return strcmp (a1, a2);
}
Putting that together in the same example (with nul-terminated rows), you could do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define N 5
#define R 16
int compare (const void *w1, const void *w2);
int comparestr (const void *w1, const void *w2);
int main (void) {
char s1[][R] = {{"azghtdffopAsAfp"},
{"poiuyjklhgADHTp"},
{"hgjkFfGgBnVUuKk"},
{"lokijuhygtfrdek"},
{"AaaAdDcCFfgGhHI"}};
int i, j;
for (i = 0; i < N; i++) {
for (j = 0; s1[i][j] && j < R; j++)
putchar(s1[i][j]);
putchar('\n');
}
for (i = 0; i < N; i++) /* sort arrays */
qsort (s1[i], R - 1, sizeof *(s1[i]), compare);
qsort (s1, N, sizeof *s1, comparestr); /* sort array */
putchar('\n');
for (i = 0; i < N; i++) {
for (j = 0; s1[i][j] && j < R; j++)
putchar(s1[i][j]);
putchar('\n');
}
return 0;
}
int compare (const void *w1, const void *w2)
{
const char *a1 = w1;
const char *a2 = w2;
while (*a1 && *a2) {
int r = tolower (*a1) - tolower (*a2);
if (!r) {
if (*a1 - *a2)
return *a1 - *a2 > 0 ? 1 : -1;
} else
break;
++a1;
++a2;
}
// return *a1 - *a2; /* to sort ALLcapsfirst */
return tolower (*a1) - tolower (*a2);
}
int comparestr (const void *w1, const void *w2)
{
const char *a1 = w1;
const char *a2 = w2;
return strcmp (a1, a2);
}
Finally, as noted above, if you want to sort ALLCapsfirst, then simply return the difference between *a1 - *a2 instead of tolower (*a1) - tolower (*a2). e.g. using return *a1 - *a2; the sort would be:
AACDFGHIaacdfgh
AAadfffghoppstz
ADHTghijkloppuy
BFGKUVfgghjkknu
defghijkklortuy
Look things over. I could have misunderstood your goal completely. If so, drop a note and I can help further in a bit.

Instead of comparing the lowercase values, check the values ASCII values. In the table capital letters come first, then the lowercase ones:
http://www.asciitable.com/
UPDATE: If you need a bit more platform and character set independent code, just add an extra if, and check the letter case with isupper() and/or islower():
https://www.tutorialspoint.com/c_standard_library/c_function_islower.htm
https://www.tutorialspoint.com/c_standard_library/c_function_isupper.htm

if you want such that upper case lower case distiction is made per character, so you would sort like "A", "Aa", "AB", "aa", "B", "b", compare could look like that
int compare(const void *w1, const void *w2) {
char *a1 = w1;
char *a2 = w2;
while (*a1 && *a2)
{
register r = tolower(*a1) - tolower(*a2);
if (r)
return r;
// this is the new part
else if( isupper( *a1 ) && !isupper( *a2 ) ) {
// w1 < w2
return -1;
} else if( !isupper( *a1 ) && isupper( *a2 ) ) {
// w1 > w2
return 1;
}
++a1;
++a2;
}
return tolower(*a1) - tolower(*a2);
}
If you want "aa" to be sorted before "AB" it could look like:
int compare(const void *w1, const void *w2) {
char *a1 = w1;
char *a2 = w2;
register r;
int caseDifference = 0;
while (*a1 && *a2)
{
r = tolower(*a1) - tolower(*a2);
if (r)
return r;
// this is the new part
else if( caseDifference == 0 && ( isupper( *a1 ) && !isupper( *a2 ) ) ) {
// w1 < w2
caseDifference = -1;
} else if( caseDifference == 0 && ( !isupper( *a1 ) && isupper( *a2 ) ) ) {
// w1 > w2
caseDifference = 1;
}
++a1;
++a2;
}
r = tolower(*a1) - tolower(*a2);
if( r != 0 )
return r;
else
return caseDifference;
}

Your comparison function is incorrect: it compares multiple characters instead of just the ones pointed to by the arguments.
If you can assume ASCII, here is a much simpler comparison function that solves the problem:
int compare(const void *w1, const void *w2) {
int c1 = *(const unsigned char *)w1;
int c2 = *(const unsigned char *)w2;
int l1 = tolower(c1);
int l2 = tolower(c2);
/* sort first by alphabetical character, then by case */
return l1 != l2 ? l1 - l2 : c1 - c2;
}
Also note that the main() function can be simplified too:
#include <stdio.h>
#include <stdlib.h>
#define N 5
int compare(const void *w1, const void *w2);
int main(void) {
char s1[N][15] = {
{ "azghtdffopAsAfp" },
{ "poiuyjklhgADHTp" },
{ "hgjkFfGgBnVUuKk" },
{ "lokijuhygtfrdek" },
{ "AaaAdDcCFfgGhHI" } };
for (int i = 0; i < N; i++) {
printf("%.15s\n", s1[i]);
}
for (int i = 0; i < N; i++) {
qsort(s1[i], 15, sizeof(char), compare);
}
printf("\n");
for (int i = 0; i < N; i++) {
printf("%.15s\n", s1[i]);
}
return 0;
}

Related

Using strcmp and strcpy to sort province’s name in alphabetical order

I am trying to implement strcmp and strcpy to re-arrange names in alphabetical order and there is an issue with my name array initialization.
The state array cannot be printed out on the console as expected.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
char sort(char [], char []);
int main() {
char strStates[8] = {
'Ontario', 'Quebec', 'Manitoba', 'Alberta',
'British Colombia', 'Nova Scotia', '\0'
};
char strSorted[] = { '\0' };
int x = 0;
printf("\nThe list of states before being sorted in alphabetical order: %s", strStates);
for (x = 0; x < 7; x++) {
printf("\n%s", strStates);
}
sort(strStates[x], strSorted[x]);
printf("\nThe list of states sorted alphabetically are: ");
for (x = 0; x < 4; x++) {
printf("\n%s", strStates[x]);
}
return 0;
}
char sort(char string1[], char string2[]) {
int x, y = 0;
for (x = 0; x < 3; x++) {
for (y = 1; y < 4; y++) {
if (strcmp(string1[x], string1[y]) > 0) {
strcpy(string2[y], string1[x]);
strcpy(string1[x], string1[y]);
strcpy(string[y], string2[y]);
}
}
}
}
You declared a character array of 8 characters
char strStates[8] = {'Ontario', 'Quebec', 'Manitoba', 'Alberta','British Colombia','Nova Scotia','\0'};
and trying to initialize it with multibyte character constant as for example 'Ontario' that have implementation defined values.
It seems you want to declare an array of string literals. In this case you should write for example
const char * strStates[] =
{
"Ontario", "Quebec", "Manitoba", "Alberta", "British Colombia","Nova Scotia"
};
Also it is a bad idea yo use magic numbers like 8
char strStates[8] = //...
or 7
for (x = 0; x < 7; x++)
or 4
for (x = 0; x < 4; x++)
This makes the code unclear.
You can determine the size of the declared array as shown above the following way
const char * strStates[] =
{
"Ontario", "Quebec", "Manitoba", "Alberta", "British Colombia","Nova Scotia"
};
const size_t N = sizeof( strStates ) / sizeof( *strStates );
As a result for example the for loop that outputs elements of the array can look like
puts( "The list of states before being sorted in alphabetical order:" );
for ( size_t i = 0; i < N; i++) {
puts( strStates[i] );
}
The array strSorted declared like
char strSorted[] = {'\0'};
is not used in your program. Remove the declaration.
This call of the function sort
sort(strStates[x], strSorted[x]);
does not make sense. The argument expressions have the type char while the function expects arguments of the type char *.
The function sort can be declared the following way
void sort( const char *[], size_t );
and called like
sort( strStates, N );
The function definition that implements the bubble sort method can look like
void sort( const char * s[], size_t n )
{
for ( size_t sorted = 0; !( n < 2 ); n = sorted )
{
for ( size_t i = sorted = 1; i < n; i++ )
{
if ( strcmp( s[i], s[i-1] ) < 0 )
{
const char *tmp = s[i];
s[i] = s[i-1];
s[i-1] = tmp;
sorted = i;
}
}
}
}
Here is a demonstration program.
#include <stdio.h>
#include <string.h>
void sort( const char * s[], size_t n )
{
for (size_t sorted = 0; !( n < 2 ); n = sorted)
{
for (size_t i = sorted = 1; i < n; i++)
{
if (strcmp( s[i], s[i - 1] ) < 0)
{
const char *tmp = s[i];
s[i] = s[i - 1];
s[i - 1] = tmp;
sorted = i;
}
}
}
}
int main( void )
{
const char *strStates[] =
{
"Ontario", "Quebec", "Manitoba", "Alberta", "British Colombia","Nova Scotia"
};
const size_t N = sizeof( strStates ) / sizeof( *strStates );
puts( "The list of states before being sorted in alphabetical order:" );
for (size_t i = 0; i < N; i++) {
puts( strStates[i] );
}
sort( strStates, N );
puts( "\nThe list of states sorted alphabetically are:" );
for ( size_t i = 0; i < N; i++) {
puts( strStates[i] );
}
}
The program output is
The list of states before being sorted in alphabetical order:
Ontario
Quebec
Manitoba
Alberta
British Colombia
Nova Scotia
The list of states sorted alphabetically are:
Alberta
British Colombia
Manitoba
Nova Scotia
Ontario
Quebec
"...there is an issue with my name array initialization."
There are other issues additional to the array initialization....
Note: Single quotes in C depict a char. Use double quotes.. Change:
'Ontario'
to
"Ontario"
everywhere.
Additionally the \0 as the last element in the array is not necessary. Each "..." string contains its own null terminator.
Array initialization: arrays must be sized to contain the number of visible characters plus one additional byte for null terminator - for the longest string in the array.
Change
char strStates[8] = {'Ontario', ..., 'Nova Scotia','\0'};
To (including 2D array notation)
char strStates[][18] = {"Ontario", ..., "Nova Scotia"};//British Columbia (17 + 1)
^ ^ " " "
|longest string in collection
Or to this (avoids needing to know longest string in array.)
char *strStates[] = {"Ontario", ..., "Nova Scotia"};
For the comparison question, the following is a minimal example of how you can do that, using strcmp() implementation of compare function in conjunction with the qsort() function:
int comp(const void* a, const void* b);
int main(void)
{
char strStates[][18] = {"Ontario", "Quebec", "Manitoba", "Alberta","British Colombia","Nova Scotia"};
qsort(strStates, size, 18, comp);
return 0;
}
int comp(const void* a, const void* b) {
const char* aa = (const char*)a;
const char* bb = (const char*)b;
return strcmp(aa, bb);
}

Need to sort a string input by the most frequent characters first in C (qsort)

I managed to sort it alphabetically but I need to sort it by the most frequent characters first after that. Since I'm new to C programming Im not sure if this alphabetical sort is needed. Also I thought about using a struct but not sure how to do the whole process with it.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int cmpfunc(const void *a, const void *b) {
return *(char*)a - *(char*)b;
}
void AlphabetOrder(char str[]) {
qsort(str, (size_t) strlen(str), (size_t) sizeof(char), cmpfunc);
printf("%s\n", str);
}
void Max_Occurring(char *str)
{
int i;
int max = 0;
int freq[256] = {0};
for(i = 0; str[i] != '\0'; i++)
{
freq[str[i]] = freq[str[i]] + 1;
}
for(i = 0; i < 256; i++)
{
if(freq[i] > freq[max])
{
max = i;
}
}
printf("Character '%c' appears %d times", max, freq[max], str);
}
int main() {
char str1[20];
printf("Enter a string: ");
scanf("%s", &str1);
AlphabetOrder(str1);
Max_Occurring(str1);
return 0;
}
I wrote you a frequency sorter using the idea that #WeatherVane mentioned:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct cfreq {
unsigned char c;
int freq;
};
int freqcmp(const void *a, const void *b) {
struct cfreq *a2 = (struct cfreq *) a;
struct cfreq *b2 = (struct cfreq *) b;
if(a2->freq < b2->freq) return -1;
if(a2->freq == b2->freq) return 0;
return 1;
}
int freqcmpdesc(const void *a, const void *b) {
return -freqcmp(a, b);
}
void FrequencyOrder(const char str[]) {
struct cfreq cfreqs[256];
for(int i = 0; i < sizeof(cfreqs) / sizeof(*cfreqs); i++) {
cfreqs[i].c = i;
cfreqs[i].freq = 0;
}
for(int i = 0; str[i]; i++) cfreqs[str[i]].freq++;
qsort(cfreqs, sizeof(cfreqs) / sizeof(*cfreqs), sizeof(*cfreqs), freqcmpdesc);
for(int i = 0; i < sizeof(cfreqs) / sizeof(*cfreqs); i++) {
if(cfreqs[i].freq) printf("%c", cfreqs[i].c);
}
printf("\n");
}
int main() {
char str1[20];
printf("Enter a string: ");
scanf("%s", &str1);
FrequencyOrder(str1);
return 0;
}
and here is a sample session (note: output is not deterministic for letters with same frequency):
Enter a string: buzz
zbu
If you want duplicate letters in the output then replace the print with a loop along these lines:
while(cfreqs[i].freq--) printf("%c", cfreqs[i].c);
Im not sure if this alphabetical sort is needed.
It is not needed, yet if done, Max_Occurring() can take advantage of a sorted string.
Since the string is sorted before calling Max_Occurring(), compute the max occurring via a count of adjacent repetitions of each char.
// Untested illustrative code.
// str points to a sorted string.
void Max_Occurring(const char *str) {
char max_ch = '\0';
size_t max_occurence = 0;
char previous = '\0';
size_t occurrence = 0;
while (*str) {
if (*str == previous) {
occurrence++;
} else {
occurrence = 1;
}
if (occurrence > max_occurence) {
max_occurence = occurrence;
max_ch = *str;
}
previous = *str;
str++;
}
printf("Character '%c' appears %zu times", max_ch, max_occurence);
}
In the case of multiple characters with the same max occurrence, this code only reports one max.
Avoid buffer overflow
Do not use scanf("%s"... without a width limit.
Tip: enable all warnings to save time and see the problem of using &str1 when str1 should be used.
char str1[20];
...
// scanf("%s", &str1);
scanf("%19s", str1);
Avoid a negative index
If still wanting to for a frequency table, watch out for the case when char is signed and code use str[i] < 0 to index an array.
Instead:
const unsigned char *ustr = (const unsigned char *) str;
size_t freq[UCHAR_MAX + 1] = {0};
for(size_t i = 0; ustr[i] != '\0'; i++) {
freq[ustr[i]]++;
}
Here's another alternative that may be simpler.
void freqOrder( char *p ) {
#define ASCIIcnt 128 // 7bit ASCII
// to count occurences of each character
int occur[ ASCIIcnt ];
memset( occur, 0, sizeof occur );
int maxCnt = 0; // remember the highest count
// do the counting
for( ; *p; p++ )
if( ++occur[ *p ] > maxCnt )
maxCnt = occur[ *p ];
// output most frequent to least frequen
for( ; maxCnt; maxCnt-- )
for( int i = 0; i < ASCIIcnt; i++ )
if( occur[i] == maxCnt )
while( occur[i]-- )
putchar( i );
putchar( '\n' );
}
int main( void ) {
freqOrder( "The quick brown fox jumps over the lazy dog" );
return 0;
}
Output
' ooooeeehhrruuTabcdfgijklmnpqstvwxyz'

Print all elements in an array just once in C?

I have created an array in C and I know how to print every element in an array but couldn't figure it out how to not print repeated elements, or to be more precise, like I ask in the title, how can I print all elements just once?
For example my array is: [a b c d a a b d c c]
I want to print it like this: [a b c d]
I think that I should use for or while loop, but I don't know how. I have been thinking about this for hours and did some research but couldn't find anything valuable.
Here you are.
#include <stdio.h>
int main(void)
{
char a[] = { 'a', 'b', 'c', 'd', 'a', 'a', 'b', 'd', 'c', 'c' };
const size_t N = sizeof( a ) / sizeof( *a );
for ( size_t i = 0; i < N; i++ )
{
size_t j = 0;
while ( j != i && a[j] != a[i] ) ++j;
if ( j == i ) printf( "%c ", a[i] );
}
putchar ( '\n' );
return 0;
}
The program output is
a b c d
Or for example if you have a character array that contains a string then the same approach can be implemented the following way.
#include <stdio.h>
int main(void)
{
char s[] = { "abcdaabdcc" };
for (const char *p = s; *p != '\0'; ++p )
{
const char *prev = s;
while ( prev != p && *prev != *p ) ++prev;
if ( prev == p ) printf( "%c ", *p );
}
putchar ( '\n' );
return 0;
}
The program output is the same as shown above that is
a b c d
As the array is an array of char containing lower case letters, there are pretty few different values. Consequently, you can make a table (aka another array) to track the already printed values.
Like:
#define MAX ('z' - 'a' + 1) // Calculate the number of unique elements
int already_printed[MAX] = { 0 }; // Mark all chars as "not printed"
for (i = 0; i < SIZE_OFF_ARRAY; ++i)
{
if (already_printed[array[i] - 'a'] == 0) // If NOT printed yet
{
printf("%c\n", array[i]); // Print it and
already_printed[array[i] - 'a'] = 1; // mark it as printed
}
}
This gives you a simple O(N) solution. Having a O(N) solution is important for performance when handling large arrays.
Notice: This solution assumes that all array element are between 'a' and 'z' (both included) but can easilly be extended to support more a wider range.
I'm not sure what the type of the elements in the array is, but let's assume it's some type that C can "natively" compare. Then the conceptually simple solution is to sort the array, and the print it skipping duplicates. Sorting will ensure that the duplicates are adjacent. This approach will perform well in most circumstances.
First, let's set up some helper functions specific to the element type. You could remove the assign function if you only want to deal with char type, but it'll be inlined by the compiler anyway.
#include <stdlib.h>
#include <stdio.h>
// You can adapt the element type per your requirements
typedef char ElementType;
// This function assigns the source value to the destination:
// it does what *dst = *src would do.
static inline void assign(void *dst, const void *src)
{
*(ElementType*)dst = *(const ElementType*)src;
}
// This is the "spaceship" comparison operator (<=> in C++) that
// merges less-than, equal, and more-than comparisons into one.
int compare(const void *l, const void *r)
{
const ElementType *L = l;
const ElementType *R = r;
if (*L < *R) return -1;
if (*L > *R) return 1;
return 0;
}
void print_element(const ElementType *el) { printf("%c", *el); }
Since we plan to sort the array, we need to copy it first - after all, a "printer" for an array shouldn't be modifying it. Such modifications are OK in tiny programs, but are just a bad habit, since if you look at the name of the function like print_unique, nothing hints you that it would modify the data it's supposed to print: that's not how printing normally acts. It'd be unexpected and very error prone.
The copy operation could be skipped if it's OK to modify the source array: its elements would need to be non-const then, and the print_unique function name would need to be changed to something like sort_and_print_unique.
ElementType *copy_array(const ElementType *src, const int count)
{
ElementType *copy = malloc(sizeof(ElementType) * count);
if (!copy) abort;
for (int i = 0; i < count; ++i)
assign(copy + i, src + i);
return copy;
}
And now the unique element printer, and a test with the data you provided:
void print_unique(const ElementType *data, int const count)
{
ElementType *copy = copy_array(data, count);
qsort(copy, count, sizeof(ElementType), compare);
printf("[");
for (int i = 0; i < count; ++i) {
if (i == 0 || compare(copy+i, copy+i-1) != 0) {
if (i != 0) printf(" ");
print_element(copy+i);
}
}
printf("]\n");
}
int main() {
const char array[] = "abcdaabdcc";
print_unique(array, sizeof(array)/sizeof(*array) - 1);
}
Output: [a b c d]
The alternate, modifying implementation I mentioned above would be:
void sort_and_print_unique(ElementType *data, int const count)
{
qsort(data, count, sizeof(ElementType), compare);
printf("[");
for (int i = 0; i < count; ++i) {
if (i == 0 || compare(data+i, data+i-1) != 0) {
if (i != 0) printf(" ");
print_element(data+i);
}
}
printf("]\n");
}
int main() {
char array[] = "abcdaabdcc"; // note absence of const!
sort_and_print_unique(array, sizeof(array)/sizeof(*array) - 1);
}
A simple way:
#include <stdio.h>
int main() {
int ascii[128] = { 0 };
char input[] = "abcdaabdcc";
for(int i = 0; input[i]; i++) {
++ascii[(int)input[i]];
}
for(int i = 0; i < 128; i++) {
if( ascii[i] ) printf("%c ", i);
}
return 0;
}
The array ascii is used to keep track of the frequency of each of the 128 ascii characters with a non negative value (for example 'a' is 97 and '0' is 48). And then if the frequency of a character is not 0, you print the character.
First, sort the array (use qsort(3) for example), then all the equal elements will be together. Then go in a one pass on the array saving the last element printed... if the one to be printed now is the same as the one printed last, just skip it and continue; to the next.

Is there a way if string repeats to return only repeated letters once?

I made code which will for string "aabbcc" return "abc" but in cases when there is more letters like "aaa" it will return "aa" instead of just one.
Here is the code I made.
void Ponavljanje(char *s, char *p) {
int i, j = 0, k = 0, br = 0, m = 0;
for (i = 0; i < strlen(s) - 1; i++) {
for (j = i + 1; j < strlen(s); j++) {
if (s[i] == s[j]) {
br++;
if (br == 1) {
p[k++] = s[i];
}
}
}
br = 0;
}
p[k] = '\0';
puts(p);
}
For "112233" output should be "123" or for "11122333" it should be also "123".
Avoid repeated calls to strlen(s). A weak compiler may not see that s is unchanged and call strlen(s) many times, each call insuring a cost of n operations - quite inefficient. #arkku.1 Instead simply stop iterating when the null character detected.
Initialize a boolean list of flags for all char to false. When a character occurs, set the flag to prevent subsequent usage. Be careful when indexing that list as char can be negative.
Using a const char *s allows for wider allocation and helps a compiler optimization.
Example:
#include <stdbool.h>
#include <limits.h>
void Ponavljanje(const char *s, char *p) {
const char *p_original = p;
bool occurred[CHAR_MAX - CHAR_MIN + 1] = { 0 }; // all values set to 0 (false)
while (*s) {
if (!occurred[*s - CHAR_MIN]) {
occurred[*s - CHAR_MIN] = true;
*p++ = *s;
}
s++;
}
*p = '\0';
puts(p_original);
}
1 #wrongway4you comments that many compilers may assume the string did not change and optimize out the repeated strlen() call. A compliant compiler cannot do that though without restrict unless it is known that in all calls, s and p do not overlap. A compiler otherwise needs to assume p may affect s and warrant a repeated strlen() call.
does the work with a complexity O(n)
I suppose programming can give rmg
void Ponavljanje(char *s,char *p)
{
char n[256] = {0};
int i = 0;
while (*s) {
switch (n[(unsigned char) *s]) {
case 0:
n[(unsigned char) *s] = 1;
break;
case 1:
p[i++] = *s;
n[(unsigned char) *s] = 2;
}
s += 1;
}
p[i] = 0;
puts(p);
}
While the inner loop checks br to only copy the output on the first repetition, the outer loop still passes over each repetition in s on future iterations. Hence each further occurrence of the same character will run a separate inner loop after br has already been reset.
With aaa as the input, both the first and the second a cause the inner loop to find a repetition, giving you aa. In fact, you always get one occurrence fewer of each character in the output than there is in the input, which means it only works for 1 or 2 occurrences in the input (resulting in 0 and 1 occurrences, respectively, in the output).
If you only want to remove the successive double letters, then this function would be sufficient, and the examples given in the question would fit:
#include <stdio.h>
void Ponavljanje(char *s,char *p)
{
char dd = '\0';
char *r;
if(s == NULL || p == NULL)
return;
r = p;
while(*s){
if(*s != dd){
*r = *s;
dd = *s;
r++;
}
s++;
}
*r = '\0';
puts(p);
}
int main(void)
{
char s[20] = "1111332222";
char p[20];
Ponavljanje(s,p);
}
Here is something that works regardless of order:
#include <stdio.h>
#include <string.h>
void
repeat(char *s, char *p)
{
int slen;
int sidx;
int pidx;
int plen;
int schr;
slen = strlen(s);
plen = 0;
for (sidx = 0; sidx < slen; ++sidx) {
schr = s[sidx];
// look for duplicate char
int dupflg = 0;
for (pidx = 0; pidx < plen; ++pidx) {
if (p[pidx] == schr) {
dupflg = 1;
break;
}
}
// skip duplicate chars
if (dupflg)
continue;
p[plen++] = schr;
}
p[plen] = 0;
puts(p);
}
int
main(void)
{
char p[100];
repeat("112233",p);
repeat("123123",p);
return 0;
}
Note: As others have mentioned, strlen should not be placed in the loop condition clause of the for [because the length of s is invariant]. Save strlen(s) to a separate variable and loop to that limit
Here is a different/faster version that uses a histogram so that only a single loop is required:
#include <stdio.h>
#include <string.h>
void
repeat(char *s, char *p)
{
char dups[256] = { 0 };
int slen;
int sidx;
int pidx;
int plen;
int schr;
slen = strlen(s);
sidx = 0;
plen = 0;
for (sidx = 0; sidx < slen; ++sidx) {
schr = s[sidx] & 0xFF;
// look for duplicate char
if (dups[schr])
continue;
dups[schr] = 1;
p[plen++] = schr;
}
p[plen] = 0;
puts(p);
}
int
main(void)
{
char p[100];
repeat("112233",p);
repeat("123123",p);
return 0;
}
UPDATE #2:
I would suggest iterating until the terminating NUL byte
Okay, here's a full pointer version that is as fast as I know how to make it:
#include <stdio.h>
#include <string.h>
void
repeat(char *s, char *p)
{
char dups[256] = { 0 };
char *pp;
int schr;
pp = p;
for (schr = *s++; schr != 0; schr = *s++) {
schr &= 0xFF;
// look for duplicate char
if (dups[schr])
continue;
dups[schr] = 1;
*pp++ = schr;
}
*pp = 0;
puts(p);
}
int
main(void)
{
char p[100];
repeat("112233",p);
repeat("123123",p);
return 0;
}

how to check for duplicates in a char array c

I'm pretty new to C and how would I check the duplicates of a 1D char array
for example
#define MAX_SIZE 60
Char canvas[MAX_SIZE] = {0};
for(int i=0; i<MAX_SIZE;i++){
//How do i check if there is a duplicate in that array?
}
How do I iterate through to check for duplicates, like do i have to use double for loops and do sizeOf(canavas)/SOMETHING here?
My solution, using a function:
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
bool mem_hasduplicates(const char arr[], size_t len)
{
assert(arr != NULL);
if (len == 0)
return false;
for (size_t i = 0; i < len - 1; ++i) {
for (size_t j = i + 1; j < len; ++j) {
if (arr[i] == arr[j]) {
return true;
}
}
}
return false;
}
int main() {
const char canvas[] = "zcxabca";
printf("%x\n", mem_hasduplicates(canvas, sizeof(canvas)/sizeof(canvas[0])));
const char other_canvas[] = "abcfsd";
printf("%x\n", mem_hasduplicates(other_canvas, sizeof(other_canvas)/sizeof(other_canvas[0])));
}
Live version available at onlinegdb.
#edit Or we can "just" create a histogram from all the numbers as #selbie suggested, although this got me complicated fast:
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
struct histogram_value_s {
char value;
unsigned int count;
};
struct histogram_s {
struct histogram_value_s *v;
size_t len;
};
#define HISTOGRAM_INIT() {0}
void histogram_fini(struct histogram_s *t)
{
t->len = 0;
free(t->v);
}
static int histogram_sort_by_value_qsort_cb(const void *a0, const void *b0)
{
const struct histogram_value_s *a = a0;
const struct histogram_value_s *b = b0;
assert(a != NULL);
assert(b != NULL);
return a->value - b->value;
}
void histogram_sort_by_value(struct histogram_s *t)
{
qsort(t->v, t->len, sizeof(*t->v), histogram_sort_by_value_qsort_cb);
}
static int histogram_sort_by_count_qsort_cb(const void *a0, const void *b0)
{
const struct histogram_value_s *a = a0;
const struct histogram_value_s *b = b0;
assert(a != NULL);
assert(b != NULL);
return a->count - b->count;
}
void histogram_sort_by_count(struct histogram_s *t)
{
qsort(t->v, t->len, sizeof(*t->v), histogram_sort_by_count_qsort_cb);
}
int histogram_getValue_2(const struct histogram_s *t, char value, size_t *idx, unsigned int *ret0)
{
for (size_t i = 0; i < t->len; ++i) {
if (t->v[i].value == value) {
if (ret0) {
*ret0 = t->v[i].count;
}
if (idx) {
*idx = i;
}
return 0;
}
}
return -1;
}
void histogram_printlns_generic(const struct histogram_s *t, const char fmt[])
{
assert(t != NULL);
for (size_t i = 0; i < t->len; ++i) {
printf(fmt, t->v[i].value, t->v[i].count);
}
}
int histogram_add(struct histogram_s *t, char value)
{
size_t idx;
if (histogram_getValue_2(t, value, &idx, NULL) == 0) {
if (t->v[idx].count == UINT_MAX) {
goto ERR;
}
++t->v[idx].count;
} else {
void *tmp;
tmp = realloc(t->v, (t->len + 1) * sizeof(*t->v));
if (tmp == NULL) goto ERR;
t->v = tmp;
t->v[t->len] = (struct histogram_value_s){
.value = value,
.count = 1,
};
++t->len;
}
return 0;
ERR:
return -1;
}
bool histogram_has_any_count_greater_then_2(const struct histogram_s *t)
{
assert(t != NULL);
for (size_t i = 0; i < t->len; ++i) {
if (t->v[i].count >= 2) {
return true;
}
}
return false;
}
/* ----------------------------------------------------------- */
int histogram_create_from_mem(struct histogram_s *ret0, const char arr[], size_t len)
{
assert(ret0 != NULL);
assert(arr != NULL);
struct histogram_s ret = HISTOGRAM_INIT();
for (size_t i = 0; i < len; ++i) {
const char to_add = arr[i];
if (histogram_add(&ret, to_add) < 0) {
goto ERR;
}
}
*ret0 = ret;
return 0;
ERR:
histogram_fini(&ret);
return -1;
}
int main() {
const char canvas[] = "abc";
struct histogram_s h;
int ret;
ret = histogram_create_from_mem(&h, canvas, sizeof(canvas)/sizeof(canvas[0]));
if (ret) {
fprintf(stderr, "mem_createhistogram error!\n");
return -1;
}
printf("'%s' %s duplicates\n",
canvas,
histogram_has_any_count_greater_then_2(&h)
? "has"
: "does not have"
);
histogram_fini(&h);
}
Live version here.
#edit Or we can sort the array, and check if any two adjacent bytes are the same!
#include <stdlib.h>
#include <stdbool.h>
int cmp_chars(const void *a, const void *b)
{
return *(char*)a - *(char*)b;
}
int main() {
char canvas[] = "abca";
qsort(canvas, sizeof(canvas) - 1, sizeof(canvas[0]), cmp_chars);
bool duplicate_found = false;
for (char *p = canvas; p[1] != '\0'; ++p) {
if (p[0] == p[1]) {
duplicate_found = true;
break;
}
}
printf("'%s' %s duplicates\n",
canvas,
duplicate_found ? "has" : "does not have");
}
Live version available at onlinegdb.
If Char is just a typo for char, then this becomes relatively simple - set up a second array, indexed by character code, that keeps track of the number of occurrences of each character:
#include <limits.h>
#include <ctype.h>
...
int charCount[SCHAR_MAX+1] = {0}; // We're only going to worry about non-negative
// character codes (i.e., standard ASCII)
// [0..127]
...
/**
* This assumes that canvas is *not* a 0-terminated string, and that
* every element of the array is meaningful. If that's not the case,
* then loop on the length of the string instead of MAX_SIZE.
*/
for ( int i = 0; i < MAX_SIZE; i++ )
{
if ( canvas[i] >= 0 && canvas[i] <= SCHAR_MAX )
{
charCount[canvas[i]]++; // index into charCount by the value of canvas[i]
}
}
Then you can walk through the charCount array and print all the character values that occurred more than once:
for ( int i = 0; i <= SCHAR_MAX; i++ )
{
if ( charCount[i] > 1 )
{
/**
* If the character value is a printable character (punctuation, alpha,
* digit), print the character surrounded by single quotes - otherwise,
* print the character code as a decimal integer.
*/
printf( isprint( i ) ? "'%c': %d\n" : "%d: %d\n", i, charCount[i] );
}
}
What's that SCHAR_MAX all about, any why am I yammering about non-negative character codes in the comments?
In C, characters the basic execution character set (digits, upper and lowercase letters, common punctuation characters) are guaranteed to have non-negative encodings (i.e., the [0..127] range of standard ASCII). Characters outside of that basic execution character set may have positive or negative values, depending on the implementation. Thus, the range of char values may be [-128..127] on some platforms and [0..255] on others.
The limits.h header defines constants for various type ranges - for characters, it defines the following constants:
UCHAR_MAX - maximum unsigned character value (255 on most platforms)
SCHAR_MIN - minimum signed character value (-128 on most platforms)
SCHAR_MAX - maximum signed character value (127 on most platforms)
CHAR_MIN - minimum character value, either 0 or SCHAR_MIN depending on platform
CHAR_MAX - maximum character value, either UCHAR_MAX or SCHAR_MAX depending on value
To keep this code simple, I'm only worrying about character codes in the range [0..127]; otherwise, I'd have to map negative character codes onto non-negative array indices, and I didn't feel like doing that.
Both this method and the nested loop solution require some tradeoffs. The nested loop solution trades time for space, while this solution trades space for time. In this case, the additional space is fixed regardless of how large canvas becomes. In the nested loop case, time will increase with the square of the length of canvas. For short inputs, there's effectively no difference, but if canvas gets large enough, you will notice a significant decrease in performance with the nested loop solution.

Resources