C array sorting ignoring special characters - c

char temp[size];
int b, z;
for (b = 0; b < size; b++) {
for (z = 0; z < size; z++) {
if (strcmp(processNames[b], processNames[z]) < 0) {
strcpy(temp, processNames[b]);
strcpy(processNames[b], processNames[z]);
strcpy(processNames[z], temp);
}
}
}
I'm sorting a list of char ** processNames;
I want it to sort like this:
abc
bee
george
(sally)
saw
thomas
zebra
However, it is sorting it like this:
(sally)
abc
bee
george
saw
thomas
zebra
Thanks, I'm not sure how to negate the special characters and only sort on alphabet. Thanks!

You can pre-process the string and use strcmp to compare the processed string:
// Inside the two-layer for loop
char newb[size], newz[size];
int ib, iz, tb = 0, tz = 0;
for (ib = 0; processNames[b][ib] != '\0'; ib++){
if (isalpha(processNames[b][ib])) {
newb[tb++] = processNames[b][ib];
}
}
newb[tb] = 0;
for (iz = 0; processNames[z][iz] != '\0'; iz++){
if (isalpha(processNames[z][iz])) {
newz[tz++] = processNames[z][iz];
}
}
newz[tz] = 0;
if (strcmp(newb, newz)) {
// swap the ORIGINAL string here
}
The above code is what I came up with at first. It is very inefficient and is not recommended. Alternatively, you can write your own mystrcmp() implementation:
int mystrcmp(const char* a, const char *b){
while (*a && *b) {
while (*a && !isalpha(*a)) a++;
while (*b && !isalpha(*b)) b++;
if (*a - *b) return *a - *b;
a++, b++;
}
return *a - *b;
}

“Sorting” means “putting things in order.” What order? The order is defined by some thing that tells us which of two items goes first.
In your code, you are using strcmp to decide which item goes first. That is the thing that decides the order. Since strcmp is giving an order you do not want, you need another function. In this case, you have to write your own function.
Your function should take two strings (via pointers to char), examine the strings, and return a value to indicate whether the first string should be before or after the second string (or whether they are equal).
Since this is likely a class assignment, I will leave it to you to ponder the necessary comparison function.
Alternative
There is an alternative method which is likely to be used in professionally deployed code, in suitable situations. I recommend the above because it is suitable for a class assignment—it addresses the key principle this assignment seems to target.
The alternative is to preprocess all the list items before doing the sort. Since you want to sort on the non-special characters of the names, you would augment the list by creating copies of the names with the special characters removed. These new versions would be your “sort keys”—they would be the values you use to decide order instead of the original names. You could compare them with strcmp.
This method requires allocating new memory for the new versions of the names, managing both the keys and the names while you sort them, and releasing the memory after the sort. It requires some overhead before you start the sort. However, if there are a very large number of things to sort with a considerable number of special characters, then doing the extra work up front can result in better performance overall.
(Again, I mention this only for completeness. It is likely not useful in a class assignment of this sort, just something computer science students should learn over time.)
Bonus Notes
You say you are sorting an array of char **ProcessNames. In this case, it is probably not necessary to move the strings themselves with strcpy. Instead, you can simply move the pointers to the strings. E.g., if you want to swap ProcessNames[4] and ProcessNames[7], just make a copy of the pointer that is ProcessNames[4], set ProcessNames[4] to be the pointer that is ProcessNames[7], and set ProcessNames[7] to be the temporary copy you made. This is generally faster than moving strings.
As others note, starting your z loop with z = 0 is probably not a good idea. You likely want z = b+1.
Your code uses size for the size of the string buffer (char temp[size]) and for the size of the ProcessNames array (for (b = 0; b < size; b++)). It is unlikely the number of strings to be sorted is the same as the maximum length of the strings. You should be sure to use the correct size in each instance.

Related

How to set the values from the token character to this array called customerData[][] in C?

I just started learning C language and I need some help with a program. Here is the code.
Questions:
What is this? customerData[NUM_FIELDS][FIELD_LENGTH];
Is it a char 2D array?
How do you input data into the array? fgetC, putchar, getchar ?
#include <stdio.h> #include <string.h> #include <stdlib.h>
#define INPUT_LENGTH 128
#define FIELD_LENGTH 30
#define NUM_FIELDS 9
int main()
{
FILE *data=NULL;
char input[INPUT_LENGTH];
char customerData[NUM_FIELDS][FIELD_LENGTH];
int element=0;
char *next;
char ch;
data= fopen("data.txt","r");
if(data!=NULL)
{
//token=strtok(input,"|");
/*while loop will go through line by line and stored it in an input array*/
while(fgets(input,INPUT_LENGTH,data)!= NULL)
{
next=strtok(input,"|");
while(next!=NULL)
{
//ch=getchar()
//>probably a get char for ch
strcpy(next,customerData[element][strlen(next)]);
/*need to put the values into customer data one by one*/
printf("%s\n",next);
//element+=1;
next=strtok(NULL,"|");
}
//element=0;
}
printf("program is done\n");
}
fclose(data);
return 0;
}
In general, "help me with my code" questions are off-topic on Stack Overflow. In order to keep the question on-topic, I'm going to focus only on the question of how to access 2D char arrays.
Yes, this is a 2D char array. Or, put another way, it's an array with NUM_FIELDS elements, where each element of the array is a char array with FIELD_LENGTH elements.
There are loads of ways to insert data into a 2D char array, but there are probably two I've encountered most often. Which one you choose to use will depend on how you want to think of this array.
Option 1: A 2D array of single chars
The first way to think about this variable is simply as a 2D array of chars - a grid of elements that you can access. Here, you can simply input values using the normal assignment operator. You'll want to make sure that your indexes are in range, or you'll start accessing invalid memory.
//Set a known element that's definitely in range
customerData[1][2] = 'A';
//Loop through all the elements
for(int ii = 0; ii < NUM_FIELDS; ii++)
{
for (int jj = 0; jj < FIELD_LENGTH; jj++)
{
customerData[i][j] = 'B';
}
}
//Set an element from variables
char nextInput = getNextCharFromInput();
if(x < NUM_FIELD && y < FIELD_LENGTH)
{
customerData[x][y] = nextInput;
}
//Bad. This could corrupt memory
customerData[100][60] = 'X';
//Risky without check. How do you know x and y are in range?
cusomterData[x][y] = 'X';
You could certainly write your code by assigning these elements on character at a time. However, the broader context of your program heavily implies to me that the next option is better.
Option 2: A 1D array of fixed-length strings
In C, a "string" is simply an array of chars. So another way to look at this variable (and the one that makes the most sense for this program) is to treat it as a 1D array of length NUM_FIELDS, where each element is a string of length FIELD_LENGTH.
Looking at this this way, you can start using the C string functions to input data into the array, rather than needing to deal character by character. As before, you still need to be careful of lengths so that you don't go off the end of the strings.
Also be aware that all array decay into pointers, so char* is also a string (just of unknown length).
//Set a specific field to a known string, which is short enough to fit
strcpy(customerData[2], "date");
//Loop through all fields and wipe their data
for(int ii = 0; ii < NUM_FIELDS; ii++)
{
memset(customerData[ii], 0, FIELD_LENGTH);
}
//Set field based on variables
if(x < NUM_FIELDS)
{
//Will truncate next if it is too long
strncpy(customerData[x], next, FIELD_LENGTH);
//Will not input anything if field is too long
if(strlen(next) < FIELD_LENGTH)
{
strcpy(customerData[x], next);
}
}
//Bad. Could corrupt memory
strcpy(customerData[100], "date");
strcpy(customerData[1], "this string is definitely much longer than FIELD_LENGTH");
//Risky. Without a check, how do you know either variable in in range?
strcpy(customerData[x], next);
getchar and fgetC both deal with reading characters, from stdout and a file respectively, so can't be used to put data into a variable. putchar does deal with put character into things, but only stdout, so can't be used here.

Creating multiple random strings in C

I have written a code which creates multiple random strings. But every time I print it, only the last string is printed multiple times even though different strings are created every time. Can anyone tell me what I'm doing wrong.
static const char alphanum[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz";
char s[5],*b[5] ;
int num =0;
for(int j=0;j<5;j++)
{
*b=(char*)malloc(sizeof(char*)*10);
for (int i = 0; i < 4; ++i)
{
num = rand() % (sizeof(alphanum) - 1);
s[i] = alphanum[num];
}
s[4] = 0;
printf("%s\t",s);
b[j] = s;
}
for(int j=0;j<5;j++)
printf("\n%s",b[j]);
}
Assuming that you've seeded the random number generator with, for instance, srand(time(NULL));, so that it will generate different random number sequences on each run of the program, there is one more flaw in your code:
s is a pointer to an array of characters. With the assignment b[j] = s;, you only assign b[j] the pointer (memory location) of s, but not the contents of s. Since the memory location of s does not change, all entries of b contain the same reference to the same string s, which has been changed multiple times. To copy the current content of s to b[j], use strcpy(), like this.
strcpy(b[j], s);
I think your should read the man 3 rand
In facts you have to "seed" your rand by calling void srand(unsigned int seed); one time in the beggining of your application
First of all, doing e.g. *b is the same as *(b + 0) which is the same as b[0]. That means that when you allocate memory you assign it to the same entry all the time.
Secondly, last in the loop you overwrite the pointer and make b[j] point to s, all the time. So all pointers in b will point to the same s. That's why all your strings seems to be the same.
Thirdly, you don't need to allocate dynamically in the loop, as all strings are of a fixed size. Instead declare b as an array of arrays of characters:
char b[5][5];
Then instead of assigning the pointer, you copy the string into the correct entry in b.
Lastly, and for future reference, don't cast the return of malloc.

Alphabetically Ordering an array of words

I'm studying C on my own in preparation for my upcoming semester of school and was wondering what I was doing wrong with my code so far.
If Things look weird it is because this is part of a much bigger grab bag of sorting functions I'm creating to get a sense of how to sort numbers,letters,arrays,and the like! I'm basically having some troubles with the manipulation of strings in C currently.
Also, I'm quite limited in my knowledge of C at the moment!
My main Consists of this:
#include <stdio.h>
#include <stdio.h>
#include <stdlib.h>
int numbers[10];
int size;
int main(void){
setvbuf(stdout,NULL,_IONBF,0); //This is magical code that allows me to input.
int wordNumber;
int lengthOfWord = 50;
printf("How many words do you want to enter: ");
scanf("%i", &wordNumber);
printf("%i\n",wordNumber);
char words[wordNumber][lengthOfWord];
printf("Enter %i words:",wordNumber);
int i;
for(i=0;i<wordNumber+1;i++){ //+1 is because my words[0] is blank.
fgets(&words[i], 50, stdin);
}
for(i=1;i<wordNumber+1;i++){ // Same as the above comment!
printf("%s", words[i]); //prints my words out!
}
alphabetize(words,wordNumber); //I want to sort these arrays with this function.
}
My sorting "method" I am trying to construct is below! This function is seriously flawed, but I'd thought I'd keep it all to show you where my mind was headed when writing this.
void alphabetize(char a[][],int size){ // This wont fly.
size = size+1;
int wordNumber;
int lengthOfWord;
char sortedWords[wordNumber][lengthOfWord]; //In effort for the for loop
int i;
int j;
for(i=1;i<size;i++){ //My effort to copy over this array for manipulation
for(j=1;j<size;j++){
sortedWords[i][j] = a[i][j];
}
}
//This should be kinda what I want when ordering words alphabetically, right?
for(i=1;i<size;i++){
for(j=2;j<size;j++){
if(strcmp(sortedWords[i],sortedWords[j]) > 0){
char* temp = sortedWords[i];
sortedWords[i] = sortedWords[j];
sortedWords[j] = temp;
}
}
}
for(i=1;i<size;i++){
printf("%s, ",sortedWords[i]);
}
}
I guess I also have another question as well...
When I use fgets() it's doing this thing where I get a null word for the first spot of the array. I have had other issues recently trying to scanf() char[] in certain ways specifically spacing my input word variables which "magically" gets rid of the first null space before the character. An example of this is using scanf() to write "Hello" and getting " Hello" or " ""Hello"...
Appreciate any thoughts on this, I've got all summer to study up so this doesn't need to be answered with haste! Also, thank you stack overflow as a whole for being so helpful in the past. This may be my first post, but I have been a frequent visitor for the past couple of years and it's been one of the best places for helpful advice/tips.
You're going to like this - it's a derivation of QSort, adapted to your situation. It may not work quite perfectly for you without a touchup here or there (TEST FIRST!):
void qsort (Strings[], NumberOfItems)
{
char Temp[51]; // Assumes your max string length of 50
int I1 = 0; // Primary index
int I2 = 0; // Index + 1
int NumberOfItems_1 = 0;
bool Swapped = false;
do // Outer loop
{
Swapped = false;
// Decrement our limit
NumberOfItems--;
// Save time not performing this subtraction many times
NumberOfItems_1 = NumberOfItems - 1;
// Repeatedly scan the list
for (I1 = 0; I1 < NumberOfItems_1; I1++)
{
// Save time not performing this addition many times
// I1 points to the current string
// This points to the next string
I2 = I1 + 1;
// If the current string is greater than the next string in the list,
// swap the two strings
if (strcmp(Strings[I1], Strings[I2]) > 0)
{
strcpy (Temp, Strings[I1]);
strcpy (Strings[I1], Strings[I2]);
strcpy (Strings[I2], Temp);
Swapped = true;
}
}
}
while (Swapped); // Break out when we've got nothing left to swap
}
I see a few things wrong with your code off the bat. First, you declare sortedWords as a multidimensional array (since you have sortedWords[wordnumber][lengthofword], but you try to use it with only one index later on.. this doesn't work! Also, your passing of the 2D array is not valid. Check out this post to see the valid ways to pass a 2D array: Passing a 2D array to a C++ function
Function declaration and definition
The function declaration is invalid, as you've found out. You must specify the size of every dimension of the array except the first, so you might write:
void alphabetize(char a[][SOMESIZE], int size)
However, you have a non-constant second dimension (so you're using a VLA or variable length array), which means that you need to pass both sizes to the function, and pass them before you pass the array:
void alphabetize(int size, int length, char a[size][length])
and then invoke it:
alphabetize(wordNumber, lengthOfWords, words);
Of course, you should also declare the function before you try calling it.
There are probably other issues to be addressed, but this is the immediate one that jumps out. For example, you need to use size and length to control the loops in the function. You probably don't need to copy the data into the local array (in which case you don't need the local array). Etc.
You should consider compiling with options such as:
gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
-Wold-style-definition -Wold-style-declaration -Werror …
Note that note all versions of GCC support all those options, but use as many as are supported.
Input issue
You have:
int i;
for (i = 0; i < wordNumber + 1; i++) { //+1 is because my words[0] is blank.
fgets(&words[i], 50, stdin);
}
You're stepping out of bounds of your array, which potentially wreaks havoc on your code. The first entry is blank because scanf() leaves the newline in the input buffer. You should read to the end of line before going to line-based input:
int c;
while ((c = getchar()) != EOF && c != '\n')
;
You should also check the fgets() returns a non-null pointer; don't continue if it does.

Algorithm for processing the string

I really don't know how to implement this function:
The function should take a pointer to an integer, a pointer to an array of strings, and a string for processing. The function should write to array all variations of exchange 'ch' combination to '#' symbol and change the integer to the size of this array. Here is an example of processing:
choker => {"choker","#oker"}
chocho => {"chocho","#ocho","cho#o","#o#o"}
chachacha => {"chachacha","#achacha","cha#acha","chacha#a","#a#acha","cha#a#a","#acha#a","#a#a#a"}
I am writing this in C standard 99. So this is sketch:
int n;
char **arr;
char *string = "chacha";
func(&n,&arr,string);
And function sketch:
int func(int *n,char ***arr, char *string) {
}
So I think I need to create another function, which counts the number of 'ch' combinations and allocates memory for this one. I'll be glad to hear any ideas about this algorithm.
You can count the number of combinations pretty easily:
char * tmp = string;
int i;
for(i = 0; *tmp != '\0'; i++){
if(!(tmp = strstr(tmp, "ch")))
break;
tmp += 2; // Skip past the 2 characters "ch"
}
// i contains the number of times ch appears in the string.
int num_combinations = 1 << i;
// num_combinations contains the number of combinations. Since this is 2 to the power of the number of occurrences of "ch"
First, I'd create a helper function, e.g. countChs that would just iterate over the string and return the number of 'ch'-s. That should be easy, as no string overlapping is involved.
When you have the number of occurences, you need to allocate space for 2^count strings, with each string (apart from the original one) of length strlen(original) - 1. You also alter your n variable to be equal to that 2^count.
After you have your space allocated, just iterate over all indices in your new table and fill them with copies of the original string (strcpy() or strncpy() to copy), then replace 'ch' with '#' in them (there are loads of ready snippets online, just look for "C string replace").
Finally make your arr pointer point to the new table. Be careful though - if it pointed to some other data before, you should think about freeing it or you'll end up having memory leaks.
If you would like to have all variations of replaced string, array size will have 2^n elements. Where n - number of "ch" substrings. So, calculating this will be:
int i = 0;
int n = 0;
while(string[i] != '\0')
{
if(string[i] == 'c' && string[i + 1] == 'h')
n++;
i++;
}
Then we can use binary representation of number. Let's note that incrementing integer from 0 to 2^n, the binary representation of i-th number will tell us, which "ch" occurrence to change. So:
for(long long unsigned int i = 0; i < (1 << n); i++)
{
long long unsigned int number = i;
int k = 0;
while(number > 0)
{
if(number % 2 == 1)
// Replace k-th occurence of "ch"
number /= 2;
k++;
}
// Add replaced string to array
}
This code check every bit in binary representation of number and changes k-th occurrence if k-th bit is 1. Changing k-th "ch" is pretty easy, and I leave it for you.
This code is useful only for 64 or less occurrences, because unsigned long long int can hold only 2^64 values.
There are two sub-problems that you need to solve for your original problem:
allocating space for the array of variations
calculating the variations
For the first problem, you need to find the mathematical function f that takes the number of "ch" occurrences in the input string and returns the number of total variations.
Based on your examples: f(1) = 1, f(2) = 4 and f(3) = 8. This should give you a good idea of where to start, but it is important to prove that your function is correct. Induction is a good way to make that proof.
Since your replace process ensures that the results have either the same of a lower length than the original you can allocate space for each individual result equal to the length of original.
As for the second problem, the simplest way is to use recursion, like in the example provided by nightlytrails.
You'll need another function which take the array you allocated for the results, a count of results, the current state of the string and an index in the current string.
When called, if there are no further occurrences of "ch" beyond the index then you save the result in the array at position count and increment count (so the next time you don't overwrite the previous result).
If there are any "ch" beyond index then call this function twice (the recurrence part). One of the calls uses a copy of the current string and only increments the index to just beyond the "ch". The other call uses a copy of the current string with the "ch" replaced by "#" and increments the index to beyond the "#".
Make sure there are no memory leaks. No malloc without a matching free.
After you make this solution work you might notice that it plays loose with memory. It is using more than it should. Improving the algorithm is an exercise for the reader.

Optimizing a search algorithm in C

Can the performance of this sequential search algorithm (taken from
The Practice of Programming) be improved using any of C's native utilities, e.g. if I set the i variable to be a register variable ?
int lookup(char *word, char*array[])
{
int i
for (i = 0; array[i] != NULL; i++)
if (strcmp(word, array[i]) == 0)
return i;
return -1;
}
Yes, but only very slightly. A much bigger performance improvement can be achieved by using better algorithms (for example keeping the list sorted and doing a binary search).
In general optimizing a given algorithm only gets you so far. Choosing a better algorithm (even if it's not completely optimized) can give you a considerable (order of magnitude) performance improvement.
I think, it will not make much of a difference. The compiler will already optimize it in that direction.
Besides, the variable i does not have much impact, word stays constant throughout the function and the rest is too large to fit in any register. It is only a matter how large the cache is and if the whole array might fit in there.
String comparisons are rather expensive computationally.
Can you perhaps use some kind of hashing for the array before searching?
There is well-known technique as sentinal method.
To use sentinal method, you must know about the length of "array[]".
You can remove "array[i] != NULL" comparing by using sentinal.
int lookup(char *word, char*array[], int array_len)
{
int i = 0;
array[array_len] = word;
for (;; ++i)
if (strcmp(word, array[i]) == 0)
break;
array[array_len] = NULL;
return (i != array_len) ? i : -1;
}
If you're reading TPOP, you will next see how they make this search many times faster with different data structures and algorithms.
But you can make things a bit faster by replacing things like
for (i = 0; i < n; ++i)
foo(a[i]);
with
char **p = a;
for (i = 0; i < n; ++i)
foo(*p);
++p;
If there is a known value at the end of the array (e.g. NULL) you can eliminate the loop counter:
for (p = a; *p != NULL; ++p)
foo(*p)
Good luck, that's a great book!
To optimize that code the best bet would be to rewrite the strcmp routine since you are only checking for equality and don't need to evaluate the entire word.
Other than that you can't do much else. You can't sort as it appears you are looking for text within a larger text. Binary search won't work either since the text is unlikely to be sorted.
My 2p (C-psuedocode):
wrd_end = wrd_ptr + wrd_len;
arr_end = arr_ptr - wrd_len;
while (arr_ptr < arr_end)
{
wrd_beg = wrd_ptr; arr_beg = arr_ptr;
while (wrd_ptr == arr_ptr)
{
wrd_ptr++; arr_ptr++;
if (wrd_ptr == wrd_en)
return wrd_beg;
}
wrd_ptr++;
}
Mark Harrison: Your for loop will never terminate! (++p is indented, but is not actually within the for :-)
Also, switching between pointers and indexing will generally have no effect on performance, nor will adding register keywords (as mat already mentions) -- the compiler is smart enough to apply these transformations where appropriate, and if you tell it enough about your cpu arch, it will do a better job of these than manual psuedo-micro-optimizations.
A faster way to match strings would be to store them Pascal style. If you don't need more than 255 characters per string, store them roughly like this, with the count in the first byte:
char s[] = "\x05Hello";
Then you can do:
for(i=0; i<len; ++i) {
s_len = strings[i][0];
if(
s_len == match_len
&& strings[i][s_len] == match[s_len-1]
&& 0 == memcmp(strings[i]+1, match, s_len-1)
) {
return 1;
}
}
And to get really fast, add memory prefetch hints for string start + 64, + 128 and the start of the next string. But that's just crazy. :-)
Another fast way to do it is to get your compiler to use a SSE2 optimized memcmp. Use fixed-length char arrays and align so the string starts on a 64-byte alignment. Then I believe you can get the good memcmp functions if you pass const char match[64] instead of const char *match into the function, or strncpy match into a 64,128,256,whatever byte array.
Thinking a bit more about this, these SSE2 match functions might be part of packages like Intel's and AMD's accelerator libraries. Check them out.
Realistically, setting I to be a register variable won't do anything that the compiler wouldn't do already.
If you are willing to spend some time upfront preprocessing the reference array, you should google "The World's Fastest Scrabble Program" and implement that. Spoiler: it's a DAG optimized for character lookups.
/* there is no more quick */
int lookup(char *word, char*array[])
{
int i;
for(i=0; *(array++) != NULL;i++)
if (strcmp(word, *array) == 0)
return i;
return -1;
}

Resources