words processing (dynamic array of structs) - arrays

//runs through initial values and set them to null and zero;
for(int g =0;g<Arraysize;g++){Array1[g].word="NULL";Array1[g].usage=0;}
//struct
int Arraysize = 100;
struct HeavyWords{
string word;
int usage;
};
//runs through txt file and checks if word has already been stored, if it didn't,
it adds it as the next point in the struct, if it has, it adds to the usage int at that point in my array of structs
while (myfile >> Bookword)
{totalwords++;cout<<Bookword<<endl;
bool foundWord = false;
for(int q = 0;q<counter;q++)
{
if(Array1[q].word == Bookword)
{
Array1[q].usage++;
foundWord = true;
}
}
if(foundWord == false) {
Array1[counter].word = Bookword;
Array1[counter].usage = 1;
counter++;
//cout<<counter<<endl;
}
//double size of array when the counter reaches array size
if(counter==Arraysize)
{
HeavyWords * Array2;
Array2 = new HeavyWords[2*Arraysize];
for (int k= 0;k<Arraysize;k++)
{
Array2[k].word = Array1[k].word;
}
Arraysize = 2*Arraysize;
Arraydouble++;
HeavyWords* cursor = Array1;
Array1 = Array2;
delete [] cursor;
}
}
//I just started programming in C++ so im apologize if this code is an explosion of nonesence.
//here is my code,
//I have been racking my brain as to why it is not correctly storing the usage of each word, but when I run it, it gives me the incorrect amount of times certain words are used
//would really love if someone could tell me where my logic went wrong

Immediate problem - While copying Array1 to Array2 you are not copying the usage.
Solution - copy the usage. A statement such as Array2[k] = Array1[k] would do.
Suggestions:
You are also not breaking out in the first part of the code when you find a match in the Array1 for the word you are looking for. You code would needlessly continue to iterate over the entire array, when e.g. a match would have been found at say 10th index and you could have come out of the for loop.
You are re inventing the wheel. You need an expandable array; C++ STL has one readymade for you - it is called vector.
Also Array/Vector does NOT look to be right choice for what you are trying to do. On each word you are doing a linear search on the Array1. A map from C++ STL would neatly AND efficiently do what you are trying to do. Your code would also be much shorter. You can look up on how to code with maps. If you write some code, I can help further. Or wait; someone here would write out entire code for you :).

Related

Segregate Even and Odd numbers In an Integer Array using Recursion

I am doing an algorithm exercise, which asks to rearrange an array of integers, to put all the even-valued elements before odd-valued elements.
I thought for a while and came up with the following pseudo code:
int[] Rearrange(int [] arr)
{
if arr.length=1
return arr;
if arr[0] is even
return arr[0] followed by Rearrange(arr.subarray(1,arr.length))
else
return Rearrange(arr.subarray(1,arr.length)) followed by arr[0]
}
I am bit concerned about my proposed solution above, since I need to do a copy operation in each recursion cycle, which is expensive. Experts please kindly advise, thanks!
Recursion is expensive, and your approach would create tons of extra copies. Sometimes recursion yields an elegant solution, and other times it is absolutely the wrong tools for the job. This is a wrong-tool-for-the job case.
Instead, write a method that keeps a head index and a tail index. Initialize the head pointer to the beginning of the array and the tail index to the end.
At each pass, loop through the items at the head of the list, looking for odd values. When you find one, stop, then look for an even value from the end, looking backwards. When you find one, switch the two values (using a third int as temporary storage.) Repeat forever. When the head and tail indexes meet, you're done.
Something like this:
int head_index = 0;
int tail_index = array.count;
int temp;
while (true)
{
//Find the next odd number at the front of the array.
while (array[head_index] %2==0) && head_index < tail_index)
head_index++;
//Find the next even number at the end of the array.
while (array[tail_index]%2==1 && head_index < tail_index)
tail_index--;
//If the pointers meet, we're done
if (head_index <= tail_index)
break;
//Swap the items at the current indexes
temp = array[head_index];
array[head_index] = array[tail_index];
array[tail_index] = temp;
}
(Completely untested, and I'm tired, but the basic idea should work)
It's more-or-less C syntax pseudo code.
It should run in O(n) time, with the only extra RAM needed being your 2 indexes and the temporary holding variable.
Even though the question was answered, I'm putting the recursive version of solving this problem here so that people, who are wondering, can see why recursion is a bad approach for this problem.
public static int[] segregate(int[] array, int left) {
int leftIndex = left;
if(left == array.length) {
return array;
}
for(int i = leftIndex + 1; i < array.length; i++) {
if(array[leftIndex] % 2 == 1) {
if(array[i] % 2 == 0) {
int temp = array[leftIndex];
array[leftIndex] = array[i];
array[i] = temp;
}
}
}
return segregate(array, leftIndex + 1);
}
As can be seen from the code, the method will call itself N times. When you consider the fact that the complexity of for loop in the method is O(N), the total complexity of recursion will be O(n*2) which is worse than non-recursive solution.

C - How can I sort and print an array in a method but have the prior unsorted array not be affected

This is for a Deal or No Deal game.
So in my main function I'm calling my casesort method as such:
casesort(cases);
My method looks like this, I already realize it's not the most efficient sort but I'm going with what I know:
void casesort(float cases[10])
{
int i;
int j;
float tmp;
float zero = 0.00;
for (i = 0; i < 10; i++)
{
for (j = 0; j < 10; j++)
{
if (cases[i] < cases[j])
{
tmp = cases[i];
cases[i] = cases[j];
cases[j] = tmp;
}
}
}
//Print out box money amounts
printf("\n\nHidden Amounts: ");
for (i = 0; i < 10; i++)
{
if (cases[i] != zero)
printf("[$%.2f] ", cases[i]);
}
}
So when I get back to my main it turns out the array is sorted. I thought void would prevent the method returning a sorted array. I need to print out actual case numbers, I do this by just skipping over any case that is populated with a 0.00. But after the first round of case picks I get "5, 6, 7, 8, 9, 10" printing out back in my MAIN. I need it to print the cases according to what has been picked. I feel like it's a simple fix, its just that my knowledge of the specifics of C is still growing. Any ideas?
Return type void has nothing to do with prevention of array from being sorted. It just says that function does not return anything.
You see that the passed array itself is affected because an array decays to a pointer when passed to a function. Make a copy of the array and then pass it. That way you have the original list.
In C, arrays are passed by reference. i.e. they're passed as pointer to the first element. So when you pass cases into your function, you're actually giving it the original array to modify. Try creating a copy and sorting the copy rather than the actual array. Creating a copy wouldn't be bad as you have only 10 floats.
Instead of rolling your own sort, consider using qsort() or std::sort() if you are actually using c++
There are 2 obvious solutions. 1) Make a copy of the array and sort the copy (easy, waste some memory, likely not a problem these days). 2) Create a parallel array of integers and perform an index sort, i.e., instead of sorting thing original, you sort the index and then dereference the array using the index when you want the sorted version, otherwise by the raw unsorted array.
Well, make a local copy of you input and sort it. Something like this:
void casesort(float cases[10])
{
float localCases[10];
memcopy(localCases, cases, sizeof(cases));
...
Then use localCases to do your sorting.
If you don't want the array contents to be affected, then you'll have to create a copy of the array and pass that to your sorting routine (or create the copy within the routine itself).
Arrays Are Differentâ„¢ in C; see my answer here for a more detailed explanation.

C - Returning the most repeated/occurring string in an array of char pointers

I have almost completed the code for this problem, which I shall state as under:
Given:
Array of length 'n' (say n = 10000) declared as below,
char **records = malloc(10000*sizeof(*records));
Each record[i] is a char pointer and points to a non-empty string.
records[i] = malloc(11);
The strings are of fixed length (10 chars + '\0').
Requirement:
Return the most frequently occurring string in the above array.
But now, I am interested in obtaining a slightly less brutal algorithm than the primitive one which I have currently, which is to sift through the entire array in two for loops :(, storing strings encountered by the two loops in a temporary array of similar size ('n' - in case all are unique strings) for comparison with the next strings. The inner loop iterates from 'outer loop position + 1' to 'n'. At the same time, I have an integer array, of similar size - 'n', for counting repeat occurrences, with each i th element corresponding to the i th (unique) string in the comparison array. Then find the largest integer and use its index in the comparison array to return the most frequently occurring string.
I hope I am clear enough. I am quite ashamed of the algo myself, but it had to be done. I am sure there is a much smarter way to do this in C.
Have a great Sunday,
Cheers!
Without being good at nice algorithms (Google, Wikipedia and Stackoverflow are good enough for me), one solution that comes out at the top of my head is to sort the array, then use a single loop to go through the entries. As long as the current string is the same as the previous, increase a counter for that string. When done you have a "list" of strings and their occurrence, which can then be sorted if needed.
In most languages, the usual approach would be to construct a hashtable, mapping strings to counts. This has O(N) complexity.
For example, in Python (although usually you would use collections.Counter for this, and even this code can be made more concise using more specialised Python knowledge, but I've made it explicit for demonstration).
def most_common(strings):
counts = {}
for s in strings:
if s not in counts:
counts[s] = 0
counts[s] += 1
return max(counts, key=counts.get)
But in C, you don't have a hashtable in the standard library (although in C++ you can use hash_map from the STL), so a sort and scan can be done instead. It's O(N.log(N)) complexity, which is worse than optimal, but quite practical.
Here's some C (actually C99) code that implements this.
int compare_strings(const void*s0, const void*s1) {
return strcmp((const char*)s0, (const char*)s1);
}
const char *most_common(const char **records, size_t n) {
qsort(records, n, sizeof(records[0]), compare_strings);
const char *best = 0; // The most common string found so far.
size_t max = 0; // The longest run found.
size_t run = 0; // The length of the current run.
for (size_t i = 0; i < n; i++) {
if (!compare_strings(records[i], records[i - run])) {
run += 1;
} else {
run = 1;
}
if (run > max) {
best = records[i];
max = run;
}
}
return best;
}

Test for first empty value of array

Is there anyway to test for the first empty value of a 2 dimensional int array in c?
In my current program, I used 2 for loops before the main program(while loop) to set all the values of my 2 dimensional array to -9999. Then inside my main while loop, I test for the first -9999 value and set it to a value, and then use break to exit from it.
Using this I managed to do my assignment, but I'm not very satisfied, as I think there might be a better solution.
Is there one?
EDIT: Code since you asked for it.
For loop outside while loop:
for(int x=0;x<ctr-1;x++)
{
for(int y=0;y<maxtrips;y++)
{
EmployeeKilos[x][y] = -9999; // Set all the kilos to -9999 to signify emptiness.
}
}
Inside my main while loop:
for(int x=0;x<ctr-1;x++) // and set it to the log kilometers
{
if(employeenames[x].EmployeeNumber == log.Record)
{
for(int y=0;y<maxtrips;y++)
{
if(EmployeeKilos[x][y] == -9999)
{
EmployeeKilos[x][y] = log.Kilometers;
break;
}
}
}
}
All my code: http://pastebin.com/Zb60mym8
As Dave said, checking for empty values cannot be made more efficient than linear time (O(n)), but my answer focuses on a solution that can prevent having to look for it in the first place.
In general you could iterate the matrix in row-major or column-major mode.
Effectively, you can use a single index that translates to a matrix cell like so
for (size_t i=0; i<ROWS*COLS; ++i)
{
int row = i / ROWS;
int col = i % ROWS;
// work with matrix[row][col]
}
This way you could just store and remember the value of i where you last found the first empty cell, so you don't have to restart from the beginning.
If you're not actually interested in row/col addressing, you could forget about those and just use an output iterator to track your current output location.
Here's a demo using 'iterator' style (borrowing from c++ but perfectly C99)
typedef int data;
typedef data* output;
output add_next(data matrix[ROWS][COLS], output startpoint, data somevalue)
{
if (output < (matrix + ROWS*COLS))
*(output++) = somevalue;
return output;
}
Now you can just say:
add_next(matrix, 42);
add_next(matrix, 9);
NOTE the output iterator thing assumes contiguous storage and therefore cannot be used with so-called jagged arrays
HTH
You can use memset to initialise arrays to a fixed value - it's a bit more efficient and cleaner looking than iterating over the array.
Checking for your 'empty' value can't be done much faster than you are doing it, though. :)
This sounds like you should think about your datatype. Since you are already using structs. why don't you add another int for the last unassigned value so you just loop to it. something like
e.g:
struct t_EmployeeKilos{
int kilos[maxtrips];
int nlast;
} EmployeeKilos[N];
and set nlast whenever you assign a new element in kilos. This way it is O(1).
for(int x=0;x<ctr-1;x++) //
{
if(employeenames[x].EmployeeNumber == log.Record)
{
EmployeeKilos[x].kilos[EmployeeKilos[x].nlast] = log.Kilometers;
EmployeeKilos[x].nlast++;
}
}

How do I remove duplicate strings from an array in C?

I have an array of strings in C and an integer indicating how many strings are in the array.
char *strarray[MAX];
int strcount;
In this array, the highest index (where 10 is higher than 0) is the most recent item added and the lowest index is the most distant item added. The order of items within the array matters.
I need a quick way to check the array for duplicates, remove all but the highest index duplicate, and collapse the array.
For example:
strarray[0] = "Line 1";
strarray[1] = "Line 2";
strarray[2] = "Line 3";
strarray[3] = "Line 2";
strarray[4] = "Line 4";
would become:
strarray[0] = "Line 1";
strarray[1] = "Line 3";
strarray[2] = "Line 2";
strarray[3] = "Line 4";
Index 1 of the original array was removed and indexes 2, 3, and 4 slid downwards to fill the gap.
I have one idea of how to do it. It is untested and I am currently attempting to code it but just from my faint understanding, I am sure this is a horrendous algorithm.
The algorithm presented below would be ran every time a new string is added to the strarray.
For the interest of showing that I am trying, I will include my proposed algorithm below:
Search entire strarray for match to str
If no match, do nothing
If match found, put str in strarray
Now we have a strarray with a max of 1 duplicate entry
Add highest index strarray string to lowest index of temporary string array
Continue downwards into strarray and check each element
If duplicate found, skip it
If not, add it to the next highest index of the temporary string array
Reverse temporary string array and copy to strarray
Once again, this is untested (I am currently implementing it now). I just hope someone out there will have a much better solution.
The order of items is important and the code must utilize the C language (not C++). The lowest index duplicates should be removed and the single highest index kept.
Thank you!
The typical efficient unique function is to:
Sort the given array.
Verify that consecutive runs of the same item are setup so that only one remains.
I believe you can use qsort in combination with strcmp to accomplish the first part; writing an efficient remove would be all on you though.
Unfortunately I don't have specific ideas here; this is kind of a grey area for me because I'm usually using C++, where this would be a simple:
std::vector<std::string> src;
std::sort(src.begin(), src.end());
src.remove(std::unique(src.begin(), src.end()), src.end);
I know you can't use C++, but the implementation should essentially be the same.
Because you need to save the original order, you can have something like:
typedef struct
{
int originalPosition;
char * string;
} tempUniqueEntry;
Do your first sort with respect to string, remove unique sets of elements on the sorted set, then resort with respect to originalPosition. This way you still get O(n lg n) performance, yet you don't lose the original order.
EDIT2:
Simple C implementation example of std::unique:
tempUniqueEntry* unique ( tempUniqueEntry * first, tempUniqueEntry * last )
{
tempUniqueEntry *result=first;
while (++first != last)
{
if (strcmp(result->string,first->string))
*(++result)=*first;
}
return ++result;
}
I don't quite understand your proposed algorithm (I don't understand what it means to add a string to an index in step 5), but what I would do is:
unsigned int i;
for (i = n; i > 0; i--)
{
unsigned int j;
if (strarray[i - 1] == NULL)
{
continue;
}
for (j = i - 1; j > 0; j--)
{
if (strcmp(strarray[i - 1], strarray[j - 1]) == 0)
{
strarray[j - 1] = NULL;
}
}
}
Then you just need to filter the null pointers out of your array (which I'll leave as an exercise).
A different approach would be to iterate backwards over the array and to insert each item into a (balanced) binary search tree as you go. If the item is already in the binary search tree, flag the array item (such as setting the array element to NULL) and move on. When you've processed the entire array, filter out the flagged elements as before. This would have slightly more overhead and would consume more space, but its running time would be O(n log n) instead of O(n^2).
Can you control the input as it is going into the array? If so, just do something like this:
int addToArray(const char * toadd, char * strarray[], int strcount)
{
const int toaddlen = strlen(toadd);
// Add new string to end.
// Remember to add one for the \0 terminator.
strarray[strcount] = malloc(sizeof(char) * (toaddlen + 1));
strncpy(strarray[strcount], toadd, toaddlen + 1);
// Search for a duplicate.
// Note that we are cutting the new array short by one.
for(int i = 0; i < strcount; ++i)
{
if (strncmp(strarray[i], toaddlen + 1) == 0)
{
// Found duplicate.
// Remove it and compact.
// Note use of new array size here.
free(strarray[i]);
for(int k = i + 1; k < strcount + 1; ++k)
strarray[i] = strarray[k];
strarray[strcount] = null;
return strcount;
}
}
// No duplicate found.
return (strcount + 1);
}
You can always use the above function looping over the elements of an existing array, building a new array without duplicates.
PS: If you are doing this type of operation a lot, you should move away from an array as your storage structure, and used a linked list instead. They are much more efficient for removing elements from a location other than the end.
Sort the array with an algorithm like qsort (man 3 qsort in the terminal to see how it should be used) and then use the function strcmp to compare the strings and find duplicates
If you want to mantain the original order you could use a O(N^2) complexity algorithm nesting two for, the first each time pick an element to compare to the other and the second for will be used to scan the rest of the array to find if the chosen element is a duplicate.

Resources