Is there an approach to traverse array randomly? - c

I am trying to compare linear memory access to random memory access. I am traversing an array in the order of its indices to log performance of linear memory access. However to log memory's performance with random memory access I want to traverse my array randomly i.e arr[8], arr[17], arr[34], arr[2]...
Can I use pointer chasing to achieve this while ensuring that no index are accessed twice? Is pointer chasing most optimal approach in this case?

If your goal is to show that sequential access is faster than non-sequential access, simply pointer chasing the latter is not a good way to demonstrate that. You would be comparing access via a single pointer plus simple offset against deterrencing one or more pointers before offsetting.
To use pointer chasing, you'd have to apply it to both cases. Here's an example:
int arr[n], i;
int *unshuffled[n];
int *shuffled[n];
for(i = 0; i < n; i++) {
unshuffled[i] = arr + i;
}
/* I'll let you figure out how to randomize your indices */
shuffle(unshuffled, shuffled)
/* Do toning on these two loops */
for(i = 0; i < n; i++) {
do_stuff(*unshuffled[i]);
}
for(i = 0; i < n; i++) {
do_stuff(*shuffled[i]);
}
It you want to time the direct access better though, you could construct some simple formula for advancing the index instead of randomizing the access completely:
for(i = 0; i < n; i++) {
do_stuff(arr[i]);
}
for(i = 0; i < n; i++) {
do_stuff(arr[i / 2 + (i % 2) * (n / 2)]);
}
This will only work properly for even n as shown, but it illustrates the idea. You could go so far as to compensate for the extra flops in computing the index within do_stuff.
Probably the most apples-to-apples test would be to literally access the indices you want, without loops or additional computations:
do_stuff(arr[0]);
do_stuff(arr[1]);
do_stuff(arr[2]);
...
do_stuff(arr[123]);
do_stuff(arr[17]);
do_stuff(arr[566]);
...
Since I'd imagine you'd want to test with large arrays, you can write a program to generate the actual test code for you, and possibly compile and run the result.

I can tell you that for arrays in C the access time is constant regardless of the index being accessed. There will be no difference between accessing them randomly or sequentially other than the fact that randomizing will in itself introduce additional computations.
But, to really answer your question, you would probably be best off to build some kind of lookup array and shuffle it a few times and use that array to get the next index. Obviously, you would be accessing two arrays, one sequentially and another randomly, by doing so, thus making the exercise pretty much useless.

Related

Avoiding duplicates in a 2D array?

I am doing a program in C which needs to take in a set of values (integers) into a 2D array, and then performs certain mathematical operations on it. I have decided to implement a check in the program as the user is inputting the values to avoid them from entering values that are already present in the array.
I am however unsure of how to go about this check. I figured out I might need some sort of recursive function to check all the elements previous to the one that's being entered, but I don't know how to implement it.
Please find below a snippet of my code for illustrative purposes:
Row and col are values inputted by the user for the dimension of the array
for (int i=0; i<row;i++){
for (int j=0; j<col; j++){
scanf("%d", &arr[i][j]); //take in elements
}
}
for (int i = 0; i < row; i++)
{
for (int j = 0; i < col; j++)
{
if (arr[i][j] == arr[i][j-1]){
printf("Duplicate.\n");}
else {}
}
}
I know this is probably not correct but it's my attempt.
Any help would be much appreciated.
I would suggest that your store every element you read in a temporary 1D array. Everytime you scan a new element, traverse the 1D array checking if the value exists or not. Although this is not optimal, this will be at least less expensive than traversing the 2D array everytime.
Example:
int temp[SIZE];
int k,elements = 0;
for (int i = 0; i < row; i++) {
for (int j = 0; j < col; j++) {
scanf("%d", &arr[i][j]); //take in elements
temp[elements] = arr[i][j];
elements++;
for (int k = 0; k < elements; k++) {
if (temp[k] == arr[i][j])
printf("Duplicate.\n"); //or do whatever you wish
}
}
}
A balanced tree inserts and searches in O(log N) time.
Since the algorithms are quite simple & standard and were published in the seminal books by Knuth, there are plenty of implementations out there, including a clear and concise one at codereview.SE (which is thus automatically CC-BY-SA 3.0; do apply a bugfix in the answer). Using it (as well as virtually any other one) is simple: start with node* root = NULL;, then insert and search, and finally free_tree.
Asymptotically, the best method is a hash table with O(1) for both, but that is probably an overkill (the algorithms are much more complex and memory footprint is larger) unless you have a lot of numbers. For C++, there's a standard implementation, yet there are plenty 3rd-party ones for C, too.
If your number of input values is small, even the tree may be an overkill, and simply looking through previous values would be fast enough. If your 2D array is contiguous in memory, you can access it as 1D with int* arr1d = (int*)&arr2d.

How to remove certain elements from an array using a conditional test in C?

I am writing a program that goes through an array of ints and calculates stdev to identify outliers in the data. From here, I would like to create a new array with the identified outliers removed in order to recalculate the avg and stdev. Is there a way that I can do this?
There is a pretty simple solution to the problem that involves switching your mindset in the if statement (which isn't actually in a for loop it seems... might want to fix that).
float dataMinusOutliers[n];
int indexTracker = 0;
for (i=0; i<n; i++) {
if (data[i] >= (-2*stdevfinal) && data[i] <= (2*stdevfinal)) {
dataMinusOutliers[indexTracker] = data[i];
indexTracker += 1;
}
}
Note that this isn't particularly scalable and that the dataMinusOutliers array is going to potentially have quite a few unused indices. You can always use indexTracker - 1 to note how large the array actually is though, and create yet another array into which you copy the important values in dataMinusOutliers. Is there likely a more elegant solution? Yes. Does this work given your requirements though? Yup.

Optimising C for performance vs memory optimisation using multidimensional arrays

I am struggling to decide between two optimisations for building a numerical solver for the poisson equation.
Essentially, I have a two dimensional array, of which I require n doubles in the first row, n/2 in the second n/4 in the third and so on...
Now my difficulty is deciding whether or not to use a contiguous 2d array grid[m][n], which for a large n would have many unused zeroes but would probably reduce the chance of a cache miss. The other, and more memory efficient method, would be to dynamically allocate an array of pointers to arrays of decreasing size. This is considerably more efficient in terms of memory storage but would it potentially hinder performance?
I don't think I clearly understand the trade-offs in this situation. Could anybody help?
For reference, I made a nice plot of the memory requirements in each case:
There is no hard and fast answer to this one. If your algorithm needs more memory than you expect to be given then you need to find one which is possibly slower but fits within your constraints.
Beyond that, the only option is to implement both and then compare their performance. If saving memory results in a 10% slowdown is that acceptable for your use? If the version using more memory is 50% faster but only runs on the biggest computers will it be used? These are the questions that we have to grapple with in Computer Science. But you can only look at them once you have numbers. Otherwise you are just guessing and a fair amount of the time our intuition when it comes to optimizations are not correct.
Build a custom array that will follow the rules you have set.
The implementation will use a simple 1d contiguous array. You will need a function that will return the start of array given the row. Something like this:
int* Get( int* array , int n , int row ) //might contain logical errors
{
int pos = 0 ;
while( row-- )
{
pos += n ;
n /= 2 ;
}
return array + pos ;
}
Where n is the same n you described and is rounded down on every iteration.
You will have to call this function only once per entire row.
This function will never take more that O(log n) time, but if you want you can replace it with a single expression: http://en.wikipedia.org/wiki/Geometric_series#Formula
You could use a single array and just calculate your offset yourself
size_t get_offset(int n, int row, int column) {
size_t offset = column;
while (row--) {
offset += n;
n << 1;
}
return offset;
}
double * array = calloc(sizeof(double), get_offset(n, 64, 0));
access via
array[get_offset(column, row)]

C - How can I sort and print an array in a method but have the prior unsorted array not be affected

This is for a Deal or No Deal game.
So in my main function I'm calling my casesort method as such:
casesort(cases);
My method looks like this, I already realize it's not the most efficient sort but I'm going with what I know:
void casesort(float cases[10])
{
int i;
int j;
float tmp;
float zero = 0.00;
for (i = 0; i < 10; i++)
{
for (j = 0; j < 10; j++)
{
if (cases[i] < cases[j])
{
tmp = cases[i];
cases[i] = cases[j];
cases[j] = tmp;
}
}
}
//Print out box money amounts
printf("\n\nHidden Amounts: ");
for (i = 0; i < 10; i++)
{
if (cases[i] != zero)
printf("[$%.2f] ", cases[i]);
}
}
So when I get back to my main it turns out the array is sorted. I thought void would prevent the method returning a sorted array. I need to print out actual case numbers, I do this by just skipping over any case that is populated with a 0.00. But after the first round of case picks I get "5, 6, 7, 8, 9, 10" printing out back in my MAIN. I need it to print the cases according to what has been picked. I feel like it's a simple fix, its just that my knowledge of the specifics of C is still growing. Any ideas?
Return type void has nothing to do with prevention of array from being sorted. It just says that function does not return anything.
You see that the passed array itself is affected because an array decays to a pointer when passed to a function. Make a copy of the array and then pass it. That way you have the original list.
In C, arrays are passed by reference. i.e. they're passed as pointer to the first element. So when you pass cases into your function, you're actually giving it the original array to modify. Try creating a copy and sorting the copy rather than the actual array. Creating a copy wouldn't be bad as you have only 10 floats.
Instead of rolling your own sort, consider using qsort() or std::sort() if you are actually using c++
There are 2 obvious solutions. 1) Make a copy of the array and sort the copy (easy, waste some memory, likely not a problem these days). 2) Create a parallel array of integers and perform an index sort, i.e., instead of sorting thing original, you sort the index and then dereference the array using the index when you want the sorted version, otherwise by the raw unsorted array.
Well, make a local copy of you input and sort it. Something like this:
void casesort(float cases[10])
{
float localCases[10];
memcopy(localCases, cases, sizeof(cases));
...
Then use localCases to do your sorting.
If you don't want the array contents to be affected, then you'll have to create a copy of the array and pass that to your sorting routine (or create the copy within the routine itself).
Arrays Are Different™ in C; see my answer here for a more detailed explanation.

Maintain a sorted array that a separate, iterative function can keep accessing

I'm writing code for a decision tree in C. Right now it gives me the correct result (0% training error, low test error), but it takes a long time to run.
The problem lies in how often I run qsort. My basic algorithm is this:
for every feature
sort that feature column using qsort
remove duplicate feature values in that column
for every unique feature value
split
determine entropy given that split
save the best feature to split + split value
for every training_example
if training_example's value for best feature < best split value, store in Left[]
else store in Right[]
recursively call this function, using only the Left[] training examples
recursively call this function, using only the Right[] training examples
Because the last two lines are iterative calls, and because the tree can extend for dozens and dozens of branches, the number of calls to qsort is huge (especially for my dataset that has > 1000 features).
My idea to reduce the runtime is to create a 2d array (in a separate function) where each column is a sorted feature column. Then, as long as I maintain a vector of row numbers of the training examples in Left[] and Right[] for each recursive call, I can just call this separate function, grab the rows I want in the pre-sorted feature vector, and save the cost of having to qsort each time.
I'm fairly new to C and so I'm not sure how to code this. In MatLab I can just have a global array that any function can change or access, looking for something like that in C.
Global arrays in C are totally possible. There are actually two ways of doing that. In the first case the dimensions of the array are fixed for the application:
#define NROWS 100
#define NCOLS 100
int array[NROWS][NCOLS];
int main(void)
{
int i, j;
for (i = 0; i < NROWS; i++)
for (j = 0; j < NCOLS; j++)
{
array[i][j] = i+j;
}
return 0;
}
In the second example the dimensions may depend on values from the input.
#include <stdlib.h>
int **array;
int main(void)
{
int nrows = 100;
int ncols = 100;
int i, j;
array = malloc(nrows*sizeof(*array));
for (i = 0; i < nrows; i++)
{
array[i] = malloc(ncols*sizeof(*(array[i])));
for (j = 0; j < ncols; j++)
{
array[i][j] = i+j;
}
}
}
Although the access to the arrays in both examples looks deceivingly similar, the implementation of the arrays is quite different. In the first example the array is located in one piece of memory and the strides to access rows is a whole row. In the second example each row access is a pointer to a row, which is one piece of memory. The various rows can however be located in different areas of the memory. In the second example rows might also have a different length. In that case you would need to store the length of each row somewhere too.
I don't fully understand what you are trying to achieve, because I'm not familiar with the terminology of decision tree, feature and the standard approaches to training sets. But you may also want to have a look at other data structures to maintain sorted data:
http://en.wikipedia.org/wiki/Red–black_tree maintains a more or less balanced and sorted tree.
AVL tree a bit slower but more balanced and sorted tree.
Trie a sorted tree on lists of elements.
Hash function to easily map a complex element to an integral value that can be used to sort the elements. Good for finding exact elements, but there is no real order in the elements itself.
P.S1: Coming from Matlab you may want to consider a different language from C to move to. C++ has standard libraries to support above data structures. Java, Python come to mind or even Haskell if you are daring. Pointer handling in C can be quite tedious and error prone.
P.S2: I'm unable to include a - in a URL on StackOverflow. So the Red-black tree links is a bit off and can't be clicked. If someone can edit my post to fix it, then I would appreciate that.

Resources