Aligning dataset columns in C - c

I have a dataset of two groups: Red and Green, and I want to compare the difference between ratios, but they have to be aligned first.
Original File ( First few rows of 200,000 entries)
A B C D
Red Ratio Green Ratio
1 0.35 1 0.21
2 0.45 2 0.235
3 0.45 3 0.154
4 0.235 4 0.156
6 0.156 5 0.146
7 0.668 6 0.154
8 0.44 7 0.148
9 0.446 8 0.148
10 0.354 9 0.199
11 0.154 10 0.143
12 0.49 12 0.148
After using the code, the values are aligned and the "extras" are delete, and the columns are shifted up.
A B C D
Red Ratio Green Ratio
1 0.35 1 0.21
2 0.45 2 0.235
3 0.45 3 0.154
4 0.235 4 0.156
6 0.156 6 0.154
7 0.668 7 0.148
8 0.44 8 0.148
9 0.446 9 0.199
10 0.354 10 0.143
12 0.49 12 0.148
15 0.146 15 0.87
17 0.113 17 0.113
19 0.44 19 0.448
This is the code I have so far: I am taking difference between A and C to check if they are 0, and adjusting them if they are not.
#include <stdio.h>
int deletemove(char column, int row)
{
// This script would delete the positions mentioned in the arguments, and shift the other values up.
}
int main(void)
{
//Opening input file for read/write
FILE *input;
input=fopen("/full/path/file.xlsx", "r");
if (input == NULL) {printf("error opening input file\n");}
//Store the values from file into an array
int colA[1024];
int colC[1024];
// read contents of columns A and C and store in an array
int ai;
for(ai=1; ai<1024; ai++)
{ fseek(input,ai,SEEK_SET);
colA[ai]=fgetc(input);
}
int ci;
for(ci=1; ci<1024; ci++)
{ fseek(input,ci,SEEK_SET);
colC[ci]=fgetc(input);
}
//Take difference between value of Column A and C to check if they are identical.
int j;
char A,B;
for (j = 1; j < 1024; j++)
{
int check = colA[j] - colC[j]; // check difference between two values in a column
if (check > 0)
deletemove(A,j); //delete values from column C and D
else if (check < 0)
deletemove(B,j); // delete values from column A and B
}
fclose(input); // close files
}
I need help implementing a delete row/column function and reading the values in array.
Also, is storing 200,000 values in an array a feasible method?
Thanks.

is storing 200,000 values in an array a feasible method?
Yes, as long as you don't put those arrays on the stack. Declaring a variable inside a function puts the variable on the stack(1). Contemporary (year 2016) desktops typically limit the stack size to a few megabytes, whereas the main memory is a few gigabytes.
So it's best to put large arrays into main memory. This can be done in a variety of ways:
use a global array, i.e. declare the array outside of any function
use a static array, i.e. declare the array with the static keyword
use a dynamically allocated array, i.e. use malloc to allocate the array
(You could also use a linked list. A linked list has the advantage that it can grow as needed; you don't need to know the space requirements in advance.)
In your case, I would store the ratios in the arrays at the index given by the red/green value. Assuming that ratios are always positive numbers, I would initialize all of the entries in the arrays with -1.0. Then, as you read the file store the ratios at the proper locations in the two arrays. For example, when you read the line
6 0.156 5 0.146
store 0.156 at index 6 in the red array, and store 0.146 at index 5 in the green array.
When all the values have been read from the file, you can simply scan the two arrays, and print the values where both arrays have a non-negative value.
(1) Ignoring oddball systems (e.g. small embedded systems) that don't have a normal stack.

Related

Finding key for minimum value and conditions in excel

This is my table (copied from the similar question Finding minimum value in index(match) array [EXCEL])
A B C D
tasmania 10 3 10
queensland 22 8 10
new south wales 10 12 12
northern territory 8 4 15
south australia 12 2 8
western australia 32 4 15
tasmania 72 6 16
I have criteria for B and C, and I want to retrieve the A with the lowest corresponding value D. Values in B, C and D can be duplicates, values in A can not.
Example:
B >= 8
C >= 4
Should result in "queensland" (lowest matching value is 10), but not "tasmania" (has the same cost)
I am currently trying this array formula:
{ =MIN(IF(B:B>=8;IF(C:C>=4;D;""));1) }
Which returns the correct lowest D, but since I am losing the informaiton about A, I can not retrieve the value for A
This as an array formula should work for you:
=INDEX($A$1:$A$7,MATCH(MIN(IF($B$1:$B$7>=8,IF($C$1:$C$7>=4,$D$1:$D$7))),IF($B$1:$B$7>=8,IF($C$1:$C$7>=4,$D$1:$D$7)),0))
It should be noted that if you have Excel 2016 or Office365, you'll have access to the MINIFS function which is probably better suited for this task (i don't actually have the newest version, so am unable to test)

saving hashtable using c so that random access is faster

I am writing a C code (call it database generation) processes an input file and generated a number in range [1,10^8] alongwith a sequence of float values whose length is fixed but unknown followed by 3 integers. All values are separated by space
Example:
19432 23.45 32.12 45.76 ...(156 such float values) 4 6 106
This will be one line of database where first number is hash index (one to 10^8) , and last 3 integers denote the x,y coordinated and document ID respectively.
Our database is saved in file xyz which has following content
2341 34.67 43.13 ... (234 such float values) 5 8 123
2352 46.92 41.89 ... (51 such float values) 1 9 145
2352 46.92 41.89 ... (98 such float values) 2 7 12
2359 12.71 72.90 ... (141 such float values) 8 12 13
The starting number (hash index value) will always be in non-decreasing order in database as we proceed from one line to next.
I have another C code (call it retrieval) which takes hash index value as input and should output all lines starting with that value.
I have 2 questions
How can I make sure that retrieval directly jumps to line containing asked hash index value skipping the starting lines of database so that its response is fast.
When I get another input file for database and its hash index value is 2352. How do i add another line starting with 2352 at its proper position in database?
I am considering following approach which is not ideal, as the database won't be organised in required non-decreasing order of hash index values. Also, database is split into 2 components. One contains byte offset entries for each hash index and another is the database file presented above.
It involves
(1)byte-offset.txt of the form
2341 byte-pos-1
2352 byte-pos-2
2359 byte-pos-3
2352 byte-pos-4
(2)database.txt of the form
2341 34.67 43.13 ... (234 such float values) 5 8 123
2352 46.92 41.89 ... (51 such float values) 1 9 145
2359 12.71 72.90 ... (141 such float values) 8 12 13
2352 46.92 41.89 ... (98 such float values) 2 7 12
the only good thing about it is that new entries can be appended to end in each file as database grows when we get more data.

Reading and processing from text file to 2D integer array, with statements in between

I'm making a maze solver using Breadth-first search. Consider the following list of numbers in a text file
10 20
1 1
10 20
5 1
4 2
3 3
1 10
2 9
3 8
4 7
5 6
6 5
7 4
8 3
Where the first row denotes a size of my maze (10x20), the second row denotes the starting position coordinates (1x1),and the third row denotes the ending position(10x20). Every row after the third row represents the coordinate where a block in the maze will be (aka will have to move around it).
Here's what this particular board will look like:
**********************
*s........*..........*
*........*...........*
*..*....*............*
*.*....*.............*
**....*..............*
*....*...............*
*...*................*
*..*.................*
*....................*
*...................e*
**********************
What I am trying to do:
If my text file has impossible coordinates for either the size or start/end coordinates, ignore those coordinates and continue processing input.
example:
10 0 => Invalid: Maze sizes must be greater than 0
15 7 => Maze becomes size 15 x 7
10 20 => Invalid: column 20 is outside range from 1 to 7
5 1 => Starting position is at position 5, 1
24 2 => Invalid: row 24 is outside of range from 1 to 15
3 3 => Ending position is at position 3, 3
1 10 => Invalid: column 10 is outside range from 1 to 7
2 9 => Invalid: column 9 is outside range from 1 to 7
3 8 => Invalid: column 8 is outside range from 1 to 7
4 7
5 6
5 1 => Invalid: attempting to block starting position
6 5
7 4
8 3
I know I'm supposed to use some fprintf or fscanf loop until the end of file is reached.
Can someone start me off in the right direction?
I want to print all coordinates in the file, with error messages further in the line, if necessary.
Is the problem you are trying to ask how to read all the points? If so you can do something like the following:
int n1, n2;
FILE * fp = fopen("myfile.txt", "r");
//...read first three lines and do what you need with them
//read rest of points
while( fscanf(fp, "%d %d", &n1, &n2) ) {
if (checkPoints(n1,n2)) // check points are valid
addPointsToBoard(n1,n2); // add to board
}
If you're asking how to implement something like checkPoints I'd say you haven't given enough information of how you plan to implement your code for someone to help
NOTE: This assumes you have a well formed input file. If you are concerned about invalid inputs you will need to do sanity checking
EDIT Based upon comment here is a way you can do a sanity check on the matrix size input (first line)
int valid_size = 0;
while(1) {
if ( fsanf(fp, "%d %d", &n1, &n2) )
valid_size = checkMatrixSizes(n1,n2);
else
exit(1); //never finding valid matrix size in file
if (valid_size)
break
}
The above loop will continuously loop until checkMatrixSizes finds valid sizes (I would also suspect it would create your board, etc. The above is pseudo code and far from complete). You could do similar loops for the second and third inputs. It should be noted that this code simply ignores any invalid input and moves on, which I think is the behavior you want based upon your question. Other behaviors might include adjusting the input to the closest acceptable value (i.e. if a column is out of range, set column to the highest possible value).

How to find subarray between min and max

I have a Sorted array .Lets assume
{4,7,9,12,23,34,56,78} Given min and max I want to find elements in array between min and max in efficient way.
Cases:min=23 and max is 78 op:{23,34,56,78}
min =10 max is 65 op:{12,23,34,56}
min 0 and max is 100 op:{4,7,9,12,23,34,56,78}
Min 30 max= 300:{34,56,78}
Min =100 max=300 :{} //empty
I want to find efficient way to do this?I am not asking code any algorithm which i can use here like DP exponential search?
Since it's sorted, you can easily find the lowest element greater than or equal to the minimum desired, by using a binary search over the entire array.
A binary search basically reduces the serch space by half with each iteration. Given your first example of 10, you start as follows with the midpoint on the 12:
0 1 2 3 4 5 6 7 <- index
4 7 9 12 23 34 56 78
^^
Since the element you're looking at is higher than 10 and the next lowest is lesser, you've found it.
Then, you can use a similar binary search but only over that section from the element you just found to the end. This time you're looking for the highest element less than or equal to the maximum desired.
On the same example as previously mentioned, you start with:
3 4 5 6 7 <- index
12 23 34 56 78
^^
Since that's less than 65 and the following one is also, you need to increase the pointer to the halfway point of 34..78:
3 4 5 6 7 <- index
12 23 34 56 78
^^
And there you have it, because that number is less and the following number is more (than 65)
Then you have the start at stop indexes (3 and 6) for extracting the values.
0 1 2 3 4 5 6 7 <- index
4 7 9 ((12 23 34 56)) 78
-----------
The time complexity of the algorithm is O(log N). Though keep in mind that this really only becomes important when dealing with larger data sets. If your data sets do consist of only about eight elements, you may as well use a linear search since (1) it'll be easier to write; and (2) the time differential will be irrelevant.
I tend not to worry about time complexity unless the operations are really expensive, the data set size gets into the thousands, or I'm having to do it thousands of times a second.
Since it is sorted, this should do:
List<Integer> subarray = new ArrayList<Integer>();
for (int n : numbers) {
if (n >= MIN && n <= MAX) subarray.add(n);
}
It's O(n) as you only look at every number once.

What format does matlab need for n-dimensional data input?

I have a 4-dimensional dictionary I made with a Python script for a data mining project I'm working on, and I want to read the data into Matlab to do some statistical tests on the data.
To read a 2-dimensional matrix is trivial. I figured that since my first dimension is only 4-deep, I could just write each slice of it out to a separate file (4 files total) with each file having many 2-dimensional slices, looking something like this:
2 3 6
4 5 8
6 7 3
1 4 3
6 6 7
8 9 0
This however does not work, and matlab reads it as a single continuous 6 x 3 matrix. I even took a look a dlmread but could not figure out how to get it do what I wanted. How do I format this so I can put 3 (or preferably more) dimensions in a single file?
A simple solution is to create a file with two lines only: the first line contains the target array size, the second line contains all your data. Then, all you need to do is reshape the data.
Say your file is
3 2 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
You do the following to read the array into the variable data
fid = fopen('myFile'); %# open the file (don't forget the extension)
arraySize = str2num(fgetl(fid)); %# read the first line, convert to numbers
data = str2num(fgetl(fid)); %# read the second line
data = reshape(data,arraySize); %# reshape the data
fclose(fid); %# close the file
Have a look at data to see how Matlab orders elements in multidimensional arrays.
Matlab stores data column wise. So from your example (assuming its a 3x2x3 matrix), matlab will store it as first, second and third column from the first "slice", followed by the first, second third columns from the second slice and so on like this
2
4
3
5
6
8
6
1
7
4
3
3
6
8
6
9
7
0
So you can write the data out like this from python (I don't know how) and then read it into matlab. Then you can reshape it back into a 3x2x3 matrix and you'll retain your correct ordering.

Resources