How to remove supersets from a list of sets using C program - c

I have a list of sets like the following in a text file. These are the output of my Code which generates paths of a graph. Node 0 is connected with 1 and 5.Node 1 is connected with 0, 5 and 2. Node 5 is connected with 0 , 1 and 6.Thus all the nodes are connected.
The graph looks like below:
0
1 5
2 6
3 7
4 8
9
( source 0 , destination 9)
Created paths:
0 1 2 3 4 9
0 1 2 3 7 8 9
0 1 2 6 7 3 4 9
0 1 2 6 7 8 9
0 5 6 2 3 4 9
0 5 6 2 3 7 8 9
0 5 6 7 3 4 9
0 5 6 7 8 9
I want to remove all the lines that are superset (a set that contain all the elements of another set, but with additional elements) of another set.
For the example above, removing the supersets should result in the following:
0 1 2 3 4 9
0 1 2 6 7 8 9
0 1 2 3 7 8 9
0 5 6 7 8 9
0 5 6 2 3 4 9
0 5 6 7 3 4 9
removed superset:
0 1 2 6 7 3 4 9
0 5 6 2 3 7 8 9
How can I do this in C program. I have to accomplish this for large number of graph paths.

Consider two paths, path1 and path2, such that path2 > path1, i.e. path2 contains all nodes or path1 + some additional nodes. That means that at some node1 path2 must diverge from path1, and at some node2 later reconnect. However, because path1 does not have any nodes which are not in path2, there should be no nodes in path1 which are between node1 and node2. In other words, node2 must be located just under node1. Consequently, the only possibility for path2 to diverge from and later reconnect to path1 is to make a small loop: move one node right, one down and one left OR the symmetrical path: one left, one down, one right. So, path2 must contain the full 'square' of nodes:
node node
node node
The reversed is also true - if some path contains a full square of nodes, then it can be simplified by going directly down without moving left-right. So, you just need to remove all paths, which contain any square of adjacent nodes. In your example you need to remove paths which contain any of
1 5 2 6
2 6 3 7
3 7 4 8
Also, you must exclude paths which contain avoidable loops at the start and the end:
0 1 5
4 8 9
This solution however requires that you generate all possible paths from top to bottom. Right now you are not generating all possible paths, for example path
0 1 5 6 7 8 9
is missing.
Of course, it will be much easier just to generate paths without superset. When you generate such paths you just can't allow switching branches right under each other. Branch switches must be separated by at least one vertical move.

Related

Using Key operator to make a game but need more tree depth to create a more complex tree

I am working on a game using the Key operator to create simple parent tree nodes connected with children. Like (1 3 2 7 11 12) with 1 as a parent node and 3 2 7 11 12 children. The array has all the information via Key to create the nested array. Of course its extremely fast. But I actually need 2 or 3 more depth. I can create a different tree construction shown on the 'same' array - second image. This different encoding (1 2 1 1 2 3 1 3 3.....) allows arbitrarily nesting vector depth and works perfectly. - with just a simple array.
There could be enough information with the Key transformation on the array then more code to connect the children nodes - for needed depth. Are there any same or similar APL/Co-dfns for (1.) transforming the array into the tree (2.) - and back? I am new to APL and focusing on the rectangular. Tree wrangling is down the road. I need almost the same for Key speed due to very long arrays and their nested arrays.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4
Using Key:
{⊂⍵}⌸1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4
(1 3 4 7 11 12) (2 5 13) (6 8 9 10) (14) (15) (16) (17) (18)
Using maybe Key and something else....
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1. 1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4
2. (1 3 4 (7 14) 11 12) (2 5 13) (6 (8 15) (9 (16 17)) 10) (,18)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 2 2 2 2 2 1 7 8 3 10 11 10 10 10 15 9
(different array for same tree encoding)
(1(7(8 (9 17)))) (2 3 4 5 6) (10(11 12) 13 14 (15 16))
({⊂⍵}⌸⍠ 2) 1 2 1 1 2 3 1 3 3 3 1 1 2 7 8 9 16 4
Perhaps using Variant on Key down the road?
There are some ways to do this, but the best method will depend on what you want to do with the results. If you really do have very large arrays, then producing the "nested children" representation of arrays is going to be expensive no matter how you compute them, because the underlying representation is expensive (though, no more expensive than the same sort of representation in another language).
Section 3.2 of (Hsu 2019) discusses this in detail:
"A data parallel compiler hosted on the GPU". Hsu, Aaron W.
https://scholarworks.iu.edu/dspace/handle/2022/24749
Generally speaking, if you intend to work with the data in some way, it is almost always faster and easier to work directly with the parent vector or depth vector representation instead of first converting to a record-type style representation.
One technique is to query the data in parent vector form first, to identify the relevant nodes over which you intend to work, and only then to extract the children nodes for that limited set using primitives like membership (∊) or where (⍸).
If you can describe the sort of operations you intend to perform over these nested representations, there might be a better algorithm that does not require the conversion.
If you do wish to simply create the full record-type representation, there is some conversion code in (Hsu 2019). You can also look at the P2D and D2P functions in the Co-dfns compiler:
https://github.com/Co-dfns/Co-dfns/blob/master/src/codfns/P2D.aplf
https://github.com/Co-dfns/Co-dfns/blob/master/src/codfns/D2P.aplf
These may give you some additional help in converting between the formats.
If you need to convert directly between the parent and record-type representation, you can use something akin to this:
kids←{0=≢k←⍸p=⍵:⍵ ⋄ ⍵,∇¨k~⍵}¨
And apply it to the root nodes of your tree like this:
kids ⍸p=⍳≢p
where p is your parent vector.
I hope this helps!

VTK Structured Point file

I am trying to parse a VTK file in C by extracting its point data and storing each point in a 3D array. However, the file I am working with has 9 shorts per point and I am having difficulty understanding what each number means.
I believe I understand most of the header information (please correct me if I have misunderstood):
ASCII: Type of file (ASCII or Binary)
DATASET: Type of dataset
DIMENSIONS: dims of voxels (x,y,z)
SPACING: Volume of each voxel (w,h,d)
ORIGIN: Unsure
POINT DATA: Total number of points/voxels (dimx.dimy.dimz)
I have looked at the documentation and I am still not getting an understanding on how to interpret the data. Could someone please help me understand or point me to some helpful resources
# vtk DataFile Version 3.0
vtk output
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 256 256 130
SPACING 1 1 1.3
ORIGIN 86.6449 -133.929 116.786
POINT_DATA 8519680
SCALARS scalars short
LOOKUP_TABLE default
0 0 0 0 0 0 0 0 0
0 0 7 2 4 5 3 3 4
4 5 5 1 7 7 1 1 2
1 6 4 3 3 1 0 4 2
2 3 2 4 2 2 0 2 6
...
thanks.
You are correct regarding the meaning of fields in the header.
ORIGIN corresponds to the coordinates of the 0-0-0 corner of the grid.
An example of a DATASET STRUCTURED_POINTS can be found in the documentation.
Starting from this, here is a small file with 6 shorts per point. Each line represents a point.
# vtk DataFile Version 2.0
Volume example
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 3 4 2
ASPECT_RATIO 1 1 1
ORIGIN 0 0 0
POINT_DATA 24
SCALARS volume_scalars char 6
LOOKUP_TABLE default
0 1 2 3 4 5
1 1 2 3 4 5
2 1 2 3 4 5
0 2 2 3 4 5
1 2 2 3 4 5
2 2 2 3 4 5
0 3 2 8 9 10
1 3 2 8 9 10
2 3 2 8 9 10
0 4 2 8 9 10
1 4 2 8 9 10
2 4 2 8 9 10
0 1 3 18 19 20
1 1 3 18 19 20
2 1 3 18 19 20
0 2 3 18 19 20
1 2 3 18 19 20
2 2 3 18 19 20
0 3 3 24 25 26
1 3 3 24 25 26
2 3 3 24 25 26
0 4 3 24 25 26
1 4 3 24 25 26
2 4 3 24 25 26
The 3 first fields may be displayed to understand the data layout : x change faster than y, which change faster than z in file.
If you wish to store the data in an array a[2][4][3][6], just read while doing a loop :
for(k=0;k<2;k++){ //z loop
for(j=0;j<4;j++){ //y loop : y change faster than z
for(i=0;i<3;i++){ //x loop : x change faster than y
for(l=0;l<6;l++){
fscanf(file,"%d",&a[k][j][i][l]);
}
}
}
}
To read the header, fscanf() may be used as well :
int sizex,sizey,sizez;
char headerpart[100];
fscanf(file,"%s",headerpart);
if(strcmp(headerpart,"DIMENSIONS")==0){
fscanf(file,"%d%d%d",&sizex,&sizey,&sizez);
}
Note than fscanf() need the pointer to the data (&sizex, not sizex). A string being a pointer to an array of char terminated by \0, "%s",headerpart works fine. It can be replaced by "%s",&headerpart[0]. The function strcmp() compares two strings, and return 0 if strings are identical.
As your grid seems large, smaller files can be obtained using the BINARY kind instead of ASCII, but watch for endianess as specified here.

Algorithm to normalize the elements of an array

If an array contains N number of elements (elements can be repeated) and the goal is to make all the elements equal by a +1 on an element and a -1 on another element in each iteration, how can we determine whether it's possible or not to normalize the array? What will be the optimal algorithm to solve the problem?
Ex.
For the array 1 2 3, if I apply +1 on 1 and -1 on 3, the array becomes 2 2 2. That means it's possible in 1 iteration.
For the array 1 2 1, it's not possible to make all the elements equal.
First, since you're not disturbing the sum by each iteration, since you're increasing one number and decreasing another, the optimal target value is going to be the average.
If this average is a whole number, you should be able to achieve it with the iterations, however if the average is a fractional number then you will not be able to achieve it.
The number of steps is going to be the sum of the distances between each number and the target, divided by 2.
Every iteration pick one number above target and one below and apply the operations to them.
PS! As per commented, if all you want is answers to the following two questions:
Can it be done
What will the value be
Then the answers are:
Yes, provided the average number is a whole number
The value repeated in the whole array is the average number
Anyway, if you want the actual operations getting from the input to the target values, here's a longer example:
1 2 3 4 5 6 7 = 28, 28/7 = 4 (optimal target)
+ -
2 2 3 4 5 6 6
+ -
3 2 3 4 5 6 5
+ -
4 2 3 4 5 6 4
+ -
4 3 3 4 5 5 4
+ -
4 4 3 4 5 4 4
+ -
4 4 4 4 4 4 4
6 steps, let's total the distances from the first number:
1 2 3 4 5 6 7
3 2 1 0 1 2 3 = 12, divided by 2 = 6
Here's the example from the comments on the question:
1 9 10 12 3 7 = 42 / 6 = 7 (optimal target)
Distances:
1 9 10 12 3 7
6 2 3 5 4 0 = 20, divided by 2 = 10 (steps)
1 9 10 12 3 7
+ - step 1
2 8 10 12 3 7
+ - step 2
3 7 10 12 3 7
+ - step 3
4 7 9 12 3 7
+ - step 4
5 7 8 12 3 7
+ - step 5
6 7 7 12 3 7
+ - step 6
7 7 7 11 3 7
- + step 7
7 7 7 10 4 7
- + step 8
7 7 7 9 5 7
- + step 9
7 7 7 8 6 7
- + step 10
7 7 7 7 7 7
Here is a more pseudo-code like algorithm description:
Calculate SUM of all the elements
COUNT all the elements
If AVERAGE (SUM/COUNT) is not whole number, solution is not possible to achieve
STEPS = SUM(ABS(numberN - AVERAGE))/2
Each iteration, pick one number below AVERAGE and one above
Apply + operation to number below and - operation to number above
Repeat steps 5 and 6 until target achieved

Tacit function to multiply five consecutive number in a list: J, j701

I'm working on Project Euler, I'm on problem 8, and I'm trying a simple brute force: Multiply each consecutive 5 digit of the number, make a list with the results, and find the higher.
This is the code I'm currently trying to write in J:
n =: 731671765313x
NB. 'n' will be the complete 1000-digits number
itl =: (".#;"0#":)
NB. 'itl' transform an integer in a list of his digit
N =: itl n
NB. just for short writing
takeFive =: 5 {. ] }.~ 1 -~ [
NB. this is a dyad, I get this code thanks to '13 : '5{.(x-1)}.y'
NB. that take a starting index and it's applied to a list
How I can use takeFive for all the index of N?
I tried:
(i.#N) takeFive N
|length error: takeFive
| (i.#N) takeFive N
but it doesn't work and I don't know why.
Thank you all.
1. The reason that (i.#N) takeFive N is not working is that you are essentially trying to run 5{. ((i.#N)-1) }. Nbut you have to use x not as a list but as an atom. You can do that by setting the appropriate left-right rank " of the verb:
(i.#N) (takeFive"0 _) N
7 3 1 6 7
7 3 1 6 7
3 1 6 7 1
1 6 7 1 7
6 7 1 7 6
7 1 7 6 5
1 7 6 5 3
7 6 5 3 1
6 5 3 1 3
5 3 1 3 0
3 1 3 0 0
1 3 0 0 0
2. One other way is to bind (&) your list (N) to takeFive and then run the binded-verb through every i.#N. To do this, it's better to use the reverse version of takeFive: takeFive~:
((N&(takeFive~))"0) i.#N
7 3 1 6 7
7 3 1 6 7
3 1 6 7 1
1 6 7 1 7
6 7 1 7 6
7 1 7 6 5
1 7 6 5 3
7 6 5 3 1
6 5 3 1 3
5 3 1 3 0
3 1 3 0 0
1 3 0 0 0
or (N&(takeFive~)) each i.#N.
3. I think, though, that the infix dyad \ might serve you better:
5 >\N
7 3 1 6 7
3 1 6 7 1
1 6 7 1 7
6 7 1 7 6
7 1 7 6 5
1 7 6 5 3
7 6 5 3 1
6 5 3 1 3

Algorithm for Vertex connections From List of Directed Edges

The square of a directed graph G = (V, E) is the graph G2 = (V, E2) such that u→w is in E2 if and only if u ≠ w and there is a vertex v such that both u→v and v→w are in E2. The input file simply lists the edges in arbitrary order as ordered pairs of vertices, with each edge on a separate line. The vertices are numbered in order from 1 to the total number of vertices.
*self-loops and duplicate/parallel edges are not allowed
If we look at the an example of input data:
1 6
1 4
1 3
2 4
2 8
2 6
2 5
3 5
3 2
3 6
4 7
4 5
4 6
4 8
5 1
5 8
5 7
6 3
6 4
7 5
7 4
7 6
8 1
Then the output would be:
1: 3 4 7 8 5 2 6
2: 5 6 3 4 1 8 7
3: 1 7 8 6 5 4
4: 5 6 8 7 3 1
5: 3 1 4 6
6: 2 7 5 8
7: 1 5 6 8 3 4
8: 6 4 3
I'm writing the code in C.
My thoughts are to run through the file, see how many vertices they are and then allocate an array of pointers. Proceed to go through the list again searching for just where the line has a 1 in it, then look at where those corresponding numbers lead. If its not a duplicate or the same number(1) then I'll add it to a linked list, from the array of pointers. I will do this for every number vertex number in the file.
However, I feel this is terribly inefficient, and not the best way to go about doing this. If anyone has any other suggestions I would be extremely grateful.
if I get it right, you want to build a result set for each node where all nodes with a distance of one and two for each node are stated.
therefore, one can hold the edges in an adjacency matrix of bit arrays, where a bit is one when an edge exists and zero if not.
now one can multiply this matrix with itself. in this case multiply means you can make an AND on row and column.
A small example (sorry, don't know how to insert a matrix properly):
0 1 0 0 1 0 0 0 1
0 0 1 x 0 0 1 = 1 1 0
1 1 0 1 1 0 0 1 1
This matrix contains a one for all nodes reachable in two steps. simply it's the adjacency matrix for two instead of one steps. If you now OR this matrix with your initial matrix you have a matrix which holds all paths of length one and two.
this approach has multiple advantages. at first bit operations are very fast. the cpu parallyzes your calculations and you can stop for the result matrix cell if one pair is found where the results gives one.
furthermore it is well documented how to calculate matrix multiplication in parallel.
you can easily calculate all other length of pathes. for a length k one has to calculate:
A^k = A^(k-1) * A
hope that helped

Resources