When to use Ø for states in DFA / NFA - theory

I am confused about the usage of "Ø" in the DFA / NFA ( Let's talk about this in the context of DFA to NFA conversion )
Let's say that I have a NFA as follows:
enter image description here
There isn't any transition defined for symbol "a" for state 1. So in the transition table would i write the transition for the same as "Ø" or do i write "1" cause it will stay in state "1" as there isn't any transition arrow for it
So would it be this:
| state | a | b | E |
|----------------|----------|-------|---------|
| 1 | Ø | {2} | {3} |
| 2 | {2,3} | {3} | Ø |
| 3 | {1} | Ø | Ø |
or:
| state | a | b | E |
|----------------|----------|-------|---------|
| 1 | {1} | {2} | {3} |
| 2 | {2,3} | {3} | {2} |
| 3 | {1} | {3} | {3} |
The choice to use "Ø" affects the final output. Now, we could construct the power set of the states and then calcualte epsilon clousure, but let's cut it short, we will directly Let's look at a controversial state that may or may not be present in the final DFA, and we will try experiment using "Ø" and not using "Ø"
(1) Not using null: ( Table 2 )
State = {1,2}
From 1 and 2, on receiving a we go to 1,2,3
epsilon closure (1,2,3) = 1,2,3
(2) Using null: ( Table 1 )
State = {1,2}
From 1 and 2, on receiving a we go to {2,3}
epsilon closure (2,3) = 1,2,3
Now, the transition might look same but in some cases we would end up again with states where we don't have anywhere to go but to stay in the same state and if we choose to use "Ø" then we would have an additional state in the final output "Ø". In this case, it just means that if we don't have a transition for some symbol then we go to this state
But if we don't then our final output won't have the extra state "Ø". Here we don't specify the transition arrow for some symbol if we don't a transition for it, just like as for symbol "a" in state 1, in the diagram
So, which one is correct, one without the state "Ø" or one with the state "Ø"

The one with Ø. Consider a simple NFA:
/-> S -a-> (T)
| |
\-a-/
S is the start state, T is an accepting state. The alphabet is {a, b}.
When this machine reads the string b, it rejects. If the transition set for (S, b) were {S}, then the DFA this converts into would accept ba, which is obviously wrong.
The right transition states are as follows:
S, a, {S, T}
S, b, Ø
{S, T}, a, {S, T}
{S, T}, b, Ø
Ø, a, Ø
Ø, b, Ø
And you could draw this DFA in two ways. A incomplete DFA (where the null state is omitted) or a complete DFA (where the null state is present as an inescapable state). q1 = {S}, the start state, and q2 = {S, T}, the accepting state.
Incomplete:
q1 -a-> (q2)<-\
| |
\-a-/
Complete:
/a,b\
| /
\ v
/-> q3 <-\
b b
| |
q1 -a-> (q2)<-\
| |
\-a-/
Note that in the complete version, q3 has the same properties as Ø.
(In general an incomplete DFA can always be replaced with a complete DFA in this manner: create a new non-accepting state d, whose outgoing transitions are d, Σ, d, and whose incoming transitions are whatever transitions are missing from the original DFA. d is the "dump" state, and can be thought of as a "rejecting" state as opposed to an accepting state. As soon as an input leads into that state, it has no path to acceptance and so can be rejected.)

Related

PostgreSQL / TypeORM: search array in array column - return only the highest arrays' intersection

let's say we have 2 edges in a graph, each of them has many events observed on them, each event has one or several tags associated to them:
Let's say the first edge had 8 events with these tags: ABC ABC AC BC A A B.
Second edge had 3 events: BC, BC, C.
We want the user to be able to search
how many events occurred on every edge
by set of given tags, which are not mutually exclusive, nor they have a strict hierarchical relationship.
We represent this schema with 2 pre-aggregated tables:
Edges table:
+----+
| id |
+----+
| 1 |
| 2 |
+----+
EdgeStats table (which contains relation to Edges table via tag_id):
+------+---------+-----------+---------------+
| id | edge_id | tags | metric_amount |
+------+---------+-----------+---------------+
| 1 | 1 | [A, B, C] | 7 |
| 2 | 1 | [A, B] | 7 |
| 3 | 1 | [B, C] | 5 |
| 4 | 1 | [A, C] | 6 |
| 5 | 1 | [A] | 5 |
| 6 | 1 | [B] | 4 |
| 7 | 1 | [C] | 4 |
| 8 | 1 | null | 7 | //null represents aggregated stats for given edge, not important here.
| 9 | 2 | [B, C] | 3 |
| 10 | 2 | [B] | 2 |
| 11 | 2 | [C] | 3 |
| 12 | 2 | null | 3 |
+------+---------+-----------+---------------+
Note that when table has tag [A, B] for example, it represents amount of events that had either one of this tag associated to them. So A OR B, or both.
Because user can filter by any combination of these tags, DataTeam populated EdgeStats table with all permutations of tags observed per given edge (edges are completely independent of each other, however I am looking for way to query all edges by one query).
I need to filter this table by tags that user selected, let's say [A, C, D]. Problem is we don't have tag D in the data. The expected return is:
+------+---------+-----------+---------------+
| id | edge_id | tags | metric_amount |
+------+---------+-----------+---------------+
| 4 | 1 | [A, C] | 6 |
| 11 | 2 | [C] | 3 |
+------+---------+-----------+---------------+
i.e. for each edge, the highest matching subset between what user search for and what we have in tags column. Rows with id 5 and 7 were not returned because information about them is already contained in row 4.
Why returning [A, C] for [A, C, D] search? Because since there are no data on edge 1 with tag D, then metric amount for [A, C] equals to the one for [A, C, D].
How do I write query to return this?
If you can just answer the question above, you can ignore what's bellow:
If I needed to filter by [A], [B], or [A, B], problem would be trivial - I could just search for exact array match:
query.where("edge_stats.tags = :filter",
{
filter: [A, B],
}
)
However in EdgeStats table I don't have all tags combination user can search by (because it would be too many), so I need to find more clever solution.
Here is list of few possible solutions, all imperfect:
try exact match for all subsets of user's search term - so if user searches by tags [A, C, D], first try querying for [A, C, D], if no exact match, try for [C, D], [A, D], [A, C] and voila we got the match!
use #> operator:
.where(
"edge_stats.tags <# :tags",
{
tags:[A, C, D],
}
)
This will return all rows which contained either A, C or D, so rows 1,2,3,4,5,7,11,13. Then it would be possible to filter out all but highest subset match in the code. But using this approach, we couldn't use SUM and similar functions, and returning too many rows is not good practice.
approach built on 2) and inspired by this answer:
.where(
"edge_stats.tags <# :tags",
{
tags: [A, C, D],
}
)
.addOrderBy("edge.id")
.addOrderBy("CARDINALITY(edge_stats.tags)", "DESC")
.distinctOn(["edge.id"]);
What it does is for every edge, find all tags containing either A, C, or D, and gets the highest match (high as array is longest) (thanks to ordering them by cardinality and selecting only one).
So returned rows indeed are 4, 11.
This approach is great, but when I use this as one filtration part of much larger query, I need to add bunch of groupBy statements, and essentially it adds bit more complexity than I would like.
I wonder if there could be a simpler approach which is simply getting highest match of array in table's column with array in query argument?
Your approach #3 should be fine, especially if you have an index on CARDINALITY(edge_stats.tags). However,
DataTeam populated EdgeStats table with all permutations of tags observed per given edge
If you're using a pre-aggregation approach instead of running your queries on the raw data, I would recommend to also record the "tags observed per given edge", in the Edges table.
That way, you can
SELECT s.edge_id, s.tags, s.metric_amount
FROM "EdgeStats" s
JOIN "Edges" e ON s.edge_id = e.id
WHERE s.tags = array_intersect(e.observed_tags, $1)
using the array_intersect function from here.

array formula with dates

I have four columns of dates(A,B,C,D),I want to excel verify if each period of "period 2" intersects ALL "period 1" from 1,2,3.....
For example 20-28/01/2016 intersects 01-24/01/2016 AND 25/01-03/02/2016. The answer in this case in column E must be "wrong".
I think this have to be an array because in a cell must be verified an entire column. If it could be done without array I would be very happy, because array slow time calculation very much down on my computer .
_________________________________________________________________
| A | B | C | D | E |
| period 1 | period 2 | |
1 |01/01/2016|24/01/2016|20/01/2016|28/01/2016| "wrong" |
2 |25/01/2016|03/02/2016|04/02/2016|10/02/2016| "ok" |
3 |

Creating hierarchical data (tree) structures in Neo4j using "tree keys"

I have imported data from a CSV file and created a lot of Nodes, all of which are related to other Nodes within the same data set based on a "Tree Number" hierarchy system:
For example, the Node with Tree Number A01.111 is a direct child of Node A01, and the Node with Tree Number A01.111.230 is a direct child of Node A01.111.
What I am trying to do is create unique relationships between Nodes that are direct children of other Nodes. For example Node A01.111.230 should only have one "IS_CHILD_OF" relationship, with Node A01.111.
I have tried several things, for example:
MATCH (n:Node), (n2:Node)
WHERE (n2.treeNumber STARTS WITH n.treeNumber)
AND (n <> n2)
AND NOT ((n2)-[:IS_CHILD_OF]->())
CREATE UNIQUE (n2)-[:IS_CHILD_OF]->(n);
This example results in creating unique "IS_CHILD_OF" relationships but not with the direct parent of a Node. Rather, Node A01.111.230 would be related to Node A01.
I'd like to suggest another general solution, also avoiding a cartesian product as #InverseFalcon points out.
Let's indeed start by creating an index for faster lookup, and inserting some test data:
CREATE CONSTRAINT ON (n:Node) ASSERT n.treeNumber IS UNIQUE;
CREATE (n:Node {treeNumber: 'A01.111.230'})
CREATE (n:Node {treeNumber: 'A01.111'})
CREATE (n:Node {treeNumber: 'A01'})
Then we need to scan all nodes as potential parents, and look for children which start with the treeNumber of the parent (STARTS WITH can use the index) and have no dots in the "remainder" of the treeNumber (i.e. a direct child), instead of splitting, joining, etc.:
MATCH (p:Node), (c:Node)
WHERE c.treeNumber STARTS WITH p.treeNumber
AND p <> c
AND NOT substring(c.treeNumber, length(p.treeNumber) + 1) CONTAINS '.'
RETURN p, c
I replaced the creation of the relationship by a simple RETURN for profiling purposes, but you can simply replace it by CREATE UNIQUE or MERGE.
Actually, we can get rid of the p <> c predicate and the + 1 on the length by pre-computing the actual prefix which should match:
MATCH (p:Node)
WITH p, p.treeNumber + '.' AS parentNumber
MATCH (c:Node)
WHERE c.treeNumber STARTS WITH parentNumber
AND NOT substring(c.treeNumber, length(parentNumber)) CONTAINS '.'
RETURN p, c
However, profiling that query shows that the index is not used, and there is a cartesian product (so we have a O(n^2) algorithm):
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +ProduceResults | 2 | 2 | 0 | c, p | p, c |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +Filter | 2 | 2 | 26 | c, p, parentNumber | NOT(Contains(SubstringFunction(c.treeNumber,length(parentNumber),None),{ AUTOSTRING1})) AND StartsWith(c.treeNumber,parentNumber) |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +Apply | 2 | 9 | 0 | p, parentNumber -- c | |
| |\ +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| | +NodeByLabelScan | 9 | 9 | 12 | c | :Node |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +Projection | 3 | 3 | 3 | parentNumber -- p | p; Add(p.treeNumber,{ AUTOSTRING0}) |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
| +NodeByLabelScan | 3 | 3 | 4 | p | :Node |
+--------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------------------------------------------------+
Total database accesses: 45
But, if we simple add a hint like so
MATCH (p:Node)
WITH p, p.treeNumber + '.' AS parentNumber
MATCH (c:Node)
USING INDEX c:Node(treeNumber)
WHERE c.treeNumber STARTS WITH parentNumber
AND NOT substring(c.treeNumber, length(parentNumber)) CONTAINS '.'
RETURN p, c
it does use the index and we have something like a O(n*log(n)) algorithm (log(n) for the index lookup):
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +ProduceResults | 2 | 2 | 0 | c, p | p, c |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +Filter | 2 | 2 | 6 | c, p, parentNumber | NOT(Contains(SubstringFunction(c.treeNumber,length(parentNumber),None),{ AUTOSTRING1})) |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +Apply | 2 | 3 | 0 | p, parentNumber -- c | |
| |\ +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| | +NodeUniqueIndexSeekByRange | 9 | 3 | 6 | c | :Node(treeNumber STARTS WITH parentNumber) |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +Projection | 3 | 3 | 3 | parentNumber -- p | p; Add(p.treeNumber,{ AUTOSTRING0}) |
| | +----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
| +NodeByLabelScan | 3 | 3 | 4 | p | :Node |
+-------------------------------+----------------+------+---------+----------------------+------------------------------------------------------------------------------------------+
Total database accesses: 19
Note that I did cheat a bit when introducing the WITH step creating the prefix earlier, as I noticed it improved the execution plan and DB accesses over
MATCH (p:Node), (c:Node)
USING INDEX c:Node(treeNumber)
WHERE c.treeNumber STARTS WITH p.treeNumber
AND p <> c
AND NOT substring(c.treeNumber, length(p.treeNumber) + 1) CONTAINS '.'
RETURN p, c
which has the following execution plan:
Compiler CYPHER 3.0
Planner RULE
Runtime INTERPRETED
+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| Operator | Rows | DB Hits | Variables | Other |
+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| +Filter | 2 | 9 | c, p | NOT(p == c) AND NOT(Contains(SubstringFunction(c.treeNumber,Add(length(p.treeNumber),{ AUTOINT0}),None),{ AUTOSTRING1})) |
| | +------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| +SchemaIndex | 6 | 12 | c -- p | PrefixSeekRangeExpression(p.treeNumber); :Node(treeNumber) |
| | +------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
| +NodeByLabel | 3 | 4 | p | :Node |
+--------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------+
Total database accesses: 25
Finally, for the record, the execution plan of the original query I wrote (i.e. without the hint) was:
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +ProduceResults | 2 | 2 | 0 | c, p | p, c |
| | +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Filter | 2 | 2 | 21 | c, p | NOT(p == c) AND StartsWith(c.treeNumber,p.treeNumber) AND NOT(Contains(SubstringFunction(c.treeNumber,Add(length(p.treeNumber),{ AUTOINT0}),None),{ AUTOSTRING1})) |
| | +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +CartesianProduct | 9 | 9 | 0 | p -- c | |
| |\ +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| | +NodeByLabelScan | 3 | 9 | 12 | c | :Node |
| | +----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +NodeByLabelScan | 3 | 3 | 4 | p | :Node |
+--------------------+----------------+------+---------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Total database accesses: 37
It's not the worse one: the one without the hint but with the pre-computed prefix is! This is why you should always measure.
I think we can improve on the query a bit. First, ensure you have either a unique constraint or an index on :Node.treeNumber, as you'll need that to improve your parent node lookups in this query.
Next, let's match on child nodes, excluding root nodes (assuming no .'s in the root's treeNumber) and nodes that have already been processed and have a relationship already.
Then we'll find each node's parent by the treeNumber using our index, and create the relationship. This assumes that a child treeNumber always has 4 more characters, including the dot.
MATCH (child:Node)
WHERE child.treeNumber CONTAINS '.'
AND NOT EXISTS( (child)-[:IS_CHILD_OF]->() )
WITH child, SUBSTRING(child.treeNumber, 0, SIZE(child.treeNumber)-4) as parentNumber
MATCH (parent:Node)
WHERE parent.treeNumber = parentNumber
CREATE UNIQUE (child)-[:IS_CHILD_OF]->(parent)
I think this query avoids a cartesian product as you may get from other answers, and should be around O(n) (someone correct me if I'm wrong).
EDIT
In the event that each subset of numbers in treeNumbers is NOT constrained to 3 (as in your description, actually, with 'A01.111.23'), then you need a different means of deriving the parentNumber. Neo4j is a little weak here, as it lacks both an indexOf() function as well as a join() function to reverse a split(). You may need the APOC Procedures library installed to allow access to a join() function.
The query to handle cases with variable counts of digits in the numeric subsets of treeNumber becomes this:
MATCH (child:Node)
WHERE child.treeNumber CONTAINS '.'
AND NOT EXISTS( (child)-[:IS_CHILD_OF]->() )
WITH child, SPLIT(child.treeNumber, '.') as splitNumber
CALL apoc.text.join(splitNumber[0..-1], '.') YIELD value AS parentNumber
WITH child, parentNumber
MATCH (parent:Node)
WHERE parent.treeNumber = parentNumber
CREATE UNIQUE (child)-[:IS_CHILD_OF]->(parent)
I think I just figured out a solution! (If someone has a more elegant one please do post)
I just realized that the "Tree Number" coding system always uses 3-digit numbers between the dots, i.e. A01.111.230 or C02.100, therefore if a Node is the direct child of another Node, it's "Tree Number" should not only start with the Tree Number of the parent Node, it should also be 4 characters longer (one character for the dot '.' and 3 characters for the numeric value).
Therefore my solution that seems to do the job is:
MATCH (n:Node), (n2:Node)
WHERE (n2.treeNumber STARTS WITH n.treeNumber)
AND (length(n2.treeNumber) = (length(n.treeNumber) + 4))
CREATE UNIQUE (n2)-[:IS_CHILD_OF]->(n);
For your requirement STARTS WITH won't work, since A01.111.23 does indeed start with A01 in addition to starting with A01.111.
The treeNumber is made up of several parts with '.' as the separator. Let's not make any assumptions about the maximum/minimum possible character lengths of the individual parts. What we need is to compare all but the last part of each node's treeNumber with that of the potential child node being tested. You can achieve this using Cypher's split() function as follows:
MATCH (n1:Node), (n2:Node)
WHERE split(n2.treeNumber,'.')[0..-1] = split(n1.treeNumber,'.')
CREATE UNIQUE (n2)-[:IS_CHILD_OF]->(n1);
The split() function splits a string, at each occurrence of a given separator, into a list of strings (parts). In this context the separator is '.' to split any treeNumber. We can select a subset of a list in cypher using the syntax list[{startIndex}..{endIndex}]. Negative indices for reverse lookup are permitted, such ass the one used in the above query.
This solution should generalize to all possible treeNumber values, in the format at hand, irrespective of number of parts and individual part lengths.

Efficient algorithm for looping over all neighbor pairs (2 point cliques) in 2-D array

I need to loop over all (unordered) pairs of pixels in an image that are neighbors of each other without repetition. I am using an 8 point neighborhood. For example:
x,y| 0 1 2 3 4
---+---+---+---+---+---+
0 | | | | | |
+---+---+---+---+---+
1 | a | b | c | d | |
+---+---+---+---+---+
2 | e | f | g | h | |
+---+---+---+---+---+
3 | i | j | k | l | |
+---+---+---+---+---+
4 | | | | | |
+---+---+---+---+---+
The neighbors of pixel f are in the 3x3 square around it. Thus, g, for example, forms a 2 point clique with f. If I were to loop over all the rows and columns of the image, this clique would be counted twice, once when f is the center pixel and once when g is the center pixel. Similar inefficiencies would occur with the rest of the cliques.
So what I would like to do, is loop over all the cliques, rather than each pixel. If I were familiar with graph theory, I think some of the answers already given to similar questions would suffice, but as I am not, I would really appreciate any help that you can give with an efficient algorithm in layman's terms. Thanks in advance!
Loop the first point over all points. Inner loop the second point over the right, lower-left, lower, and lower-right neighbors (if they exist).

Trouble understanding meaning of functional dependency notation (A - > BC)

I'm having a hard time visualizing exactly what A->BC means, mainly what exactly BC does.
For example, on a table "If A -> B and B -> C, then A -> C" would look like this, and the statement would be true:
A | B | C
1 | 2 | 3
1 | 2 | 3
What would A -> BC look like?
How would you show something like "If AB -> C, then A -> BC" is false?
Thanks!
EDIT:
My guess at it is that AB -> C means that C is dependant on both A and B, so the table would look like this:
A | B | C
1 | 2 | 3
1 | 2 | 3
Or this (which would be a counterexample for my question above):
A | B | C
1 | 2 | 4
1 | 3 | 4
And both would be true. But this would be false:
A | B | C
1 | 2 | 4
1 | 3 | 5
Is that the right idea?
In case you haven't already read this, it's an okay introduction to functional dependencies. It says:
Union: If X → Y and X → Z, then X → YZ
Decomposition: If X → YZ, then X → Y and X → Z
I find it helpful to read A -> B as "A determines B", and read A -> BC as "A determines B and C". In other words, given an A, you can uniquely determine the value of B and C, but it's not necessarily true that given a B and a C, you can uniquely determine the value of A.
Here's a simple example: a table with at least 3 columns, where A is the primary key and B and C are any other columns:
id | x | y
------------
1 | 7 | 4
2 | 9 | 4
3 | 7 | 6
To show that If AB -> C, then A -> BC is false, you just have to come up with a single counter-example. Here's one: a table where AB is the primary key (therefore by definition it satisfies AB -> C):
A | B | C
------------
1 | 1 | 4
1 | 2 | 5
2 | 1 | 6
2 | 2 | 4
However, it does not satisfy A -> B (because for A=1, B=1,2) and therefore, by Union, it does not satisfy A -> BC. (Bonus points: does it satisfy A -> C? Does it matter?)

Resources