Trouble decomposing to 3rd Normal Form (DB)

Trouble decomposing to 3rd Normal Form (DB) - database

I've recently started studying Data Bases but I'm struggling with this specific part.
I've read the definition of each Normal Form but I still can't seem to understand. Here's an example that I couldn't solve properly:
**R(A,B,C,D,E,F)**
A->B; B->CD; AD->E
Solution: R1(*A*,B,E); R2(*B*,C,D); R3(*A*,*F*)
I can't understand why the R3 is like that

R3 is to make sure it is in 2nd Normal Form and there is no update anomaly. F in R1 would lead to duplicate rows of A,B,E where there are multiple F values for A. B and E values might be either ambiguous or completely redundant.

Related

Basic interpretation of two table operations (row wise/not row wise) in Postgis

I'm new in PostGIS, I has been reading the docs, usually the docs are very good written, at least for tables of 1 row D:
Probs this will be a silly question, or obvs too all ones that know postgis, but plis help a little to can go inside from other languages.
I have checked a lot from:
https://postgis.net/workshops/postgis-intro/
Sadly, I still can't get an answer for a simple question, the behavior of a lot of functions in table-table operations.
I know R/sf, and I'm trying to learn Postgis, but usually, every function have its own way to relate the functions, as example, IIRC intersects exists in sf and geopandas, but..., the behavior of the function is different, even when they have the same name.
Lets pick an example:
https://postgis.net/docs/ST_Intersects.html
The function is defined as:
boolean ST_Intersects( geometry geomA , geometry geomB );
All the params are defined a geometry, that means it can be a column or a singular geometry, but we don't know what will be the behavior if the tables has more than 1 row, maybe when it says "geometries" will interpret the full table as one big geometry.
Then I can go to this link:
https://postgis.net/docs/geometry_overlaps.html
Where I can finally see a result that seems a matrix operation..., at some extent, here is where the possibilities starts to open.
Intersects is a row wise function?
Intersects will intersects every from from the first table over the second table? in case how would be the return...? (need a table of rows(table1)*rows(table2), this is not written in the docs)
Here above, are just the questions and what is confusing, checking intersects, now lets go back to the specific issue.
Probably, the relation of the functions is a common sense in postgis, because in the doc that is omitted, and not only in intersects in others like intersection, disjoint, etc. I think all of them has the same behavior, is just implicit.
So, postgis works in a element-element? table-element? element-table? table-table? or other interpretation? or every function have its own way but is not written or I need search on other place?
Thx!

Can a language have a multiple solution in dfa diagram?

What i mean is that can there be multiple different forms of diagram of the same language? Can it be drawn with multiple solutions? Or each language has only one solution in DFA? I attended a pop quiz today. Drew a solution and tried multiple strings. Each of those were accepted but i didn't get any points for it. Didn't get any feedback from my TA as why it was considered wrong.
The question was. Let L = {w | w contains an odd number of 0s or at least two 1s}.
This is what i did (sorry had to use ms paint).

If you notice a bit more carefully then 0101 is a string in your language but it is not accepted by your automata. Also to answer your other question, yes, there can be multiple DFAs which accept the same language. A trivial example would be the language 0* (Think about it if you are still interested, haha!).
P.S. - Just noticed a comment which pointed out the counter-example but I still went ahead. Sorry!

What is a good approach to check if an item is in a very big hashset?

I have a hashset that cannot be entirely loaded into the memory. So let's say it has ABC part and each one could be loaded into memory but not all at one time.
I also have random entries coming in from time to time which I can barely tell which part it could potentially belong to. So one of the approaches could be that I load A first and then make a check, and then B, C. But next entry could belong to B so I have to unload C, and then load A, then B...Hopefully I make this understood.
This clearly would be very slow so I wonder is there a better way to do that? (if using db is not an alternative)

I suggest that you don't use some criteria to put data entry either to A or to B. In other words, A,B,C - it's just result of division of whole data to 3 equal parts. Am I right? If so I recommend you add some criteria when you adding new entry to your set. For example, if your entries are numbers put those who starts from 0-3 to A, those who starts from 4-6 - to B, from 7-9 to C. When your search something, you apriori now that you have to search in A or in B, or in C. If your entries are words - the same solution, but now criteria is first letter. May be here better use not 3 sets but 26 - size of english alphabet. Please note, that you anyway have to store one of sets in memory. You see one advantage - you do maximum 1 load/unload operation, you don't need to check all sets - you now which of them can really store your value. This idea is widely using in DB - partitioning. If you store in sets nor numbers nor words but some complex objects you anyway can invent some simple criteria.

Algorithm to match three sorted lists

So I've been trying to solve this for some hours now, but apparently there's still something missing. Maybe I'm thinking the wrong way, but I think it is a very complex problem:
I have three lists with items in a fixed order. For explaining the problem assume they contain items A to Z - mostly in the same order with some exceptions, where items can be in different positions. Also only one list contains all items - the other contain a subset and are missing certain items. As a solution for this problem would be sufficient, it could be possible to have no list with all items, but only partly overlapping sets. Even better would be an algorithm to solve the problem with multiple (> 3) lists.
So here's the example:
List 1: A B C D E F G H I J
List 2: A C D B F G
List 3: B C D E H F G
Now what I want is to match these three lists to visualize where the sort order is different and where are items that are missing. So the result should be:
List 1: A B C D E F G H I J
List 2: A C D B F G
List 3: B C D E H F G
So I immediately see, that List 2 has a B at the wrong position, A is missing from List 3, which also has H in the wrong position.
I was thinking about storing the result in a CSV to import into Excel. So the rows are:
A,A,
B,,B
C,C,C
...
Now my question is: how do I match the lists that way to generate the CSV output? The language I use is Java. So far I failed with the problem that a list other than the reference list contains items earlier, which appear later in the reference list.
This is by the way a real-world problem.
Any suggestions are appreciated.

There are off-the-shelf tools for solving this problem, such as the Unix tool diff3. Trying to solve it for arbitrary numbers of lists is not advisable unless you are willing to invest a lot of time in developing heuristics, as you are then dealing with the NP-hard general case of the longest common subsequence problem.

If I understand your question correctly, you are essentially trying to solve a multiple sequence alignment problem, which is a well-researched topic within bioinformatics. There are several algorithms for it, some of which are based on the concept of Levenshtein distance (which would solve a two-array version of your problem) - I suggest you start there.

Prolog - Help fixing a rule

I have a database full of facts such as:
overground(newcrossgate,brockley,2).
overground(brockley,honoroakpark,3).
overground(honoroakpark,foresthill,3).
overground(foresthill,sydenham,2).
overground(sydenham,pengewest,3).
overground(pengewest,anerley,2).
overground(anerley,norwoodjunction,3).
overground(norwoodjunction,westcroydon,8).
overground(sydenham,crystalpalace,5).
overground(highburyandislington,canonbury,2).
overground(canonbury,dalstonjunction,3).
overground(dalstonjunction,haggerston,1).
overground(haggerston,hoxton,2).
overground(hoxton,shoreditchhighstreet,3).
example: newcrossgate to brockley takes 2 minutes.
I then created a rule so that if I enter the query istime(newcrossgate,honoroakpark,Z). then prolog should give me the time it takes to travel between those two stations. (The rule I made is designed to calculate the distance between any two stations at all, not just adjacent ones).
istime(X,Y,Z):- istime(X,Y,0,Z); istime(Y,X,0,Z).
istime(X,Y,T,Z):- overground(X,Y,Z), T1 is T + Z.
istime(X,Y,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
istime(X,Y,Z):- overground(X,B,T), istime(B,X,T1), Z is T + T1.
it seems to work perfectly for newcrossgate to the first couple stations, e.g newcrossgate to foresthill or sydenham. However, after testing newcrossgate to westcroydon which takes 26mins, I tried newcrossgate to crystalpalace and prolog said it should take 15 mins...despite the fact its the next station after westcroydon. Clearly somethings wrong here, however it works for most of the stations while coming up with a occasional error in time every now and again, can anyone tell me whats wrong? :S

This is essentially the same problem as your previous question, the only difference is that you need to accumulate the time as you go.
One thing I see is that your "public" predicate, istime/3 tries to do too much. All it should do is seed the accumulator and invoke the worker predicate istime/4. Since you're looking for route/time in both directions, the public predicate should be just
istime( X , Y , Z ) :- istime( X , Y , 0 , Z ) .
istime( X , Y , Z ) :- istime( Y , X , 0 , Z ) .
The above is essentially the first clause of your istime/3 predicate
istime(X,Y,Z):- istime(X,Y,0,Z); istime(Y,X,0,Z).
The remaining clauses of istime/3, the recursive ones:
istime(X,Y,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
istime(X,Y,Z):- overground(X,B,T), istime(B,X,T1), Z is T + T1.
should properly be part of istime/4 and have the accumulator present. That's where your problem is.
Give it another shot and edit your question to show the next iteration. If you still can't figure it out, I'll show you some different ways to do it.
Some Hints
Your "worker" predicate will likely look a lot like your earlier "find a route between two stations" exercise, but it will have an extra argument, the accumulator for elapsed time.
There are two special cases. If you use the approach you used in your "find a route between two stations" solution, the special cases are
A and B are directly adjacent.
A and B are connected via at least one intermediate stop.
There's another approach as well, that might be described as using lookahead, in which case the special cases are
A and B are the same, in which case you're arrived.
A and B are not and are connected by zero or more intermediate stops.
FWIW, You shouldn't necessarily expect the route with the shortest elapsed time or the minimal number of hops to be the first solution found. Backtracking will produce all the routes, but the order in which they are found has to do with the order in which the facts are stored in the database. A minimal cost search of the graph is another kettle of fish entirely.

Have you tried to cycle through answers with ;? 26mins is not the shortest time between newcrossgate and westcroydon...
Edit: my bad! Apparently the shorter results were due to a bug in your code (see my comment about the 4th clause). However, your code is correct, 15mins is the shortest route between newcrossgate and crystalpalace. Only because there is a route that goes from newcrossgate to westcroydon, then crystalpalace, that doesn't mean it's the shortest route, or the route your program will yield first.
Update: if you're running into problems to find answers to some routes, I'd suggest changing the 3rd clause to:
istime(X,Y,_,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
The reason is simple: your first clause swaps X with Y, which is good, since with that you're saying the routes are symmetrical. However, the 3rd clause does not benefit from that, because it's never called by the swapped one. Ignoring the 3rd argument (which you're not using anyway) and thus letting the 1st clause call the 3rd might fix your issue, since some valid routes that were not used previously will be now.
(also: I agree with Nicholas Carey's answer, it would be better to use the third argument as an accumulator; but as I said, ignoring it for now might just work)

To make it work you need to do the reverse of both journeys stated in your last clause.
Keep the predicate as it is, istime(X,Y,Z) and just make another clause containing the reverse journeys.
This way it works with all the stations. (Tried and Tested)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight