Is DataLog equivalent to SQL? - datomic

If Datalog is based on first order logic which is equivalent to SQL, how come Datalog can express transitivity (which is inexpressible in SQL/first order logic)?
https://en.wikipedia.org/wiki/Datalog
This clearly means Datalog is more expressive than SQL is,
http://www.learndatalogtoday.org/
Says that it has expressive power of SQL. Does it mean that Datomic is doing a subset of datalog? Or is Datalog First order logic with fixpoints? What am I missing here?

I think you are right. Datalog is first order logic with fixed points, while classical SQL is pure first order logic.
Practically this comes from Datalog allowing recursion and classical SQL not having an expression for recursion.

Related

Explicit JOINs vs Implicit joins?

My Database Professor told us to use:
SELECT A.a1, B.b1 FROM A, B WHERE A.a2 = B.b2;
Rather than:
SELECT A.a1, B.b1 FROM A INNER JOIN B ON A.a2 = B.b2;
Supposedly Oracle don't likes JOIN-Syntaxes, because these JOIN-syntaxes are harder to optimize than the WHERE restriction of the Cartesian Product.
I can't imagine why this should be the case. The only Performance issue could be that the parser Needs to parse a few characters more. But that is negligible in my eyes.
I found this Stack Overflow Questions:
Is there an Oracle official recommendation on the use of explicit ANSI JOINs vs implicit joins?
Explicit vs implicit SQL joins
And this sentence in a Oracle Documentation: https://docs.oracle.com/cd/B19306_01/server.102/b14200/queries006.htm
Oracle recommends that you use the FROM clause OUTER JOIN syntax rather than the Oracle join operator.
Can someone give me up-to-date recommendations from Oracle with link. Because she don't acknowledges StackOverflow (here can answer everyone) and the 10g Documentation is outdated in here eyes.
If i am wrong and Oracle realy don't likes JOINS now than thats also ok, but i don't find articles. I just want to know who is Right.
Thanks a lot to everyone who can help me!
Your professor should speak with Gordon Linoff, who is a computer science professor at Columbia University. Gordon, and most SQL enthusiasts on this site, will almost always tell you to use explicit join syntax. The reasons for this are many, including (but not limited to):
Explicit joins make it easy to see what the actual join logic is. Implicit joins, on the other hand, obfuscate the join logic, by spreading it out across both the FROM and WHERE clauses.
The ANSI 92 standard recommends using modern explicit joins, and in fact deprecated the implicit join which your professor seems to be pushing
Regarding performance, as far as I know, both versions of the query you wrote would be optimized to the same thing under the hood. You can always check the execution plans of both, but I doubt you would see a significant difference very often.
An average sql query you will encounter in real business has 7-8 joins with 12-16 join conditions. One every 10 or 20 queries may involve nested joins or other more advanced cases.
Explicit join syntax is simply far easier to maintain, debug and develop. And those factors are critical for business software - the faster and safer the better.
Implicit join are somewhat easier to code if you create statements dynamically through application code. Perhaps there are other uses that i am unaware.
As with many non-trivial things there is no simple yes / no answer.
The first thing is, for trivial queries (as yours example in the question) it doesn’t matter which syntax you use. The classic syntax is even more compact for simple queries.
First for non-trivial queries (say more than five joins) you will learn the benefits of the ANSI syntax. The main benefit is that the join predicates are separated and divided from the WHERE condition.
Simple example – this is a complete valid query in the pre-ANSI syntax
SELECT A.a1, B.b1
FROM A, B
WHERE A.a1 = B.b1 and
A.a1 = B.b1(+);
Is it inner or outer join? Furthermore if this construct is scattered in a predicate with 10 other join condition in the WHERE clause, it is even very easy to misread it.
Anyway, it would be very naïve to assume that those two syntax options are only a syntax sugar and that the resulting execution plan is for all queries, any data and all Oracle versions identical.
Yes, and there were times (about Oracle 10) you should be careful. But in times of 12 and 18 versions I do not see a reason to be defensive and I'm convinced it is safe to use the ANSI syntax from the above reason of better overview and readability.
Final remark for your professor: if you get in the position of optimizing the WHERE restriction of the Cartesian Product you typically encounters a performance problem. Make a thought experiment with a Cartesian Product of four tables with 1.000 rows each…
There are rare occasions when the optimiser suffers from a bug when using the explicit JOIN syntax as opposed to the implicit one. For example, I once could not achieve to profit from a join elimination optimisation in Oracle 12c when using explicit joins, whereas the join was properly eliminated with the implicit join syntax. When working with views querying views querying views, lack of join elimination can indeed cause performance issues. I've explained the concept of join elimination in a blog post, here.
That was a bug (and a rare one at that, these days), and not a good reason to avoid the explicit join syntax in general. I think in current versions of Oracle, there's no reason in favour of one or the other syntax other than personal taste, when join trees are simple. With complex join trees, the explicit syntax tends to be superior, as it is more clear, and some relationships (e.g. full outer joins or joins with complex join predicates) are not possible otherwise. But neither of these arguments is about performance.

MS Access Queries Conversion to Sql Server

I am converting lots of access queries to sql server stored procedure. So the sql need to meet the t-sql standard. For example IIF etc
Is there a tool that can convert big access queries to t-sql ? What is the best way of doing this ?
As far as a "tool" that will just convert the queries for you, I'm not aware of one. Neither is anyone on this thread or this site.
There are a couple places I can direct you, though, that can possibly help with the transition.
Here is a cheat sheet you can use as a quick glance when converting your queries.
If your queries use any [Forms]! references, there could also be an issue with that. (I've never tried it, but I am going to assume it doesn't work.)
This resource has probably the most detailed explanations on things you might need to learn in SQL Server. From stored queries, to handling NULLs to some of the other differences. There are also differences in MS Access SQL compared to T-SQL. Gordon Linoff briefly describes 10 important differences in his blog.
Access does not support the case statement, so conditional logic is
done with the non-standard IIf() or Switch() functions.
Access requires parentheses around each pair-wise join, resulting in
a proliferation of nesting in from clauses that only serves to
confuse people learning SQL.
Access join syntax requires the INNER for INNER JOIN. While it may
be a good idea to use inner for clarify, it is often omitted in
practice (in other databases).
Access does not support full outer join.
Access does not allow union or union all in subqueries.
Access requires the AS for table aliases. In most databases, this
is optional, and I prefer to only use as for column aliases.
Ironically, the use of as for table aliases is forbidden in Oracle.
Access uses double quotes to delimit strings (as opposed to single
quotes) and is the only database (to my knowledge) that uses & as a
string concatenation operator.
Access uses * for the wildcard in like rather than %.
Access allows BETWEEN AND . This is allowed in other databases, but
will always evaluate to false.
Access does not support window/analytic functions (using the over
and partition by clauses).
In sum, no, there is no tool that I have seen.

Query equivalence evaluation

My question is rooted in T-SQL, SQL Server environment, but its scope is not confined to this technology. I am working on a database with a quite complex business logic, with existing views, stored procedures and new ones to be designed. By means of comparisons of different queries or part of them, I have a strong feeling that there are sections performing the same job with a different arrangement, but of course to refactor the whole mess I need something more that a feeling; so I am trying to determine a way to demonstrate that two statements are equivalent.
An obvious but weak response could be to ascertain that the two queries A and B produce the same recordset: if A is a subset of B and B is a subset of A, they are the same recordset; but I am not sure that this is a good idea because, of course, a recordset is not a query, the results could depend on data and specific parameter values. My questions is: there is a method to prove the equivalence of two different queries? I would say yes, because the optimization performed by the database should works on this. Someone could provide me some pointer to documentation or books digging in this? If there is no general method to prove the equivalence, there is some smart approach based on regression testing performed according to some effective heuristic that does the job?
Edited later: in case, reverse engineering the queries (by hand?) by means of relational algebra, could be a superior method to assess the query equivalence instead of using other queries and / or the computer? There are automated tools helping in performing this "reverse engineering", in case?
Thanks a lot for helping
You probably can't prove it, since the problem seems to be NP-complete; check this SO question on query equivalence (that one is about Oracle, but there are a couple of answers / links that should be relevant for you).
You can check the execution plans of the two queries. If they are the same, you have your answer!
Only by the execution plan you can check it. Apart from that i dont think that there is any way to prove this thing.
You'll need to implement some "canonical query plan" generator for this (an "optimal query plan" as generated by the DBMS can be nondeterministic). In most cases, using alphabetical ordering of terms and tables as a tie-breaker will get you there.
I doubt you are going to be able to formally proof or disprove this but my take on this would be to
identify all use cases
identify all boundary values
identify all parameters
and derive a test plan from that. It would require you to
create testdata for each case
run both queries against that data
compare the results
If you don't find any differences after testing, you can be reasonably assured that both statements are equivallent.

Ordered input attribute for Microsoft Naive Bayes algorithm

Could someone explain me why I get next error message:
Mining structure column MyColumn has content type of Ordered that is not
supported by Microsoft Association or Microsoft Naive Bayes algorithms.
Documentation states that (Content Types (Data Mining)):
This content type is supported by all the data mining data types in Analysis
Services. However, however, most algorithms treat ordered values as
discrete values and do not perform special processing.
And specifically for Bayes (Microsoft Naive Bayes Algorithm Technical Reference):
Input attribute: Cyclical, Discrete, Discretized, Key, Table, and Ordered
And another question. What algorithms does the Ordered content type have impact on? I mean if we use Ordered instead of just Discrete.
The ORDERED content type has no impact on any of the SSAS algorithm. It was added to the OLEDB for Data Mining specification in 1999 based on feedback from other data mining vendors. It is possible to write a custom algorithm that will consider the ORDERED flag, but there is no practical way in SQL Server 2005 and beyond to actually order the actual discrete values (there was a way in SQL Server 2000, but it's not practical for 2005+ and likely doesn't work.)
As an aside, the only time ordered discrete states are considered by any SSAS algorithm is when the Clustering algorithm handles discretized values.

How does an index work on a SQL User-Defined Type (UDT)?

This has been bugging me for a while and I'm hoping that one of the SQL Server experts can shed some light on it.
The question is:
When you index a SQL Server column containing a UDT (CLR type), how does SQL Server determine what index operation to perform for a given query?
Specifically I am thinking of the hierarchyid (AKA SqlHierarchyID) type. The way Microsoft recommends that you use it - and the way I do use it - is:
Create an index on the hierarchyid column itself (let's call it ID). This enables a depth-first search, so that when you write WHERE ID.IsDescendantOf(#ParentID) = 1, it can perform an index seek.
Create a persisted computed Level column and create an index on (Level, ID). This enables a breadth-first search, so that when you write WHERE ID.GetAncestor(1) = #ParentID, it can perform an index seek (on the second index) for this expression.
But what I don't understand is how is this possible? It seems to violate the normal query plan rules - the calls to GetAncestor and IsDescendantOf don't appear to be sargable, so this should result in a full index scan, but it doesn't. Not that I am complaining, obviously, but I am trying to understand if it's possible to replicate this functionality on my own UDTs.
Is hierarchyid simply a "magical" type that SQL Server has a special awareness of, and automatically alters the execution plan if it finds a certain combination of query elements and indexes? Or does the SqlHierarchyID CLR type simply define special attributes/methods (similar to the way IsDeterministic works for persisted computed columns) that are understood by the SQL Server engine?
I can't seem to find any information about this. All I've been able to locate is a paragraph stating that the IsByteOrdered property makes things like indexes and check constraints possible by guaranteeing one unique representation per instance; while this is somewhat interesting, it doesn't explain how SQL Server is able to perform a seek with certain instance methods.
So the question again - how do the index operations work for types like hierarchyid, and is it possible to get the same behaviour in a new UDT?
The query optimizer team is trying to handle scenarios that don't change the order of things. For example, cast(someDateTime as date) is still sargable. I'm hoping that as time continues, they fix up a bunch of old ones, such as dateadd/datediff with a constant.
So... handling Ancestor is effectively like using the LIKE operator with the start of a string. It doesn't change the order, and you can still get away with stuff.
You are correct - HierarchyId and Geometry/Geography are both "magical" types that the Query Optimizer is able to recognize and rewrite the plans for in order to produce optimized queries - it's not as simple as just recognizing sargable operators. There is no way to simulate equivalent behavior with other UDTs.
For HierarchyId, the binary serialization of the type is special in order to represent the hierarchical structure in a binary ordered fashion. It is similar to the mechanism used by the SQL Xml type and described in a research paper ORDPATHs: Insert-Friendly XML Node Labels. So while the QO rules to translate queries that use IsDescendant and GetAncestor are special, the actual underlying index is a regular relational index on the binary hierarchyid data and you could achieve the same behavior if you were willing to write your original queries to do range seeks instead of calling the simple method.

Resources