SQL join multiple tables - result set not expected - sql-server

I am working in SQL Server 2008. I have 4 tables I want join. Let us call them tables A, B, C, and D. B, C, and D are all subsets of table A. There could be some records that are common amongst B, C, and D. My goal is to select all records in A that are not in B, C, or D. So, I think the correct query to run is:
SELECT
A.x
FROM A
LEFT JOIN B
ON A.x = B.y
LEFT JOIN C
ON A.x = C.z
LEFT JOIN D
ON A.x = D.i
WHERE
(
(B.y IS NULL)
AND
(C.z IS NULL)
AND
(D.i IS NULL)
)
The problem I am having is that I know that there are some records in table B that are returning in this result set which should not be. (The same could hold for tables C and D as well.) So, something must be wrong with my query. My best guess is that the joins are vague. The first one should give me all records in A that are not in B. Similarly, the second one should give me all records in A that are not in C. Because I have used AND in the WHERE clause, then I should essentially be returning only the records that are common to each of the joins. But, something is going wrong. How do I correct this?

Try this:
SELECT x FROM A
EXCEPT
SELECT x FROM
(
SELECT y FROM B UNION
SELECT z FROM C UNION
SELECT i FROM D
) T(x)

Related

PostGIS minimum distance between two large sets of points

I have two tables of points in PostGIS, say A and B, and I want to know, for every point in A, what is the distance to the closest point in B. I am able to solve this for small sets of points with the following query:
SELECT a.id, MIN(ST_Distance_Sphere(a.geom, b.geom))
FROM table_a a, table_b b
GROUP BY a.id;
However, I have a couple million points in each table and this query runs indefinitely. Is there some more efficient way to approach this. I am open to getting an approximate distance rather than an exact one.
Edit: A slight modification to the answer provided by JGH to return distances in meters rather than degrees if points are unprojected.
SELECT
a.id, nn.id AS id_nn,
a.geom, nn.geom_closest,
ST_Distance_Sphere(a.geom, nn.geom_closest) AS min_dist
FROM
table_a AS a
CROSS JOIN LATERAL
(SELECT
b.id,
b.geom AS geom_closest
FROM table_b b
ORDER BY a.geom <-> b.geom
LIMIT 1) AS nn;
Your query is slow because it computes the distance between every points without using any index. You could rewrite it to use the <-> operator that uses the index if used in the order by clause.
select a.id,closest_pt.id, closest_pt.dist
from tablea a
CROSS JOIN LATERAL
(SELECT
id ,
a.geom <-> b.geom as dist
FROM tableb b
ORDER BY a.geom <-> b.geom
LIMIT 1) AS closest_pt;

MS SQL Server - Table Dependency Hierarchy Group

I have a database with approx 500 tables and there are lot of foreign key relationships among these tables.
I need to form the groups of related tables together i.e one group is not related to any other group all the related tables should come in one group.
For ex:-
There are four tables T1, T2, T3 and T4.
T1 and T2 have a relationship and T3 and T4 have a relationship. So i can insert T1 and T2 in one group and T3 and T4 in another group.
Which SQL server are you using?
select O.name as [Object_Name], C.text as [Object_Definition]
from sys.syscomments C
inner join sys.all_objects O ON C.id = O.object_id
--where C.text like '%table_name%'
Here's a hierarchical query on sys.foreign_keys that should get you pretty close to what you're looking for.
WITH cte AS (
-- find tables that are parents, but are not children themselves
SELECT [fk].[referenced_object_id] AS [child_id],
NULL AS [parent_id],
CAST(CONCAT('/', [fk].[referenced_object_id], '/') AS VARCHAR(MAX)) AS h,
1 AS l
FROM sys.[foreign_keys] AS [fk]
WHERE [fk].[referenced_object_id] NOT IN (
SELECT [parent_object_id]
FROM sys.[foreign_keys]
)
UNION ALL
SELECT child.[parent_object_id],
[child].[referenced_object_id] AS [parent_id],
CAST(CONCAT(parent.[h], child.[parent_object_id], '/') AS VARCHAR(MAX)) AS [h],
parent.l + 1 AS l
FROM cte AS [parent]
JOIN sys.[foreign_keys] AS [child]
ON [parent].[child_id] = child.[referenced_object_id]
),
hier AS (
SELECT DISTINCT
OBJECT_NAME([cte].[child_id]) AS [child],
object_name([cte].[parent_id]) AS [parent],
h,
--CAST([cte].[h] AS HIERARCHYID) AS h
l
FROM cte
)
SELECT [hier].[child] ,
[hier].[parent] ,
[hier].[h]--.ToString()
FROM [hier]
ORDER BY
l, h -- breadth-first search
--h, l -- depth-first search
--h.GetLevel(), h -- breadth-first search; hierarchyid
--h, h.GetLevel() -- depth-first search; hierarchyid
You'll note that I included two order by clauses. Each have their uses. Assume that you have the following disconnected graphs of foreign keys: (a → b → c), (d → e → f). Using the first order by clause will return rows in the following order: a, d, b, e, c, f. That is, all of the top-level elements first, followed by the tier two elements, etc. The second order by clause will return them in the order of a, b, c, d, e, f (or maybe d, e, f, a, b, c; depending on the object ids for a and d). The idea here is that you fully exhaust one disconnected graph before moving onto the next one.
One note is that I'm fairly sure that the above doesn't take self-referential foreign keys into account. If that's important to you, I'd deal with those as a separate action (i.e. fully populate those first, then find the non-self-referential relationships using the above).
I also left a comment or two in there for making a hierarchyid solution work. in the hier cte, use the casting of h to hierarchyid instead of h and then use the order by clauses that take advantage of that. None of that is necessary, but could be a good first exposure to hierarchyid.

SQL Server - WHERE <several columns> in (<list of columns values>)

I have done this long ago in other DBMSs (Oracle or MySQL... don't really remember) and I'm looking for the way to do this in SQL Server, if possible at all.
Suppose you have a table with several columns, say A, B, C, ... M. I wish to phrase a select from this table where columns A, B, and C display specific sets of value or, in other words, a list of values combinations.
For instance, I wish to retrieve all the records that match any of the following combinations:
A B C
1 'Apples' '2016-04-12'
56 'Cars' '2014-02-11'
....
Since the list of possible combinations may be quite long (including the option of an inner SELECT), it would not be practical to use something like:
WHERE ( A = 1 AND B = 'Apples' and C = '2016-04-12' ) OR
( A = 56 AND B = 'Cars' and C = '2014-02-11' ) OR
...
As stated, I did use this type of construct in the past and it was something like:
SELECT *
FROM MyTable
WHERE (A,B,C) IN (SELECT A,B,C FROM MYOtherTable) ;
[Most likely this syntax is wrong but it shows what I'm looking for]
Also, I would rather avoid Dynamic SQL usage.
So, the questions would be:
Is this doable in SQL Server?
If the answer is YES, how should the SELECT be phrased?
Thanks in advance.
You can use JOIN
SELECT m1.*
FROM MyTable m1
JOIN MYOtherTable m2
ON m1.A = m2.A
AND m1.B = m2.B
AND m1.C = m2.C
or Exists
SELECT m1.*
FROM MyTable m1
WHERE EXISTS (SELECT 1
FROM MYOtherTable m2
WHERE m1.A = m2.A
AND m1.B = m2.B
AND m1.C = m2.C)

How to reuse calculated columns avoiding duplicating the sql statement

I have a lots of calculated columns and they keep repeating themselves, one inside of the others, including nested cases statements.
There is a really simplified version of something that I've searching a way to do.
SELECT
(1+2) AS A,
A + 3 AS B,
B * 7 AS C
FROM MYTABLE
You could try something like this.
SELECT
A.Val AS A,
B.Val AS B,
C.Val AS C
FROM MYTABLE
cross apply(select 1 + 2) as A(Val)
cross apply(select A.Val + 3) as B(Val)
cross apply(select B.Val * 7) as C(Val)
You can't reference just-created expressions by later referencing their column aliases. Think of the entire select list as being materialized at the same time or in random order - A doesn't exist yet when you're trying to make an expression to create B. You need to repeat the expressions - I don't think you'll be able to make "simpler" computed columns without repeating them, and views the same - you'll have to nest things, like:
SELECT A, B, C = B * 7
FROM
(
SELECT A, B = A + 3
FROM
(
SELECT A = (1 + 2)
) AS x
) AS y;
Or repeat the expression (but I guess that is what you're trying to avoid).
Another option if someone is still interested:
with aa(a) as ( select 1+2 )
, bb(b) as ( select a+3 from aa )
,cc(c) as ( select b*7 from bb)
SELECT aa.a, bb.b, cc.c
from aa,bb,cc
The only way to "save" the results of your calculations would be using them in a subquery, that way you can use A, B and C. Unfortunately it cannot be done any other way.
You can create computed columns to represent the values you want. Also, you can use a view if your calculations are dependent on data in a separate table.
Do you want calculated results out of your table? In that case you can put the relevant calculations in scalar valued user defined function and use that inside your select statement.
Or do you want the calculated results to appear as columns in the table, then use a computed column:
CREATE TABLE Test(
ID INT NOT NULL IDENTITY(1,1),
TimesTen AS ID * 10
)

How to insert artificial data into a select statement

Suppose we got a original query as follows:
SELECT A, B, C FROM tblA
Now, I need to additional artificial rows like
SELECT 'Kyala', B, C FROM tblA when, for example, C = 100 to be inserted into the resultset.
As an example, if the tblA hold one row:
A B C
John 1 100
my goal is to return two rows like below with a single SQL query.
A B C
John 1 100
Kyala 1 100
How could I achieve it using a single SQL instead of relying on table variable or temp table?
Just refined the query to resolve error on Union:
SELECT A, B, C from tblA
UNION
SELECT 'Kyala' as A, B, C FROM tblA WHERE C = 100
And if you don't want the others where c=100 and still getting the A in the result (from the first Select in the union), you can do it like:
SELECT A, B, C from tblA WHERE C <> 100
UNION
SELECT 'Kyala', B, C FROM tblA WHERE C = 100
or
SELECT CASE(C)
when 100 then 'Kyala'
else A
END as A, B, C from tblA
You can use a CASE:
SELECT B, C,
CASE
WHEN C = 100 THEN 'Kyala'
ELSE A
END
FROM tblA
You could achieve this with the UNION operator.
SELECT A, B, C from tblA
UNION
SELECT 'Kyala', B, C FROM tblA WHERE C = 100
In response to the question in the comments about improving performance so that the table is only queried once - you could add a covering index over columns C and B so that the second part of the query uses that index rather than querying the table:
CREATE NONCLUSTERED INDEX [IX_tblA_CD] ON [dbo].[tblA]
(
[C] ASC
)
INCLUDE ( [B]) ON [PRIMARY]
GO
However, depending on the use case (this sounds like some kind of ad-hoc process for testing?), you might prefer to take the hit of two table scans rather than adding a new index which might not be appropriate for use in production.
You can use UNIION statement:
SELECT A, B, C FROM tblA
UNION
SELECT 'Kyala', B, C FROM tblA WHERE C = 100
I need to additional artificial rows like SELECT 'Kyala', B, C FROM
tblA when, for example, C = 100 to be
inserted into the resultset.
Now, read up on....
* IF in SQL Server
*SWITCH etc.
Basically, you can define an additional column as was shown
(SELECT 'test', A, B, C FROM...)
But instead of 'test' you can put in an if or switch and work with the other fields to determine the exact stuff to output.
SELECT IF (xxxx) AS FirstColumn, A, B,
C FROM ...

Resources