MS SQL Server - Table Dependency Hierarchy Group - sql-server

I have a database with approx 500 tables and there are lot of foreign key relationships among these tables.
I need to form the groups of related tables together i.e one group is not related to any other group all the related tables should come in one group.
For ex:-
There are four tables T1, T2, T3 and T4.
T1 and T2 have a relationship and T3 and T4 have a relationship. So i can insert T1 and T2 in one group and T3 and T4 in another group.

Which SQL server are you using?
select O.name as [Object_Name], C.text as [Object_Definition]
from sys.syscomments C
inner join sys.all_objects O ON C.id = O.object_id
--where C.text like '%table_name%'

Here's a hierarchical query on sys.foreign_keys that should get you pretty close to what you're looking for.
WITH cte AS (
-- find tables that are parents, but are not children themselves
SELECT [fk].[referenced_object_id] AS [child_id],
NULL AS [parent_id],
CAST(CONCAT('/', [fk].[referenced_object_id], '/') AS VARCHAR(MAX)) AS h,
1 AS l
FROM sys.[foreign_keys] AS [fk]
WHERE [fk].[referenced_object_id] NOT IN (
SELECT [parent_object_id]
FROM sys.[foreign_keys]
)
UNION ALL
SELECT child.[parent_object_id],
[child].[referenced_object_id] AS [parent_id],
CAST(CONCAT(parent.[h], child.[parent_object_id], '/') AS VARCHAR(MAX)) AS [h],
parent.l + 1 AS l
FROM cte AS [parent]
JOIN sys.[foreign_keys] AS [child]
ON [parent].[child_id] = child.[referenced_object_id]
),
hier AS (
SELECT DISTINCT
OBJECT_NAME([cte].[child_id]) AS [child],
object_name([cte].[parent_id]) AS [parent],
h,
--CAST([cte].[h] AS HIERARCHYID) AS h
l
FROM cte
)
SELECT [hier].[child] ,
[hier].[parent] ,
[hier].[h]--.ToString()
FROM [hier]
ORDER BY
l, h -- breadth-first search
--h, l -- depth-first search
--h.GetLevel(), h -- breadth-first search; hierarchyid
--h, h.GetLevel() -- depth-first search; hierarchyid
You'll note that I included two order by clauses. Each have their uses. Assume that you have the following disconnected graphs of foreign keys: (a → b → c), (d → e → f). Using the first order by clause will return rows in the following order: a, d, b, e, c, f. That is, all of the top-level elements first, followed by the tier two elements, etc. The second order by clause will return them in the order of a, b, c, d, e, f (or maybe d, e, f, a, b, c; depending on the object ids for a and d). The idea here is that you fully exhaust one disconnected graph before moving onto the next one.
One note is that I'm fairly sure that the above doesn't take self-referential foreign keys into account. If that's important to you, I'd deal with those as a separate action (i.e. fully populate those first, then find the non-self-referential relationships using the above).
I also left a comment or two in there for making a hierarchyid solution work. in the hier cte, use the casting of h to hierarchyid instead of h and then use the order by clauses that take advantage of that. None of that is necessary, but could be a good first exposure to hierarchyid.

Related

PostGIS minimum distance between two large sets of points

I have two tables of points in PostGIS, say A and B, and I want to know, for every point in A, what is the distance to the closest point in B. I am able to solve this for small sets of points with the following query:
SELECT a.id, MIN(ST_Distance_Sphere(a.geom, b.geom))
FROM table_a a, table_b b
GROUP BY a.id;
However, I have a couple million points in each table and this query runs indefinitely. Is there some more efficient way to approach this. I am open to getting an approximate distance rather than an exact one.
Edit: A slight modification to the answer provided by JGH to return distances in meters rather than degrees if points are unprojected.
SELECT
a.id, nn.id AS id_nn,
a.geom, nn.geom_closest,
ST_Distance_Sphere(a.geom, nn.geom_closest) AS min_dist
FROM
table_a AS a
CROSS JOIN LATERAL
(SELECT
b.id,
b.geom AS geom_closest
FROM table_b b
ORDER BY a.geom <-> b.geom
LIMIT 1) AS nn;
Your query is slow because it computes the distance between every points without using any index. You could rewrite it to use the <-> operator that uses the index if used in the order by clause.
select a.id,closest_pt.id, closest_pt.dist
from tablea a
CROSS JOIN LATERAL
(SELECT
id ,
a.geom <-> b.geom as dist
FROM tableb b
ORDER BY a.geom <-> b.geom
LIMIT 1) AS closest_pt;

only display one row when key field is the same

I have created a key field (C) by joining two columns(A&C). I want to run an sql that says, if column C is unique take only the top row.
Sample data:-
A B C D
10022 Blue 10022Blue Buggy
10300 Red 10300Red Noodle
10300 Red 10300Red Sammy
so I only want one line to show for 10300Red
Cheers
One way to do it is with a cte and ROW_NUMBER():
;WITH CTE AS
(
SELECT A,
B,
C,
D,
ROW_NUMBER() OVER(PARTITION BY C ORDER BY (SELECT NULL)) rn
FROM Table
)
SELECT A, B, C, D
FROM CTE
WHERE rn = 1
Note: You did say you want the "first" record, but you didn't specify the order of the records. Since tables in a relational database are unsorted by nature, "first" is simply an arbitrary row, hence "order by (select null)"
Do it this way:
select distinct A, B, C from tablename
You can find the result set by grouping it, then join it with the main table.
SELECT
A.*
FROM
YourTable A INNER JOIN
(
SELECT
G.C,
MAX(G.D) D
FROM
YourTable G
GROUP BY
G.C
) B ON A.C = B.C AND A.D = B.D

SQL join multiple tables - result set not expected

I am working in SQL Server 2008. I have 4 tables I want join. Let us call them tables A, B, C, and D. B, C, and D are all subsets of table A. There could be some records that are common amongst B, C, and D. My goal is to select all records in A that are not in B, C, or D. So, I think the correct query to run is:
SELECT
A.x
FROM A
LEFT JOIN B
ON A.x = B.y
LEFT JOIN C
ON A.x = C.z
LEFT JOIN D
ON A.x = D.i
WHERE
(
(B.y IS NULL)
AND
(C.z IS NULL)
AND
(D.i IS NULL)
)
The problem I am having is that I know that there are some records in table B that are returning in this result set which should not be. (The same could hold for tables C and D as well.) So, something must be wrong with my query. My best guess is that the joins are vague. The first one should give me all records in A that are not in B. Similarly, the second one should give me all records in A that are not in C. Because I have used AND in the WHERE clause, then I should essentially be returning only the records that are common to each of the joins. But, something is going wrong. How do I correct this?
Try this:
SELECT x FROM A
EXCEPT
SELECT x FROM
(
SELECT y FROM B UNION
SELECT z FROM C UNION
SELECT i FROM D
) T(x)

OVER (ORDER BY Col) generates 'Sort' operation

I'm working on a query that needs to do filtering, ordering and paging according to the user's input. Now I'm testing a case that's really slow, upon inspection of the Query Plan a 'Sort' is taking 96% of the time.
The datamodel is really not that complicated, the following query should be clear enough to understand what's happening:
WITH OrderedRecords AS (
SELECT
A.Id
, A.col2
, ...
, B.Id
, B.col1
, ROW_NUMBER() OVER (ORDER BY B.col1 ASC) AS RowNumber
FROM A
LEFT JOIN B ON (B.SomeThing IS NULL) AND (A.BId = B.Id)
WHERE (A.col2 IN (...)) AND (B.Id IN (...))
)
SELECT
*
FROM OrderedRecords WHERE RowNumber Between x AND y
A is a table containing about 100k records, but will grow to tens of millions in the field, while B is category type table with 5 items (and this will never grow any bigger then perhaps a few more). There are clustered indexes on A.Id and B.Id.
Performance is really dreadful and I'm wondering if it's possible to remedy this somehow. If, for example, the ordering is on A.Id instead of B.col1 everything is pretty darn fast. Perhaps I can optimize B.col1 is some sort of index.
I already tried putting an index on the field itself, but this didn't help. Probably because the number of distinct items in table B is very small (in itself & compared to A).
Any ideas?
I think this may be part of the problem:
LEFT JOIN B ON (B.SomeThing IS NULL) AND (A.Id = B.Id)
WHERE (A.col2 IN (...)) AND (B.Id IN (...)
Your LEFT JOIN is going to logically act like an INNER JOIN because of the WHERE clause you have in place, since only certain B.ID rows are going to be returned. If that's your intent, then go ahead and use an inner join, which may help the optimizer realize that you are looking for a restricted number of rows.
I suggest you to try following.
For the B table create index:
create index IX_B_1 on B (col1, Id, SomeThing)
For the A table create index:
create index IX_A_1 on A (col2, BId) include (Id, ...)
In the include put all other columns of the table A, that listed in SELECT of OrderedRecords CTE.
However, as you see, index IX_A_1 is space taking, and can take size of about table data itself.
So, as an alternative you may try omit extra columns from include part of the index:
create index IX_A_2 on A (col2, BId) include (Id)
but in this case you will have to slightly modify your query:
;WITH OrderedRecords AS (
SELECT
AId = A.Id
, A.col2
-- remove other A columns from here
, bid = B.Id
, B.col1
, ROW_NUMBER() OVER (ORDER BY B.col1 ASC) AS RowNumber
FROM A
LEFT JOIN B ON (B.SomeThing IS NULL) AND (A.BId = B.Id)
WHERE (A.col2 IN (...)) AND (B.Id IN (...))
)
SELECT
R.*, A.OtherColumns
FROM OrderedRecords R
join A on A.Id = R.AId
WHERE R.RowNumber Between x AND y

set difference in SQL query

I'm trying to select records with a statement
SELECT *
FROM A
WHERE
LEFT(B, 5) IN
(SELECT * FROM
(SELECT LEFT(A.B,5), COUNT(DISTINCT A.C) c_count
FROM A
GROUP BY LEFT(B,5)
) p1
WHERE p1.c_count = 1
)
AND C IN
(SELECT * FROM
(SELECT A.C , COUNT(DISTINCT LEFT(A.B,5)) b_count
FROM A
GROUP BY C
) p2
WHERE p2.b_count = 1)
which takes a long time to run ~15 sec.
Is there a better way of writing this SQL?
If you would like to represent Set Difference (A-B) in SQL, here is solution for you.
Let's say you have two tables A and B, and you want to retrieve all records that exist only in A but not in B, where A and B have a relationship via an attribute named ID.
An efficient query for this is:
# (A-B)
SELECT DISTINCT A.* FROM (A LEFT OUTER JOIN B on A.ID=B.ID) WHERE B.ID IS NULL
-from Jayaram Timsina's blog.
You don't need to return data from the nested subqueries. I'm not sure this will make a difference withiut indexing but it's easier to read.
And EXISTS/JOIN is probably nicer IMHO then using IN
SELECT *
FROM
A
JOIN
(SELECT LEFT(B,5) AS b1
FROM A
GROUP BY LEFT(B,5)
HAVING COUNT(DISTINCT C) = 1
) t1 On LEFT(A.B, 5) = t1.b1
JOIN
(SELECT C AS C1
FROM A
GROUP BY C
HAVING COUNT(DISTINCT LEFT(B,5)) = 1
) t2 ON A.C = t2.c1
But you'll need a computed column as marc_s said at least
And 2 indexes: one on (computed, C) and another on (C, computed)
Well, not sure what you're really trying to do here - but obviously, that LEFT(B, 5) expression keeps popping up. Since you're using a function, you're giving up any chance to use an index.
What you could do in your SQL Server table is to create a computed, persisted column for that expression, and then put an index on that:
ALTER TABLE A
ADD LeftB5 AS LEFT(B, 5) PERSISTED
CREATE NONCLUSTERED INDEX IX_LeftB5 ON dbo.A(LeftB5)
Now use the new computed column LeftB5 instead of LEFT(B, 5) anywhere in your query - that should help to speed up certain lookups and GROUP BY operations.
Also - you have a GROUP BY C in there - is that column C indexed?
If you are looking for just set difference between table1 and table2,
the below query is simple that gives the rows that are in table1, but not in table2, such that both tables are instances of the same schema with column names as
columnone, columntwo, ...
with
col1 as (
select columnone from table2
),
col2 as (
select columntwo from table2
)
...
select * from table1
where (
columnone not in col1
and columntwo not in col2
...
);

Resources