Materialize a CTE, or otherwise increase performance

Materialize a CTE, or otherwise increase performance - sql-server

Given a table (AccountId, ParentId NULL), we want to be able to quickly find:
1. The master parent ID (the accountId where ParentId is null).
2. All children for a given account ID.
With a CTE this is fairly easy. However, we can't save the CTE in an indexed view, which hurts performance. We've kicked around some other ideas, like saving the path (id1/id2/id3) in another field, but that feels sorta hacky.
We thought of a trigger that'd save the "master" ID on each row, but we're unsure how that'd work in the middle of a chain (1 owns 2 owns 3, but then 2 transfers to 7). It also doesn't solve the "find all children" query.
Any thoughts? We're using SQL 2008 R2, but can move to SQL 2012.

In SQL 2008, there is a hierarchyid type that basically implements the saving the path to the root. http://technet.microsoft.com/en-us/library/bb677290%28v=sql.100%29.aspx
If your hierarchy is mostly static, nother option is to have a de-normalized version of this table with the combination of a parent to every descendant. So if your hierarchy is A is a parent of B who is a parent of C, the denormalized table can look like this
parent child depth
A A 0
A B 1
A C 2
B B 0
B C 1
C C 0
Now if you index both the parent and the child columns, searching the hierarchy becomes very fast.

Related

Unlimited levels of hierarchy in SQL table - PostgreSQL

I am looking for a way to store and handle unlimited level of hierarchy for various organisations/entities stored in my DB. For example, instead of having just one parent and one child organisation (e.g. 2 levels of hierarchy) and just one-to-many relationship as allowed by self-join (e.g. having another column called parent referring to the IDs of the same table), I want to be able to have as many levels of hierarchy as possible and as many connections as possible.
Supposing I have an organisation table such as the following:
ID
Name
Other Non-related data
1
Test1
NULL
2
Test2
NULL
3
Test3
something
4
Test4
something else
5
Test5
etc
I am considering the following solution; for each table that I need this I can add another table named originalTable_hierarchy which refers to the organisation table in both columns and make it look like this:
ID
Parent ID
ChildID
1
1
2
2
2
4
3
3
1
4
3
2
5
2
3
From this table I can tell that 1 is parent to 2, 2 is parent to 4, 3 is parent to 1, 3 is also parent to 2, 2 is also parent to 3.
The restrictions I can think of are not to have the same ParentID and ChildID (e.g. a tuple like (3,3)) and not to have a record that puts them into the opposite order (e.g. if I have the (2,3) tuple, I can't also have (3,2))
Is this the correct solution for multiple organisations and suborganisations I might have later on? Users will have to navigate through them easily back and forth. If users decide to split one organisation into many, does this solution suffice? What else should I consider (extra or missing perks) when doing this instead of a traditional self-join or a certain number of tables for certain levels of hierarchy (e.g. organisaion table and suborganisation table)? Also, can you impose restrictions on certain records, so that no more childs of a certain parent can be created? Or to report on all the childs of an original parent?
Please feel free to also instruct on where to read more about this. Any relevant resources are welcome.

You only need a single table as having just one parent and one child allows an unlimited (theoretical anyway) levels in the hierarchy. You do this by reversing the relationship so that the Child references the Parent. (Your table has the Parent referencing the Child). This results in allowing a child, at any level, also being a parent. This can be chained as far as needed.
create table organization ( id integer primary key
, name text
, parent_id integer references organization(id)
, constraint parent_not_self check (parent_id <> id)
) ;
create unique index organization_not__mirrored
on organization( least(id,parent_id), greatest(id,parent_id) );
The check constraint enforces you first restriction and the unique index the second.
The following query shows the full hierarchy, along with the full path and the level.
with recursive hier(parent_id, child_id, path, level) as
( select id, parent_id, id::text, 1
from organization
where parent_id is null
union all
select o.id, o.parent_id,h.path || '->' ||o.id::text,h.level+1
from organization o
join hier h
on (o.parent_id = h.parent_id)
)
select * from hier;
See demo here.

Best Way to Store Hierarchal Data (Parent <- Child <- Grandchild)

I have a dataset that I need to work with that represents a part schematic for a large machine. I need to come up with an appropriate database schema for this dataset and am having trouble coming up with something to use that represents this data efficiently.
The top level components are the biggest "structures", and as you traverse down the hierarchy, the data represents inner components, or components that make up the inner components. For example, at the top level, there could be an engine as a level 1 component, and then a level 2 component is a piston, which goes into an engine, and a level 3 component could be a gasket that goes into the piston.
This representation is spread across a few hundred lines of a CSV file. There are 3 columns for IDs:
a master_id, which all components have
a parent_id, which all components have as well but their value varies based on the situation.
If the component in question is a level 1 part, the parent_id is its own master_id.
If the component in question is a level 2 part, the parent_id is the master_id of the level 1 component.
If the component in question is a level 3 part, the parent_id is the master_id of the level 2 component.
Basically, the parent id of any component is the master id of the component in the level above it. So lv1 parent is lv1 master (since it' s the root), lv2 parent is lv1 master, and lv3 is lv2 master. Also, multiple components can share a parent ID, meaning multiple lv2 parts, for example, can have the same parent ID.
a grandparent_id, which only level 3 components have (but not all lv3 components for some reason (idk I didn't make this data set)). If a component is lv3 and has a grandparent_id, the grandparent ID is a direct link back to the master ID of the lv1 component. Yeah, confusing right?
So here's an example. A lv3 component has a master_id of 700000137, a parent_id of 600000049, and a grandparent_id of 500000006. If we look at the component with a master of 600000049, we'll see that this is a lv2 component that has a parent id of 500000006, which is the master id of a lv1 component, and again is the grandparent of this lv3 component.
I prefaced this post saying I need to come up with a database representation for this data set (it has later use in a project but the data organization is the first step). I'm comfortable using PostgreSQL, so my initial thoughts were to make 3 tables, master, parent, and grandparent, where based on the key that I'm parsing out, I would insert this into the appropriate database and foreign key back to the other tables if there were parent or grandparent keys. But I realized this could get quite hairy especially since there could be multiple foreign keys linking back to a single master id, and I feel with this representation some data could possibly get repeated, which I obviously don't want happening.
My second thought was to use something like a python dictionary, where I essentially build out a tree like structure where the lv1 components are in the top level, the lv2 components in the second, etc. I could then convert the dictionary into JSON, since Python is nice that way, and store that json blob in the database. But, this JSON blob could potentially get REALLY big, though I guess that's just something I'd have to live with as the dataset grows. This part schematic I was given is only for one machine, so basically each entry in my database would be like
id | name | json
----------------------
1 | machine_a | JSON_BLOB_MACHINE_A
----------------------
2 | machine_b | JSON_BLOB_MACHINE_B
etc...
does my second approach seem better than trying to create separate tables that represent each part level and foreign keying back to parents? If there's a better way to do this with Postgres, I'd appreciate you explaining it. Otherwise, I'm probably going to go with the latter route. Thanks!

If you don't need to join parts in other machines, then I think a jsonb column for parts may be best. You can still index jsonb using GIN indexes and get really good performance from queries.
As long as the parts are not shared among many machines, which would make updating part properties across all machines tricky, then you probably OK.
This should make queries for a machine pretty effortless as majority of the data is self-contained.

How can I get one of my foreign key outputs to repeat in a merge transformation in SSIS?

I tried asking this question before and it seemed to have gotten swept under the rug.
First thing first, here are these two pictures to show the table structure and the current output I get in SSIS.
Table Diagram
Current Output
So in table three, there is only one entry. This entry (name) applies to the other foreign keys though. What I want the final output to look like is like my current output, but instead of the NULLS, there should just be ones.
I was able to get this far on my own through researching and learning about the merge transformations but I can't seem to find anything on manipulating the data in the way that I want.
I greatly appreciate any tips or advice you can offer.
EDIT: Since the images can't be seen apparently, I will try and describe them.
The table diagram has four tables, the top one in the waterfall has a primary key formed from the three foreign keys for the three different tables.
Trying to accomplish filling out this table in SSIS, my output has each foreign key id from the first two tables, but only one in the third table. The rest from the third foreign key are all NULLS. I believe this is because there is only one entry in that table for now, but this entry applies to all of the foreign key ids and so it should be repeating.
It should look like this:
ID1 ID2 ID3
1 1 1
2 2 1
3 3 1
But instead, I am only getting nulls in the ID3 field after the first record. How do I make the single id repeat in ID3?
EDIT 2: Some additional screenshots of my data flow and merge transformation as requested.
[![SSIS Dataflow][3]][3]

After working on this for a few weeks, and with a tips from a colleague, a solution to this question was found. Surprisingly, it was quite simple and I'm slightly shocked that no one on here could provide the answer.
The solution was simply this; Using a data source, write the following SQL code in the data access mode (SQL Command):
SELECT a.T1ID,
b.T2ID,
c.T3ID
FROM Table1 AS a join
Table2 AS b
On a.T1ID = b.T2ID,
Table3 AS c
ORDER BY a.[T1ID] ASC

If Table3 will always have just a single row, the simplest solution would be to use an Execute SQL task to save the T3id to a variable (Control Flow), then use a Derived Column task (Data Flow) to add the variable as a new column.
If that won't work for you (or your data), you can take a look here to see how to fudge the Merge Join task to do what you want.

Database design: ordered set

task_set is a database with two colums(id, task):
id task
1 shout
2 bark
3 walk
4 run
assume there is another table with two colums(employee,task_order)
task_order is an ordered set of tasks, for example (2,4,3,1)
generally, the task_order is unchanged, but sometimes it may be inserted or deleted, e.g, (2,4,9,3,1) ,(2,4,1)
how to design such a database? I mean how to realize the ordered set?

If, and ONLY if you don't need to search inside the task_set column, or update one of it's values (i.e change 4,2,3 to 4,2,1), keeping that column as a delimited string might be an easy solution.
However, if you ever plan on searches or updates for specific values inside the task_set, then you better normalize that structure into a table that will hold employee id, task id, and task order.

Self-Join in SSAS

I have a table like this:
PersonId Job City ParentId
--------- ---- ----- --------
101 A C1 105
102 B C2 101
103 A C1 102
Then I need to getting the association rules between Person's job and parent's city.
I've used self-referencing and define case/nested tables but at the result of dependency graph there is no difference between person's job or city and parent's job or city!
What is the best solution for this problem in SSAS project?

SSAS Hierarchies should address your problem. However, it's tough to say exactly how to use them without knowing more about your particular situation.

I've run into a similar need in my own work. So far I have only investigated
SQL Server Analysis Services Tabular models. I will update this answer with more information once I have finished looking into Multidimensional models.
Per Relationships (SSAS Tabular), SSAS Tabular models do not support self-joins (see below for the relevant quote). What you end up having to do is break out the group of parent elements and each level of their child elements as separate model tables. Once you have the model tables, you can use the diagram view to draw the relevant relationships.
Self-joins and loops
Self-joins are not permitted in tabular model tables. A self-join is a
recursive relationship between a table and itself. Self-joins are
often used to define parent-child hierarchies. For example, you could
join an Employees table to itself to produce a hierarchy that shows
the management chain at a business.
The model designer does not allow loops to be created among
relationships in a model. In other words, the following set of
relationships is prohibited. +
Table 1, column a to Table 2, column f
Table 2, column f to Table 3, column n
Table 3, column n to Table 1, column a
If you try to create a relationship that would result in a loop being
created, an error is generated.

Not sure exactly what you are trying to acheive but the following SQL would be a good starting point:
select c.PersonId , p.City
from ptable c, ptable p
where c.ParentId = p.PersonId

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight