Ideas/theories on grouping in SQL Server - sql-server

I am looking for some help/ideas on how to structure (table wise) infinitely nest groups in SQL.
EX.
group1 will contain a,b,c
group2 will contain d,e,f
group3 will contain h,i,j
group4 will contain k.l.m
groupA will contain group1,group2
groupB will contain group3
groupA1 will contain groupA,groupB,group4
each lowest level group will refer to a list of scans in a different table (in this example lets say group1 group2 group3 and group4 are the lowest level)
this should be able to support an infinite number of groups
I know this is vague but i am trying to find out how to structure and manage something like this...
I am trying for both tables and queries. So far I have this:
Scan Table
((uniqueID),barcode,user,date,group)
Groups Table
(groupID,groupName,groupRef)
but i am having trouble "Creating" GroupA
in terms of queries i would need to know what are the lower level groups and get a list of all items in a group.

Based on your example it looks like a parent-child structure would do it:
CREATE TABLE #ParentChild (Parent VARCHAR(30), Child VARCHAR(30))
INSERT INTO #ParentChild
VALUES
('group1','a'),
('group1','b'),
('group1','c'),
('group2','d'),
('group2','e'),
('group2','f'),
('group3','h'),
('group3','i'),
('group3','j'),
('group4','k'),
('group4','l'),
('group4','m'),
('groupA','group1'),
('groupA','group2'),
('groupB','group3'),
('groupA1','groupA'),
('groupA1','groupB'),
('groupA1','group4')
This will allow you to store an (almost) infinite number of groups. The "limit" depends on the SQL Server version (e.g. SQL Server 2008R2: File size (data): 16 terabytes) which should be good enough ;-)
As for your specific questions:
--"what are the lower level groups?"
--"i.e., give me all the groups, except those that contain another group
SELECT Parent
FROM #ParentChild
EXCEPT
SELECT t1.Parent
FROM #ParentChild t1
INNER JOIN #ParentChild t2
ON t1.Child = t2.Parent
--"get a list of all items in a group"
SELECT Child
FROM #ParentChild
WHERE Parent = #Group
Another option would be to store the data using the hierarchyid hierarchyid data type

You're talking about Hierarchical Data. Sql Server has built-in support for this. You should read through this article:
http://msdn.microsoft.com/en-us/library/bb677173.aspx
That article discusses using the new hierarchyid type, as well as Parent/Child alternative.

Related

how to use case to combine spelling variations of an item in a table in sql

I have two SQL tables, with deviations of the spellings of department names. I'm needing to combine those using case to create one spelling of the location name. Budget_Rc is the only one with same spelling in both tables. Here's an example:
Table-1 table-2
Depart_Name Room_Loc Depart_Name Room_Loc
1. Finance_P1 P144 1. Fin_P1 P1444
2. Budget_Rc R2c 2. Budget_Rc R2c
3. Payroll_P1_2 P1144 3. Finan_P1_1 P1444
4. PR_P1_2 P1140
What I'm needing to achieve is for the department to be 1 entity, with one room location. These should show as one with one room location in the main table (Table-1).
Depart_Name Room_Loc
1. Finance_P1 F144
2. Budget_Rc R2c
3. Payroll_P1_2 P1144
Many many thanks in advance!
I'd first try a
DECLARE #AllSpellings TABLE(DepName VARCHAR(100));
INSERT INTO #AllSpellings(DepName)
SELECT Depart_Name FROM tbl1 GROUP BY Depart_Name
UNION
SELECT Depart_Name FROM tbl2 GROUP BY Depart_Name;
SELECT DepName
FROM #AllSpellings
ORDER BY DepName
This will help you to find all existing values...
Now you create a clean table with all Departments with an IDENTITY ID-column.
Now you have two choices:
In case you cannot change the table's layout
Use the upper select-statement to find all existing entries and create a mapping table, which you can use as indirect link
Better: real FK-relation
Replace the department's names with the ID and let this be a FOREIGN KEY REFERENCE
Can more than one department be in a Room?
If so then its harder and you can't really write a dynamic query without having a list of all the possible one to many relationships such as Finance has the department key of FIN and they have these three names. You will have to define that table to make any sort of relationship.
For instance:
DEPARTMENT TABLE
ID NAME ROOMID
FIN FINANCE P1444
PAY PAYROLL P1140
DEPARTMENTNAMES
ID DEPARTMENTNAME DEPARTMENTID
1 Finance_P1 FIN
2 Payroll_P1_2 PAY
3 Fin_P1 FIN
etc...
This way you can correctly match up all the departments and their names. I would use this match table to get the data organized and normalized before then cleaning up all your data and then just using a singular department name. Its going to be manual but should be one time if you then clean up the data.
If the room is only ever going to belong to one department you can join on the room which makes it a lot easier.
Since there does not appear any solid rule for mapping department names from table one to table two, the way I would approach this is to create a mapping table. This mapping table will relate the two department names.
mapping
Depart_Name_1 | Depart_Name_2
-----------------------------
Finance_P1 | Fin_P1
Budget_Rc | Budget_Rc
Payroll_P1_2 | PR_P1_2
Then, you can do a three-way join to bring everything into a single result set:
SELECT t1.*, t2.*
FROM table1 t1
INNER JOIN mapping m
ON t1.Depart_Name = m.Depart_Name_1
INNER JOIN table2 t2
ON m.Depart_Name_2 = t2.Depart_Name
It may seem tedious to create the mapping table, but it may be unavoidable here. If you can think of a way to automate it, then this could cut down on the time spent there.

Is it possible in SQl Server to create a self-maintaing table with self-references

I'm using Azure's SQL Database & MS SQL Server Management Studio and I wondering if its possible to create a self-referencing table that maintains itself.
I have three tables: Race, Runner, Names. The Race table includes the following columns:
Race_ID (PK)
Race_Date
Race_Distance
Number_of_Runners
The second table is Runner. Runner contains the following columns:
Runner_Id (PK)
Race_ID (Foreign Key)
Name_ID
Finish_Position
Prior_Race_ID
The Names Table includes the following columns:
Full Name
Name_ID
The column of interest is Prior_Race_ID in the Runner Table. I'd like to automatically populate this field via a Trigger or Stored Procedure, but I'm not sure if its possible to do so and how to go about it. The goal would be to be able to get all a runners races very quickly and easily by traversing the Prior_Race_ID field.
Can anyone point me to a good resource or references that explains if and how this is achievable. Also, if there is a preferred approach to achieving my objective please do share that.
Thanks for your input.
Okay, so we want, for each Competitor (better name than Names?), to find their two most recent races. You'd write a query like this:
SELECT
* --TODO - Specific columns
FROM
(SELECT
*, --TODO - Specific columns
ROW_NUMBER() OVER (PARTITION BY n.Name_ID ORDER BY r.Race_Date DESC) rn
FROM
Names n
inner join
Runners rs
on
n.Name_ID = rs.Name_ID
inner join
Races r
on
rs.Race_ID = r.Race_ID
) t
WHERE
t.rn in (1,2)
That should produce two rows per competitor. If needed, you can then PIVOT this data if you want a single row per competitor, but I'd usually leave that up to the presentation layer, rather than do it in SQL.
And so, no, I wouldn't even have a Prior_Race_ID column. As a general rule, don't store data that can be calculated - that just introduces opportunities for that data to be incorrect compared to the base data.
run the following sql(The distinct here is to avoid that a runner has more than one race at a same day):
update runner r1
set r1.prior_race_id =
(
select distinct race.race_id from runner, race where runner.race_id = race.race_id and runner.runner_id = r1.runner_id group by runner.runner_id having race.race_date = max(race.race_date)
)

Database schema for end user report designer

I'm trying to implement a feature whereby, apart from all the reports that I have in my system, I will allow the end user to create simple reports. (not overly complex reports that involves slicing and dicing across multiple tables with lots of logic)
The user will be able to:
1) Select a base table from a list of allowable tables (e.g., Customers)
2) Select multiple sub tables (e.g., Address table, with AddressId as the field to link Customers to Address)
3) Select the fields from the tables
4) Have basic sorting
Here's the database schema I have current, and I'm quite certain it's far from perfect, so I'm wondering what else I can improve on
AllowableTables table
This table will contain the list of tables that the user can create their custom reports against.
Id Table
----------------------------------
1 Customers
2 Address
3 Orders
4 Products
ReportTemplates table
Id Name MainTable
------------------------------------------------------------------
1 Customer Report #2 Customers
2 Customer Report #3 Customers
ReportTemplateSettings table
Id TemplateId TableName FieldName ColumnHeader ColumnWidth Sequence
-------------------------------------------------------------------------------
1 1 Customer Id Customer S/N 100 1
2 1 Customer Name Full Name 100 2
3 1 Address Address1 Address 1 100 3
I know this isn't complete, but this is what I've come up with so far. Does anyone have any links to a reference design, or have any inputs as to how I can improve this?
This needs a lot of work even though it’s relatively simple task. Here are several other columns you might want to include as well as some other details to take care of.
Store table name along with schema name or store schema name in additional column, add column for sorting and sort order
Create additional table to store child tables that will be used in the report (report id, schema name, table name, column in child table used to join tables, column in parent table used to join tables, join operator (may not be needed if it always =)
Create additional table that will store column names (report id, schema name, table name, column name, display as)
There are probably several more things that will come up after you complete this but hopefully this will get you in the right direction.

Analysis service create recursive hierarchy

I have following table:
CatId CatName parent CatId
1 Category 1 NULL
2 Category 2 NULL
3 SubCat 1 1
4 SubSubCat 1 3
5 SSSubCat 1 4
In Analysis Service I want to create Hierarchy in dimension such that it allows me to drill down till N Level.. Currently I am able to do it only 2 levels.. Category and Sub Category.. but I would like to go till N level if N level is not possible atleast till 4-5 levels.
The type of Hierarchy you appear to be attempting is called a Parent-Child Dimension. SSAS will use recursive joins to "explode" your data into a tree shape.
But your table as you describe it is a little confusing. So I am offering a solution that requires you to rethink your table a little. A classic Parent-Child will have for each node (record) in the hierarchy:
A key (ID) for the node
The literal text (Name) for the node
A foreign key called the parent
In your example, the column labelled "parent" appears to be superfluous. The last column in your example (called "CatID") is what the Parent of the dimension usually looks like. If you consider that each record in the table is a "child", the parent of the child acts as a pointer back to some record that owns or contains that record. At the highest level in the hierarchy, records will have no Parent so the Parent column is set to NULL.
Rename the second "CatID" to "parent" and remove or rename the original column called "Parent" (you don't need it). If you tweak your table as I suggest, you should check that the highest level is correct by running the following query:
SELECT CatID, CatName, parent FROM mytable WHERE (parent IS NULL)
Then to get the next level down run the following query:
SELECT HighestLevel.CatID, HighestLevel.CatName, HighestLevel.parent, Level2.CatID AS Level2ID, Level2.CatName AS Level2Name
FROM mytable AS HighestLevel
INNER JOIN mytable AS Level2 ON HighestLevel.CatID = Level2.parent
WHERE (HighestLevel.parent IS NULL)
Note the recursive INNER JOIN. Run at least one more query to view another level down to verify that the keys are "expanding" the way you expect:
SELECT HighestLevel.CatID, HighestLevel.CatName, HighestLevel.parent, Level2.CatID AS Level2ID, Level2.CatName AS Level2Name, Level3.CatID AS Level3ID, Level3.CatName AS Level3Name
FROM mytable AS HighestLevel
INNER JOIN mytable AS Level2 ON HighestLevel.CatID = Level2.parent
INNER JOIN mytable AS Level3 ON Level2.CatID = Level3.parent
WHERE (HighestLevel.parent IS NULL)
You could keep adding levels as necessary to convince yourself that the data is correct. This is essentially what SSAS is doing when it builds a Parent-Child hierarchy.
Finally, you'll add this table to the DSV and create a Parent-Child Dimension. That's a bit more complicated and this looks like a great starter article. SSAS will keep adding levels as necessary until it runs out of data.
In AdventureWorks, the Employee dimension has an example of this. Assuming your category is on your fact table:
Set your ParentCatID to be a FK of CatID in the DSV
Reference your Parent attribute as the Parent Attribute type in the Dimension Hierarcy manager
Add the attribute into your hierarchy
The nested levels should be able to be browsed in your Category Hierarchy.

Avoid SQL Cursor in this scenario

I have inherited a system which seemingly requires me to use a cursor or while loop.
Given the below tables, I would like to get the names of the attendees e.g
BillBobJaneJill
Attendees
SourceTable|SourceTableIdBoys |1Boys |2Girls |2Girls |1
Boys
Id|FirstName1 |Bill2 |Bob
Girls
Id|FirstName1 |Jill2 |Jane
Note, the system doesn't actually use Attendees,Boys & Girls but rather uses Contracts, Orders and other such entities etc but it was easier\simpler to represent in this form.
There may be loads more lookup tables than just "boy" and "girl" so
Is there anyway I can achieve this by not using cursors or other row based operations.
If I understand this query should work:
SELECT FirstName
FROM Attendees
join Boys on id = SourceTableId
WHERE SourceTable = 'Boys'
union all
SELECT FirstName
FROM Attendees
join Girls on id = SourceTableId
WHERE SourceTable = 'Girls'
A union is probably the only way you're going to do this, probably encapsulated in a view. If you can get a list of the tables then you could write a code generator that generates the view. If necessary put the view in a different database or schema on the same server if the vendor won't allow you to put it in the application DB.
Can you programatically identify the tables and columns you need or get a list from somewhere?

Resources