How to fetch an object graph at once? - database

I'm reading a book, where the author talks about fetching an row + all linked parent rows in one step. Like fetching an order + all it's items all at once. Okay, sounds nice, but really: I've never seen an possibility in SQL to ask for - lets say - one order + 100 items? How would this record set look like? Would I get 101 rows with merged fields of both the order and the item table, where 100 rows have a lot of NULL values for the order fields, while one row has a lot of NULL values for the item fields? Is that the way to go? Or is there something much cooler? I mean... I never heard of fetching arrays onto a field?

A simple JOIN would do the trick:
SELECT o.*
, i.*
FROM orders o
INNER JOIN order_items i
ON o.id = i.order_id
The will return one row for each row in order_items. The returned rows consist of all fields from the orders table, and concatenated to that, all fields from the order_items table (quite literally, the records from the tables are joined, that is, they are combined by record concatenation)
So if orders has (id, order_date, customer_id) and order_items has (order_id, product_id, price) the result of the statement above will consist of records with (id, order_date, customer_id, order_id, product_id, price)
One thing you need to be aware of is that this approach breaks down whenever there are two distinct 'detail' tables for one 'master'. Let me explain.
In the orders/order_items example, orders is the master and order_items is the detail: each row in order_items belongs to, or is dependent on exactly one row in orders. The reverse is not true: one row in the orders table can have zero or more related rows in the order_items table. The join condition
ON o.id = i.order_id
ensures that only related rows are combined and returned (leaving out the condition would retturn all possible combinations of rows from the two tables, assuming the database would allow you to omit the join condition)
Now, suppose you have one master with two details, for example, customers as master and customer_orders as detail1 and customer_phone_numbers. Suppose you want to retrieve a particular customer along with all is orders and all its phone numbers. You might be tempted to write:
SELECT c.*, o.*, p.*
FROM customers c
INNER JOIN customer_orders o
ON c.id = o.customer_id
INNER JOIN customer_phone_numbers p
ON c.id = p.customer_id
This is valid SQL, and it will execute (asuming the tables and column names are in place)
But the problem is, is that it will give you a rubbish result. Assuming you have on customer with two orders (1,2) and two phone numbers (A, B) you get these records:
customer-data | order 1 | phone A
customer-data | order 2 | phone A
customer-data | order 1 | phone B
customer-data | order 2 | phone B
This is rubbish, as it suggests there is some relationship between order 1 and phone numbers A and B and order 2 and phone numbers A and B.
What's worse is that these results can completely explode in numbers of records, much to the detriment of database performance.
So, JOIN is excellent to "flatten" a hierarchy of items of known depth (customer -> orders -> order_items) into one big table which only duplicates the master items for each detail item. But it is awful to extract a true graph of related items. This is a direct consequence of the way SQL is designed - it can only output normalized tables without repeating groups. This is way object relational mappers exist, to allow object definitions that can have multiple dependent collections of subordinate objects to be stored and retrieved from a relational database without losing your sanity as a programmer.

This is normally done through a JOIN clause. This will not result in many NULL values, but many repeated values for the parent row.
Another option, if your database and programming language support it, it to return both result sets in one connection - one select for the parent row another for the related rows.

Related

How to display the total number of products in 2 warehouses?

I have this database:
I have a problem, how to show the total number of products in 2 warehouse, if the products in warehouse 1 don't have in warehouse2 and vice versa
I have tried like this
select warehouse1.good_count+warehouse2.good_count
from warehouse1, warehouse2
join goods on (warehouse1.good_id = goods.id)
and (warehouse2.good_id=goods.id);
but it doesn't work
Either you misunderstood what your teacher wants, or your teacher asked the question in a wrong manner.
There should be only one WAREHOUSE table with additional column (e.g. WAREHOUSE_TYPE_ID whose values would then be "remote" or "nearby" or whichever comes next), possibly making a foreign key constraint to WAREHOUSE_TYPE table.
You'd then simply
select t.warehouse_type_name,
sum(w.good_count) summary
from warehouse w join warehouse_type t on t.warehouse_type_id = w.warehouse_type_id
group by t.warehouse_type_name;
I read comments you wrote. The fact that these are physically different warehouses (one in the city and one in its outskirts) doesn't change the fact that data model is wrong. Data they represent should be contained in the same table, a single table (as I explained earlier).
Because, what will you do when company builds yet another warehouse in some other city? Add another table? WRONG!
Anyway, if you insist, something like this might be what you're looking for in this wrong data model of yours:
select g.name,
sum(a.good_count + b.good_count) total_count
from goods g left join warehouse1 a on a.good_id = g.id
left join warehouse2 b on b.good_id = g.id
group by g.name

SqlServer Many to Many AND

I have 3 (hypothetical) tables.
Photos (a list of photos)
Attributes (things describing the photos)
PhotosToAttributes (a table to link the first 2)
I want to retrieve the Names of all the Photos that have a list of attributes.
For example, all photos that have both dark lighting and are portraits (AttributeID 1 and 2). Or, for example, all photos that have dark lighting, are portraits and were taken at a wedding (AttributeID 1 and 2 and 5). Or any arbitrary number of attributes.
The scale of the database will be maybe 10,000 rows in Photos, 100 Rows in Attributes and 100,000 rows in PhotosToAttributes.
This question: SQL: Many-To-Many table AND query is very close. (I think.) I also read the linked answers about performance. That leads to something like the following. But, how do I get Name instead of PhotoID? And presumably my code (C#) will build this query and adjust the attribute list and count as necessary?
SELECT PhotoID
FROM PhotosToAttributes
WHERE AttributeID IN (1, 2, 5)
GROUP by PhotoID
HAVING COUNT(1) = 3
I'm a bit database illiterate (it's been 20 years since I took a database class); I'm not even sure this is a good way to structure the tables. I wanted to be able to add new attributes and photos at will without changing the data access code.
It is probably a reasonable way to structure the database. An alternate would be to keep all the attributes as a delimited list in a varchar field, but that would lead to performance issues as you search the field.
Your code is close, to take it to the final step you should just join the other two tables like this:
Select p.Name, p.PhotoID
From Photos As p
Join PhotosToAttributes As pta On p.PhotoID = pta.PhotoID
Join Attributes As a On pta.AttributeID = a.AttributeID
Where a.Name In ('Dark Light', 'Portrait', 'Wedding')
Group By p.Name, p.PhotoID
Having Count(*) = 3;
By joining the Attributes table like that it means you can search for attributes by their name, instead of their ID.
For first create view from your joins:
create view vw_PhotosWithAttributes
as
select
p.PhotoId,
a.AttributeID,
p.Name PhotoName,
a.Name AttributeName
from Photos p
inner join PhotosToAttributes pa on p.PhotoId = pa.PhotoId
inner join Attributes a on a.AttributeID = pa.AttributeID
You can easy ask for attribute, name, id but don't forget to properly index field.

how to use case to combine spelling variations of an item in a table in sql

I have two SQL tables, with deviations of the spellings of department names. I'm needing to combine those using case to create one spelling of the location name. Budget_Rc is the only one with same spelling in both tables. Here's an example:
Table-1 table-2
Depart_Name Room_Loc Depart_Name Room_Loc
1. Finance_P1 P144 1. Fin_P1 P1444
2. Budget_Rc R2c 2. Budget_Rc R2c
3. Payroll_P1_2 P1144 3. Finan_P1_1 P1444
4. PR_P1_2 P1140
What I'm needing to achieve is for the department to be 1 entity, with one room location. These should show as one with one room location in the main table (Table-1).
Depart_Name Room_Loc
1. Finance_P1 F144
2. Budget_Rc R2c
3. Payroll_P1_2 P1144
Many many thanks in advance!
I'd first try a
DECLARE #AllSpellings TABLE(DepName VARCHAR(100));
INSERT INTO #AllSpellings(DepName)
SELECT Depart_Name FROM tbl1 GROUP BY Depart_Name
UNION
SELECT Depart_Name FROM tbl2 GROUP BY Depart_Name;
SELECT DepName
FROM #AllSpellings
ORDER BY DepName
This will help you to find all existing values...
Now you create a clean table with all Departments with an IDENTITY ID-column.
Now you have two choices:
In case you cannot change the table's layout
Use the upper select-statement to find all existing entries and create a mapping table, which you can use as indirect link
Better: real FK-relation
Replace the department's names with the ID and let this be a FOREIGN KEY REFERENCE
Can more than one department be in a Room?
If so then its harder and you can't really write a dynamic query without having a list of all the possible one to many relationships such as Finance has the department key of FIN and they have these three names. You will have to define that table to make any sort of relationship.
For instance:
DEPARTMENT TABLE
ID NAME ROOMID
FIN FINANCE P1444
PAY PAYROLL P1140
DEPARTMENTNAMES
ID DEPARTMENTNAME DEPARTMENTID
1 Finance_P1 FIN
2 Payroll_P1_2 PAY
3 Fin_P1 FIN
etc...
This way you can correctly match up all the departments and their names. I would use this match table to get the data organized and normalized before then cleaning up all your data and then just using a singular department name. Its going to be manual but should be one time if you then clean up the data.
If the room is only ever going to belong to one department you can join on the room which makes it a lot easier.
Since there does not appear any solid rule for mapping department names from table one to table two, the way I would approach this is to create a mapping table. This mapping table will relate the two department names.
mapping
Depart_Name_1 | Depart_Name_2
-----------------------------
Finance_P1 | Fin_P1
Budget_Rc | Budget_Rc
Payroll_P1_2 | PR_P1_2
Then, you can do a three-way join to bring everything into a single result set:
SELECT t1.*, t2.*
FROM table1 t1
INNER JOIN mapping m
ON t1.Depart_Name = m.Depart_Name_1
INNER JOIN table2 t2
ON m.Depart_Name_2 = t2.Depart_Name
It may seem tedious to create the mapping table, but it may be unavoidable here. If you can think of a way to automate it, then this could cut down on the time spent there.

SQL Server FullText Search with Weighted Columns from Previous One Column

In the database on which I am attempting to create a FullText Search I need to construct a table with its column names coming from one column in a previous table. In my current implementation attempt the FullText indexing is completed on the first table Data and the search for the phrase is done there, then the second table with the search results is made.
The schema for the database is
**Players**
Id
PlayerName
Blacklisted
...
**Details**
Id
Name -> FirstName, LastName, Team, Substitute, ...
...
**Data**
Id
DetailId
PlayerId
Content
DetailId in the table Data relates to Id in Details, and PlayerId relates to Id in Players. If there are 1k rows in Players and 20 rows in Details, then there are 20k rows in Data.
WITH RankedPlayers AS
(
SELECT PlayerID, SUM(KT.[RANK]) AS Rnk
FROM Data c
INNER JOIN FREETEXTTABLE(dbo.Data, Content, '"Some phrase like team name and player name"')
AS KT ON c. DataID = KT.[KEY]
GROUP BY c.PlayerID
)
…
Then a table is made by selecting the rows in one column. Similar to a pivot.
…
SELECT rc.Rnk,
c.PlayerID,
PlayerName,
TeamID,
…
(SELECT Content FROM dbo.Data data WHERE DetailID = 1 AND data.PlayerID = c.PlayerID) AS [TeamName],
…
FROM dbo.Players c
JOIN RankedPlayers rc ON c. PlayerID = rc. PlayerID
ORDER BY rc.Rnk DESC
I can return a ranked table with this implementation, the aim however is to be able to produce results from weighted columns, so say the column Playername contributes to the rank more than say TeamName.
I have tried making a schema bound view with a pivot, but then I cannot index it because of the pivot. I have tried making a view of that view, but it seems the metadata is inherited, plus that feels like a clunky method.
I then tried to do it as a straight query using sub queries in the select statement, but cannot due to indexing not liking sub queries.
I then tried to join multiple times, again the index on the view doesn't like self-referencing joins.
How to do this?
I have come across this article http://developmentnow.com/2006/08/07/weighted-columns-in-sql-server-2005-full-text-search/ , and other articles here on weighted columns, however nothing as far as I can find addresses weighting columns when the columns were initially row data.
A simple solution that works really well. Put weight on the rows containing the required IDs in another table, left join that table to the table to which the full text search had been applied, and multiply the rank by the weight. Continue as previously implemented.
In code that comes out as
DECLARE #Weight TABLE
(
DetailID INT,
[Weight] FLOAT
);
INSERT INTO #Weight VALUES
(1, 0.80),
(2, 0.80),
(3, 0.50);
WITH RankedPlayers AS
(
SELECT PlayerID, SUM(KT.[RANK] * ISNULL(cw.[Weight], 0.10)) AS Rnk
FROM Data c
INNER JOIN FREETEXTTABLE(dbo.Data, Content, 'Karl Kognition C404') AS KT ON c.DataID = KT.[KEY]
LEFT JOIN #Weight cw ON c.DetailID = cw.DetailID
GROUP BY c.PlayerID
)
SELECT rc.Rnk,
...
I'm using a temporary table here for evidence of concept. I am considering adding a column Weights to the table Details to avoid an unnecessary table and left join.

How can I convert a view containing a START WITH...CONNECT BY sub-query to SQL Server?

I am trying to convert a view from an Oracle RDBMS to SQL Server. The view looks like:
create or replace view user_part_v
as
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id)
Having read about recursive CTE's and also about their use in sub-queries, my best guess at translating the above into SQL Server syntax is:
create view user_part_v
as
with region_structure(region_id, parent_region_id) as (
select region_id
, parent_region_id
from region_relation
where parent_region_id = users.region_id
union all
select r.region_id
, r.parent_region_id
from region_relation r
join region_structure rs on rs.parent_region_id = r.region_id
)
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(select region_id from region_structure)
Obviously this gives me an error about the reference to users.region_id in the CTE definition.
How can I achieve the same result in SQL Server as I get from the Oracle view?
Background
I am working on the conversion of a system from running on an Oracle 11g RDMS to SQL Server 2008. This system is a relatively large Java EE based system, using JPA (Hibernate) to query from the database.
Many of the queries use the above mentioned view to restrict the results returned to those appropriate for the current user. If I cannot convert the view directly then the conversion will be much harder as I will need to change all of the places where we query the database to achieve the same result.
The tables referenced by this view have a structure similar to:
USERS
ID
REGION_ID
REGION
ID
NAME
REGION_RELATIONSHIP
PARENT_REGION_ID
REGION_ID
PART
ID
PARTNO
DESCRIPTION
PART_REGION
PART_ID
REGION_ID
So, we have regions, arranged into a hierarchy. A user may be assigned to a region. A part may be assigned to many regions. A user may only see the parts assigned to their region. The regions reference various geographic regions:
World
Europe
Germany
France
...
North America
Canada
USA
New York
...
If a part, #123, is assigned to the region USA, and the user is assigned to the region New York, then the user should be able to see that part.
UPDATE: I was able to work around the error by creating a separate view that contained the necessary data, and then have my main view join to this view. This has the system working, but I have not yet done thorough correctness or performance testing yet. I am still open to suggestions for better solutions.
I reformatted your original query to make it easier for me to read.
create or replace view user_part_v
as
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(
select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id
);
Let's examine what's going on in this query.
select part_region.part_id, users.id as users_id
from part_region, users
This is an old-style join where the tables are cartesian joined and then the results are reduced by the subsequent where clause(s).
where part_region.region_id in(
select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id
);
The sub-query that's using the connect by statement is using the region_id from the users table in outer query to define the starting point for the recursion.
Then the in clause checks to see if the region_id for the part_region is found in the results of the recursive query.
This recursion follows the parent-child linkages given in the region_relation table.
So the combination of doing an in clause with a sub-query that references the parent and the old-style join means that you have to consider what the query is meant to accomplish and approach it from that direction (rather than just a tweaked re-arrangement of the old query) to be able to translate it into a single recursive CTE.
This query also will return multiple rows if the part is assigned to multiple regions along the same branch of the region heirarchy. e.g. if the part is assigned to both North America and USA a user assigned to New York will get two rows returned for their users_id with the same part_id number.
Given the Oracle view and the background you gave of what the view is supposed to do, I think what you're looking for is something more like this:
create view user_part_v
as
with user_regions(users_id, region_id, parent_region_id) as (
select u.users_id, u.region_id, rr.parent_region_id
from users u
left join region_relation rr on u.region_id = rr.region_id
union all
select ur.users_id, rr.region_id, rr.parent_region_id
from user_regions ur
inner join region_relation rr on ur.parent_region_id = rr.region_id
)
select pr.part_id, ur.users_id
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
Note that I've added the users_id to the output of the recursive CTE, and then just done a simple inner join of the part_region table and the CTE results.
Let me break down the query for you.
select u.users_id, u.region_id, rr.parent_region_id
from users u
left join region_relation rr on u.region_id = rr.region_id
This is the starting set for our recursion. We're taking the region_relation table and joining it against the users table, to get the starting point for the recursion for every user. That starting point being the region the user is assigned to along with the parent_region_id for that region. A left join is done here and the region_id is pulled from the user table in case the user is assigned to a top-most region (which means there won't be an entry in the region_relation table for that region).
select ur.users_id, rr.region_id, rr.parent_region_id
from user_regions ur
inner join region_relation rr on ur.parent_region_id = rr.region_id
This is the recursive part of the CTE. We take the existing results for each user, then add rows for each user for the parent regions of the existing set. This recursion happens until we run out of parents. (i.e. we hit rows that have no entries for their region_id in the region_relationship table.)
select pr.part_id, ur.users_id
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
This is the part where we grab our final result set. Assuming (as I do from your description) that each region has only one parent (which would mean that there's only one row in region_relationship for each region_id), a simple join will return all the users that should be able to view the part based on the part's region_id. This is because there is exactly one row returned per user for the user's assigned region, and one row per user for each parent region up to the heirarchy root.
NOTE:
Both the original query and this one do have a limitation that I want to make sure you are aware of. If the part is assigned to a region that is lower in the heirarchy than the user (i.e. a region that is a descendent of the user's region like the part being assigned to New York and the user to USA instead of the other way around), the user won't see that part. The part has to be assigned to either the user's assigned region, or one higher in the region heirarchy.
Another thing is that this query still exhibits the case I mentioned above about the original query, where if a part is assigned to multiple regions along the same branch of the heirarchy that multiple rows will be returned for the same combination of users_id and part_id. I did this because I wasn't sure if you wanted that behavior changed or not.
If this is actually an issue and you want to eliminate the duplicates, then you can replace the query below the CTE with this one:
select p.part_id, u.users_id
from part p
cross join users u
where exists (
select 1
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
where pr.part_id = p.part_id
and ur.users_id = u.users_id
);
This does a cartesian join between the part table and the users table and then only returns rows where the combination of the two has at least one row in the results of the subquery, which are the results that we are trying to de-duplicate.

Resources