Multi join issue - sql-server

*EDIT** Thanks for all the input, and sorry for late reply. I have been away during the weekend without access to internet. I realized from the answers that I needed to provide more information, so people could understand the problem more throughly so here it comes:
I am migrating an old database design to a new design. The old one is a mess and very confusing ( I haven't been involved in the old design ). I've attached a picture of the relevent part of the old design below:
The table called Item will exist in the new design as well, and it got all columns that I need in the new design as well except one and it is here my problem begin. I need the column which I named 'neededProp' to be associated( with associated I mean like a column in the new Item table in the new design) with each new migrated row from Item.
So for a particular eid in table Environment there can be n entries in table Item. The "corresponding" set exists in table Room. The only way to know which rows that are associated in Item and Room are with the help of the columns "itemId" and "objectId" in the respective table. So for example for a particular eid there might be 100 entries in Item and Room and their "itemId" and "objectId" can be values from 1 to 100, so that column is only unique for a particular eid ( or baseSeq which it is called in table BaseFile).
Basically you can say that the tables Environment and BaseFile reminds of each other and the tables Item and Room reminds of each other. The difference is that some tables lack some columns and other may have some extra. I have no idea why it is designed like this from the beginning.
My question is if someone can help me with creating a query so that I can be able to find out the proper "neededProp" for each row in the Item-table so I can get that data into the new design?
*OLD-PART**This might be a trivial question but I can't get it to work as I want. I want to join a few tables as in the sql-statement below. If I start like this and run this query
select * from Environment e
join items ei on e.eid = ei.eid
I get like 400000 rows which is what I want. However if I add one more line so it looks like this:
select * from Environment e
join items ei on e.eid= ei.eid
left join Room r on e.roomnr = r.roomobjectnr
I get an insane amount of rows so there must be some multiplication going on. I want to get the same amount of rows ( like 400000 in this case ) even after joining the third table. Is that possible somehow? Maybe like creating a temporary view with the first 2 rows.
I am using MSSQL server.

So without knowing what data you have in your second query it's very difficult to say exactly how to write this out, and you're likely having a problem where there's an additional column that you are joining to in Rooms that perhaps you have forgotten such as something indicating a facility or hallway perhaps where you have multiple 'Room 1' entries as an example.
However, to answer your question regarding another way to write this out without using a temp table I've crufted up the below as an example of using a common table expression which will only return one record per source row.
;WITH cte_EnvironmentItems AS (
SELECT *
FROM Environment E
INNER JOIN Items I ON I.eid = E.eid
), cte_RankedRoom AS (
SELECT *
,ROW_NUMBER() OVER (ORDER BY R.UpdateDate DESC) [RN]
FROM Room R
)
SELECT *
FROM cte_EnvironmentItems E
LEFT JOIN cte_RankedRoom R ON E.roomnr = R.roomobjectnr
AND R.RN = 1

btw,do you want column from room table.if no then
select * from Environment e
join items ei on e.eid= ei.eid
where e.roomnr in (select r.roomobjectnr from Room r )
else
select * from Environment e
join items ei on e.eid= ei.eid
left join (select distinct roomobjectnr from Room) r on e.roomnr = r.roomobjectnr

Related

Can I use Dimension table ‘startdate’ instead Fact table?

I’m joining dim table & fact table with start date. Can I use start date from dim table instead fact? If so why we need to use fact table start date? Below is the example:
Select count(*)
from dim_table d
Inner join fact_table f
On d.bizkeys = f.bizkeys
Where currentind =‘1’
And d.startdate = (select max(startdate) from dim_table)
After giving startdate condition I’m getting 1.8 million records, if I give
f.startdate = (select max(startdate) from fact_table)
I got 100 million records.
Can anyone Please clarify my doubt? Why I’m seeing this huge variation?
If you're just trying to get a list of all possible dates from the dimension (say, for a dropdown in your reporting tool that would let the user pick a date), then there's probably no reason that you would join to the fact table—unless you only want to include dates for which there's a corresponding fact record.
Without some sample data (or at least a little more information about the substance of the fact and dimension tables), I'm not sure I can give a better answer than that.

SQL group searching and matching between two tables

Working on a physical security migration. Have two tables. First table (AreaAccess) lists the badgeholder with the areaid's the badgeholder has access to. Second table (AreaGroups) has areaid's grouped together in sets. The goal is to read the cardholder's AreaAccess records and then search the AreaGroups for the count of the best or exact match of the cardholder's areas to a group.
Curious why you didn't give this a shot first. If you aren't as familiar with SQL, here's a great link to get started: https://www.w3schools.com/sql/
Also, it's extremely helpful if you can provide a sample of what you want the outcome to look like. It doesn't have to be fancy, just a few rows/columns that can demonstrate what you're hoping to see.
Here's a possible look. However, your question is a little vague, so this is a best guess.
create table ##CardToArea --Is this table a log, or is it grouping of permission? I'm treating it as a log, but the wording of your question isn't quite clear.
(
CardholderID int not null
, AreaID int not null
);
insert into ##CardToArea
(CardholderID,AreaID)
values
(1961,11)
,(1961,25)
,(1961,28)
,(1961,71)
,(1961,73)
,(1961,74)
,(1961,44)
,(1961,50)
,(1961,51)
,(1961,52);
create table ##AreaToGroup
(
AreaID int not null
, AreaGroupID int not null
, unique (AreaID,AreaGroupID)
);
insert into ##AreaToGroup
(AreaID,AreaGroupID)
values
(33,0)
,(45,0)
,(45,7)
,(19,16)
,(17,16)
,(11,16)
,(11,48)
,(17,48)
,(17,49)
,(15,49)
,(11,49);
select
isnull(convert(nvarchar,atg.AreaGroupID),'Not defined') as [AreaGroupID]
, cta.CardholderID
, count(*) as [CountOfAccesses]
from ##CardToArea as cta
left join ##AreaToGroup as atg on cta.AreaID = atg.AreaID
group by
atg.AreaGroupID
, cta.CardholderID;
drop table ##AreaToGroup;
drop table ##CardToArea;
#Robert - thanks for the response.
This is my query I had worked through yesterday. Looking only at two sample users (23006 and 28190). The result is a full report of the areasets these users are part of. What I have been trying to do, hence yesterdays question, is to limit the query to the top five areacounts for each cardholder. Attempted to use ROW_NUMBER processing but that was not working primarily because of the alias for "count(g.areaid)" in the select. I also tried numerous sub queries but to no avail.
select g.AreaGroupID, ag.caption, count(g.areaid) as AreaCount, a.CardholderID
from AHBadgeActivity B
join areaaccess a on a.CardholderID=b.CardholderID
left join AreaGroupSet g
on g.areaid=a.AreaID
left join AreaGroup ag on AG.AreaGroupID=g.AreaGroupID
where (a.CardholderID=23006 or a.CardholderID=28190) and DeleteFlag=0 and g.AreaGroupID <> 0
group by g.AreaGroupID, ag.Caption, a.CardholderID
order by a.cardholderid, AreaCount desc
Here is the sample output. My goal is to limit to the top five AreaCounts for each Cardholder.
Output from Query

SqlServer Many to Many AND

I have 3 (hypothetical) tables.
Photos (a list of photos)
Attributes (things describing the photos)
PhotosToAttributes (a table to link the first 2)
I want to retrieve the Names of all the Photos that have a list of attributes.
For example, all photos that have both dark lighting and are portraits (AttributeID 1 and 2). Or, for example, all photos that have dark lighting, are portraits and were taken at a wedding (AttributeID 1 and 2 and 5). Or any arbitrary number of attributes.
The scale of the database will be maybe 10,000 rows in Photos, 100 Rows in Attributes and 100,000 rows in PhotosToAttributes.
This question: SQL: Many-To-Many table AND query is very close. (I think.) I also read the linked answers about performance. That leads to something like the following. But, how do I get Name instead of PhotoID? And presumably my code (C#) will build this query and adjust the attribute list and count as necessary?
SELECT PhotoID
FROM PhotosToAttributes
WHERE AttributeID IN (1, 2, 5)
GROUP by PhotoID
HAVING COUNT(1) = 3
I'm a bit database illiterate (it's been 20 years since I took a database class); I'm not even sure this is a good way to structure the tables. I wanted to be able to add new attributes and photos at will without changing the data access code.
It is probably a reasonable way to structure the database. An alternate would be to keep all the attributes as a delimited list in a varchar field, but that would lead to performance issues as you search the field.
Your code is close, to take it to the final step you should just join the other two tables like this:
Select p.Name, p.PhotoID
From Photos As p
Join PhotosToAttributes As pta On p.PhotoID = pta.PhotoID
Join Attributes As a On pta.AttributeID = a.AttributeID
Where a.Name In ('Dark Light', 'Portrait', 'Wedding')
Group By p.Name, p.PhotoID
Having Count(*) = 3;
By joining the Attributes table like that it means you can search for attributes by their name, instead of their ID.
For first create view from your joins:
create view vw_PhotosWithAttributes
as
select
p.PhotoId,
a.AttributeID,
p.Name PhotoName,
a.Name AttributeName
from Photos p
inner join PhotosToAttributes pa on p.PhotoId = pa.PhotoId
inner join Attributes a on a.AttributeID = pa.AttributeID
You can easy ask for attribute, name, id but don't forget to properly index field.

SQL SUM() function with parameters returned by query for each row

First of all, sorry for that weird title. Here is the thing:
I work for a online shop, which sells products on amazon. Since we sell sets of different items, it happens that we send the same item within multiple sets to amazon fba. To give out the total sum of one item in all sets, I wrote the following query:
SELECT
SUM(nQuantity)
AS [total]
FROM [amazon_fba]
INNER JOIN (SELECT
[cArtNr]
FROM [tArtikel]
INNER JOIN (SELECT
[kStueckliste]
FROM [tStueckliste]
WHERE [kArtikel] = (SELECT
[kArtikel]
FROM [tArtikel]
WHERE [cHAN] = 12345)) [bar]
ON [tArtikel].[kStueckliste] = [bar].[kStueckliste]) [foo]
ON [amazon_fba].[cSellerSKU] = [foo].[cArtNr]
The cHAN=12345 part is just used to pick one specific item for which we want to know the total number of items. This query itself works fine, so this is not the problem.
However, I also know that all products that are part of sets have [tArtikel].[kStueckliste]=0, which -in theory- makes identifying them pretty easy. Which got me to the idea, that I could use this query to instantly generate a list of all these products with their respective total, like:
kArtikel | total
=================
01234 | 23
56789 | 42
So basically I needed something like
foreach (
select [kArtikel]
from [tArtikel]
where [tArtikel].[kStueckliste]=0
) do (
< the query I made >
)
Thus I tried the following statement:
SELECT
SUM(nQuantity)
AS [total]
FROM [amazon_fba]
INNER JOIN (SELECT
[cArtNr]
FROM [tArtikel]
INNER JOIN (SELECT
[kStueckliste]
FROM [tStueckliste]
INNER JOIN (SELECT
[kArtikel]
FROM [tArtikel]
WHERE [tArtikel].[tStueckliste] = 0) [baz]
ON [tStueckliste].[kArtikel] = [baz].[kArtikel]) [bar]
ON [tArtikel].[kStueckliste] = [bar].[kStueckliste]) [foo]
ON [amazon_fba].[cSellerSKU] = [foo].[cArtNr]
This did not -as I hoped- return a list of sums, but instead gave me the total sum of all sums I wanted to create.
Since I am pretty new to SQL (about two weeks in maybe), I have neither any idea what to do, nor where my mistake is, NOR what phrasing I should use to google my way around -thus that wierd Title of this post. So if anyone could help me with that and/or point me into the right direction I'd be really happy :)
I write MySQL rather than SQL but I believe it's very similar other than a few functions and syntaxes. Here's what I think should work for you:
select am.cArtNr, sum(am.nQuantity) as total
from amazon_fba am
join tArtikel ar on ar.cArtNr=am.cArtNr
join tStueckliste st on st.kStueckliste=ar.kStueckliste
where ar.kStueckliste=0
group by am.cArtNr;
Adding the group by will do the split out by articles, but reducing the number of brackets (in this instance derived tables) will speed up the query provided you're using indexes. Again, this is how I would do it in MySQL, and the only other query language I have experience in is BigQuery which won't help here.

How to fetch an object graph at once?

I'm reading a book, where the author talks about fetching an row + all linked parent rows in one step. Like fetching an order + all it's items all at once. Okay, sounds nice, but really: I've never seen an possibility in SQL to ask for - lets say - one order + 100 items? How would this record set look like? Would I get 101 rows with merged fields of both the order and the item table, where 100 rows have a lot of NULL values for the order fields, while one row has a lot of NULL values for the item fields? Is that the way to go? Or is there something much cooler? I mean... I never heard of fetching arrays onto a field?
A simple JOIN would do the trick:
SELECT o.*
, i.*
FROM orders o
INNER JOIN order_items i
ON o.id = i.order_id
The will return one row for each row in order_items. The returned rows consist of all fields from the orders table, and concatenated to that, all fields from the order_items table (quite literally, the records from the tables are joined, that is, they are combined by record concatenation)
So if orders has (id, order_date, customer_id) and order_items has (order_id, product_id, price) the result of the statement above will consist of records with (id, order_date, customer_id, order_id, product_id, price)
One thing you need to be aware of is that this approach breaks down whenever there are two distinct 'detail' tables for one 'master'. Let me explain.
In the orders/order_items example, orders is the master and order_items is the detail: each row in order_items belongs to, or is dependent on exactly one row in orders. The reverse is not true: one row in the orders table can have zero or more related rows in the order_items table. The join condition
ON o.id = i.order_id
ensures that only related rows are combined and returned (leaving out the condition would retturn all possible combinations of rows from the two tables, assuming the database would allow you to omit the join condition)
Now, suppose you have one master with two details, for example, customers as master and customer_orders as detail1 and customer_phone_numbers. Suppose you want to retrieve a particular customer along with all is orders and all its phone numbers. You might be tempted to write:
SELECT c.*, o.*, p.*
FROM customers c
INNER JOIN customer_orders o
ON c.id = o.customer_id
INNER JOIN customer_phone_numbers p
ON c.id = p.customer_id
This is valid SQL, and it will execute (asuming the tables and column names are in place)
But the problem is, is that it will give you a rubbish result. Assuming you have on customer with two orders (1,2) and two phone numbers (A, B) you get these records:
customer-data | order 1 | phone A
customer-data | order 2 | phone A
customer-data | order 1 | phone B
customer-data | order 2 | phone B
This is rubbish, as it suggests there is some relationship between order 1 and phone numbers A and B and order 2 and phone numbers A and B.
What's worse is that these results can completely explode in numbers of records, much to the detriment of database performance.
So, JOIN is excellent to "flatten" a hierarchy of items of known depth (customer -> orders -> order_items) into one big table which only duplicates the master items for each detail item. But it is awful to extract a true graph of related items. This is a direct consequence of the way SQL is designed - it can only output normalized tables without repeating groups. This is way object relational mappers exist, to allow object definitions that can have multiple dependent collections of subordinate objects to be stored and retrieved from a relational database without losing your sanity as a programmer.
This is normally done through a JOIN clause. This will not result in many NULL values, but many repeated values for the parent row.
Another option, if your database and programming language support it, it to return both result sets in one connection - one select for the parent row another for the related rows.

Resources