Detecting multiple records associated with a table - sql-server

I have a SQL Server 2008 database that has two tables. These two tables are CoreGroup and CoreGroupMember. Please note, I did not setup these tables. Regardless, the table structure is:
CoreGroup
---------
ID
GroupMember1MemberName
GroupMember2MemberName
GroupMember3MemberName
GroupMember4MemberName
CoreGroupMember
---------------
ID
CoreGroupID
MemberName
I need to determine how many CoreGroup records are associated with a CoreGroupMember with a specific MemberName. There is one catch that is really throwing me for a loop though. Some CoreGroup records only have one member associated with them. I need to retrieve the CoreGroup records that have multiple CoreGroupMember records where at least one of the records has the specific MemberName. I can't seem to figure out the multiple record part. Can someone please help me?
Thank you!

I'll take a stab at it hoping I've understood the requirements correctly. First, I use a cte to find all groups with multiple members, then use that result set to find groups with your specific member.
with cteMultipleMembers as (
select cg.ID, COUNT(*) as MemberCount
from CoreGroup cg
inner join CoreGroupMember cgm
on cg.ID = cgm.CoreGroupID
group by cg.ID
having COUNT(*) > 1
)
select mm.ID
from cteMultipleMembers mm
inner join CoreGroupMember cgm
on mm.ID = cgm.CoreGroupID
and cgm.MemberName = #YourMemberName

Related

Duplicate SQL Record Entries with in 3 days

Table has following structure
ID, OrderNumber, PFirstName, PLastName, Product, LastDateModified
This information is populated into my SQL Server database by a XML import file and is created when the front end hits 'Enter'. But someone on the front has been seeing an error and then hitting Cancel and re-submitting the order with new information.
Now, the first order is in the Database because they didn't cancel it out on the backend first.
How can I find the any duplicate OrderNumber, PFirstName, PLastName, Product within 3 days of any lastdatemodified entry?
A self join with a simple where clause.
Assuming the ORDER numbers are not duplicated and that's what you're looking for.
SELECT A.ID as A_ID
, A.orderNumber as OriginalOrder
, B.ID as B_ID
, B.OrderNumber as PossibleDuplicatedOrder
FROM TBL A
INNER JOIN TBL B
on A.PFirstName = B.PfirstName
AND A.PLastName = B.PLastName
AND A.Product = B.Product
AND A.LastDateMOdified < B.LastDateModified
WHERE datediff(day,A.LastDateModified,B.LastDateModified) <=3
Logically this self joins and to eliminate A-->B and B-->A duplication casued by self joins we use a < so that all of the records in alias A have a date earlier than that in B when the other fields are equal, and then we simply look for those that have a datediff of <=3.
However if multiple duplicates exist for the same order such as
A-->B
B-->C
You'll see duplication in the results such as (but only if all 3 are w/in 3 days)
A-->B
B-->C
A-->C
But I don't see this as a bad thing given what you're attempting to recover from.
I'm not sure how to determine if it's been cancelled or backed out so you'll have to set other limits for that as they weren't specified in the question.

Last Date Attended from 2 Tables SQL Server

I have searched the forum, and couldn't find an answer. So I apologize if this is out there. This seems simple in my mind, however, I can't seem to get the correct code.
I have 2 tables. STUDENT_TERMS_VIEW table holds STTR_STUDENT, STTR_TERM and TERMS table, holds the TERM_END_DATE. I need to find a way to select the student's last term based on MAX(TERM_END_DATE), but I get STTR_TERM duplicate rows per student. I need to get 1 row per student and their last term attended.
EDIT: Ok so both tables are linked by TERM.
View Code Here
As you can see, I am getting duplicate TERMS for the same student, even though I am pulling MAX(TERM_END_DATE)
SELECT * FROM
(SELECT STUDENT_TERMS_VIEW.STTR_STUDENT,
STUDENT_TERMS_VIEW.STTR_TERM,
TERMS.TERM_END_DATE
FROM STUDENT_TERMS_VIEW
JOIN STUDENT_TERMS_VIEW ON TERMS.TERMS_ID = STUDENT_TERMS_VIEW.STTR_TERM
ORDER BY TERMS.TERM_END_DATE DESC,STUDENT_TERMS_VIEW.STTR_STUDENT)
GROUP BY STUDENT_TERMS_VIEW.STTR_STUDENT
Your query is getting the max of the combination of (STTR_STUDENT and STTR_TERM). If you only want to get the max term of each student, you should only GROUP BY STUDENT_TERMS_VIEW.STTR_STUDENT. Try the query below.
SELECT stv.STTR_STUDENT, MAX(t.TERM_END_DATE)
FROM STUDENT_TERMS_VIEW stv
JOIN TERMS t ON t.TERMS_ID = stv.STTR_TERM
GROUP BY stv.STTR_STUDENT
If you also need to get the term, join it back to STUDENT_TERMS_VIEW and TERMS.
SELECT s.STTR_STUDENT, s.STTR_TERM, t.TERM_END_DATE
FROM (
SELECT stv.STTR_STUDENT, MAX(t.TERM_END_DATE) AS 'MaxDate'
FROM STUDENT_TERMS_VIEW stv
JOIN TERMS t ON t.TERMS_ID = stv.STTR_TERM
GROUP BY stv.STTR_STUDENT
) a
JOIN STUDENT_TERMS_VIEW s ON s.STTR_STUDENT = a.STTR_STUDENT
JOIN TERMS t ON t.TERMS_ID = s.STTR_TERM AND t.TERM_END_DATE = a.TERM_END_DATE

Multi join issue

*EDIT** Thanks for all the input, and sorry for late reply. I have been away during the weekend without access to internet. I realized from the answers that I needed to provide more information, so people could understand the problem more throughly so here it comes:
I am migrating an old database design to a new design. The old one is a mess and very confusing ( I haven't been involved in the old design ). I've attached a picture of the relevent part of the old design below:
The table called Item will exist in the new design as well, and it got all columns that I need in the new design as well except one and it is here my problem begin. I need the column which I named 'neededProp' to be associated( with associated I mean like a column in the new Item table in the new design) with each new migrated row from Item.
So for a particular eid in table Environment there can be n entries in table Item. The "corresponding" set exists in table Room. The only way to know which rows that are associated in Item and Room are with the help of the columns "itemId" and "objectId" in the respective table. So for example for a particular eid there might be 100 entries in Item and Room and their "itemId" and "objectId" can be values from 1 to 100, so that column is only unique for a particular eid ( or baseSeq which it is called in table BaseFile).
Basically you can say that the tables Environment and BaseFile reminds of each other and the tables Item and Room reminds of each other. The difference is that some tables lack some columns and other may have some extra. I have no idea why it is designed like this from the beginning.
My question is if someone can help me with creating a query so that I can be able to find out the proper "neededProp" for each row in the Item-table so I can get that data into the new design?
*OLD-PART**This might be a trivial question but I can't get it to work as I want. I want to join a few tables as in the sql-statement below. If I start like this and run this query
select * from Environment e
join items ei on e.eid = ei.eid
I get like 400000 rows which is what I want. However if I add one more line so it looks like this:
select * from Environment e
join items ei on e.eid= ei.eid
left join Room r on e.roomnr = r.roomobjectnr
I get an insane amount of rows so there must be some multiplication going on. I want to get the same amount of rows ( like 400000 in this case ) even after joining the third table. Is that possible somehow? Maybe like creating a temporary view with the first 2 rows.
I am using MSSQL server.
So without knowing what data you have in your second query it's very difficult to say exactly how to write this out, and you're likely having a problem where there's an additional column that you are joining to in Rooms that perhaps you have forgotten such as something indicating a facility or hallway perhaps where you have multiple 'Room 1' entries as an example.
However, to answer your question regarding another way to write this out without using a temp table I've crufted up the below as an example of using a common table expression which will only return one record per source row.
;WITH cte_EnvironmentItems AS (
SELECT *
FROM Environment E
INNER JOIN Items I ON I.eid = E.eid
), cte_RankedRoom AS (
SELECT *
,ROW_NUMBER() OVER (ORDER BY R.UpdateDate DESC) [RN]
FROM Room R
)
SELECT *
FROM cte_EnvironmentItems E
LEFT JOIN cte_RankedRoom R ON E.roomnr = R.roomobjectnr
AND R.RN = 1
btw,do you want column from room table.if no then
select * from Environment e
join items ei on e.eid= ei.eid
where e.roomnr in (select r.roomobjectnr from Room r )
else
select * from Environment e
join items ei on e.eid= ei.eid
left join (select distinct roomobjectnr from Room) r on e.roomnr = r.roomobjectnr

How can I convert a view containing a START WITH...CONNECT BY sub-query to SQL Server?

I am trying to convert a view from an Oracle RDBMS to SQL Server. The view looks like:
create or replace view user_part_v
as
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id)
Having read about recursive CTE's and also about their use in sub-queries, my best guess at translating the above into SQL Server syntax is:
create view user_part_v
as
with region_structure(region_id, parent_region_id) as (
select region_id
, parent_region_id
from region_relation
where parent_region_id = users.region_id
union all
select r.region_id
, r.parent_region_id
from region_relation r
join region_structure rs on rs.parent_region_id = r.region_id
)
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(select region_id from region_structure)
Obviously this gives me an error about the reference to users.region_id in the CTE definition.
How can I achieve the same result in SQL Server as I get from the Oracle view?
Background
I am working on the conversion of a system from running on an Oracle 11g RDMS to SQL Server 2008. This system is a relatively large Java EE based system, using JPA (Hibernate) to query from the database.
Many of the queries use the above mentioned view to restrict the results returned to those appropriate for the current user. If I cannot convert the view directly then the conversion will be much harder as I will need to change all of the places where we query the database to achieve the same result.
The tables referenced by this view have a structure similar to:
USERS
ID
REGION_ID
REGION
ID
NAME
REGION_RELATIONSHIP
PARENT_REGION_ID
REGION_ID
PART
ID
PARTNO
DESCRIPTION
PART_REGION
PART_ID
REGION_ID
So, we have regions, arranged into a hierarchy. A user may be assigned to a region. A part may be assigned to many regions. A user may only see the parts assigned to their region. The regions reference various geographic regions:
World
Europe
Germany
France
...
North America
Canada
USA
New York
...
If a part, #123, is assigned to the region USA, and the user is assigned to the region New York, then the user should be able to see that part.
UPDATE: I was able to work around the error by creating a separate view that contained the necessary data, and then have my main view join to this view. This has the system working, but I have not yet done thorough correctness or performance testing yet. I am still open to suggestions for better solutions.
I reformatted your original query to make it easier for me to read.
create or replace view user_part_v
as
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(
select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id
);
Let's examine what's going on in this query.
select part_region.part_id, users.id as users_id
from part_region, users
This is an old-style join where the tables are cartesian joined and then the results are reduced by the subsequent where clause(s).
where part_region.region_id in(
select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id
);
The sub-query that's using the connect by statement is using the region_id from the users table in outer query to define the starting point for the recursion.
Then the in clause checks to see if the region_id for the part_region is found in the results of the recursive query.
This recursion follows the parent-child linkages given in the region_relation table.
So the combination of doing an in clause with a sub-query that references the parent and the old-style join means that you have to consider what the query is meant to accomplish and approach it from that direction (rather than just a tweaked re-arrangement of the old query) to be able to translate it into a single recursive CTE.
This query also will return multiple rows if the part is assigned to multiple regions along the same branch of the region heirarchy. e.g. if the part is assigned to both North America and USA a user assigned to New York will get two rows returned for their users_id with the same part_id number.
Given the Oracle view and the background you gave of what the view is supposed to do, I think what you're looking for is something more like this:
create view user_part_v
as
with user_regions(users_id, region_id, parent_region_id) as (
select u.users_id, u.region_id, rr.parent_region_id
from users u
left join region_relation rr on u.region_id = rr.region_id
union all
select ur.users_id, rr.region_id, rr.parent_region_id
from user_regions ur
inner join region_relation rr on ur.parent_region_id = rr.region_id
)
select pr.part_id, ur.users_id
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
Note that I've added the users_id to the output of the recursive CTE, and then just done a simple inner join of the part_region table and the CTE results.
Let me break down the query for you.
select u.users_id, u.region_id, rr.parent_region_id
from users u
left join region_relation rr on u.region_id = rr.region_id
This is the starting set for our recursion. We're taking the region_relation table and joining it against the users table, to get the starting point for the recursion for every user. That starting point being the region the user is assigned to along with the parent_region_id for that region. A left join is done here and the region_id is pulled from the user table in case the user is assigned to a top-most region (which means there won't be an entry in the region_relation table for that region).
select ur.users_id, rr.region_id, rr.parent_region_id
from user_regions ur
inner join region_relation rr on ur.parent_region_id = rr.region_id
This is the recursive part of the CTE. We take the existing results for each user, then add rows for each user for the parent regions of the existing set. This recursion happens until we run out of parents. (i.e. we hit rows that have no entries for their region_id in the region_relationship table.)
select pr.part_id, ur.users_id
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
This is the part where we grab our final result set. Assuming (as I do from your description) that each region has only one parent (which would mean that there's only one row in region_relationship for each region_id), a simple join will return all the users that should be able to view the part based on the part's region_id. This is because there is exactly one row returned per user for the user's assigned region, and one row per user for each parent region up to the heirarchy root.
NOTE:
Both the original query and this one do have a limitation that I want to make sure you are aware of. If the part is assigned to a region that is lower in the heirarchy than the user (i.e. a region that is a descendent of the user's region like the part being assigned to New York and the user to USA instead of the other way around), the user won't see that part. The part has to be assigned to either the user's assigned region, or one higher in the region heirarchy.
Another thing is that this query still exhibits the case I mentioned above about the original query, where if a part is assigned to multiple regions along the same branch of the heirarchy that multiple rows will be returned for the same combination of users_id and part_id. I did this because I wasn't sure if you wanted that behavior changed or not.
If this is actually an issue and you want to eliminate the duplicates, then you can replace the query below the CTE with this one:
select p.part_id, u.users_id
from part p
cross join users u
where exists (
select 1
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
where pr.part_id = p.part_id
and ur.users_id = u.users_id
);
This does a cartesian join between the part table and the users table and then only returns rows where the combination of the two has at least one row in the results of the subquery, which are the results that we are trying to de-duplicate.

How to fetch an object graph at once?

I'm reading a book, where the author talks about fetching an row + all linked parent rows in one step. Like fetching an order + all it's items all at once. Okay, sounds nice, but really: I've never seen an possibility in SQL to ask for - lets say - one order + 100 items? How would this record set look like? Would I get 101 rows with merged fields of both the order and the item table, where 100 rows have a lot of NULL values for the order fields, while one row has a lot of NULL values for the item fields? Is that the way to go? Or is there something much cooler? I mean... I never heard of fetching arrays onto a field?
A simple JOIN would do the trick:
SELECT o.*
, i.*
FROM orders o
INNER JOIN order_items i
ON o.id = i.order_id
The will return one row for each row in order_items. The returned rows consist of all fields from the orders table, and concatenated to that, all fields from the order_items table (quite literally, the records from the tables are joined, that is, they are combined by record concatenation)
So if orders has (id, order_date, customer_id) and order_items has (order_id, product_id, price) the result of the statement above will consist of records with (id, order_date, customer_id, order_id, product_id, price)
One thing you need to be aware of is that this approach breaks down whenever there are two distinct 'detail' tables for one 'master'. Let me explain.
In the orders/order_items example, orders is the master and order_items is the detail: each row in order_items belongs to, or is dependent on exactly one row in orders. The reverse is not true: one row in the orders table can have zero or more related rows in the order_items table. The join condition
ON o.id = i.order_id
ensures that only related rows are combined and returned (leaving out the condition would retturn all possible combinations of rows from the two tables, assuming the database would allow you to omit the join condition)
Now, suppose you have one master with two details, for example, customers as master and customer_orders as detail1 and customer_phone_numbers. Suppose you want to retrieve a particular customer along with all is orders and all its phone numbers. You might be tempted to write:
SELECT c.*, o.*, p.*
FROM customers c
INNER JOIN customer_orders o
ON c.id = o.customer_id
INNER JOIN customer_phone_numbers p
ON c.id = p.customer_id
This is valid SQL, and it will execute (asuming the tables and column names are in place)
But the problem is, is that it will give you a rubbish result. Assuming you have on customer with two orders (1,2) and two phone numbers (A, B) you get these records:
customer-data | order 1 | phone A
customer-data | order 2 | phone A
customer-data | order 1 | phone B
customer-data | order 2 | phone B
This is rubbish, as it suggests there is some relationship between order 1 and phone numbers A and B and order 2 and phone numbers A and B.
What's worse is that these results can completely explode in numbers of records, much to the detriment of database performance.
So, JOIN is excellent to "flatten" a hierarchy of items of known depth (customer -> orders -> order_items) into one big table which only duplicates the master items for each detail item. But it is awful to extract a true graph of related items. This is a direct consequence of the way SQL is designed - it can only output normalized tables without repeating groups. This is way object relational mappers exist, to allow object definitions that can have multiple dependent collections of subordinate objects to be stored and retrieved from a relational database without losing your sanity as a programmer.
This is normally done through a JOIN clause. This will not result in many NULL values, but many repeated values for the parent row.
Another option, if your database and programming language support it, it to return both result sets in one connection - one select for the parent row another for the related rows.

Resources