Why doesn't this "not in" clause do what I expect? - sql-server

I'm writing what I think should be a simple query using the "not in" operator, and it's not doing what I expect.
Background:
I have two tables, Contact and Company.
Contact includes columns ContactID (person's identity) and CompanyID (which company they work for)
CompanyID values are expected to be equivalent to the CompanyIDs in the Company table
I want to write a query that checks how many people from the Contact table that have an "invalid" CompanyID (i.e., listed as working for a Company that isn't in the Company table)
I have a working query that does this:
select
count(ContactID)
from
Contact left join Company on Contact.CompanyID = Company.CompanyID
where
Company.CompanyID is null;
This query returns the value 2725538, which I believe to be the correct answer (I've done some simple "show me the top 10 rows" debugging, and it appears to be counting the right rows).
I wrote a second query which I expected to return the same result:
select
count(ContactID)
from
Contact
where
CompanyID not in
(select
CompanyID
from
Company)
However, this query instead returns 0.
To help me debug this, I checked two additional queries.
First, I tried commenting out the WHERE clause, which should give me all of the ContactIDs, regardless of whether they work for an invalid company:
select
count(ContactID)
from
Contact
This query returns 29722995.
Second, I tried removing the NOT from my query, which should give me the inverse of what I'm looking for (i.e., it should count the Contacts who work for valid companies):
select
count(ContactID)
from
Contact
where
CompanyID in
(select
CompanyID
from
Company)
This query returns 26997457.
Notably, these two numbers differ by exactly 2725538, the number returned by the first, working query. This is what I would expect if my second query was working. The total number of Contacts, minus the number of Contacts whose CompanyIDs are in the Company table, should equal the number of Contacts whose CompanyIDs are not in the Company table, shouldn't it?
So, why is the "not in" version of the query returning 0 instead of the correct answer?

the only issue could be of NULL CompanyID. Not In doesn't work with NULLs because of non-comparability of NULL.
try the following:
select
count(ContactID)
from
Contact
where
CompanyID not in
(select
ISNULL(CompanyID,'')
from
Company)
you can see the example in db<>fiddle here.
Please find more details HERE.

Related

Inner Joins in SOQL With Relationship

I need a help in SOQL. I am new to this, so please bear with me.
I have to do a downwards traverse in SOQL.
SELECT Id, (SELECT Name from Contacts WHERE CreatedDate > YESTERDAY AND LastModifiedDate > YESTERDAY) from Account where CreatedDate > YESTERDAY AND LastModifiedDate > YESTERDAY
I want to get all records from Account and Contact where created date or last modified date is within a certain range. I want records where there are no changes in Account Object but changes are there in records in Contact Object.
But this query will not fetch any records if there are any changes in only Contact and no change in Account. How can I possibly do that.
Please help!
You're doing a Parent-Child SOQL query. The Contact subquery only matches Contacts associated with Accounts that match the primary query's filters.
You'll need to run one query on Account and a separate, non-relationship-based query on Contact.
Probably the following answer might help someone.
Query:
SELECT Id, Name, LastModifiedDate,(SELECT Name,LastModifiedDate from Contacts WHERE LastModifiedDate >= YESTERDAY) from Account where LastModifiedDate < YESTERDAY AND Id IN (SELECT AccountId FROM Contact WHERE LastModifiedDate >= YESTERDAY).
Explanation:
Assuming the certain range of date is yesterday, we want to retrieve all accounts that have contacts which are modified yesterday and after yesterday but the accounts should not have been modified in the above specified range.
The above query is parent-to-child query. In the sub-query we are checking for contacts that have been modified yesterday and after that. In the outer query we are checking that accounts that might have been modified before yesterday.
Now if we stop here and do not include the IN clause the results fetched will be a outer join that is along with required results it fetches accounts that have been modified before yesterday even if they do not have a contact that has been modified yesterday or after that. See attached pic. Outer Join results
So We have to include another check in the outer query ensuring that accounts that have contacts that have satisfied the specified date range are only returned and this is ensured using IN clause. See attached pic. Inner Join results.
Please mark this as useful if it helps.
Thanks

Query on at least and not present in SQL table

I have two queries:
Query the id number of the students who at least enrolled subj1 and subj2 courses (I do not know how to code for at least)
Query the id number, name, and age of the students who did not enroll subj2 course.
I coded something as below - which returns an empty table even though I should get some values.
Select sno, sname, age
from student
where not exists (select cno from course where cno ='C2');
I'm assuming you have a table connecting students with courses in a many-to-many, so that each student can enroll in more than one course and each course can contain multiple students.
So for the sake of this example, let's call it StudentsToCourses.
This table should contain the student id and the course id, and it's primary key should contain the combination of both it's columns.
So the first query would be something like (to get student numbers enrolled into at least one of the two courses):
SELECT sno
FROM StudentToCourses
WHERE cno IN ('C1', 'C2')
or this (to get students enrolled into both courses):
SELECT sno
FROM StudentToCourses
WHERE cno IN ('C1', 'C2')
GROUP BY sno
HAVING COUNT(DISTINCT cno) = 2
Note that the subquery in the EXISTS operator is correlated to the main query using the student number.
The second query is almost the same as the first one, except instead of IN you use =, and instead of EXISTS you use NOT EXISTS.
Since this seems like a homework question, I'll leave it up to you to write the code, otherwise you will not learn anything from this.

How to get rid of duplicates with T-SQL

Hi I have a login table that has some duplicated username.
Yes I know I should have put a constraint on it, but it's a bit too late for that now!
So essentially what I want to do is to first identify the duplicates. I can't just delete them since I can't be too sure which account is the correct one. The accounts have the same username and both of them have roughly the same information with a few small variances.
Is there any way to efficiently script it so that I can add "_duplicate" to only one of the accounts per duplicate?
You can use ROW_NUMBER with a PARTITION BY in the OVER() clause to find the duplicates and an updateable CTE to change the values accordingly:
DECLARE #dummyTable TABLE(ID INT IDENTITY, UserName VARCHAR(100));
INSERT INTO #dummyTable VALUES('Peter'),('Tom'),('Jane'),('Victoria')
,('Peter') ,('Jane')
,('Peter');
WITH UpdateableCTE AS
(
SELECT t.UserName AS OldValue
,t.UserName + CASE WHEN ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY ID)=1 THEN '' ELSE '_duplicate' END AS NewValue
FROM #dummyTable AS t
)
UPDATE UpdateableCTE SET OldValue = NewValue;
SELECT * FROM #dummyTable;
The result
ID UserName
1 Peter
2 Tom
3 Jane
4 Victoria
5 Peter_duplicate
6 Jane_duplicate
7 Peter_duplicate
You might include ROW_NUMBER() as another column to find the duplicates ordinal. If you've got a sort clause to get the earliest (or must current) numbered with 1 it should be easy to find and correct the duplicates.
Once you've cleaned this mess, you should ensure not to get new dups. But you know this already :-D
There is no easy way to get rid of this nightmare. Some manual actions required.
First identify duplicates.
select * from dbo.users
where userId in
(select userId from dbo.users
group by username
having count(userId) > 1)
Next identify "useless" users (for example those who registered but never place any order).
Rerun the query above. Out of this list find duplicates which are the same (by email for example) and combine them in a single record. If they did something useful previously (for example placed orders) then first assign these orders to a user which survive. Remove others.
Continue with other criteria until you you get rid of duplicates.
Then set unique constrain on username field. Also it is good idea to set unique constraint on email field.
Again, it is not easy and not automatic.
In this case where you duplicates and the original names have some variance it is highly impossible to select non duplicate rows since you are not aware which is real and which is duplicate.
I think the best thing to is to correct you data and then fix from where you are getting this slight variant duplicates.

Query to show which games which friends don't have

I have three tables set up in Access. I want to make a query that shows which games someone doesn't have in common with me.
I tried using an unmatched query, but that didn't work since each person has at least one game in common with me.
I guess I'm unsure how to handle this. The GameTimePlayed table basically has the opposite of the information I want to query, so is it possible to query that and add a "Not" conditional to "GameName" or something?
This is for a final project for class, and isn't due for about another month. I don't expect anyone to answer this for me, but even just a point in the right direction would be greatly appreciated. Everything I've tried to find so far is basically about unmatched queries, which did not work for me.
--EDIT TO PROVIDE MORE INFO--
I have all of the games in FavoriteGames. However, not all of my friends (PersonID) have all of my FavoriteGames. I'd like a query to show a record of FirstName, LastName, GameName, for each PersonID, for each GameName that he/she does not have.
Expected Behavior Example: PersonID 10 only has one GameName in common with me. The query should return five records for PersonID 10
(every game except Rocket League).
Sample Data:
tbl_FavoriteGames
tbl_FriendsWithGame
tbl_GameTimePlayed
GameName is the Primary Key for tbl_FavoriteGames
PersonID is the Primary Key for tbl_FriendsWithGame
PersonID, GameName Foreign Keys form a Composite Primary Key for tbl_GameTimePlayed
This is the closest I have gotten so far (still way off though) in that it removes the specified GameName:
SELECT *
FROM tbl_GameTimePlayed
WHERE NOT EXISTS
(
SELECT *
FROM tbl_FriendsWithGame
WHERE tbl_GameTimePlayed.PersonID = tbl_FriendsWithGame.PersonID
AND tbl_GameTimePlayed.GameName = tbl_FavoriteGames.GameName
);
It prompts me to enter a GameName (no idea why). When I enter a GameName, it returns all records that don't have that specific GameName.
This returns 6 games for each person, whether or not the person actually has that game. Could be useful since it contains the people/games that aren't in common.
SELECT PersonID, GameName
FROM tbl_FriendsWithGame, tbl_FavoriteGames
WHERE EXISTS (SELECT PersonID FROM tbl_GameTimePlayed WHERE GameName = tbl_GameTimePlayed.GameName);
I tried "WHERE NOT EXISTS" and that returned 0 results.
--SECOND EDIT: SOLVED!!--
I took a fresh look at the problem today, and figured it out! I used the code mentioned above to query (qry_AllPeopleAllGames) a list of all of the games, for all of the people (so 6 entries per person):
SELECT PersonID, GameName
FROM tbl_FriendsWithGame, tbl_FavoriteGames
WHERE EXISTS (SELECT PersonID FROM tbl_GameTimePlayed WHERE GameName = tbl_GameTimePlayed.GameName);
Then, I made another query that compared the qry_AllPeopleAllGames list to my tbl_GameTimePlayed (which is the list of people, games they actually own, and hours played) and spit out a list of FirstName & LastInitial and GameName that don't exist in the real list:
SELECT [tbl_FriendsWithGame]![FirstName] & " " & [tbl_FriendsWithGame]![LastInitial] AS FullName, GameName
FROM qry_AllPeopleAllGames INNER JOIN tbl_FriendsWithGame ON qry_AllPeopleAllGames.PersonID = tbl_FriendsWithGame.PersonID
WHERE ((NOT Exists (SELECT PersonID, GameName
FROM tbl_GameTimePlayed
WHERE qry_AllPeopleAllGames.PersonID = tbl_GameTimePlayed.PersonID AND qry_AllPeopleAllGames.GameName = tbl_GameTimePlayed.GameName
)));
****NOTE:**** The first part of the SELECT is not needed, I just used it for easier viewing in my actual query results (showing first name/last initial in one field).
I'm really excited that I figured this out! I'm sure there are better/more efficient ways to do this, and if you want to share, please let me know!
I included this in my initial post, but I'll post this as the answer as well.
I took a fresh look at the problem today, and figured it out! Last night while trying to test random possible solutions, I accidently made a query that lists all of the games, for all of the people (so 6 entries per person). Today, I used it as part of the solution, qry_AllPeopleAllGames:
SELECT PersonID, GameName
FROM tbl_FriendsWithGame, tbl_FavoriteGames
WHERE EXISTS (SELECT PersonID FROM tbl_GameTimePlayed WHERE GameName = tbl_GameTimePlayed.GameName);
Then, I made another query that compared the qry_AllPeopleAllGames list to my tbl_GameTimePlayed, which is the real list of people/games/hours played.
It returns the FirstName&LastInitial and the GameName for each PersonID/GameName combo that doesn't appear in the tbl_GameTimePlayed table. Here is the code:
SELECT [tbl_FriendsWithGame]![FirstName] & " " & [tbl_FriendsWithGame]![LastInitial] AS FullName, GameName
FROM qry_AllPeopleAllGames INNER JOIN tbl_FriendsWithGame ON qry_AllPeopleAllGames.PersonID = tbl_FriendsWithGame.PersonID
WHERE ((NOT Exists (SELECT PersonID, GameName
FROM tbl_GameTimePlayed
WHERE qry_AllPeopleAllGames.PersonID = tbl_GameTimePlayed.PersonID AND qry_AllPeopleAllGames.GameName = tbl_GameTimePlayed.GameName
)));
NOTE: The first part of the SELECT is not needed, I just used it for easier viewing in my actual query results (showing first name/last initial in one field).
I'm really excited that I figured this out! I'm sure there are better/more efficient ways to do this, and if you want to share, please let me know!
You need a dataset of all possible pairs of friends/games in order to determine which games each friend does not have. Do you have a tbl_Friends? Consider:
Query1:
SELECT tblFriends.ID, tbl_FavoriteGames.ID FROM tblFriends, tbl_FavoriteGames;
That is a Cartesian query - without JOIN clause every record of each table will associate with each record of other table.
Query2:
SELECT Query1.tblFriends.ID, Query1.tbl_FavoriteGames.ID
FROM tbl_FriendsWithGame RIGHT JOIN Query1 ON (tbl_FriendsWithGame.GameID = Query1.tbl_FavoriteGames.ID) AND (tbl_FriendsWithGame.FriendID = Query1.tblFriends.ID) WHERE tbl_FriendsWithGame.GameID IS NULL;
Or if you don't have tbl_Friends
SELECT DISTINCT tbl_FriendsWithGame.FriendID, tbl_FavoriteGames.ID
FROM tbl_FavoriteGames, tbl_FriendsWithGame;
Then adjust Query2.

Database tables: One-to-many of different types

Due to non-disclosure at my work, I have created an analogy of the situation. Please try to focus on the problem and not "Why don't you rename this table, m,erge those tables etc". Because the actual problem is much more complex.
Heres the deal,
Lets say I have a "Employee Pay Rise" record that has to be approved.
There is a table with single "Users".
There are tables that group Users together, forexample, "Managers", "Executives", "Payroll", "Finance". These groupings are different types with different properties.
When creating a "PayRise" record, the user who is creating the record also selects both a number of these groups (managers, executives etc) and/or single users who can 'approve' the pay rise.
What is the best way to relate a single "EmployeePayRise" record to 0 or more user records, and 0 or more of each of the groupings.
I would assume that the users are linked to the groups? If so in this case I would just link the employeePayRise record to one user that it applies to and the user that can approve. So basically you'd have two columns representing this. The EmployeePayRise.employeeId and EmployeePayRise.approvalById columns. If you need to get to groups, you'd join the EmployeePayRise.employeeId = Employee.id records. Keep it simple without over-complicating your design.
My first thought was to create a table that relates individual approvers to pay rise rows.
create table pay_rise_approvers (
pay_rise_id integer not null references some_other_pay_rise_table (pay_rise_id),
pay_rise_approver_id integer not null references users (user_id),
primary key (pay_rise_id, pay_rise_approver_id)
);
You can't have good foreign keys that reference managers sometimes, and reference payroll some other times. Users seems the logical target for the foreign key.
If the person creating the pay rise rows (not shown) chooses managers, then the user interface is responsible for inserting one row per manager into this table. That part's easy.
A person that appears in more than one group might be a problem. I can imagine a vice-president appearing in both "Executive" and "Finance" groups. I don't think that's particularly hard to handle, but it does require some forethought. Suppose the person who entered the data changed her mind, and decided to remove all the executives from the table. Should an executive who's also in finance be removed?
Another problem is that there's a pretty good chance that not every user should be allowed to approve a pay rise. I'd give some thought to that before implementing any solution.
I know it looks ugly but I think somethimes the solution can be to have the table_name in the table and a union query
create table approve_pay_rise (
rise_proposal varchar2(10) -- foreign key to payrise table
, approver varchar2(10) -- key of record in table named in other_table
, other_table varchar2(15) );
insert into approve_pay_rise values ('prop000001', 'e0009999', 'USERS');
insert into approve_pay_rise values ('prop000001', 'm0002200', 'MANAGERS');
Then either in code a case statement, repeated statements for each other_table value (select ... where other_table = '' .. select ... where other_table = '') or a union select.
I have to admit I shudder when I encounter it and I'll now go wash my hands after typing a recomendation to do it, but it works.
Sounds like you'd might need two tables ("ApprovalUsers" and "ApprovalGroups"). The SELECT statement(s) would be a UNION of UserIds from the "ApprovalUsers" and the UserIDs from any other groups of users that are the "ApprovalGroups" related to the PayRiseId.
SELECT UserID
INTO #TempApprovers
FROM ApprovalUsers
WHERE PayRiseId = 12345
IF EXISTS (SELECT GroupName FROM ApprovalGroups WHERE GroupName = "Executives" and PayRiseId = 12345)
BEGIN
SELECT UserID
INTO #TempApprovers
FROM Executives
END
....
EDIT: this would/could duplicate UserIds, so you would probably want to GROUP BY UserID (i.e. SELECT UserID FROM #TempApprovers GROUP BY UserID)

Resources