Duplicated results when performing INNER JOIN

Duplicated results when performing INNER JOIN - sql-server

I have 2 simple tables that I would like to perform an INNER JOIN with, but the problem is that I'm getting duplicated (for the columns str1 and str2) results:
CREATE TABLE #A (Id INT, str1 nvarchar(50), str2 nvarchar(50))
insert into #A values (1, 'a', 'b')
insert into #A values (2, 'a', 'b')
CREATE TABLE #B (Id INT, str1 nvarchar(50), str2 nvarchar(50))
insert into #B values (7, 'a', 'b')
insert into #B values (8, 'a', 'b')
select * from #A a
INNER JOIN #B b ON a.str1 = b.str1 AND a.str2 = b.str2
It gave me 4 records when I really wanted 2.
What I got:
id | str1 | str2| id | str1 | str2
1 | a | b | 7 | a | b
2 | a | b | 7 | a | b
1 | a | b | 8 | a | b
2 | a | b | 8 | a | b
What I really wanted:
1 a | b | 7 | a | b
2 a | b | 8 | a | b
Can anyone help? I know this is achievable using a cursor and loop, but I'd like to avoid it and only use some type of JOIN if possible.

SELECT
a.id AS a_id, a.str1 AS a_str1, a.str2 AS a_str2,
b.id AS b_id, b.str1 AS b_str1, b.str2 AS b_str2
FROM
( SELECT *
, ROW_NUMBER() OVER (PARTITION BY str1, str2 ORDER BY id) AS rn
FROM #A
) a
INNER JOIN
( SELECT *
, ROW_NUMBER() OVER (PARTITION BY str1, str2 ORDER BY id) AS rn
FROM #B
) b
ON a.str1 = b.str1
AND a.str2 = b.str2
AND a.rn = b.rn ;
If you have more rows in one or the other tables for the same (str1, str2) combination, you can choose which ones will be returned by changing INNER join to either LEFT, RIGHT or FULL join.

You can accomplish a sort of matching with a query like the following (SQL 2005 and up):
WITH A AS (
SELECT
Seq = Row_Number() OVER (PARTITION BY Str1, Str2 ORDER BY Id),
*
FROM #A
), B AS (
SELECT
Seq = Row_Number() OVER (PARTITION BY Str1, Str2 ORDER BY Id),
*
FROM #B
)
SELECT
A.Id, A.Str1, A.Str2, B.Id, B.Str1, B.Str2
FROM
A
FULL JOIN B
ON A.Seq = B.Seq AND A.Str1 = B.Str1 AND A.Str2 = B.Str2;
This joins the items between A and B on their Id-ordered position. But take note: if you have an unequal number of items for each set of Str1 and Str2, you may get unexpected results, since NULLs will appear for #A or #B.
I'm assuming here that you want the first row of table #A's "Str1 Str2", as ordered by #A.Id (1 being first), to correlate with the first row of table #B's "Str1 Str2", as ordered by #B.Id (7 being first), and so on and so forth for each successively numbered row. Is that right?
But what will you do if the number of rows does not match, and there are, for example, 3 rows in #A that have the same values as 2 rows in #B? Or the reverse? What do you want to see?
A mere DISTINCT will not do the job because the data is not duplicated. You are getting what is in effect a partial cross-join (resulting in a partial Cartesian product). That is, your join criteria do not ensure that there is a one-to-one correspondence of #A row to #B row. When that happens, for each row in #A, you will get an output row for each matching row in B. 2 x 2 = 4, not 2.
I think it would help if you were to be a little more concrete in your example. What things are you actually querying? Surely you've simplified for us, but that has also removed all context for us to know what you're trying to accomplish in the real world. If you are trying to line up sports teams, we might give a different answer than if you are trying to line up invoice line items or tardy occurrences or who knows what!

With that data, and just that data, you can't get the result you want, unless you can provide some way for each of #A's ID values to map to each of #B's ID values.
So, if you really have just 2 records in each table, it would go something like this:
SELECT *
FROM #A a
JOIN #B b
ON a.str1 = b.str1 -- actually, if you join by IDs this isn't necessary
AND a.str2 = b.str2 -- nor is this
AND
(
( a.ID = 1 and b.ID = 7 )
OR ( a.ID = 2 and b.ID = 8 )
)
What you're getting is called a Cartesian product, where each record in #A is paired with each matching record in #B. Since there is more than one matching record in each table, you get every possible combination of matching records from A and B.
Since the only other fields you have to work with are the ID fields, you need to use those to combine exactly one A record with one B record.

Related

what to use instead of union to join same results based on two where clauses

I have two queries that work as expected for example
Query 1
select Name,ID,Product,Question
from table 1
where Id= 9 and ProductID=30628
table output
Name | ID | Product | QUestion
0659e103-b33d-4603 |12356|Apple | is it picked up?
0659e103-b33d-4603 |12456|Apple |Available in store?
0659e103-b33d-4603 |12458|Apple |confirm order?
query 2
select Name,ID,Product,Question
from table 1
where Id= 9 and TypeID=2
table output
Name | ID | Product | QUestion
0659e103-b33d-4603 |12347|Apple | Problem at store?
as you can see in query 1 i use a ProductID and in query 2 i use a TypeID these two values gives me different out puts
so i used a union to join both as follows
select Name,ID,Product,Question
from table 1
where Id= 9 and ProductID=30628
union
select Name,ID,Product,Question
from table 1
where Id= 9 and TypeID=2
which i get the desired output
Name | ID | Product | QUestion
0659e103-b33d-4603 |12356|Apple | is it picked up?
0659e103-b33d-4603 |12456|Apple |Available in store?
0659e103-b33d-4603 |12458|Apple |confirm order?
0659e103-b33d-4603 |12347|Apple | Problem at store?
is their a better way to do this because my query will grow and i would not like to repeat the same thing over again. is their a better way to optimize the query?
NOte i can not use ProductID and TypeID on the same line because they do not result in accurate results

You could use OR since you are querying the same table.
SELECT Name
,ID
,Product
,Question
FROM TABLE1
WHERE (
Id = 9
AND ProductID = 30628
)
OR (
Id = 9
AND TypeID = 2
)
If you have a growing number of OR conditions you could use a temp table/variable and inner join to profit from a set based operation.
The inner join will only return matching rows.
CREATE TABLE #SomeTable(Id INT NOT NULL, ProductID INT NULL, TypeID INT NULL)
-- Insert all conditions you want to match.
INSERT INTO #SomeTable(Id, ProductID, TypeId)
VALUES (9, 30628, NULL)
, (9, NULL, 2)
SELECT Name
,ID
,Product
,Question
FROM TABLE1 x
INNER JOIN #SomeTable y ON
x.ID = y.ID -- Since ID is Not null in the temp table
AND (y.ProductID IS NULL OR y.ProductID = x.ProductID)
AND (y.TypeID IS NULL OR y.TypeID = x.TypeID)

You can use cas-when clause with a self join.
Case-when something like this:
SELECT t1_2.Name,
t1_2.ID,
t1_2.Product,
t1_2.Question,
(CASE WHEN (t1.Id= 9 and t1.ProductID=30628) THEN ID
WHEN (t1.Id= 9 and t1.TypeID=2) THEN ID
ELSE NULL) AS IDcalc
FROM table_1 t1 LEFT JOIN table_1 t1_2
ON t1.ID = t1_2.ID
WHERE (CASE WHEN (t1.Id= 9 and t1.ProductID=30628) THEN ID
WHEN (t1.Id= 9 and t1.TypeID=2) THEN ID
ELSE NULL) IS NOT NULL
You can use any table in the join.
In comparison of query performance the OR is much better until you have only one table, if you have more tables, then you should use temp table or case-when in your query.

T-SQL grouping by either of two columns

I have a table where I hold chat history as follows:
Id From To Text Hour
=================================================
1 A B Msg_A_B1 00:01
2 A B Msg_A_B2 00:02
3 B A Msg_B_A1 00:03
4 A B Msg_A_B3 00:05
5 C A Msg_C_A1 00:11
6 A C Msg_A_C1 00:12
7 C A Msg_C_A2 00:14
8 D B Msg_D_B1 00:17
I want to create a chat header list from data for a spesific user. The rules are
I want to get the start (first) "Hour" of each chat
and last message of chat for a spesific user
ordered by "Hour" ascending
For example if the user is "A" I want to get
Correspondant Text Hour
=======================================
B Msg_A_B3 00:01
C Msg_C_A2 00:11
Or for user "B" :
Correspondant Text Hour
=======================================
A Msg_A_B3 00:01
D Msg_D_B1 00:17
I can possibly do it by using Temporary tables, but I am seeking a simpler and faster solution.
This information might lead to the use of Stored Procedures, but a proper use of Views is also accepted.

What you are missing is a grouping column to mark a chat between A and B as "belonging together" without looking if A or B is the From or the To.
It is your table design, which makes things difficult. Below my suggestion I will add some hints how this might be done better:
Your mockup to simulate your issue:
DECLARE #mockupTable TABLE(Id INT,[From] VARCHAR(100),[To] VARCHAR(100),[Text] VARCHAR(100),[Hour] TIME(0))
INSERT INTO #mockupTable VALUES
(1,'A','B','Msg_A_B1','00:01')
,(2,'A','B','Msg_A_B2','00:02')
,(3,'B','A','Msg_B_A1','00:03')
,(4,'A','B','Msg_A_B3','00:05')
,(5,'C','A','Msg_C_A1','00:11')
,(6,'A','C','Msg_A_C1','00:12')
,(7,'C','A','Msg_C_A2','00:14')
,(8,'D','B','Msg_D_B1','00:17');
--The query
WITH cte AS
(
SELECT t.*
,CONCAT(CASE WHEN t.[From]>t.[To] THEN t.[To] ELSE t.[From] END,'-',CASE WHEN t.[From]>t.[To] THEN t.[From] ELSE t.[To] END) AS ChatID
FROM #mockupTable t
)
,FindFirstAndLast AS
(
SELECT cte1.ChatID
,(SELECT TOP 1 Id FROM cte cte2 WHERE cte2.ChatID=cte1.ChatID ORDER BY cte2.[Hour] ASC) AS FirstId
,(SELECT TOP 1 Id FROM cte cte2 WHERE cte2.ChatID=cte1.ChatID ORDER BY cte2.[Hour] DESC) AS LastId
FROM cte cte1
GROUP BY cte1.ChatID
)
SELECT fal.ChatID
,tFirst.[From] AS FirstFrom
,tFirst.[To] AS FirstTo
,tFirst.[Hour] AS FirstHour
,tLast.[From] AS LastFrom
,tLast.[To] AS LastTo
,tLast.[Text] AS LastText
FROM FindFirstAndLast fal
INNER JOIN #mockupTable tFirst ON fal.FirstId=tFirst.Id
INNER JOIN #mockupTable tLast ON fal.LastId=tLast.Id;
The idea in short:
The first CTE will create a ChatID by concatenating the From and the To in a sorted way. Doing so a message from A to B will get the same ChatID as a message from B to A.
The second CTE will use a correlated sub-query to find the first and the last message id, grouped for the previously computed ChatID.
The final SELECT will use these message ids to join the appropriate rows.
The result is coming with everything you need. It's on you, to put it in the format needed:
+--------+-----------+---------+-----------+----------+--------+----------+
| ChatID | FirstFrom | FirstTo | FirstHour | LastFrom | LastTo | LastText |
+--------+-----------+---------+-----------+----------+--------+----------+
| A-B | A | B | 00:01:00 | A | B | Msg_A_B3 |
+--------+-----------+---------+-----------+----------+--------+----------+
| A-C | C | A | 00:11:00 | C | A | Msg_C_A2 |
+--------+-----------+---------+-----------+----------+--------+----------+
| B-D | D | B | 00:17:00 | D | B | Msg_D_B1 |
+--------+-----------+---------+-----------+----------+--------+----------+
Some ideas about the desing
I would use
one table Person for your chatting persons.
a second table Chat for a chat with a ChatID.
one m:n mapping table ChattingPerson with JoinTime, a ChatID and a PersonID, both as FKs. Here you can set timestamps like LastAction or mark the status (active, has left, ...)
one more table Message for the messages with time, text, and ChatPersonID as FK.
Your advantages
The opener can explicitly invite more persons (or limit it to one for a person2person chat), or just wait for participants.
Starting a chat creates the row in the Chat table, the first row in the ChattingPerson table to mark the opener, and eventually a first message row.
Following messages add - if not existing yet - a row to the ChatPerson (with a new participant) and the message row.
The ID to the ChatPerson-table will give you the ChatID and the PersonID.
You can filter per chat and/or by person.
There can be separate chats between A and B over the time
You can control the type of chat with a PersonCount-Constraint
You can enforce, that a new ChatPerson can only be added by the opener
You can create certain chat types (like "person2person") with a template
Happy Coding :-)

Let's do it Creating a view:
First let's load the data in table t1:
create table t1 (Id int,[From] varchar(10),[To] varchar(10),Text varchar(100),Hour time(0))
insert into t1 values (1,'A','B','Msg_A_B1','00:01')
insert into t1 values (2,'A','B','Msg_A_B2','00:02')
insert into t1 values (3,'B','A','Msg_B_A1','00:03')
insert into t1 values (4,'A','B','Msg_A_B3','00:05')
insert into t1 values (5,'C','A','Msg_C_A1','00:11')
insert into t1 values (6,'A','C','Msg_A_C1','00:12')
insert into t1 values (7,'C','A','Msg_C_A2','00:14')
insert into t1 values (8,'D','B','Msg_D_B1','00:17')
Then let's create the view
create view vChats
as
with cte as (
select left([text],len([text])-1) as chat,
t.*
from t1 as t
),
cte2 as (
select chat,
min(hour) as minHour,
max(text) as maxText
from cte
group by chat
),
cte3 as (select distinct [From] as [User]
from t1
UNION
select distinct [To] as [User]
from t1
)
select c3.[User],
t.[To] as Correspondant,
c.maxText as [Text],
c.minHour as [Hour]
from cte2 as c
inner join t1 as t ON c.maxText = t.[Text]
inner join cte3 as c3 ON c3.[User] = t.[From]
UNION
select c3.[User],
t.[From] as Correspondant,
c.maxText as [Text],
c.minHour as [Hour]
from cte2 as c
inner join t1 as t ON c.maxText = t.[Text]
inner join cte3 as c3 ON c3.[User] = t.[To]
After that you can use to get all the communications for each user like this:
select *
from vChats
where [User] = 'A'

How to select random rows whose column sum meets the condition in Sql server

Is it possible to select random rows from a table whose particular column total (sum) should be less than my condition value ?
My table structure is like -
id | question | answerInSec
1 | Quest1 | 15
2 | Quest2 | 20
3 | Quest3 | 10
4 | Quest4 | 15
5 | Quest5 | 10
6 | Quest6 | 15
7 | Quest7 | 20
I want to get those random questions whose total sum of 'answerInSec' column is less than (nearest total) or equal to 60.
So random combination can be [1,2,3,4] OR [2,3,5,7] OR [4,5,6,7] etc.
I tried as follows but no luck
select id,question,answerinsec
from (select Question.*, sum(answerinsec) over (order by id) as CumTicketCount
from Question
) t
where cumTicketCount <= 60
ORDER BY NEWID();

I hope this one help
DECLARE #MaxAnswerInSec INT = 60
DECLARE #SumAnswerInSec INT = 0
DECLARE #RadomQuestionTable TABLE(Id INT, Question NVARCHAR(100), AnswerInSec INT)
DECLARE #tempId INT,
#tempQuestion NVARCHAR(100),
#tempAnswerInSec INT
WHILE #SumAnswerInSec <= #MaxAnswerInSec
BEGIN
SELECT TOP(1) #tempId = Id, #tempQuestion = Question, #tempAnswerInSec = AnswerInSec
FROM Question
WHERE Id NOT IN (SELECT Id FROM #RadomQuestionTable)
AND AnswerInSec + #SumAnswerInSec <= #MaxAnswerInSec
ORDER BY NEWID()
IF #tempId IS NOT NULL
BEGIN
INSERT INTO #RadomQuestionTable VALUES(#tempId, #tempQuestion, #tempAnswerInSec)
END
ELSE
BEGIN
BREAK
END
SELECT #tempId = NULL
SELECT #SumAnswerInSec = SUM(AnswerInSec) FROM #RadomQuestionTable
END
SELECT * FROM #RadomQuestionTable

OK. Try this. This might not be the fastest, but is easier to understand and implement. Moreover this is a SQL-only solution:
SELECT t1.id, t2.id, t3.id, t4.id FROM Question t1 CROSS JOIN Question t2
CROSS JOIN Question t3 CROSS JOIN Question t4
WHERE t2.id > t1.id AND t3.id > t2.id AND t4.id > t3.id
AND t1.answerInSec + t2.answerInSec + t3.answerInSec + t4.answerInSec = 60
What this basically does is to create a cross product of your Questions table with itself and then repeats this process two more times, thus creating N ^ 4 rows where N is the number of rows in your table. It then filters out duplicate rows by only those selecting the permutations where t1.id < t2.id < t3.id < t4.id. It then filters remaining rows by looking for the rows where the sum of all score fields is equal to your target value (60).
Note that this result set can become HUGE for even moderately sized tables. For example, a table with just 200 rows will generate a cross product of 200 ^ 4 = 1,600,000,000 rows (though a lot of them will be discarded by the WHERE clause). You should have your indexes in place if your table is large.
Also note that this query does not account for the permutations where less than 4 rows may add up to 60. You can easily modify it to do that by including a NULL row in your table (a row whose score field is zero).

SELECT *
FROM question
WHERE answerInSec<50
ORDER BY CHECKSUM(NEWID())

SQL GROUP BY with columns which contain mirrored values

Sorry for the bad title. I couldn't think of a better way to describe my issue.
I have the following table:
Category | A | B
A | 1 | 2
A | 2 | 1
B | 3 | 4
B | 4 | 3
I would like to group the data by Category, return only 1 line per category, but provide both values of columns A and B.
So the result should look like this:
category | resultA | resultB
A | 1 | 2
B | 4 | 3
How can this be achieved?
I tried this statement:
SELECT category, a, b
FROM table
GROUP BY category
but obviously, I get the following errors:
Column 'a' is invalid in the select list because it is not contained
in either an aggregate function or the GROUP BY clause.
Column 'b' is invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause.
How can I achieve the desired result?

Try this:
SELECT category, MIN(a) AS resultA, MAX(a) AS resultB
FROM table
GROUP BY category
If the values are mirrored then you can get both values using MIN, MAX applied on a single column like a.

Seams you don't really want to aggregate per category, but rather remove duplicate rows from your result (or rather rows that you consider duplicates).
You consider a pair (x,y) equal to the pair (y,x). To find duplicates, you can put the lower value in the first place and the greater in the second and then apply DISTINCT on the rows:
select distinct
category,
case when a < b then a else b end as attr1,
case when a < b then b else a end as attr2
from mytable;

Considering you want a random record from duplicates for each category.
Here is one trick using table valued constructor and Row_Number window function
;with cte as
(
SELECT *,
(SELECT Min(min_val) FROM (VALUES (a),(b))tc(min_val)) min_val,
(SELECT Max(max_val) FROM (VALUES (a),(b))tc(max_val)) max_val
FROM (VALUES ('A',1,2),
('A',2,1),
('B',3,4),
('B',4,3)) tc(Category, A, B)
)
select Category,A,B from
(
Select Row_Number()Over(Partition by category,max_val,max_val order by (select NULL)) as Rn,*
From cte
) A
Where Rn = 1

select resultset of counts by array param in postgres

I've been searching for this and it seems like it should be something simple, but apparently not so much. I want to return a resultSet within PostgreSQL 9.4.x using an array parameter so:
| id | count |
--------------
| 1 | 22 |
--------------
| 2 | 14 |
--------------
| 14 | 3 |
where I'm submitting a parameter of {'1','2','14'}.
Using something (clearly not) like:
SELECT id, count(a.*)
FROM tablename a
WHERE a.id::int IN array('{1,2,14}'::int);
I want to test it first of course, and then write it as a storedProc (function) to make this simple.

Forget it, here is the answer:
SELECT a.id,
COUNT(a.id)
FROM tableName a
WHERE a.id IN
(SELECT b.id
FROM tableName b
WHERE b.id = ANY('{1,2,14}'::int[])
)
GROUP BY a.id;

You can simplify to:
SELECT id, count(*) AS ct
FROM tbl
WHERE id = ANY('{1,2,14}'::int[])
GROUP BY 1;
More:
Check if value exists in Postgres array
To include IDs from the input array that are not found I suggest unnest() followed by a LEFT JOIN:
SELECT id, count(t.id) AS ct
FROM unnest('{1,2,14}'::int[]) id
LEFT JOIN tbl t USING (id)
GROUP BY 1;
Related:
Preserve all elements of an array while (left) joining to a table
If there can be NULL values in the array parameter as well as in the id column (which would be an odd design), you'd need (slower!) NULL-safe comparison:
SELECT id, count(t.id) AS ct
FROM unnest('{1,2,14}'::int[]) id
LEFT JOIN tbl t ON t.id IS NOT DISTINCT FROM id.id
GROUP BY 1;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight