Distinct count over two tables. SQL - sql-server

I'm very new to SQL, I apologize if something doesn't make sense!
I have two tables each of which has a column 'client_nbr'. Some of the client_nbrs will overlap in the two tables. I'm needing to count the number of people with a certain value in column 'age' that is in both tables. For example, the results should have something like
age - 5 count - 3,000
And that will only count a client number once, even if it is in both tables.
When I do this for one table I run:
Select age, count(distinct(client_nbr))
From table1
Group by age
I tried to follow the example here: http://www.sqlservercurry.com/2011/07/sql-server-distinct-count-multiple.html?m=1
Using:
Select table1.age,table2.age,
Count(distinct(table1.client_nbr)) as total
From table1,table2
Where table1.client_nbr=table2.client_nbr
Group by table1.age,table2.age
It didn't work out though. The total count was less than when I run a distinct count on just table1.
Thank you in advance!

Try this instead:
SELECT age, COUNT(DISTINCT client_nbr) AS Total
FROM
(
SELECT age, client_nbr FROM table1
UNION ALL
SELECT age, client_nbr FROM table2
) AS t
GROUP BY age

You are using an implicit inner join in your query meaning only the values contained in both tables are returned. Use an outer join to get all the values in both tables
Select table1.age,table2.age,
Count(distinct(table1.client_nbr)) as total
From table1 FULL OUTER JOIN table2 ON table1.age = table2.age
Group by table1.age,table2.age

Related

Alternative of UNION in sql server

I have 2 tables which contains 5 unique cities each. I want all 10 cities but i don't want to use UNION. Is there any alternative for UNION.
SELECT DISTINCT CITY FROM TABLE1
UNION
SELECT DISTINCT CITY FROM TABLE2
Here is an alternate way
SELECT DISTINCT CASE WHEN a.city is null then b.city else a.city end
FROM Table1 FULL JOIN Table2b ON 1 = 0
it offers no advantage over UNION - but you might be interested in seeing FULL JOIN, which has its similarities to UNION
You can apply Full Outer join instead of Union
SELECT DISTINCT ISNULL(t.City,t1.City)
FROM dbo.TABLE1 t
FULL OUTER JOIN dbo.TABLE2 t1 ON t.City = t.City;
This query provides you the same result as union
You can insert the data that you want into a temporary table and retrieve it from there. That will avoid the need for a UNION.
SELECT DISTINCT CITY
INTO #City
FROM TABLE1
INSERT INTO #City
SELECT DISTINCT CITY
FROM TABLE2
SELECT DISTINCT City
FROM #City
If the first table is sure to contains all the records of the second table, then one can check if the id could be found inside a subquery with an OR clause.
I'm using an ORM framework which doesn't support the UNION operator (Apache OJB) and, with the above assumption, this strategy has proven to be faster than with the use of FULL OUTER JOIN.
For instance if the table STUDENT contains all the students of a province/state with a field for their current main school and another table, STUDENT_SECONDARY_SCHOOL, contains information for those students attending a second school part time, I can get the union of all students attending a particular school either full time or part time this way :
SELECT STD_ID FROM STUDENT
WHERE
STD_SCHOOL='the_school'
OR
STD_ID IN (SELECT STD_ID FROM STUDENT_SECONDARY_SCHOOL WHERE STD_SCHOOL='the_school')
Again, I want to emphasize that this is NOT the equivalent of a UNION but can be useful in some situations.

SQL query where person has as request for every product

I have a shortcut query where I just counted the total number of products and used count to display that the person has requested that many products. (Which is 8 products)
I want to know if there's an easier way where I wouldn't need to count the the products myself and have the query do it. Basically, replace the 8 with the total amount of products that the database has.
SELECT DISTINCT
Tb_Consumer.Name
FROM
Tb_Consumer, Tb_Product, Tb_Requests
WHERE
Tb_Consumer.Con_ID = Tb_Requests.Con_ID
AND Tb_Requests.Prod_ID = Tb_Product.Prod_ID
GROUP BY
Tb_Consumer.Name
HAVING
COUNT(Tb_Product.Name) = 8
Use a subquery to find the number of products in the Tb_Product table:
SELECT
tbc.Name
FROM Tb_Consumer tbc
INNER JOIN Tb_Request tbr
ON tbc.Con_ID = tbr.Con_ID
INNER JOIN Tb_Product tbp
ON tbr.Prod_ID = tbp.Prod_ID
GROUP BY
tbc.Name
HAVING
COUNT(tbp.Name) = (SELECT COUNT(*) FROM Tb_Product); -- count products here
This assumes that every record in Tb_Product corresponds to a single unique product. If there could be duplication for some reason, then you can count distinct products, e.g.
(SELECT COUNT(DISTINCT Name) FROM Tb_Product)
Other changes I made include removing DISTINCT from the select clause, since the GROUP BY should already make each name distinct. I also refactored your query to remove the commas in the FROM clause. Instead, I use explicit joins between the three tables.

Join columns to rows

Suppose you have a table Table1 with columns
UserId, Item1, Item2, Item3, Item4, Item5, Item6, Item7, Item8, Item9, Item10
and you have another table Table2 with
UserId, ItemId, Name
. The values in Table1 is the ItemId from Table2. I have a need to display
UserId, ItemId, Name
where Item1 is 1st and Item10 is last and you have 10 rows. In other words, Item1 is 1st row and Item10 is last row. If there's any way to avoid CASE WHEN that would be great. I may have more columns in the future and would hate to hardcode the 10 columns.
I think you want a reverse pivot in this case. You don't use CASE, like you would in a normal pivot, but instead UNION ALL, like this:
select Table1.UserId, Table2.ItemId, Table2.Name
from Table1 inner join Table2 on Table1.Item1 = Table2.ItemId
UNION ALL
select Table1.UserId, Table2.ItemId, Table2.Name
from Table1 inner join Table2 on Table1.Item2 = Table2.ItemId
UNION ALL
...
select Table1.UserId, Table2.ItemId, Table2.Name
from Table1 inner join Table2 on Table1.Item10 = Table2.ItemId
If you have more items, you should also be able to write a snippet that generates the repeating UNION ALL syntax so you don't have to type it all by hand.
Given you can bypass doing it entirely with SQL, I would highly recommend using e.g. R or Python to process transactions in a ML useable way. The tidyr package with the gather function does exactly what you want to do.
Another way is to crosstabulate. It´s absolutely fine deriving a solution with the SQL standard, but a lot of problems can be much easier done within R or Python.
A table1 with just 3 columns
userid, itemid, sequence
would be more conducive for your purposes. You would be required to convert your AzureML output from the single line
Uid1, itm1,itm2,itm3,...,itm10
into 10 lines like
Uid1, itm1, 1
Uid1, itm2, 2
Uid1, itm3, 3
...
Uid1, itm10,10
Assuming you get the above output line as a (temporary) table output from AzureML with name tbla you could use the follwing UNION ALL construct (as suggested by Spencer Simpson):
INSERT INTO table1 (userid, itemid, sequence)
SELECT uid, itm1, 1 FROM tbla UNION ALL
SELECT uid, itm2, 2 FROM tbla UNION ALL
SELECT uid, itm3, 3 FROM tbla UNION ALL
SELECT uid, itm4, 4 FROM tbla UNION ALL
...
SELECT uid, itm10, 10 FROM tbla
To store the information into table1 which will be the only table you will have to deal with. No JOINs will be required anymore.
Note: I am not quite sure what your column name relates to. Is it the name of an item or the name of a user?
In both cases there should be a second table table2 that takes care of the correspondence between name and userid/itemid like
itm/usr name
This table will then be join-ed into any query that requires displaying the name column too.
What I did to work around this was to use Python (or R) and use the melt function.
There is also a pivot_table function in the dataframe.
So, you can have your columns be converted to rows. Then join those rows on the other table.
Reshaping and Pivot Tables

SQL queries combined into one row

I'm having some difficulty combining the following queries, so that the results display in one row rather than in multiple rows:
SELECT value FROM dbo.parameter WHERE name='xxxxx.name'
SELECT dbo.contest.name AS Event_Name
FROM contest
INNER JOIN open_box on open_box.contest_id = contest.id
GROUP BY dbo.contest.name
SELECT COUNT(*) FROM open_option AS total_people
SELECT SUM(scanned) AS TotalScanned,SUM(number) AS Totalnumber
FROM dbo.open_box
GROUP BY contest_id
SELECT COUNT(*) FROM open AS reff
WHERE refer = 'True'
I would like to display data from the fields in each column similar to what is shown in the image below. Any help is appreciated!
Tab's solution is fine, I just wanted to show an alternative way of doing this. The following statement uses subqueries to get the information in one row:
SELECT
[xxxx.name]=(SELECT value FROM dbo.parameter WHERE name='xxxxx.name'),
[Event Name]=(SELECT dbo.contest.name
FROM contest
INNER JOIN open_box on open_box.contest_id = contest.id
GROUP BY dbo.contest.name),
[Total People]=(SELECT COUNT(*) FROM open_option),
[Total Scanned]=(SELECT SUM(scanned)
FROM dbo.open_box
GROUP BY contest_id),
[Total Number]=(SELECT SUM(number)
FROM dbo.open_box
GROUP BY contest_id),
Ref=(SELECT COUNT(*) FROM open WHERE refer = 'True');
This requires the Total Scanned and Total Number to be queried seperately.
Update: if you then want to INSERT that into another table there are essentially two ways to do that.
Create the table directly from the SELECT statement:
SELECT
-- the fields from the first query
INTO
[database_name].[schema_name].[new_table_name]; -- creates table new_table_name
Insert into a table that already exists from the INSERT
INSERT INTO [database_name].[schema_name].[existing_table_name](
-- the fields in the existing_table_name
)
SELECT
-- the fields from the first query
Just CROSS JOIN the five queries as derived tables:
SELECT * FROM (
Query1
) AS q1
CROSS JOIN (
Query2
) AS q2
CROSS JOIN (...
Assuming that each of your individual queries only returns one row, then this CROSS JOIN should result in only one row.

Multiple Select against one CTE

I have a CTE query filtering a table Student
Student
(
StudentId PK,
FirstName ,
LastName,
GenderId,
ExperienceId,
NationalityId,
CityId
)
Based on a lot filters (multiple cities, gender, multiple experiences (1, 2, 3), multiple nationalites), I create a CTE by using dynamic sql and joining the student table with a user defined tables (CityTable, NationalityTable,...)
After that I have to retrieve the count of student by each filter like
CityId City Count
NationalityId Nationality Count
Same thing the other filter.
Can I do something like
;With CTE(
Select
FROM Student
Inner JOIN ...
INNER JOIN ....)
SELECT CityId,City,Count(studentId)
FROm CTE
GROUP BY CityId,City
SELECT GenderId,Gender,Count
FROM CTE
GROUP BY GenderId,Gender
I want to something like what LinkedIn is doing with search(people search,job search)
http://www.linkedin.com/search/fpsearch?type=people&keywords=sales+manager&pplSearchOrigin=GLHD&pageKey=member-home
It's so fast and do the same thing.
You can not use multiple select but you can use more than one CTE like this.
WITH CTEA
AS
(
SELECT 'Coulmn1' A,'Coulmn2' B
),
CETB
AS
(
SELECT 'CoulmnX' X,'CoulmnY' Y
)
SELECT * FROM CTEA, CETB
For getting count use RowNumber and CTE some think like this.
ROW_NUMBER() OVER ( ORDER BY COLUMN NAME )AS RowNumber,
Count(1) OVER() AS TotalRecordsFound
Please let me know if you need more information on this.
Sample for your reference.
With CTE AS (
Select StudentId, S.CityId, S.GenderId
FROM Student S
Inner JOIN CITY C
ON S.CityId = C.CityId
INNER JOIN GENDER G
ON S.GenderId = G.GenderId)
,
GENDER
AS
(
SELECT GenderId
FROM CTE
GROUP BY GenderId
)
SELECT * FROM GENDER, CTE
It is not possible to get multiple result sets from a single CTE.
You can however use a table variable to cache some of the information and use it later instead of issuing the same complex query multiple times:
declare #relevantStudent table (StudentID int);
insert into #relevantStudent
select s.StudentID from Students s
join ...
where ...
-- now issue the multiple queries
select s.GenderID, count(*)
from student s
join #relevantStudent r on r.StudentID = s.StudentID
group by s.GenderID
select s.CityID, count(*)
from student s
join #relevantStudent r on r.StudentID = s.StudentID
group by s.CityID
The trick is to store only the minimum required information in the table variable.
As with any query whether this will actually improve performance vs. issuing the queries independently depends on many things (how big the table variable data set is, how complex is the query used to populate it and how complex are the subsequent joins/subselects against the table variable, etc.).
Do a UNION ALL to do multiple SELECT and concatenate the results together into one table.
;WITH CTE AS(
SELECT
FROM Student
INNER JOIN ...
INNER JOIN ....)
SELECT CityId,City,Count(studentId),NULL,NULL
FROM CTE
GROUP BY CityId,City
UNION ALL
SELECT NULL,NULL,NULL,GenderId,Gender,Count
FROM CTE
GROUP BY GenderId,Gender
Note: The NULL values above just allow the two results to have matching columns, so the results can be concatenated.
I know this is a very old question, but here's a solution I just used. I have a stored procedure that returns a PAGE of search results, and I also need it to return the total count matching the query parameters.
WITH results AS (...complicated foo here...)
SELECT results.*,
CASE
WHEN #page=0 THEN (SELECT COUNT(*) FROM results)
ELSE -1
END AS totalCount
FROM results
ORDER BY bar
OFFSET #page * #pageSize ROWS FETCH NEXT #pageSize ROWS ONLY;
With this approach, there's a small "hit" on the first results page to get the count, and for the remaining pages, I pass back "-1" to avoid the hit (I assume the number of results won't change during the user session). Even though totalCount is returned for every row of the first page of results, it's only computed once.
My CTE is doing a bunch of filtering based on stored procedure arguments, so I couldn't just move it to a view and query it twice. This approach allows avoid having to duplicate the CTE's logic just to get a count.

Resources