Right way to use distinct in SQL Server - sql-server

I am trying to retrieve some records based on the query
Select distinct
tblAssessmentEcosystemCredit.AssessmentEcosystemCreditID,
tblSpecies.CommonName
from
tblAssessmentEcosystemCredit
left join
tblSpeciesVegTypeLink on tblAssessmentEcosystemCredit.VegTypeID = tblSpeciesVegTypeLink.VegTypeID
left join
tblSpecies on tblSpecies.SpeciesID = tblSpeciesVegTypeLink.SpeciesID
where
tblAssessmentEcosystemCredit.SpeciesTGValue < 1
The above query returns 17,000 records but when I remove tblSpecies.CommonName, it retrieves only 4200 (that's actually correct).
I have no idea how to distinct only tblAssessmentEcosystemCredit.AssessmentEcosystemCreditID column and retrieve all other table columns in the query.

This query selects the different COMBINATION of AssessmentEcosystemCreditID and CommonName; if you want only one row per value of AssessmentEcosystemCreditID then you need to use a GROUP BY, as suggested by #JonasB; however, in that case, there could be several values of CommonName per value of AssessmentEcosystemCreditID , and so SQL requires you to specify WHICH one you want
Select tblAssessmentEcosystemCredit.AssessmentEcosystemCreditID ,
max(tblSpecies.CommonName) as CommonName,
min(tblSpecies.CommonName) as CommonName2, -- so you can verify you only have one value
from tblAssessmentEcosystemCredit
left join tblSpeciesVegTypeLink
on tblAssessmentEcosystemCredit.VegTypeID = tblSpeciesVegTypeLink.VegTypeID
left join tblSpecies on tblSpecies.SpeciesID= tblSpeciesVegTypeLink.SpeciesID
where tblAssessmentEcosystemCredit.SpeciesTGValue <1
GROUP BY tblAssessmentEcosystemCredit.AssessmentEcosystemCreditID

See this topic: mySQL select one column DISTINCT, with corresponding other columns
You probably have to deactivate ONLY_FULL_GROUP_BY, see http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by

Related

SQL Server - select a commonly named column over multiple joined tables where the last edit date is most recent

Our County recently purchased a new real-estate system. I have created a view that joins over a dozen tables. (assessment, taxpayer, collections, etc...)
Every table has a last change data and last change user field.
We would like to know when a parcel number was last changed in the system and what login made those changes.
I was able to pull the most recent change data by using the following nested query...
SELECT
... ,
(SELECT MAX(v)
FROM (VALUES (table1.last_chg_datetime), (table2.last_chg_datetime),
(table3.last_chg_datetime), (table4.last_chg_datetime), ... ,
(tableN.last_chg_datetime)) AS value(v)) AS last_chg_datetime
FROM table1
LEFT OUTER JOIN table2 ...
What is the best way to also pull the last changed user from the table that corresponds with the last changed date?
I am sure this could be done with a stored procedure. I was just wondering if this could also be extracted by altering the nested query?
We are running SQL Server 2016.
The query could be rewritten as:
SELECT ... ,sub. *
FROM table1
LEFT OUTER JOIN table2 ...
OUTER APPLY (SELECT TOP 1 *
FROM (VALUES (table1.last_chg_datetime, table1.user), (table2.last_chg_datetime, table2.user),
(table3.last_chg_datetime, table3.user), (table4.last_chg_datetime, table4.user), ... ,
(tableN.last_chg_datetime, tableN.user)) AS value(v, user_name)
ORDER BY v DESC
) sub;
It could be also extended by table name:
(VALUES ('table1', table1.last_chg_datetime, table1.user),
('table2', table2.last_chg_datetime, table2.user), ...
) AS value(table_name, v, user_name)

Using SQL to combine detailed and aggregated results

I am developing a report against a SQL Server database. Using the query presented here...
SELECT
f.FacilityID as 'FID',
COUNT (DISTINCT f.PhoneTypeID) as 'Ptypes',
COUNT (DISTINCT f.PhoneID) as 'Pnumbers'
from dbo.FacilityPhones as f
inner join
dbo.Phones as ph
f.PhoneID = ph.PhoneID
group by f.FacilityID
having COUNT(DISTINCT f.PhoneTypeID)<>COUNT(DISTINCT f.PhoneId);
...I have identified 107 records where the number of phone numbers present for a Facility differs from the number of phone number types (e.g., there are two distinct phone numbers, both listed as primary).
I would like to be able to produce a detailed report that would list phone numbers and phone types for each facility, but ONLY when the distinct counts differ.
Is there a way to do this with a single query? Or would I need to save the summaries to a temp table, then join back to that temp table to get the details?
Not sure what fields exist in dbo.Phone; but assume the number comes from there... Likely need to join to the type table to get it's description as well...
This uses a common table expression to get your base list of items an then a correlated subquery to ensure only those facilities in your cte are displayed.
WITH CTE AS (
SELECT f.FacilityID as 'FID'
, COUNT (DISTINCT f.PhoneTypeID) as 'Ptypes'
, COUNT (DISTINCT f.PhoneID) as 'Pnumbers'
FROM dbo.FacilityPhones as f
GROUP BY f.FacilityID
HAVING COUNT(DISTINCT f.PhoneTypeID)<>COUNT(DISTINCT f.PhoneId))
SELECT *
FROM dbo.FaclityPhones FP
INNER JOIN dbo.Phones as ph
ON FP.PhoneID = ph.PhoneID
WHERE EXISTS (SELECT 1
FROM CTE
WHERE FID = FP.FacilityID)
The where clause here just says only show those FacilityID's and associated records if the FacilityID exists in your original query (CTE) (107) If we needed data from the CTE we'd join to it; but as it's simply restricting data placing it in the where clause and using an exists will likely be more efficient.

SQL Left Outer Join?

I have table that should joint to another table based on the unique id. If I do LEFT OUTER JOIN ON I will have duplicates. If I put DISTINCT in my SELECT I will get correct number of records. Then if I include any field from the table that I did LEFT OUTER JOIN in that case I'm getting duplicates again. Here is my query:
SELECT DISTINCT
Table1.fname,
Table1.lname,
Table2.address
FROM Table1
LEFT OUTER JOIN Table2
ON Table2.user_id = Table1.userid
In the example above I'm getting duplicates, also I have tried to do:
LEFT OUTER JOIN (
SELECT user_id
FROM Table2
GROUP BY user_id
) AS t2 ON Table1.user_id = t2.user_id
This gave me correct number of records but I need some additional columns from that second table, after I include extra columns I'm getting duplicates again, example:
LEFT OUTER JOIN (
SELECT user_id, address
FROM Table2
GROUP BY user_id, address
) AS t2 ON Table1.user_id = t2.user_id
I'm wondering if I missed something or there is better way to handle this type of problem. If anyone see something or know better solution please let me know.
It is impossible for you to pick the correct answer here without understanding your data.
It seems that Table2 supports multiple addresses per user_id. This is a common design. If you want to return only one address per user_id you have several options:
Fix the data - Remove the duplicate addresses from table 2 and add a constraint that prevents this situation again. You will need to determine which addresses are incorrect.
Reduce the left join to only include one address per user - How you do this will depend on your other data. You could use min() or max() with a group by if you don't care which one to return where there are multiples or you will need to perhaps order by an effective date and take the latest one - or maybe there are billing and shipping addresses and you should pick the correct one.
Accept that there are multiple addresses per user - this may be correct - and adjust the rest of your code.

TSQL query to merge data from multiple tables that may or may not have matching rows?

For example, suppose we're conducting research where students can take up to 10 different tests, and each table in the database stores all the students' responses for one test. The tables are named after each test as: T1, T2, ... , T10. Suppose each table has a primary key column 'Username' that identifies each student. Students may or may not have completed each test, so there may or may not be a record in each table for each student.
What is the correct SQL Query to return all the test data from all tables, with one row per student (one row per username)? I want the simplest query possible that returns the correct results. I would also like to coalesce the Username fields into a single Username field in the final query.
To clarify, I understand that SQL has a major limitation in that it does not support a syntax to select all columns except one or more fields like "select *[^ExcludeColumn1][^ExcludeColumn2]". To avoid specifically naming all columns in the final query, it would be acceptable to leave all the Username columns there, as long as it includes a coalesced Username field at the beginning named something like RowID.
As for the overall query, one option would be to perform a union all on the username column of all ten tables, then select the distinct usernames across all tables, then perform a series of left joins against the list of distinct usernames on all 10 tables. That would result in a very straightforward query where each left join is performed on the same distinct set of usernames, but I want to avoid a separate up-front query for distinct usernames. (Although if that's the best option, let me know). It would look something like this:
select * from
(select distinct coalesce(t1.Username,t2.Username,...,t10.Username) as RowID from t1,t2,t3,t4,t5,t6,t7,t8,t9,t10) distinct_usernames
left join t1 on t1.Username = distinct_usernames.RowID
left join t2 on t2.Username = distinct_usernames.RowID
...
left join t10 on t10.Username = distinct_usernames.RowID
Although that is short and easy to write, it is incredibly inefficient and would take hours to run on test tables with 5000+ rows each, so with an adjustment, an equivalent version that runs in a few seconds is:
select * from (
select distinct Username as RowID from (
select Username from t1
union all
select Username from t2
union all
...
select Username from t10
) all_usernames) distinct_usernames
left join t1 on t1.Username = distinct_usernames.RowID
left join t2 on t2.Username = distinct_usernames.RowID
...
left join t10 on t10.Username = distinct_usernames.RowID
I think that what I have above might be the most efficient and correct query (takes only a couple seconds to run and returns correct result set), but I also thought perhaps it could be simplified with some kind of full join. The problem is that full joins get confusing with more than two tables, because without pre-determining the usernames, each subsequent table would have to match records against any of the preceding tables, resulting in a query where each additional table has "[previous table count] + 1" conditions on matching the username.
Assuming that Username is unique in each table, your second query would be the way I would try first, with the slight modifications of removing distinct and simply using union (which implies distinct) rather than union all:
select *
from (
select Username from t1
union
select Username from t2
union
-- ...
select Username from t10
) distinct_usernames
left join t1 on t1.Username = distinct_usernames.Username
left join t2 on t2.Username = distinct_usernames.Username
-- ...
left join t10 on t10.Username = distinct_usernames.Username
From there I would make sure that Username is indexed, possibly even using it as the clustered index. I've also had optimization luck in the past by implementing your distinct_usernames as a temp table (possibly indexed, or an indexed view) at the beginning of the proc, but only testing would determine if that were worthwhile.
A full outer join would require a bunch of or conditions or coalesce arguments, though it could be worth a try on just a few tables to see if the performance is there. I can't try to out-guess what your query engine will like best.
Also, getting just the column names that you want could be done with a query to sys.columns or information_schema.columns and using dynamic SQL to build your query as a string and then executing that.

Getting repetitive column names by adding a prefix to the repeated column name in SQL Server 2005

How can I write a stored procedure in SQL Server 2005 so that i can display the repeated column names by having a prefix added to it?
Example: If I have 'Others' as the column name belonging to a multiple categories mapped to another table having columns as 'MyColumn','YourColumn'. I need to join these two tables so that my output should be 'M_Others' and 'Y_Others'. I can use a case but I am not sure of any other repeated columns in the table. How to write that dynamically to know the repetitions ?
Thanks In Advance
You should use aliases in the projection of the query: (bogus example, showing the usage)
SELECT c.CustomerID AS Customers_CustomerID, o.CustomerID AS Orders_CustomerID
FROM Customers c INNER JOIN Orders o ON c.CustomerID = o.CustomerID
You can't dynamically change the column names without using dynamic SQL.
You have to explicitly alias them. There is no way to change "A_Others" or "B_Others" in this query:
SELECT
A.Others AS A_Others,
B.Others AS B_Others
FROM
TableA A
JOIN
TableB B ON A.KeyCol = B.KeyCol
If the repeated columns contain the same data (i.e. they are the join fields), you should not be sending both in the query anyway as this is a poor practice and is wasteful of both server and network resources. You should not use select * in queries on production especially if there are joins. If you are properly writing SQL code, you would alias as you go along when there are two columns with the same name that mean different things (for instance if you joined twice to the person table, once to get the doctor name and once to get the patient name). Doing this dynamically from system tables would not only be inefficient but could end up giving you a big security hole depending on how badly you wrote the code. You want to save five minutes or less in development by permanently affecting performance for every user and possibly negatively impacing data security. This is what database people refer to as a bad thing.
select n.id_pk,
(case when groupcount.n_count > 1 then substring(m.name, 1, 1) + '_' + n.name
else n.name end)
from test_table1 m
left join test_table2 n on m.id_pk = n.id_fk
left join (select name, count(name) as n_count
from test_table2 group by name)
groupcount on n.name = groupcount.name

Resources