logic behind using temp table instead of group by in sub query - sql-server

In my table country :
name|gdp|city
-------------
S.A |60 |amr
S.A |60 |amb
US |200|ken
US |70 |mas
aus |80 |po
aus |90 |tr
I want to get the country whose gdp is lower than 100.
when I use (2) it doesnt work and gives error because it returns multiple values which have to be compared to where condition.when I use (1) it works even though the sub query still gives back multiple values which are compared to 100.
What is the logic behind it please explain because I am new to sql. how is subquery in (1) different from (2)
(1)
SELECT DISTINCT
name
FROM country a
WHERE 100 > (SELECT SUM(gdp)FROM country b WHERE a.name = b.name);
(2)
SELECT DISTINCT
name
FROM country a
WHERE 100 > (SELECT SUM(gdp)FROM country b GROUP BY name);
Subquery returned more than 1 value. This is not permitted when the
subquery follows =, !=, <, <= , >, >= or when the subquery is used as
an expression.

When you are using (2): You are getting multiple values in the Subquery, as you are not having WHERE condition. You are getting multiple countries and corresponding sum. So, you are getting error.
When you are using (1): You are getting single value in Subquery, as you are getting single gdp as you are applying filter at country level. For each country, you are getting single sum only. So, you are not getting error.

The sub-query in query number 1 does not return multiple values. What you have there is called a "correlated sub-query". The sub-query has a WHERE clause that relates the results of the sub-query (the "inner" query) to the main query (the "outer" query). It's this bit WHERE a.name = b.name. Functionally, that query is run on a row-by-row basis where the name values match, and the sub-query only returns the single result for that name value. You'll notice that you can't run the sub-query by itself, because it needs to get the name value from the outer query in order to work.
In query number 2, if you run the sub-query by itself, it will return a list of summed gdp values. One column, with several rows. The GROUP BY clause is telling the query to SUM the results by name, but the result set doesn't contain the name value, so it's just a list of numbers. The outer query has no way of knowing which row of that result set you want to compare to 100, and so it throws the error that you received.

Seems like you're after a HAVING here:
CREATE TABLE dbo.Country ([name] varchar(3),
gdp smallint,
city varchar(3));
INSERT INTO dbo.Country (name,
gdp,
city)
VALUES ('S.A', 60, 'amr'),
('S.A', 60, 'amb'),
('US ', 200, 'ken'),
('US ', 70, 'mas'),
('aus', 80, 'po '),
('aus', 90, 'tr ');
GO
SELECT C.name
FROM dbo.Country C
GROUP BY C.name
HAVING SUM(C.gdp) < 100
GO
DROP TABLE dbo.Country;
As you want rows where "the country gdp is lower than 100" this returns no rows, as there are no countries where the SUM of the gdp is lower than 100. (S.A has 120, US has 270, and aus has 170).
If the gbp of a Country (not the City) is worked out differently, you may need to use a different aggregate function (AVG, MAX?) or a completely different method. if so, you should explain which rows you are expecting in your question and why.

Related

Adapting Amazon-Redshift query computing average to query creating list of elements

I am not very familiar with the syntax of Amazon Redshift so my co-worker helped me write a query to compute the average of a column of numbers. I am not very familiar with the syntax and I want to adapt it to create a list containing all the numbers instead.
The list I want to create is a list of elements that is embedded in joins which makes the operation more complicated. The query that my coworker helped me write is the following:
''' --- Query written with co-worker's help
SELECT LOWER(some_query_query) as query,
AVG(n_results::FLOAT)::FLOAT as
n_results_avg,
count(*) as data_count
from some_field
JOIN
(SELECT
request_id,
some_id,
count(*) as n_results
from s_results
WHERE type_name = 'tinder_match'
AND time <= '2019-06-20'
AND time >= '2019-06-19'
GROUP BY request_id, some_id) as n_count
ON n_count.request_id = some_field.request_id
WHERE time <= '2019-06-20'
AND time >= '2019-06-19'
AND language = 'en'
AND country = 'US'
GROUP BY LOWER(some_query_query)
ORDER BY n_results_avg DESC
--- Current Behaviour: Returns a table with query,
n_results_avg, data_count as columns
--- Desired Behaviour: Returns a table with query,
list_of_name_match_results, data_count as columns
--- list_of_name_match_results is a list containing all name
match results (numbers)
'''
Actual results: Output table with query, name_match_results_avg, data_count as columns
Desired results: Output table with query, list_of_name_match_results, data_count as columns

TSQL to choose a record that meets the criteria or first one

I have a table for company phone numbers and one of the columns is IsPrimary which is a boolean type. The table looks like this:
CompanyId | AreaCode | PhoneNumber | IsPrimary
123 212 555-1212 0
234 307 555-1234 1
234 307 555-4321 0
As you can see in the first record, even though the phone number is the only one for CompanyId: 123, it's not marked as the primary.
In such cases, I want my SELECT statement to return the first available number for that company.
My current SELECT statement looks like this which does NOT return a number unless it's set as the primary number.
SELECT *
FROM CompanyPhoneNumbers AS t
WHERE t.IsPrimary = 1
How can I modify this SELECT statement so that it includes the phone number for CompanyId: 123?
The query might be different depending on what you are actually up to.
If you already have the CompanyId and only need the phone number for it, that's easy:
select top (1) pn.*
from dbo.CompanyPhoneNumbers pn
where pn.CompanyId = #CompanyId -- A parameter provided externally, by calling code for instance
order by pn.IsPrimary desc;
However, if you need all companies' data, including one of their phones (for example, you might be going to create a view for this), then you need a correlated subquery:
select c.*, oa.*
from dbo.Companies c
outer apply (
select top (1) pn.*
from dbo.CompanyPhoneNumbers pn
where pn.CompanyId = c.Id
order by pn.IsPrimary desc
) oa;
I have deliberately used outer instead of cross apply, otherwise it will filter out companies with no phone numbers listed.
You can achieve this using an apply statement. This looks at the exact same table and returns the record with the highest IsPrimary so, this would return the records with a 1 in that column. If there are more than one marked as primary or not as primary, then it returns the phone number, with area code, in ascending order.
select b.*
from CompanyPhoneNumbers a
cross apply (
select top 1
*
from CompanyPhoneNumbers b
where b.CompanyId = a.CompanyId
order by b.IsPrimary desc
,b.AreaCode
,b.PhoneNumber
) b

How to select Second Last Row in mySql?

I want to retrieve the 2nd last row result and I have seen this question:
How can I retrieve second last row?
but it uses order by which in my case does not work because the Emp_Number Column contains number of rows and date time stamp that mixes data if I use order by .
The rows 22 and 23 contain the total number of rows (excluding row 21 and 22) and the time and day it got entered respectively.
I used this query which returns the required result 21 but if this number increases it will cause an error.
SELECT TOP 1 *
FROM(
SELECT TOP 2 *
FROM DAT_History
ORDER BY Emp_Number ASC
) t
ORDER BY Emp_Number desc
Is there any way to get the 2nd last row value without using the Order By function?
There is no guarantee that the count will be returned in the one-but-last row, as there is no definite order defined. Even if those records were written in the correct order, the engine is free to return the records in any order, unless you specify an order by clause. But apparently you don't have a column to put in that clause to reproduce the intended order.
I propose these solutions:
1. Return the minimum of those values that represent positive integers
select min(Emp_Number * 1)
from DAT_history
where Emp_Number not regexp '[^0-9]'
See SQL Fiddle
This will obviously fail when the count is larger then the smallest employee number. But seeing the sample data, that would represent a number of records that is maybe not expected...
2. Count the records, ignoring the 2 aggregated records
select count(*)-2
from DAT_history
See SQL Fiddle
3. Relying on correct order without order by
As explained at the start, you cannot rely on the order, but if for some reason you still want to rely on this, you can use a variable to number the rows in a sub query, and then pick out the one that has been attributed the one-but-last number:
select Emp_Number * 1
from (select Emp_Number,
#rn := #rn + 1 rn
from DAT_history,
(select #rn := 0) init
) numbered
where rn = #rn - 1
See SQL Fiddle
The * 1 is added to convert the text to a number data type.
This is not a perfect solution. I am making some assumptions for this. Check if this could work for you.
;WITH cte
AS (SELECT emp_number,
Row_number()
OVER (
ORDER BY emp_number ASC) AS rn
FROM dat_history
WHERE Isdate(emp_number) = 0) --Omit date entries
SELECT emp_number
FROM cte
WHERE rn = 1 -- select the minimum entry, assuming it would be the count and assuming count might not exceed the emp number range of 9888000

Getting Random Number for each row

I have a table with some names in a row. For each row I want to generate a random name. I wrote the following query to:
BEGIN transaction t1
Create table TestingName
(NameID int,
FirstName varchar(100),
LastName varchar(100)
)
INSERT INTO TestingName
SELECT 0,'SpongeBob','SquarePants'
UNION
SELECT 1, 'Bugs', 'Bunny'
UNION
SELECT 2, 'Homer', 'Simpson'
UNION
SELECT 3, 'Mickey', 'Mouse'
UNION
SELECT 4, 'Fred', 'Flintstone'
SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5
ROLLBACK Transaction t1
The problem is the "ABS(CHECKSUM(NEWID())) % 5" portion of this query sometime returns more than 1 row and sometimes returns 0 rows. I must be missing something but I can't see it.
If I change the query to
DECLARE #n int
set #n= ABS(CHECKSUM(NEWID())) % 5
SELECT FirstName from TestingName
WHERE NameID = #n
Then everything works and I get a random number per row.
If you take the query above and paste it into SQL management studio and run the first query a bunch of times you will see what I am attempting to describe.
The final update query will look like
Update TableWithABunchOfNames
set [FName] = (SELECT FirstName from TestingName
WHERE NameID = ABS(CHECKSUM(NEWID())) % 5)
This does not work because sometimes I get more than 1 row and sometimes I get no rows.
What am I missing?
The problem is that you are getting a different random value for each row. That is the problem. This query is probably doing a full table scan. The where clause is executed for each row -- and a different random number is generated.
So, you might get a sequence of random numbers where none of the ids match. Or a sequence where more than one matches. On average, you'll have one match, but you don't want "on average", you want a guarantee.
This is when you want rand(), which produces only one random number per query:
SELECT FirstName
from TestingName
WHERE NameID = floor(rand() * 5);
This should get you one value.
Why not use top 1?
Select top 1 firstName
From testingName
Order by newId()
This worked for me:
WITH
CTE
AS
(
SELECT
ID
,FName
,CAST(5 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) AS int) AS rr
FROM
dbo.TableWithABunchOfNames
)
,CTE_ForUpdate
AS
(
SELECT
CTE.ID
, CTE.FName
, dbo.TestingName.FirstName AS RandomName
FROM
CTE
LEFT JOIN dbo.TestingName ON dbo.TestingName.NameID = CTE.rr
)
UPDATE CTE_ForUpdate
SET FName = RandomName
;
This solution depends on how smart optimizer is.
For example, if I use INNER JOIN instead of LEFT JOIN (which is the correct choice for this query), optimizer would move calculation of random numbers outside the join loop and end result would be not what we expect.
I created a table TestingName with 5 rows as in the question and a table TableWithABunchOfNames with 100 rows.
Here is the execution plan with LEFT JOIN. You can see the Compute scalar that calculates random numbers is done before the join loop. You can see that 100 rows were updated:
Here is the execution plan with INNER JOIN. You can see the Compute scalar that calculates random numbers is done after the join loop and with extra filter. This query may update not all rows in TableWithABunchOfNames and some rows in TableWithABunchOfNames may be updated several times. You can see that Filter left 102 rows and Stream aggregate left only 69 rows. It means that only 69 rows were eventually updated and also there were multiple matches for some rows (102 - 69 = 33).
To guarantee that the result is what you expect you should generate random number for each row in TableWithABunchOfNames and explicitly remember the result, i.e. materialize the CTE shown above. Then use this temporary result to join with the table TestingName.
You can add a column to TableWithABunchOfNames to store generated random numbers or save CTE to a temp table or table variable.

Gaps in recurring series of a group with datetime [duplicate]

We have a table with following data
Id,ItemId,SeqNumber;DateTimeTrx
1,100,254,2011-12-01 09:00:00
2,100,1,2011-12-01 09:10:00
3,200,7,2011-12-02 11:00:00
4,200,5,2011-12-02 10:00:00
5,100,255,2011-12-01 09:05:00
6,200,3,2011-12-02 09:00:00
7,300,0,2011-12-03 10:00:00
8,300,255,2011-12-03 11:00:00
9,300,1,2011-12-03 10:30:00
Id is an identity column.
The sequence for an ItemId starts from 0 and goes till 255 and then resets to 0. All this information is stored in a table called Item. The order of sequence number is determined by the DateTimeTrx but such data can enter any time into the system. The expected output is as shown below-
ItemId,PrevorNext,SeqNumber,DateTimeTrx,MissingNumber
100,Previous,255,2011-12-01 09:05:00,0
100,Next,1,2011-12-01 09:10:00,0
200,Previous,3,2011-12-02 09:00:00,4
200,Next,5,2011-12-02 10:00:00,4
200,Previous,5,2011-12-02 10:00:00,6
200,Next,7,2011-12-02 11:00:00,6
300,Previous,1,2011-12-03 10:30:00,2
300,Next,255,2011-12-03 16:30:00,2
We need to get those rows one before and one after the missing sequence. In the above example for ItemId 300 - the record with sequence 1 has entered first (2011-12-03 10:30:00) and then 255(2011-12-03 16:30:00), hence the missing number here is 2. So 1 is previous and 255 is next and 2 is the first missing number. Coming to ItemId 100, the record with sequence 255 has entered first (2011-12-02 09:05:00) and then 1 (2011-12-02 09:10:00), hence 255 is previous and then 1, hence 0 is the first missing number.
In the above expected result, MissingNumber column is the first occuring missing number just to illustrate the example.
We will not have a case where we would have a complete series reset at one time i.e. it can be either a series rundown from 255 to 0 as in for itemid 100 or 0 to 255 as in ItemId 300. Hence we need to identify sequence missing when in ascending order (0,1,...255) or either in descending order (254,254,0,2) etc.
How can we accomplish this in a t-sql?
Could work like this:
;WITH b AS (
SELECT *
,row_number() OVER (ORDER BY ItemId, DateTimeTrx, SeqNumber) AS rn
FROM tbl
), x AS (
SELECT
b.Id
,b.ItemId AS prev_Itm
,b.SeqNumber AS prev_Seq
,c.ItemId AS next_Itm
,c.SeqNumber AS next_Seq
FROM b
JOIN b c ON c.rn = b.rn + 1 -- next row
WHERE c.ItemId = b.ItemId -- only with same ItemId
AND c.SeqNumber <> (b.SeqNumber + 1)%256 -- Seq cycles modulo 256
)
SELECT Id, prev_Itm, 'Previous' AS PrevNext, prev_Seq
FROM x
UNION ALL
SELECT Id, next_Itm ,'Next', next_Seq
FROM x
ORDER BY Id, PrevNext DESC
Produces exactly the requested result.
See a complete working demo on data.SE.
This solution takes gaps in the Id column into consideration, as there is no mention of a gapless sequence of Ids in the question.
Edit2: Answer to updated question:
I updated the CTE in the query above to match your latest verstion - or so I think.
Use those columns that define the sequence of rows. Add as many columns to your ORDER BY clause as necessary to break ties.
The explanation to your latest update is not entirely clear to me, but I think you only need to squeeze in DateTimeTrx to achieve what you want. I have SeqNumber in the ORDER BY additionally to break ties left by identical DateTimeTrx. I edited the query above.

Resources