FInding max value from TOP selection grouped by key in SQL Server - sql-server

Apologies for goofy title. I am not sure how to describe the problem.
I have a table in SQL Server with this structure;
ID varchar(15)
ProdDate datetime
Value double
For each ID there can be hundreds of rows, each with its own ProdDate. ID and ProdDate form the unique key for the table.
What I need to do is find the maximum Value for each ID based upon the first 12 samples, ordered by ProdDate ascending.
Said another way. For each ID I need to find the 12 earliest dates for that ID (the sampling for each ID will start at different dates) and then find the maximum Value for those 12 samples.
Any idea of how to do this without multiple queries and temporary tables?

You can use a common table expression and ROW_NUMBER to logically define the TOP 12 per Id then MAX ... GROUP BY on that.
;WITH T
AS (SELECT *,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY ProdDate) AS RN
FROM YourTable)
SELECT Id,
MAX(Value) AS Value
FROM T
WHERE RN <= 12
GROUP BY Id

Related

SQL custom sort order with percentile question

I have a table with roughly 600,000 records, that I need to do percentiles on with a customized order to the ID keys. The key isn't unique as they are labeled TMC_CODE, which details a set length of highway. For these 600,000 records I have about 36 TMCs, as I want SQL server to output the TMCs in an custom order that isn't alphabetical or numerical. The code I have thus far is:
WITH PERCENTILES AS (SELECT TMC_code, EPOCH, percentile_CONT(.98)
WITHIN GROUP (ORDER BY cast(speed as float)) OVER (PARTITION BY TMC_code) AS TTAV_P_98 FROM [dbo].[I40]
SELECT TMC_code, TTAV_P_98 FROM Percentiles
GROUP BY TMC_code, TTAV_P_98 ORDER BY *no idea what to put here*
The problem is the nomenclature for TMCs isn't alphabetical or numerical. An example TMC is 113+04489 or 113P04489 or 113N04489.

SQL Server 2014 Random Value in Group By

I'm trying to figure out how to get a single random row returned per account from a table. The table has multiple rows per account or in some cases just a single row. I want to be able to get a random result back in my select so each day that I run the same statement I might get a different result.
This is basis of the query:
select number, phonenumber
from phones_master with(nolock)
where phonetypeid = '3'
This is a sample result set
number phonenumber
--------------------------
4130772, 6789100949
4130772, 6789257988
4130774, 6784519098
4130775, 6786006874
The column called Number is the account. I'd like to return a single random row. So based on the sample result set above the query should return 3 rows.
Any suggestions would be greatly appreciated. I'm beating my head against the wall with this one.
Thanks
You can use WITH TIES in concert with Row_Number()
Select Top 1 with ties *
From YourTable
Order by Row_Number() over (Partition By Number Order By NewID())
Returns (for example)
number phonenumber
4130772 6789257988
4130774 6784519098
4130775 6786006874
If you have another table called account where those number's are generated/created then here is one way using Cross Apply.
SELECT at.number,
cs.phonenumber
FROM account_table at
CROSS apply(SELECT TOP 1 phonenumber
FROM phones_master pm
WHERE at.number = pm.number
AND phonetypeid = '3'
ORDER BY Newid()) cs (phonenumber)
Also this considers the number in account table is unique.
Creating a Index on number and phonetypeid in phones_master table should improve the performance

SQL Get Second Record

I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.
You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).
Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx
Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.

MS SQL Server Algebraic Syntax

I have a table logging a floating point value from a scale (a weight). I'd like to evaluate the absolute value of the integral of this curve dynamically. I'm attempting to perform some simple algebra based on the trapezoidal approx. with a sampling rate (b-a=1) of one:
(b-a)((f(a)+f(b))/2 - f(a))
The values f(a) and f(b) represent the 2 most recent values logged in my SQL Server table. I've attempted the following with an evalution error:
SELECT TOP 2
SUM(Scale_Weight) OVER(ORDER BY t_stamp DESC)/2.0
FROM table
This query evaluates, but simply divides the most recent value by 2:
SELECT
SUM(Scale_Weight) OVER(ORDER BY t_stamp DESC)/2.0
FROM table
As you can see, I haven't even attempted the absolute value or the subtraction of the "2nd most recent" value because I didn't know how to reference a specific row (cell?). As a noob, I feel the math is doable in a single query, I just can't find the proper syntax. Thanks in advance.
So to update more clearly:
Thanks for the input ps2goat, though for some reason I'm unable to implement "TOP" function, so I currently have this:
SELECT ABS(SUM(Scale_Weight) OVER(PARTITION BY quality_code
ORDER BY t_stamp
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)/2.0)
FROM table
Still need to subtract the preceding value, something like:
SELECT ABS(SUM(Scale_Weight) OVER(PARTITION BY quality_code
ORDER BY t_stamp
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)/2.0
- 1 PRECEDING)
FROM table
Any ideas to reference the preceding value for subtraction?
You can use the LAG function to refer to the last value in a certain order. For example:
SELECT Scale_Weight AS Current, LAG(Scale_Weight) AS Last OVER (ORDER BY t_stamp)
FROM table
You can add your formula tothis query.
This is what I did. Instead of timestamps, I used an Identity field, as those are incremented and easier to enter manually (not sure if you had datetime values or actual timestamp values)
fiddle: http://sqlfiddle.com/#!6/77bcb/4/0
schema:
create table x(
xId int identity(1,1) not null primary key,
scale_weight decimal(12,4)
);
insert into x(scale_weight)
select 24.1234 union all
select 32.4455 union all
select 88.1234 union all
select 223.443;
The inner query (below) grabs the top two rows, ordered by id descending (use your t_stamp column). The outer query sums all the Scale_Weight values returned by the inner query and divides that value by two.
sql:
select SUM(Scale_Weight)/2.0 from
(
SELECT TOP 2 Scale_Weight
FROM x
ORDER BY xid DESC
) y

How can we add a column on the fly in a dynamic table in SQL SERVER?

My question needs little explanation so I'd like to explain this way:
I've got a table (lets call it RootTable), it has one million records, and not in any proper order. What I'm trying to do is to get number of rows(#ParamCount) from RootTable and at the same time these records must be sorted and also have an additional column(with unique data) added on the fly to maintain a key for row identification which will be used later in the program. It can take any number of parameters but my basic parameters are the two which mentioned below.
It's needed for SQL SERVER environment.
e.g.
RootTable
ColumnA ColumnB ColumnC
ABC city cellnumber
ZZC city1 cellnumber
BCD city2 cellnumber
BCC city3 cellnumber
Passing number of rows to return #ParamCount and columnA startswith
#paramNameStartsWith
<b>#paramCount:2 <br>
#ParamNameStartsWith:BC</b>
desired result:
Id(added on the fly) ColumnA ColumnB ColumnC
101 BCC city3 cellnumber
102 BCD city2 cellnumber
Here's another point about Id column. Id must maintain its order, like in the above result it's starting from 101 because 100 is already assigned to the first row when sorted and added column on the fly, and because it starts with "ABC" so obviously it won't be in the result set.
Any kind of help would be appreciated.
NOTE: My question title might not reflect my requirement, but I couldn't get any other title.
So first you need your on-the-fly-ID. This one is created by the ROW_NUMBER() function which is available from SQL Server 2005 onwards. What ROW_NUMBER() will do is pretty self-explaining i think. However it works only on a partition. The Partition is specified by the OVER clause. If you include GROUP BY within the OVER clause, you will have multiple partitions. In your case, there is only one partition which is the whole table, therefor GROUP BY is not necessary. However an ORDER BY is required so that the system knows which record should get which row number in the partition. The query you get is:
SELECT ROW_NUMBER() OVER (ORDER BY ColumnA) ID, ColumnA,ColumnB,ColumnC
FROM RootTable
Now you have a row number for your whole table. You cannot include any condition like your #ParamNameStartsWith parameter here because you wanted a row number set for the whole table. The query above has to be a subquery which provides the set on which the condition can be applied. I use a CTE here, i think that is better for readability:
;WITH OrderedList AS (
SELECT ROW_NUMBER() OVER (ORDER BY ColumnA) ID, ColumnA,ColumnB,ColumnC
FROM RootTable
)
SELECT *
FROM OrderedList
WHERE ColumnA LIKE #ParamNameStartsWith+'%'
Please note that i added the wildcard % after the parameter, so that the condition is basically "starts with" #ParamNameStartsWith.
Finally,if i got you right you wanted only #ParamCount rows. You can use your parameter directly with the TOP keyword which is also only possible with SQL Server 2005 or later.
;WITH OrderedList AS (
SELECT ROW_NUMBER() OVER (ORDER BY ColumnA) ID, ColumnA,ColumnB,ColumnC
FROM RootTable
)
SELECT TOP (#ParamCount) *
FROM OrderedList
WHERE ColumnA LIKE #ParamNameStartsWith+'%'

Resources