SQL custom sort order with percentile question - sql-server

I have a table with roughly 600,000 records, that I need to do percentiles on with a customized order to the ID keys. The key isn't unique as they are labeled TMC_CODE, which details a set length of highway. For these 600,000 records I have about 36 TMCs, as I want SQL server to output the TMCs in an custom order that isn't alphabetical or numerical. The code I have thus far is:
WITH PERCENTILES AS (SELECT TMC_code, EPOCH, percentile_CONT(.98)
WITHIN GROUP (ORDER BY cast(speed as float)) OVER (PARTITION BY TMC_code) AS TTAV_P_98 FROM [dbo].[I40]
SELECT TMC_code, TTAV_P_98 FROM Percentiles
GROUP BY TMC_code, TTAV_P_98 ORDER BY *no idea what to put here*
The problem is the nomenclature for TMCs isn't alphabetical or numerical. An example TMC is 113+04489 or 113P04489 or 113N04489.

Related

how to select first rows distinct by a column name in a sub-query in sql-server?

Actually I am building a Skype like tool wherein I have to show last 10 distinct users who have logged in my web application.
I have maintained a table in sql-server where there is one field called last_active_time. So, my requirement is to sort the table by last_active_time and show all the columns of last 10 distinct users.
There is another field called WWID which uniquely identifies a user.
I am able to find the distinct WWID but not able to select the all the columns of those rows.
I am using below query for finding the distinct wwid :
select distinct(wwid) from(select top 100 * from dbo.rvpvisitors where last_active_time!='' order by last_active_time DESC) as newView;
But how do I find those distinct rows. I want to show how much time they are away fromm web apps using the diff between curr time and last active time.
I am new to sql, may be the question is naive, but struggling to get it right.
If you are using proper data types for your columns you won't need a subquery to get that result, the following query should do the trick
SELECT TOP 10
[wwid]
,MAX([last_active_time]) AS [last_active_time]
FROM [dbo].[rvpvisitors]
WHERE
[last_active_time] != ''
GROUP BY
[wwid]
ORDER BY
[last_active_time] DESC
If the column [last_active_time] is of type varchar/nvarchar (which probably is the case since you check for empty strings in the WHERE statement) you might need to use CAST or CONVERT to treat it as an actual date, and be able to use function like MIN/MAX on it.
In general I would suggest you to use proper data types for your column, if you have dates or timestamps data use the "date" or "datetime2" data types
Edit:
The query aggregates the data based on the column [wwid], and for each returns the maximum [last_active_time].
The result is then sorted and filtered.
In order to add more columns "as-is" (without aggregating them) just add them in the SELECT and GROUP BY sections.
If you need more aggregated columns add them in the SELECT with the appropriate aggregation function (MIN/MAX/SUM/etc)
I suggest you have a look at GROUP BY on W3
To know more about the "execution order" of the instruction you can have a look here
You can solve problem like this by rank ordering the results by a key and finding the last x of those items, this removes duplicates while preserving the key order.
;
WITH RankOrdered AS
(
SELECT
*,
wwidRank = ROW_NUMBER() OVER (PARTITION BY wwid ORDER BY last_active_time DESC )
FROM
dbo.rvpvisitors
where
last_active_time!=''
)
SELECT TOP(10) * FROM RankOrdered WHERE wwidRank = 1
If my understanding is right, below query will give the desired output.
You can have conditions according to your need.
select top 10 distinct wwid from dbo.rvpvisitors order by last_active_time desc

How can I retrieve "exception" data from a table without knowing the data in advance?

I have a table that updates all the time.
The table maintains a list that links stores to clubs, and manages, among other things, "discount percentages" per store + club.
Table name: Policy_supplier
Column: POLXSUP_DISCOUNT
Suppose all the "vendors" in the table are marked with a 10% discount.
And someone accidentally signs one vendor with 8% or 15% (or even NULL)
How do I generate a query to retrieve the "abnormal" vendor?
You can find the mode of your discounts and then just pick out the records that aren't equal to that mode:
WITH mode_discount AS (SELECT TOP 1 POLXSUP_DISCOUNT FROM table GROUP BY POLXSUP_DISCOUNT ORDER BY count(*) DESC)
SELECT * FROM table WHERE POLXSUP_DISCOUNT <> (SELECT POLSXUP_DISCOUNT FROM mode_discount);
You can use the OVER clause with aggregates to calculate an aggregate over a data range and include it in the results. For example,
SELECT avg(POLXSUP_DISCOUNT)
from Policy_supplier
Would return a single average value while
SELECT POLXSUP_DISCOUNT, avg(POLXSUP_DISCOUNT) OVER()
from Policy_supplier
Would return the overall average in each row. Typically OVER is used with a PARTITION BY clause. If you wanted the average per supplier you could have written AVG() OVER(PARTITION BY supplierID).
To find anomalies, you should use one of the PERCENTILE functions, eg PERCENTILE_CONT. For example
select PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY POLXSUP_DISCOUNT) over()
from Policy_Supplier
Will return a discount value below which you'll find 95% of the records. The other 5% of discounts that are above this are probably anomalies.
Similarly, PERCENTILE_CONT(0.05) will return a discount below which you'll find 5% of the records
You can combine both to find potentially exceptional records, eg:
with percentiles as (
select ID,
POLXSUP_DISCOUNT,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY POLXSUP_DISCOUNT) over() as pct95,
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY POLXSUP_DISCOUNT) over() as pct05,
from Policy_Supplier)
select ID,POLXSUP_DISCOUNT
from percentiles
where POLXSUP_DISCOUNT>pct95 or POLXSUP_DISCOUNT<pct05

MS SQL Server Algebraic Syntax

I have a table logging a floating point value from a scale (a weight). I'd like to evaluate the absolute value of the integral of this curve dynamically. I'm attempting to perform some simple algebra based on the trapezoidal approx. with a sampling rate (b-a=1) of one:
(b-a)((f(a)+f(b))/2 - f(a))
The values f(a) and f(b) represent the 2 most recent values logged in my SQL Server table. I've attempted the following with an evalution error:
SELECT TOP 2
SUM(Scale_Weight) OVER(ORDER BY t_stamp DESC)/2.0
FROM table
This query evaluates, but simply divides the most recent value by 2:
SELECT
SUM(Scale_Weight) OVER(ORDER BY t_stamp DESC)/2.0
FROM table
As you can see, I haven't even attempted the absolute value or the subtraction of the "2nd most recent" value because I didn't know how to reference a specific row (cell?). As a noob, I feel the math is doable in a single query, I just can't find the proper syntax. Thanks in advance.
So to update more clearly:
Thanks for the input ps2goat, though for some reason I'm unable to implement "TOP" function, so I currently have this:
SELECT ABS(SUM(Scale_Weight) OVER(PARTITION BY quality_code
ORDER BY t_stamp
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)/2.0)
FROM table
Still need to subtract the preceding value, something like:
SELECT ABS(SUM(Scale_Weight) OVER(PARTITION BY quality_code
ORDER BY t_stamp
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)/2.0
- 1 PRECEDING)
FROM table
Any ideas to reference the preceding value for subtraction?
You can use the LAG function to refer to the last value in a certain order. For example:
SELECT Scale_Weight AS Current, LAG(Scale_Weight) AS Last OVER (ORDER BY t_stamp)
FROM table
You can add your formula tothis query.
This is what I did. Instead of timestamps, I used an Identity field, as those are incremented and easier to enter manually (not sure if you had datetime values or actual timestamp values)
fiddle: http://sqlfiddle.com/#!6/77bcb/4/0
schema:
create table x(
xId int identity(1,1) not null primary key,
scale_weight decimal(12,4)
);
insert into x(scale_weight)
select 24.1234 union all
select 32.4455 union all
select 88.1234 union all
select 223.443;
The inner query (below) grabs the top two rows, ordered by id descending (use your t_stamp column). The outer query sums all the Scale_Weight values returned by the inner query and divides that value by two.
sql:
select SUM(Scale_Weight)/2.0 from
(
SELECT TOP 2 Scale_Weight
FROM x
ORDER BY xid DESC
) y

How can we add a column on the fly in a dynamic table in SQL SERVER?

My question needs little explanation so I'd like to explain this way:
I've got a table (lets call it RootTable), it has one million records, and not in any proper order. What I'm trying to do is to get number of rows(#ParamCount) from RootTable and at the same time these records must be sorted and also have an additional column(with unique data) added on the fly to maintain a key for row identification which will be used later in the program. It can take any number of parameters but my basic parameters are the two which mentioned below.
It's needed for SQL SERVER environment.
e.g.
RootTable
ColumnA ColumnB ColumnC
ABC city cellnumber
ZZC city1 cellnumber
BCD city2 cellnumber
BCC city3 cellnumber
Passing number of rows to return #ParamCount and columnA startswith
#paramNameStartsWith
<b>#paramCount:2 <br>
#ParamNameStartsWith:BC</b>
desired result:
Id(added on the fly) ColumnA ColumnB ColumnC
101 BCC city3 cellnumber
102 BCD city2 cellnumber
Here's another point about Id column. Id must maintain its order, like in the above result it's starting from 101 because 100 is already assigned to the first row when sorted and added column on the fly, and because it starts with "ABC" so obviously it won't be in the result set.
Any kind of help would be appreciated.
NOTE: My question title might not reflect my requirement, but I couldn't get any other title.
So first you need your on-the-fly-ID. This one is created by the ROW_NUMBER() function which is available from SQL Server 2005 onwards. What ROW_NUMBER() will do is pretty self-explaining i think. However it works only on a partition. The Partition is specified by the OVER clause. If you include GROUP BY within the OVER clause, you will have multiple partitions. In your case, there is only one partition which is the whole table, therefor GROUP BY is not necessary. However an ORDER BY is required so that the system knows which record should get which row number in the partition. The query you get is:
SELECT ROW_NUMBER() OVER (ORDER BY ColumnA) ID, ColumnA,ColumnB,ColumnC
FROM RootTable
Now you have a row number for your whole table. You cannot include any condition like your #ParamNameStartsWith parameter here because you wanted a row number set for the whole table. The query above has to be a subquery which provides the set on which the condition can be applied. I use a CTE here, i think that is better for readability:
;WITH OrderedList AS (
SELECT ROW_NUMBER() OVER (ORDER BY ColumnA) ID, ColumnA,ColumnB,ColumnC
FROM RootTable
)
SELECT *
FROM OrderedList
WHERE ColumnA LIKE #ParamNameStartsWith+'%'
Please note that i added the wildcard % after the parameter, so that the condition is basically "starts with" #ParamNameStartsWith.
Finally,if i got you right you wanted only #ParamCount rows. You can use your parameter directly with the TOP keyword which is also only possible with SQL Server 2005 or later.
;WITH OrderedList AS (
SELECT ROW_NUMBER() OVER (ORDER BY ColumnA) ID, ColumnA,ColumnB,ColumnC
FROM RootTable
)
SELECT TOP (#ParamCount) *
FROM OrderedList
WHERE ColumnA LIKE #ParamNameStartsWith+'%'

FInding max value from TOP selection grouped by key in SQL Server

Apologies for goofy title. I am not sure how to describe the problem.
I have a table in SQL Server with this structure;
ID varchar(15)
ProdDate datetime
Value double
For each ID there can be hundreds of rows, each with its own ProdDate. ID and ProdDate form the unique key for the table.
What I need to do is find the maximum Value for each ID based upon the first 12 samples, ordered by ProdDate ascending.
Said another way. For each ID I need to find the 12 earliest dates for that ID (the sampling for each ID will start at different dates) and then find the maximum Value for those 12 samples.
Any idea of how to do this without multiple queries and temporary tables?
You can use a common table expression and ROW_NUMBER to logically define the TOP 12 per Id then MAX ... GROUP BY on that.
;WITH T
AS (SELECT *,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY ProdDate) AS RN
FROM YourTable)
SELECT Id,
MAX(Value) AS Value
FROM T
WHERE RN <= 12
GROUP BY Id

Resources