Is it possible to create a generic SP to determine Median? - sql-server

I am using SQL Server 2012. I guess what I am asking is should I continue on the path of researching the ability to create a SP (or UDF, but with #Temp tables probably involved, I was thinking SP) in order to have a reusable object to determine the median?
I hope this isn't too generic of a question, and is hosed, but I have spent some time researching the ability to determine a median value. Some possible hurdles include the need to pass in a string representation of the query that will return the data that I wish to perform the median on.
Anyone attempt this in the past?

Here is a stored proc I use to generate some quick stats.
Simply pass a Source, Measure and/or Filter.
CREATE PROCEDURE [dbo].[prc-Dynamic-Stats](#Table varchar(150),#Fld varchar(50), #Filter varchar(500))
-- Syntax: Exec [dbo].[prc-Dynamic-Stats] '[Chinrus-Series].[dbo].[DS_Treasury_Rates]','TR_Y10','Year(TR_Date)>2001'
As
Begin
Set NoCount On;
Declare #SQL varchar(max) =
'
;with cteBase as (
Select RowNr=Row_Number() over (Order By ['+#Fld+'])
,Measure = ['+#Fld+']
From '+#Table+'
Where '+case when #Filter='' then '1=1' else #Filter end+'
)
Select RecordCount = Count(*)
,DistinctCount = Count(Distinct A.Measure)
,SumTotal = Sum(A.Measure)
,Minimum = Min(A.Measure)
,Maximum = Max(A.Measure)
,Mean = Avg(A.Measure)
,Median = Max(B.Measure)
,Mode = Max(C.Measure)
,StdDev = STDEV(A.Measure)
From cteBase A
Join (Select Measure From cteBase where RowNr=(Select Cnt=count(*) from cteBase)/2) B on 1=1
Join (Select Top 1 Measure,Hits=count(*) From cteBase Group By Measure Order by 2 desc ) C on 1=1
'
Exec(#SQL)
End
Returns
RecordCount DistinctCount SumTotal Minimum Maximum Mean Median Mode StdDev
3615 391 12311.81 0.00 5.44 3.4057 3.57 4.38 1.06400795277565

You may want to take a look at a response that I had to this post. In short, if you're comfortable with C# or VB .NET, you could create a user defined CLR aggregate. We use CLR implementations for quite a few things, especially statistical methods that you may see in other platforms like SAS, R, etc.

This is easily accomplished by creating a User-Defined Aggregate (UDA) via SQLCLR. If you want to see how to do it, or even just download the UDA, check out the article I wrote about it on SQL Server Central: Getting The Most Out of SQL Server 2005 UDTs and UDAs (please note that the site requires free registration in order to read their content).
Or, it is also available in the Free version of the SQL# SQLCLR library (which I created, but again, it is free) available at http://SQLsharp.com/. It is called Agg_Median.

If using SQL Server 2008 or newer (which you are), you can write a function that accepts a table-valued parameter as input.
Create Type MedianData As Table ( DataPoint Int )
Create Function CalculateMedian ( #MedianData MedianData ReadOnly )
Returns Int
As
Begin
-- do something with #MedianData which is a table
End

Related

Automated selection of the correct database within a function

I have a function that's working as I'd like it to, but I need to add one thing and I'm not able to make it happen yet.
Here's a massively simplified skeleton of the function that already works:
create function IC
(
#A date
)
returns table
as
return
(
SELECT *
FROM db1.dbo.table a LEFT JOIN
db1.dbo.table2 b
on a.randomfield=b.randomfield
where datefield = #a
)
So now I can use this function to call a date-specific version of the query I wanted. Great! Now the issue is that I want to make this useful over time. datefield is an archive date, and this year's data is in a different database than 2015's data, which is in a different database than 2014's, etc. I want the function to look at #a's year, and then use that to determine which database to query against. I've tried a variety of things that didn't work, but it seems like Dynamic SQL is the answer that everyone arrives at on the interwebs.
I want to use an if...else statement somehow, but haven't been able to put that together. Any suggestions for how to proceed?
Use four-part naming to select from the correct database.
Example
DECLARE #year INT = DATEPART(yyyy, GETDATE())
SELECT CASE #year
WHEN 2016 THEN
(SELECT 1 FROM ARCHIVE2016.dbo.SomeTable)
WHEN 2017 THEN
(SELECT 1 FROM ARCHIVE2017.dbo.SomeTable)
WHEN 2018 THEN
(SELECT 1 FROM ARCHIVE2018.dbo.SomeTable)
END
Just make sure the schema exists, even if it is with empty tables, when placing this inside a function to avoid validation errors.

Postgres OpenXML

To start this off: Yes, I've read this post regarding OpenXML and postgres.
Is this still the case 6 years later?
If so, my company is converting from MS SQL Server to Postgres, with A LOT of stored procedures, many of which really rely on the OpenXML. It seems pretty cumbersome to have to write out a xpath() for every possible thing would would like to retrieve from our XML. Is there any recommendations you have to go about this conversion? An better alternative to xpath() that I haven't seen, yet?
Thanks!
After much research:
1) Yes, it appears so.
2) Yeah, it is kind of a pain, made an internal program that grabs potential XML paths and creates a script for us requiring only a few tweaks by hand.
Thanks
Analog OPENXML in PostgreSQL (hoping that the nodes are viewed by xmltable in the order they appear in the input xml):
CREATE OR REPLACE FUNCTION openxml (
p_xml xml
)
RETURNS TABLE (
id integer,
parent_id integer,
element_name text,
element_data text
) AS
$body$
DECLARE
BEGIN
return query
with t as (SELECT *
from xmltable('//*'
passing p_xml
columns
id for ORDINALITY,
element_name text path 'local-name()',
parent_name text path 'local-name(..)',
element_data text path 'text()[1]'
))
select t.id,
(select max(t1.id) from t t1 where t1.id<t.id and t.parent_name=t1.element_name)as parent_id,
t.element_name,
t.element_data
from t;
END;
$body$
LANGUAGE 'plpgsql';

Use result of selection statement into another selection in stored procedure

How can I use the result of a select statement into another statement in a stored procedure in SQL Server? I am try to write this code
CREATE procedure [dbo].[ShowRequest] (#Id int)
AS
SELECT
Request.RequestId,
Request.UserId
INTO
reqTable
FROM
[User],[Request]
WHERE
Request.UserId = [User].UserId
AND
Request.RequestId = Id
/*another selection*/
SELECT
RequestProduct.ProductName
FROM
RequestProduct
/*ERROR:The multi-part identifier "ReqTable.id" could not be bound*/
WHERE
RequestProduct.RequestId = ReqTable.[id]
Try this:
CREATE procedure [dbo].[ShowRequest] (#Id int)
AS
SELECT
Request.RequestId,
Request.UserId
INTO
reqTable
FROM
[User],[Request]
WHERE
Request.UserId = [User].UserId
AND
Request.RequestId = Id
/*another selection*/
SELECT
ProductName
FROM
RequestProduct
WHERE
RequestId in
(
select id
from reqtable
)
It sounds like you actually just need to do a single selection:
SELECT
u.RequestId,
r.UserId,
rp.ProductName
FROM
[User] u
inner join
[Request] r
on
r.UserId = u.UserId
inner join
RequestProduct rp
on
rp.RequestId = r.RequestId
WHERE
r.RequestId = #Id
Generally, with SQL, you should decide what overall result you're trying to achieve, and then try to express it as a single statement - SQL Server's job (or any SQL product's) is to work out how best to construct the results. You shouldn't have to break down the problem into separate parts.
I don't know the etiquette here - this is a comment not a proposed answer but I want to make it easier for people to comment on my comment... downvote as appropriate... I don't want to just start a new question since I want the OP to see this.
You've got some alternatives but no one talked about why the first query didn't work or answered how to use the results of one query into another.
Why it didn't work: If the table "reqTable" existed and had a column named "id" at the time that your query was evaluated, SQL Server could have bound the name. You can use "eval" to put in sql that will be valid when it runs. For this problem that would be unjustifiably complex, but things like this must come up, I don't know if it's a regular tool in the toolbox of people who write a lot of stored procedures.
SQL Server 2005 and 2008 both have a feature called "CTE" you should read about that addresses the general question of "How do I use the results of one query in another query". They don't do anything subqueries absolutely can't do, but most of the time they are easier to write.
I have solved it. declare a table contains the result of first selection and use it in the second selection.
CREATE procedure [dbo].[ShowRequest] (#Id int)
AS
DECLARE #reqTable TABLE
(
ReqID int,
UserID int
)
INSERT INTO
#reqTable
SELECT
Request.RequestId AS 'ReqID',
Request.UserId
FROM
[User],[Request]
WHERE
Request.UserId = [User].UserId
AND
Request.RequestId = Id
/*another selection*/
SELECT
RequestProduct.ProductName
FROM
RequestProduct,#reqTable
WHERE
RequestProduct.RequestId = ReqID
Thanks for everyone who care about my question and spend his time to solve my problem

What is a good way to paginate out of SQL 2000 using Start and Length parameters?

I have been given the task of refactoring an existing stored procedure so that the results are paginated. The SQL server is SQL 2000 so I can't use the ROW_NUMBER method of pagination. The Stored proc is already fairly complex, building chunks of a large sql statement together before doing an sp_executesql and has various sorting options available.
The first result out of google seems like a good method but I think the example is wrong in that the 2nd sort needs to be reversed and the case when the start is less than the pagelength breaks down. The 2nd example on that page also seems like a good method but the SP is taking a pageNumber rather than the start record. And the whole temp table thing seems like it would be a performance drain.
I am making progress going down this path but it seems slow and confusing and I am having to do quite a bit of REPLACE methods on the Sort order to get it to come out right.
Are there any other easier techniques I am missing?
There are two SQL Server 2000 compliant answers in this StackOverflow question - skip the accepted one, which is 2005-only:
No, I'm afraid not - SQL Server 2000 doesn't have any of the 2005 niceties like Common Table Expression (CTE) and such..... the method described in the Google link seems to be one way to go.
Marc
Also take a look here
http://databases.aspfaq.com/database/how-do-i-page-through-a-recordset.html
scroll down to Stored Procedure Methods
Depending on your application architecture (and your amount of data, it's structure, DB server load etc.) you could use the DB access layer for paging.
For example, with ADO you can define a page size on the record set (DataSet in ADO.NET) object and do the paging on the client. Classic ADO even lets you use a server side cursor, though I don't know if that scales well (I think this was removed altogether in ADO.NET).
MSDN documentation: Paging Through a Query Result (ADO.NET)
After playing with this for a while there seems to be only one way of really doing this (using Start and Length parameters) and that's with the temp table.
My final solution was to not use the #start parameter and instead use a #page parameter and then use the
SET #sql = #sql + N'
SELECT * FROM
(
SELECT TOP ' + Cast( #length as varchar) + N' * FROM
(
SELECT TOP ' + Cast( #page*#length as varchar) + N'
field1,
field2
From Table1
order by field1 ASC
) as Result
Order by Field1 DESC
) as Result
Order by Field 1 ASC'
The original query was much more complex than what is shown here and the order by was ordered on at least 3 fields and determined by a long CASE clause, requiring me to use a series of REPLACE functions to get the fields in the right order.
We've been using variations on this query for a number of years. This example gives items 50,000 to 50,300.
select top 300
Items.*
from Items
where
Items.CustomerId = 1234 AND
Items.Active = 1 AND
Items.Id not in
(
select top 50000 Items.Id
from Items
where
Items.CustomerId = 1234 AND
Items.Active = 1
order by Items.id
)
order by Items.Id

Hidden Features of SQL Server

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What are some hidden features of SQL Server?
For example, undocumented system stored procedures, tricks to do things which are very useful but not documented enough?
Answers
Thanks to everybody for all the great answers!
Stored Procedures
sp_msforeachtable: Runs a command with '?' replaced with each table name (v6.5 and up)
sp_msforeachdb: Runs a command with '?' replaced with each database name (v7 and up)
sp_who2: just like sp_who, but with a lot more info for troubleshooting blocks (v7 and up)
sp_helptext: If you want the code of a stored procedure, view & UDF
sp_tables: return a list of all tables and views of database in scope.
sp_stored_procedures: return a list of all stored procedures
xp_sscanf: Reads data from the string into the argument locations specified by each format argument.
xp_fixeddrives:: Find the fixed drive with largest free space
sp_help: If you want to know the table structure, indexes and constraints of a table. Also views and UDFs. Shortcut is Alt+F1
Snippets
Returning rows in random order
All database User Objects by Last Modified Date
Return Date Only
Find records which date falls somewhere inside the current week.
Find records which date occurred last week.
Returns the date for the beginning of the current week.
Returns the date for the beginning of last week.
See the text of a procedure that has been deployed to a server
Drop all connections to the database
Table Checksum
Row Checksum
Drop all the procedures in a database
Re-map the login Ids correctly after restore
Call Stored Procedures from an INSERT statement
Find Procedures By Keyword
Drop all the procedures in a database
Query the transaction log for a database programmatically.
Functions
HashBytes()
EncryptByKey
PIVOT command
Misc
Connection String extras
TableDiff.exe
Triggers for Logon Events (New in Service Pack 2)
Boosting performance with persisted-computed-columns (pcc).
DEFAULT_SCHEMA setting in sys.database_principles
Forced Parameterization
Vardecimal Storage Format
Figuring out the most popular queries in seconds
Scalable Shared Databases
Table/Stored Procedure Filter feature in SQL Management Studio
Trace flags
Number after a GO repeats the batch
Security using schemas
Encryption using built in encryption functions, views and base tables with triggers
In Management Studio, you can put a number after a GO end-of-batch marker to cause the batch to be repeated that number of times:
PRINT 'X'
GO 10
Will print 'X' 10 times. This can save you from tedious copy/pasting when doing repetitive stuff.
A lot of SQL Server developers still don't seem to know about the OUTPUT clause (SQL Server 2005 and newer) on the DELETE, INSERT and UPDATE statement.
It can be extremely useful to know which rows have been INSERTed, UPDATEd, or DELETEd, and the OUTPUT clause allows to do this very easily - it allows access to the "virtual" tables called inserted and deleted (like in triggers):
DELETE FROM (table)
OUTPUT deleted.ID, deleted.Description
WHERE (condition)
If you're inserting values into a table which has an INT IDENTITY primary key field, with the OUTPUT clause, you can get the inserted new ID right away:
INSERT INTO MyTable(Field1, Field2)
OUTPUT inserted.ID
VALUES (Value1, Value2)
And if you're updating, it can be extremely useful to know what changed - in this case, inserted represents the new values (after the UPDATE), while deleted refers to the old values before the UPDATE:
UPDATE (table)
SET field1 = value1, field2 = value2
OUTPUT inserted.ID, deleted.field1, inserted.field1
WHERE (condition)
If a lot of info will be returned, the output of OUTPUT can also be redirected to a temporary table or a table variable (OUTPUT INTO #myInfoTable).
Extremely useful - and very little known!
Marc
sp_msforeachtable: Runs a command with '?' replaced with each table name.
e.g.
exec sp_msforeachtable "dbcc dbreindex('?')"
You can issue up to 3 commands for each table
exec sp_msforeachtable
#Command1 = 'print ''reindexing table ?''',
#Command2 = 'dbcc dbreindex(''?'')',
#Command3 = 'select count (*) [?] from ?'
Also, sp_MSforeachdb
Connection String extras:
MultipleActiveResultSets=true;
This makes ADO.Net 2.0 and above read multiple, forward-only, read-only results sets on a single database connection, which can improve performance if you're doing a lot of reading. You can turn it on even if you're doing a mix of query types.
Application Name=MyProgramName
Now when you want to see a list of active connections by querying the sysprocesses table, your program's name will appear in the program_name column instead of ".Net SqlClient Data Provider"
TableDiff.exe
Table Difference tool allows you to discover and reconcile differences between a source and destination table or a view. Tablediff Utility can report differences on schema and data. The most popular feature of tablediff is the fact that it can generate a script that you can run on the destination that will reconcile differences between the tables.
Link
A less known TSQL technique for returning rows in random order:
-- Return rows in a random order
SELECT
SomeColumn
FROM
SomeTable
ORDER BY
CHECKSUM(NEWID())
In Management Studio, you can quickly get a comma-delimited list of columns for a table by :
In the Object Explorer, expand the nodes under a given table (so you will see folders for Columns, Keys, Constraints, Triggers etc.)
Point to the Columns folder and drag into a query.
This is handy when you don't want to use heinous format returned by right-clicking on the table and choosing Script Table As..., then Insert To... This trick does work with the other folders in that it will give you a comma-delimited list of names contained within the folder.
Row Constructors
You can insert multiple rows of data with a single insert statement.
INSERT INTO Colors (id, Color)
VALUES (1, 'Red'),
(2, 'Blue'),
(3, 'Green'),
(4, 'Yellow')
If you want to know the table structure, indexes and constraints:
sp_help 'TableName'
HashBytes() to return the MD2, MD4, MD5, SHA, or SHA1 hash of its input.
Figuring out the most popular queries
With sys.dm_exec_query_stats, you can figure out many combinations of query analyses by a single query.
Link
with the commnad
select * from sys.dm_exec_query_stats
order by execution_count desc
The spatial results tab can be used to create art.
enter link description here http://michaeljswart.com/wp-content/uploads/2010/02/venus.png
EXCEPT and INTERSECT
Instead of writing elaborate joins and subqueries, these two keywords are a much more elegant shorthand and readable way of expressing your query's intent when comparing two query results. New as of SQL Server 2005, they strongly complement UNION which has already existed in the TSQL language for years.
The concepts of EXCEPT, INTERSECT, and UNION are fundamental in set theory which serves as the basis and foundation of relational modeling used by all modern RDBMS. Now, Venn diagram type results can be more intuitively and quite easily generated using TSQL.
I know it's not exactly hidden, but not too many people know about the PIVOT command. I was able to change a stored procedure that used cursors and took 2 minutes to run into a speedy 6 second piece of code that was one tenth the number of lines!
useful when restoring a database for Testing purposes or whatever. Re-maps the login ID's correctly:
EXEC sp_change_users_login 'Auto_Fix', 'Mary', NULL, 'B3r12-36'
Drop all connections to the database:
Use Master
Go
Declare #dbname sysname
Set #dbname = 'name of database you want to drop connections from'
Declare #spid int
Select #spid = min(spid) from master.dbo.sysprocesses
where dbid = db_id(#dbname)
While #spid Is Not Null
Begin
Execute ('Kill ' + #spid)
Select #spid = min(spid) from master.dbo.sysprocesses
where dbid = db_id(#dbname) and spid > #spid
End
Table Checksum
Select CheckSum_Agg(Binary_CheckSum(*)) From Table With (NOLOCK)
Row Checksum
Select CheckSum_Agg(Binary_CheckSum(*)) From Table With (NOLOCK) Where Column = Value
I'm not sure if this is a hidden feature or not, but I stumbled upon this, and have found it to be useful on many occassions. You can concatonate a set of a field in a single select statement, rather than using a cursor and looping through the select statement.
Example:
DECLARE #nvcConcatonated nvarchar(max)
SET #nvcConcatonated = ''
SELECT #nvcConcatonated = #nvcConcatonated + C.CompanyName + ', '
FROM tblCompany C
WHERE C.CompanyID IN (1,2,3)
SELECT #nvcConcatonated
Results:
Acme, Microsoft, Apple,
If you want the code of a stored procedure you can:
sp_helptext 'ProcedureName'
(not sure if it is hidden feature, but I use it all the time)
A stored procedure trick is that you can call them from an INSERT statement. I found this very useful when I was working on an SQL Server database.
CREATE TABLE #toto (v1 int, v2 int, v3 char(4), status char(6))
INSERT #toto (v1, v2, v3, status) EXEC dbo.sp_fulubulu(sp_param1)
SELECT * FROM #toto
DROP TABLE #toto
In SQL Server 2005/2008 to show row numbers in a SELECT query result:
SELECT ( ROW_NUMBER() OVER (ORDER BY OrderId) ) AS RowNumber,
GrandTotal, CustomerId, PurchaseDate
FROM Orders
ORDER BY is a compulsory clause. The OVER() clause tells the SQL Engine to sort data on the specified column (in this case OrderId) and assign numbers as per the sort results.
Useful for parsing stored procedure arguments: xp_sscanf
Reads data from the string into the argument locations specified by each format argument.
The following example uses xp_sscanf
to extract two values from a source
string based on their positions in the
format of the source string.
DECLARE #filename varchar (20), #message varchar (20)
EXEC xp_sscanf 'sync -b -fproducts10.tmp -rrandom', 'sync -b -f%s -r%s',
#filename OUTPUT, #message OUTPUT
SELECT #filename, #message
Here is the result set.
-------------------- --------------------
products10.tmp random
Return Date Only
Select Cast(Floor(Cast(Getdate() As Float))As Datetime)
or
Select DateAdd(Day, 0, DateDiff(Day, 0, Getdate()))
dm_db_index_usage_stats
This allows you to know if data in a table has been updated recently even if you don't have a DateUpdated column on the table.
SELECT OBJECT_NAME(OBJECT_ID) AS DatabaseName, last_user_update,*
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID( 'MyDatabase')
AND OBJECT_ID=OBJECT_ID('MyTable')
Code from: http://blog.sqlauthority.com/2009/05/09/sql-server-find-last-date-time-updated-for-any-table/
Information referenced from:
SQL Server - What is the date/time of the last inserted row of a table?
Available in SQL 2005 and later
Here are some features I find useful but a lot of people don't seem to know about:
sp_tables
Returns a list of objects that can be
queried in the current environment.
This means any object that can appear
in a FROM clause, except synonym
objects.
Link
sp_stored_procedures
Returns a list of stored procedures in
the current environment.
Link
Find records which date falls somewhere inside the current week.
where dateadd( week, datediff( week, 0, TransDate ), 0 ) =
dateadd( week, datediff( week, 0, getdate() ), 0 )
Find records which date occurred last week.
where dateadd( week, datediff( week, 0, TransDate ), 0 ) =
dateadd( week, datediff( week, 0, getdate() ) - 1, 0 )
Returns the date for the beginning of the current week.
select dateadd( week, datediff( week, 0, getdate() ), 0 )
Returns the date for the beginning of last week.
select dateadd( week, datediff( week, 0, getdate() ) - 1, 0 )
Not so much a hidden feature but setting up key mappings in Management Studio under Tools\Options\Keyboard:
Alt+F1 is defaulted to sp_help "selected text" but I cannot live without the adding Ctrl+F1 for sp_helptext "selected text"
Persisted-computed-columns
Computed columns can help you shift the runtime computation cost to data modification phase. The computed column is stored with the rest of the row and is transparently utilized when the expression on the computed columns and the query matches. You can also build indexes on the PCC’s to speed up filtrations and range scans on the expression.
Link
There are times when there's no suitable column to sort by, or you just want the default sort order on a table and you want to enumerate each row. In order to do that you can put "(select 1)" in the "order by" clause and you'd get what you want. Neat, eh?
select row_number() over (order by (select 1)), * from dbo.Table as t
Simple encryption with EncryptByKey

Resources