Performance and usage of CLR functions in SQL - sql-server

SQL Server allows you to create CLR functions, stored procedures, user types and other objects, for purpose that are really complex to be done inside SQL.
But, can someone compare those two things: TSQL Object and CLR Object, in sense of performance, benefits, and so on.
What are real situations for usage CLR objects?
Is there any best practices proposition for their usage?

What are real situations for usage CLR objects?
SQL Server lacks an aggregate string concatenation function. This bizarre oversight leads to all manner of complicated work-arounds.
Creating and using a custom CLR aggregate function is a clean solution to this problem, and is in fact the reference example given in MSDN article on the subject of custom aggregate functions:
http://msdn.microsoft.com/en-us/library/ms131056.aspx
Performance
There's an MSDN article that gives at least a theoretical (no metrics) overview:
http://msdn.microsoft.com/en-us/library/ms131075.aspx
And here's a more practical post (with metrics) from AboutSqlServer.com:
http://aboutsqlserver.com/2013/07/22/clr-vs-t-sql-performance-considerations/

There are two questions here that need to be addressed separately.
In terms of functionality & benefits, I wrote an article (part of a series on the topic of SQLCLR) that looks at what uses of SQLCLR are "appropriate", mainly by looking at what it can do that cannot be done otherwise, or not done nearly as easily. That article is "Stairway to SQLCLR Level 1: What is SQLCLR?" (free registration required) and it is summarized in an answer to the question that was linked in a comment on the question, Advantage of SQL SERVER CLR.
In terms of performance, I published a study a few years ago (July, 2011) that detailed various scenarios and tested the raw SQL, that SQL in a T-SQL function, and that algorithm in a CLR-based function. I tested both scalar functions and table-valued functions. That article is "CLR Performance Testing" (no registration required). Please keep in mind that the testing was done on SQL Server 2008 and there were performance improvements made in SQL Server 2012 regarding deterministic scalar CLR-based functions. Meaning, the performance results of the CLR functions would be better if re-running those tests on at least SQL Server 2012 if not a newer version. But the conclusion, even on SQL Server 2008 without having those improvements, is that it depends on many factors and sometimes CLR is faster and sometimes T-SQL is faster. But, most often a formula that can be expressed in simple to moderate T-SQL as part of the query, and not abstracted to a function of either type, is by far the fastest.

Related

Question about data access with LINQ and Stored Procs (NOT asking which is better!)

First, I am NOT trying to spur yet another debate about LINQ vs. stored procedures.
Assume for this question (for right or wrong) that I'm going to use SQL server stored procedures and will access those stored procedures via LINQ. I am using stored procedures (again, for right or wrong) because I want to enforce security at the stored procedure level vs. on the underlying tables and views. I am using LINQ (once again, for right or wrong) because I want to learn it.
Given the above, my LINQ queries should be relatively simple SELECT statements (obviously just referring to reading data in this question) rather than LINQ queries that contain groupings or calculations or generally other more complex things. This assumption is based on my plan to put this logic in T-SQL. In other words, my LINQ queries will be relatively "dumb". In addition, given my desire to enforce security at the stored procedure level and not allow access to the base tables, I see this approach as being consistent with that goal.
Are there any flaws in my logic in #1?
If I were to use LINQ directly against the base tables, I'd obviously have to enforce security directly on those base tables. This seems obvious but I wanted to confirm.
Any flows in my logic in #2?
LINQ on its own is quite generic, as there is linq-to-objects, linq-to-xml, linq-to-sql, linq-to-EF etc. I guess you're likely going after the Linq-to-Sql functionality (as opposed to, say , Linq-to-EF). By doing that all your sets are provided by stored procedure and interrogate these sets in the application with Linq I would say that you will teach yourself a very skewed way of leveraging the power of Linq in general, and Linq-to-Sql in particular. You will miss much of the know-how that goes into understanding how Linq-to-Sql generates the SQL queries sent to the server, because you'll be able to do only the very basic 'dumb' queries as you say. Eg. you won't even be able to do a join in Linq-to-Sql. And you'll miss the opportunity to properly understand the ORM capabilities of Linq-to-Sql, the caching that goes into DataContexts and the ActiveRecord behavior that allows you to insert/update/delete items from sets.
Although I do not advocate against doing what you're doing (it is a very valid approach to leverage stored procedures and linq together), I would say that is a bad approach for learning Linq. Try getting your feet wet with a straight approach, the kind of data-modeling in Visual Studio with a .dbml file approach advocated by Linq evangelists in 2008. While I reckon that this approach is flawed for deploying large, viable, projects, it is though very good to teaching yourself Linq and Linq-to-Sql in particular. Once you understand how things work, you'll be able to understand how to properly leverage this with a stored procedures for separation of access control (an approach always praised by SQL Server evangelists) and how deal with the issues not solved by the .dbml modeling approach (specifically, the problem of database schema upgrades).
Some may say you should also perhaps keep an eye out for what Entity Framework has to offer, but if you are in the learning stage I whole heartily recommend Linq-to-Sql instead. It is less complex, it works, is sane, is well supported in the VS tool set, and you don't have to learn Entity-SQL...

Thoughts On Extended Stored Procedures

I am looking to insert and update records in a database using functions and logic that are not available in SQL Server or any other RDBMS for that matter. After Googling around a bit this morning, I have come across the concept of Extended Stored Procedures. As far as I can tell, I should be able to compile my desired functionality into a dll, make a stored proc utilizing that dll to do the inserting/updating.
However, most of the articles and examples I have come across are somewhat dated (~2000). Are extended stored procedures still an acceptable practice? I am far from an expert in this area, so any other suggestions or comments would be greatly appreciated.
If you're using SQL Server 2005 or later, SQL CLR is the area to look at. You can call .NET code from within SQL Server.
This article on MSDN is a good place to start.
Are extended stored procedures still
an acceptable practice?
No, they are officialy deprecated and will be dicontinued in a future release. See Deprecated Database Engine Features in SQL Server 2008 , in the Features Not Supported in a Future Version of SQL Server table:
Extended stored procedure programming: Use CLR Integration instead.
I usually recommend against using CLR procedures, in most cases you can refactor the problem you are facing, into something that Transact Sql can handle.
Of most concern is the procedural approach that often accompanies the use of CLR procedures, when a relation database performs best when performing set based operations.
So the first question I always ask, is there anyway to refactor the problem into a set based operation.
If not, then I ask why would you want to execute the code inside of the database server, instead of in an application layer? Think about the performance impact you might have by placing the logic inside the database. (This might not be an issue if your db server has plenty of extra processing time).
If you do go head with CLR procedures, I think they are best applied to intensive calculations and complex logic.

Why Did Microsoft Create its Own SQL Extension (T-SQL)?

What are the reasons behind Microsoft implementing its own SQL extension as Transact SQL (T-SQL)? What are its advantages over the normal SQL?
Everybody extends SQL.
SQL isn't procedural, it's declarative. You describe what you want, and it figures out how to retrieve it using whatever indexes or hashes or whatnot is available.
But sometimes, that's not sufficient. T-SQL gives syntax to run procedural code inside of your queries. This lets you do control structures (begin-end, if-then-else), iteration and move values between local variables, temporary tables and other sources.
Transact-SQL (T-SQL) is Microsoft's and Sybase's proprietary extension to SQL. Microsoft's implementation ships in the Microsoft SQL Server product. Sybase uses the language in its Adaptive Server Enterprise, the successor to Sybase SQL Server.
Transact-SQL enhances SQL with these additional features:
Control-of-flow language
Local variables
Various support functions for string processing, date processing, mathematics, etc.
Improvements[citation needed] to DELETE and UPDATE statements
http://en.wikipedia.org/wiki/Transact-SQL
Wiki connects you to much more expanding information and details.
What are the reasons behind Microsoft implementing its own SQL extension as Transact SQL (T-SQL)?
To make your life easier.
What are its advantages over the normal SQL?
There is no such thing as "normal SQL"
Transact-SQL both enhances the set-based abilities of SQL and adds procedural abilities.
Other systems (like Oracle and PostgreSQL) clearly distinguish between SQL and procedural languages (PL/SQL and pl/PgSQL).
Microsoft doesn't make such a strict distinction.
Transact-SQL was developed by Sybase around the middle of 80's, when there was no standard at all (the first one was proposed in 1986).
By that time each vendor already had a burden of legacy applications to support, and rewriting their databases to conform to the standard would break the compatibility.
There is more or less commonly supported standard, SQL:92, but it still misses very very much to be really of use.
That's why almost every task beyond a simple SELECT with a JOIN needs some proprietary support to be implemented efficiently.
A thing to note is that while most RDBMS providers make a clear distinction between their extensions to SQL and the programming languages used to write stored procedures and triggers and so on, Microsoft and Sybase do exactly the contrary, they mix these two concepts into one, namely, T-SQL. You use T-SQL when you write normal queries, but you also can (and usually do) use T-SQL when you are writing stored procedures and triggers.
This has the controversial benefit of encouraging (or at least making very easy) the creation a mix of procedural and SQL code[*].
Nowadays Microsoft makes a distinction between T-SQL stored procedures and those written for the CLR (ie, .NET), but this is a relatively new development (from SQL Server 2005 onwards).
[*]: Controversial because people who don't speak SQL will be tempted to write procedural code (usually very inefficient in databases) instead of learning SQL (the proper thing to do).
The applicable SQL Standard for control-of-flow, local variables, etc (i.e. procedural code) is known as SQL/PSM (Persistent Stored Modules).
According to Wikipedia it was adopted in 1996 but I suspect it's the usual problem: vendors were already committed to their own extensions and therefore take up of the Standards is postponed for long periods of time
...but not necessarily indefinitely, there is hope. For example, common table expressions (CTEs) and OLAP functions in SQL Server 2005 and temporal data types in SQL Server 2008 indicate that extensions to TSQL will keep close to the publish Standards.
One other really important reason why vendors create their own flavors of SQL is performance tuning. There are many ways of writing more performant queries using vendor specific code that is written to take advantage of how that particular database engine works.

Queries for Sql Server and Oracle

I'm developing an asp.net application with Database factory pattern which allows the application to support both Sql Server and Oracle. I've created an abstract class that has the methods common to Sql Server and Oracle, like the CreateConnection and CreateCommand methods. This class is implemented by SqlServer and Oracle classes. Now, is there an easy way to write in-line sql queries with parameters common to both Sql Server and Oracle. I mean, I understand that we use "#" symbol in Sql Server and ":" in Oracle for parameters. Just for this reason, I'm writing queries twice in each of the class. Is there a way to write such queries common to both the databases? (or interpret the parameters from one common query?)
Thanks.
The only way to write one query that will work for both Oracle and Sql Server is to use only the syntax that is common to both platforms. Once you use features that are different between the two languages (like parameters or joins), you either have to write two different queries or hack together a "translator" class that converts a query from one platform to the other.
I've done a lot of this type of programming (database-agnostic software), and with .Net a relatively pain-free way of doing this is to write your main application to work entirely with ADO.Net DataTables/DataSets, with a wrapper class that handles generating the DataTables from either Oracle or Sql Server tables under-the-hood, and also handles persisting changes made to the DataTables back into Oracle or Sql Server. This approach isolates your DB-specific code in one place, although it's not necessarily a viable approach if the data your application needs access to is large.
You could write some kind of translator, but I would suggest that in some cases you'll need to write db-specific code for performance reasons anyway, so you'll have to put up with the maintenance burden of two versions of some queries.
What is the point of using ORACLE and not using all its non standard functions (analytics, pivots etc) ? ORACLE is a powerful tool.
Other DBs have there own strenght also, so why use the lowest common denominator just to be able to work on ALL of them? You will just lose in performance.
Just pick one DB, and use it fully with all its functionalities !
Pardon my ignorance here, but can't something like an ORM (object relational mapper) work for both SQL and Oracle?
I had similar requirements, to support both Sql Server and Oracle, and summarized my two years of experience with such problems in these articles:
Writing ANSI Standard SQL is not practical.
Think ANSI Standard SQL Is Fully Portable Between Databases? Think Again.

Regular Expressions in SQL Server servers?

Is it possible to make efficient queries that use the complete regular expression feature set.
If not Microsoft really should consider that feature.
For SQL Server 2000 (and any other 32 bit edition of SQL Server), there is xp_pcre, which introduces Perl compatible regular expressions as a set of extended stored procedures. I've used it, it works.
The more recent versions give you direct access to the .NET integrated regular expressions (this link seems to be dead, here is another one: MSDN: How to: Work with CLR Database Objects).
The answer is no, not in the general case, although it might depend on what you mean by efficient. For these purposes, I'll use the following definition: 'Makes effective use of indexes and joins in a sensible order' which is probably as good as any.
In this case, 'Efficient' queries are 's-arg'-able, which means that they can use index lookups to narrow down search predicates. Equalities (t-joins) and simple inequalities can do this. 'AND' predicates can also do this. After that, we get into table, index and range scanning - i.e. operations that have to do record-by-record (or index-keyby index-key) comparisons.
Sontek's answer describes a method of in-lining regexp functionality into a query, but the operations still have to do comparisons on a record by record basis. Wrapping it up in a function would allow a function-based index where the result of a calculation is materialised in the index (Oracle supports this and you can get equivalent functionality in SQL Server by using the sort of tricks discussed in this article). However, you could not do this for an arbitrary regexp.
In the general case, the semantics of a regular expression do not lend themselves to pruning match sets in the sort of way that an index does, so integrating rexegp support into the query optimiser is probably not possible.
Check out this and this. They are great posts on how to do it.
I would love to have the ability to natively call regular expressions in SQL Server for ad hoc queries and use in stored procedures. Our DBA's won't allow us to create CLR functions so I have been using LINQ Pad as a kind of poor man's query editor for the ad hoc stuff. It is especially useful when working with structured data such as JSON or XML that has been saved to the database.
And I agree that it seems like an oversight that there is no regular expression support, it seems like an obvious feature for a query language. Hopefully we will see it in a future version but people have been asking for it for a long time and it hasn't made it's way into the product yet.
The most frequent reason I have seen against it is that a poorly formed expression can cause catastrophic backtracking which in .NET will not abort and almost always requires the machine to be restarted. Maybe once they address that in the framework we will see it included in a future version of SQL Server.
I think we can see from the new types in SQL Server 2008 (hierarchyid, geo-spatial) that if Microsoft do add this it will come in the form of a SQL CLR Assembly
If you are able to install Assemblies into your database you could roll your own by creating a new Database\SQL Server project in Visual Studio - this will allow you to make a new Trigger / UDF / Stored Proc / Aggregate or UDT. You could import System.Text.RegularExpressions into the class and go from there.
Hope this helps

Resources