What is the purpose and risks of enabling SQL CLR? - sql-server

We are looking to implement unit tests using the tSQLt test framework. It has got a pre-requisite that the SQL CLR must be enabled using this command:
EXEC sp_configure 'clr enabled', 1; RECONFIGURE;
I am curious to know what is the purpose of SQL CLR and the risks of enabling this in production environment?

PURPOSE
SQLCLR allows one to do things that either:
can't be done in T-SQL, or
can't be done as efficiently as in T-SQL
There are plenty of things that can be done in both, and for which T-SQL is actually much better at. In those cases it is an inappropriate use of SQLCLR to do those things so it is best to research first to make sure that the operation cannot be done in T-SQL, or would definitely be slower.
For example of performance, T-SQL Scalar UDFs prevent parallel execution plans. But SQLCLR scalar UDFs, as long as there is no data access and that they are marked as IsDeterministic=true, do not prevent parallel execution plans.
For more details on what SQLCLR is and is not, please see the first article in the Stairway to SQLCLR series that I am writing for SQL Server Central:
Stairway to SQLCLR Level 1: What is SQLCLR?
Or, to get a sense of what can be done in SQLCLR, please see my SQL# project, which is a library of over 320 stored procedures and functions, many of which are in the Free version, and many of which work in SAFE mode: SQLsharp.com.
RISKS
The risks vary based on the PERMISSION_SET (i.e. SAFE, EXTERNAL_ACCESS, and UNSAFE) that the Assembly is marked as, and what is being done. It is possible to do things in an UNSAFE Assembly that cannot be done in regular T-SQL (except that many of those dangerous things can already be done via some extended stored procedures, xp_cmdshell, and the OLE Automatic procedures -- sp_OA* ). An Assembly marked as SAFE cannot reach outside of the database, so generally quite safe, BUT you can still lock up the system via a Regular Expression that exposes "catastrophic backtracking" (of course, this can be mitigated starting in .NET Framework 4.5, so SQL Server 2012 and newer, by setting a max time limit on the RegEx operation). An Assembly marked as UNSAFE can write to static variables, which in the context of the shared App Domain model used by SQLCLR, allows for shared memory between Sessions. This can allow for caching, but when not used properly, easily leads to race conditions.
TESTING
As for tSQLt, I do not believe that you are required to use the SQLCLR component. I thought I saw that it just enabled some extended functionality. Either way, the source code is available on GitHub so you can check it out to see what it is doing. It has been a while since I looked at it, but from what I remember, it should not present much of a risk for the little that it is doing (especially in a Dev / QA environment).
Another option that doesn't use SQLCLR is DbFit. I have always prefered DbFit as it is completely external to the DB. It is based on the FitNesse framework, written in Java, and you manage the tests via wiki-style pages. It, by default, wraps the tests in a Transaction and rolls everything back when the test is finished (i.e. clean-up). It is worth taking a look at.
Download: DbFit project on GitHub
Tutorial: Using the DbFit Framework for Data Warehouse Regression Testing

SQLCLR allows you to create .NET assemblies and run code inside them from within SQL Server.
Depending on the permissions on the assembly the risks vary. The risks are something like so:
Permission Set: Risk
SAFE You cannot do anything more than what you can in T-SQL. So fairly safe.
EXTERNAL ACCESS You can call code in .NET assemblies approved by Microsoft, such as ADO.NET. Fairly safe, but still a risk.
UNSAFE You can do almost anything that the .NET framework allows you to do. In reality, shoot yourself in the head unless you know what you are doing.

Related

SQL Server Stress Test Tools?

I am looking for a stress tool for SQL Server. I've seen a lot of suggestions on Google. But nothing of what I really need.
I am really looking for a tool that could run a list of stored procedures in parallel to see how much contention on resources. The collect and reporting feature is not that important. But I also want something server-side base for our enterprise build server.
I am not looking for a replay feature (Yes it could do the trick but it would be difficult to program a lot of different scenarios)
I've look at the following tools:
RML Utilities from Microsoft
DTM DB Stress (this is the closest to what I'm looking for)
SQL Stress
I created a simple test tool for this scenario, check it out to see if it will be of any use to you. It's free, no licensing of any sort required. No guarantees on any performance or quality either ;-)
Usage: StressDb.exe <No. of instances> <Tot. Runtime (mins)> <Interval (secs)>
Connection string should reside in the configuration file.
All command line arguments are required. Use integers.
The stored proc to use is also in the config file.
You need to have .NET framework 3.5 installed. You can also run it from multiple workstations for additional load, or from multiple folders on same machine if trying to run additional stored procedures. You also need a SQL user Id, as currently it doesn't use a trusted connection.
This code was actually super simple, the only clever bit was making sure that the connections are not pooled.
http://......com/..../stressdb.zip
Let me know if you find it useful.

Undocumented stored procedures in MS SQL, why?

I came to know about some uses of undocumented stored procedures (such as 'sp_MSforeachdb' ...etc) in MS SQL.
What are they? and Why they are 'undocumented'?
+1 on precipitous. These procs are generally used by replication or the management tools, they are undocumented as the dev team reserves the right to change them any time. Many have changed over the years, especially in SQL 2000 Sp3 and SQL 2005
My speculation would be because they are used internally and not supported, and might change. sp_who2 is another that I find very handy. Maybe management studio activity monitor uses that one - same output. Did I mention undocumented probably means unsupported? Don't depend on these to stick around or produce the same results next year.
sp_MSforeachdb and sp_MSforeachtable are unlikely to change.
Both can be used like this:
EXEC sp_MSforeachtable "print '?'; DBCC DBREINDEX ('?')"
where the question mark '?' is replaced by the tablename (or DB name in the other sp).
Undocumented means unsupported and that MS reserves the right to change or remove said commands at any time without any notice whatsoever.
Any documented features go through two deprecation stages before they are removed. An undocumented command can be removed, even in a service pack or hotfix, without any warning or any announcements.
It is most likely that one of the internal SQl Server developers needed these stored procedures to implement the functionality that they were working on, so they developed it and used it in their code. When working with the technical documentation people they covered the scope of their project, and included in the official documentation only the portion of their project that applied to the customer. Over time, people found the extra stored procedures (because you can't hide them) and started using them. While the internal SQl Server developers wouldn't want to change these undocumented procedures, I'm sure they would in two seconds if they had to to for their next project.
As others have said, they are unsupported features that are not intended for general consumption, although they can't stop you from having a go, and indeed, sometimes they can be very useful.
But as internal code, they might have unexpected side-effects or limitations, and may well be here one day and gone the next.
Use them carefully if you wish, but don't rely on them entirely.

SQL Server 2008: How crash-safe is a CLR Stored Procedure that loads unmanaged libraries

We've got a regular (i.e. not extended) stored procedure in SQL Server 2000 that calls an external exe. That exe, in turn, loads a .dll that came from an SDK and calls some procedures from it (i.e. Init, DoStuff, Shutdown).
The only reason we have this external exe thing is because we didn't want to create an extended stored procedure that would call the .dll. We believed that if the dll crashes (an unlikely event but still) then the SQL Server process will crash as well which is not what we wanted. With an external exe, only that exe would crash.
Now, we're upgrading to SQL Server 2008 and considering creating a CLR stored procedure that calls the thing and therefore getting rid of the exe. This SP would be marked as UNSAFE, of course. The question therefor is, is it safe (safer, safe enough etc.) to do it that way as compared to the extended SP approach?
The only relevant thing I've hunted down on BOL is:
Specifying UNSAFE allows the code in
the assembly to perform illegal
operations against the SQL Server
process space, and hence can
potentially compromise the robustness
and scalability of SQL Server
, but I'm not sure whether it answers my question as I'm not after 'robustness and scalability', rather after stability and keeping the thing up and running.
PS: We want to get rid of the exe because it causes inconviniences when managing SP permissions (you know, that stuff that suddenly applies to you if you call a SP that contains xp_cmdshell).
Since this code was originally used with extended stored procedures, it sounds like it is unmanaged code. Bugs in unmanaged code can easily crash your process.
CLR integration is much more robust than extended stored procedures, but the code still runs in-process, so errors can take down or corrupt SQL Server. (For comparison, in theory, a SAFE CLR routine won't be able to corrupt SQL Server although even it could cause problems that reduce your server's availability without totally taking down the SQL Server.)
Basically, the only ways to not crash SQL Server in this scenario are:
Avoid using the functionality that crashes.
Fix the buggy code.
Run the code in a separate process (launch an executable, call a Windows service, call a web service, etc.). You can write a managed .NET DLL to perform this interaction. Most likely, you will still need to load it UNSAFE, but--if it is written properly--in reality it can be quite safe.
The question therefor is, is it safe (safer, safe enough etc.) to do it that way as compared to the extended SP approach?
Generally yes. I mean, if you are shelling out to an OS process, then you are shelling out to an OS process. I don't see how using the Extended Stored Procedure API to do that would necessarily be safer than the SQLCLR API, especially when the thing that might crash is an OS process, sitting outside of the database.
Of course, I am not certain about the XP API since I have not used it, but I do know the following:
The XP API is deprecated and the recommendation is that all new projects that could be done in either of those technologies should be done in SQLCLR.
The SQLCLR does allow for more granular permissions than those other two, including the ability to do Impersonation (if the Login executing the SQLCLR objects is a Windows Login).
The SQLCLR API is separated process/memory-wise by both Database and Assembly Owner (i.e. the User specified by the AUTHORIZATION clause). Hence you can have a problem with an Assembly in one DB without it affecting SQLCLR objects in other DBs (or even in the same DB if there are Assemblies owned by another User, though in practice this probably rarely is ever the case as most people just use the default which is dbo).
I'm not sure whether it answers my question as I'm not after 'robustness and scalability', rather after stability and keeping the thing up and running.
Well, there are certainly things you can do within SQLCLR when the Assembly is set to UNSAFE:
potentially write to the Registry (depending on the access granted to the Log On As account running the SQL Server process, or the Login executing the SQLCLR function IF Impersonation is enabled and it is a Windows Login).
potentially write to the file system
potentially interact with processes running on the system
share memory with other SQL Server SPIDs (i.e. Sessions) executing functions from the same Assembly (meaning that specific Assembly, in that DB, owned by that User). This probably eludes people the most as it is unexpected when you are used to Console apps and Windows apps having their own individual memory spaces, yet here, because it is a single AppDomain per Assembly per DB per Owner, all sessions executing that code do share all static variables. A lot of code is written with the assumption that the AppDomain is private and so storing values in static variables is efficient as it caches the value, but in SQLCLR, you can get unexpected behavior if two processes are overwriting each other's values and reading the other session's value.
potential memory leaks. The Host Protection Attributes attempt to prevent you from using built-in .NET functionality that could do this, such as using TimeZoneInfo to convert times between TimeZoneIDs, but Host Protection Attributes are not enforced on UNSAFE Assemblies.
It is possible that the thread running the SQLCLR method is handled differently when executing UNSAFE / FullTrust code (Cooperative Multitasking vs Preemptive). I thought I had read that UNSAFE threads are managed differently, but am not sure where I read it and am looking for the source.
But all of the above being said, if you are calling an external EXE, it has its own AppDomain.
So, what you can do is either:
continue to call the EXE using a SQLCLR wrapper to Process.Start(). This gives you both the process/memory separation and the ability to more easily control permissions to a single Stored Procedure that will only ever call this EXE and nobody can change it (at least not without changing that SQLCLR code and reinstalling the Assembly).
install an instance of SQL Server Express on the same machine, load the SQLCLR objects there, and create Linked Servers in both directions (from current SQL Server instance to and from the new SQL Server Express instance) so you can communicate easily between them. This will allow you to quarantine the SQLCLR execution and keep it away from the main SQL Server process.
Of course, that all being said, how much of a concern is this really? Meaning, how likely is it that a process fully crashes and takes down everything with it? Sure, it's not impossible, but usually a crash would take down just the AppDomain and not the CLR host itself. I would think it far more likely that code that doesn't crash but is written poorly and consumes too much memory and/or CPU would be the problem people run into.

Better languages than SQL for stored procedures [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I'm getting increasingly frustrated with the limitations and verbosity required to actually commit some business logic to stored procedures, using languages such as Transact-SQL or PL/SQL. I would love to convert some current databases to Oracle and take advantage of its support for Java stored procedures, but that option is not available at the moment.
What alternatives would you recommend in the way of databases that support stored procedures in other languages?
There are some architectural obstacles to having more clever query languages in a database manager. The principal one is the query optimiser. One of the design constraints on SQL is that it can only use constructs that are accessible to the query optimiser. This means that the language and its capabilities are quite tightly coupled to the capabilities of the query execution engine and query plan optimiser.
The other major design constraint is the mechanical nature of the database system - database programming is almost unique in that it has a mechanical component. Query performance is limited by the mechanical constraints of disk head seeks and rotational latency (the wait time before the data you want arrives under the heads).
This effectively precludes many clever abstractions that might make SQL more powerful or easier to work with. Many database management systems supplement SQL with procedural alternatives that can be used for scripting. However, they interact with the DBMS by
executing a series of SQL queries that are processed by the optimiser individually. Some languages of this type that are shipped with various DBMS platforms are:
Oracle's PL/SQL and
embedded Java. PL/SQL is
actually based on Ada - it is quite
'old school' by modern standards and
has a legacy code base with which it
must remain backwardly compatible.
It's not necessarily the most
pleasant programming environment but
it does have constructs for
facilities such as parallelism and a
reasonably flexible type system. One of the major
criticisms of Java stored procedures
on Oracle is that you are paying for
Oracle's capacity based licensing on
the CPU you are running the JVM's
on.
SQL Server CLR Integration.
Somewhat similar to Oracle's Java
Stored Procedures, this allows CLR
modules compiled from C# (or any .net
language) to be loaded into a SQL Server
instance and executed in much the same
way as stored procedures. SQL Server
also has
PostgreSQL-style API's for making
custom aggregate functions through CLR
integration and other
hooks for mixed SQL/CLR code bases.
PostgreSQL is actually the
system where back-end language
integration was originally
developed. The system exports a native C API with facilities for
custom aggregate functions, storage
engines, procedural extensions and other
functionality.
The language interfaces
are based on this API and include:
PL/pgSQL (a bespoke language similar
to PL/SQL), Python, Perl
and Tcl.This made it into the
mainstream through Illustra, a
commercialised version of Postgres,
which was then bought out by
Informix (which was subsequently
bought out by IBM). The key
features were incorporated into
Informix On-Line, which is
still sold by IBM.
One key limitation of these languages is their limited interaction with the query
optimiser (although the C API for PostgreSQL does have support for this). Participation
in a query plan as first-class citizen requires that the query optimiser can work out a sensible view of the resources your action will take. In practice, this type of interaction with the query optimiser is mainly useful for implementing storage engines.
This level of digging into the storage engine is (a) somewhat esoteric if the functionality is available at all (so most people won't have the skill to do this) and (b) probably considrably more trouble than just writing the query in SQL. The limitations of the query optimiser mean that you will probably never get the level of abstration out of SQL that you might get from (say) Python or even C# or Java.
The path of least resistance for efficient queries is likely to be
writing the query in SQL with some procedural glue in one of the other languages. In some cases a computation really does lend itself to a procedural approach.
This can become a hassle and lead to large bodies of boilerplate SQL code. The only real options for this are hand coded SQL or code generation systems. A trivial example of code generation is the CRUD functionality provided by frameworks where this SQL is generated from metadata. A more complex example can be seen in ETL tools such as Oracle Warehouse Builder or Wherescape Red which work by generating great screeds of stored procedure code from the model.
I find myself building code generation systems of one sort or another on a semi-regular basis for precisely this reason. Any templating system will do for this - I've had fairly good mileage from CherryTemplate but there are many such items around. Code Generation in Action is quite a good book on this subject - the author uses a ruby-based system whose name escapes me.
Edit: If you look at a 'Show Estimated Execution Plan' for a block of procedural code you will notice that each statement has its own query plan. The query optimisation algorithm can only work on a single SQL statement, so a procedure will have a forest of query plans. Because procedural code can have 'side-effects' you cannot use the type of algorithms used in query optimisation to reason about the code. This means that a query optimiser cannot globally optimise a block of procedural code. It can only optimise individual SQL statements.
PostgreSQL has support for many scripting languages procedures. Officially Perl, Python, and Tcl. As addons, PHP, Ruby, Java and probably many others (just Google for pl<languagename>) which may or may not be in working condition as of now.
Oh, and also SQL Server 2005 onwards has support for CLR stored procedures, where you can use .NET languages.
Oracle, HSQLDB and Derby allow to write stored procedures in Java.
Oracle does support CLR stored procedures so you can write stored procs in any .NET language like C#, VB.NET or IronPython. This only works when the database server runs on a Windows machine. You can't do it when the database runs on Linux or Unix.
DB2 for Z/OS is the database that support most languages as I know. It support COBOL,C/C++,JAVA as the store procedure, It of course also support SQL procedure.
There is also some support for writing Oracle stored procedures in Perl.
Because Oracle has a built in JVM you can develop stored procs in Java but also in non-java-languages that use the JVM, that means languages like JACL, JYTHON, SCHEME and GROOVY. See here: http://db360.blogspot.com/2006/08/oracle-database-programming-using-java_01.html and http://en.wikipedia.org/wiki/List_of_JVM_languages .

Statistical calculations in SQL Server

Does anyone know of any packages or source code that does simple statistical analysis, e.g., confidence intervals or ANOVA, inside a SQL Server stored procedure?
The reason you probably don't want to do that is because these calculations are CPU-intensive. SQL Server is usually licensed by the CPU socket (roughly $5k/cpu for Standard, $20k/cpu for Enterprise) so DBAs are very sensitive to any applications that want to burn a lot of CPU power on the SQL Server itself. If you started doing statistics calculations and suddenly the server needs another CPU, that's an expensive licensing proposition.
Instead, it makes sense to do these statistical calculations on a separate application server. Query the data over the wire to your app server, do the number-crunching there, and then send the results back via an update statement or stored proc. Yes, it's more work, but as your application grows, you won't be facing an expensive licensing bill.
In more recent versions of SQL Server you can use .net objects natively. So any .net package will do. Other than that there's always external proc calls...
Unless you have to do it within the stored proc I'd retrieve the data and do it outside SQL Server. That way you can choose from any of the open source or commercial stats routines and it would probably be faster too.
I don't know if a commercial package like this exist. There could be multiple reasons for this, some of which have been outlined above.
If what you are trying to accomplish is to avoid building statistical functions that process your data stored in SQL Server, you might want to try and integrate statistical packages with your database server by importing data from it. For example, R supports it and there is also CRAN
Once you have accomplished that and you still feel that you'd like to make statistical analysis run inside your SQL Server, the next steps would be to call your stats package from a stored procedure using a command line interface. Your best option here is probably xp_cmdshell, though it requires careful configuration in order not to compromise your SQL Server security.

Resources