How can we write a high performance and SAFE SQLCLR assembly to create and put a json object in MSMQ? - sql-server

There is a requirement in one of the projects i am working on, that, I need to create and pass some event messages in JSON format from SQL Server to MSMQ.
I found SQL CLR could be the best way to implement this. But, some my colleagues say that it is expensive in terms of performance and memory utilization.
I am looking for a benchmark of 20 msg/sec max.
No. of messages is depending upon certain events occured in the database.
The implementation is required to run for ~10hrs/day.
Kindly suggest on how to achieve it. Code snippet / steps will be a great help.
Any other option to implement the functionality is always welcome.
Thanks.

I found SQL CLR could be the best way to implement this. But, some my colleagues say that it is expensive in terms of performance and memory utilization.
There is a lot of "it depends" in this conversation. And, most of the performance / memory / security concerns you hear are based on misinformation or simple lack of information.
SQLCLR code can be inefficient, but it can also be more efficient / faster than T-SQL in some cases. It all depends on what you are trying to accomplish, and how you approach the problem, both in terms of overall structure as well as how it is coded. For things that can be done in straight T-SQL, then it is nearly always faster in straight T-SQL. But if you place that T-SQL code in a Scalar Function / UDF, then it is no longer fast ;-). Inline T-SQL is the fastest, IF you can actually do the thing you are trying to do.
So, if you can communicate with MSMQ via T-SQL, then do that. But if you can't, then yes, SQLCLR could be efficient enough to handle this.
HOWEVER #1, regarding the need for JSON:
I do not think that any of the supported .NET Framework libraries include JSON support (but I need to check again). While it is tempting to try to load Json.NET, certain coding practices are being done in that code that are not wise to use in SQLCLR, namely using static class variables. SQLCLR is a shared App Domain so all sessions running the same piece of code will share the same memory space. So the Json.NET code shouldn't be used as-is, it needs to be modified (if possible). Some people go the easy route and just set the Assembly to UNSAFE to get passed the errors about not being able to use static class variables that are not marked as readonly, but that can lead to odd / unexpected behavior so I wouldn't recommend it. There might also be references to unsupported .NET Framework libraries that would need to be loaded into SQL Server as UNSAFE. So, if you want to do JSON, you might have to construct it manually.
HOWEVER #2, the question title is (emphasis added):
How can we write a high performance and SAFE SQLCLR assembly to create and put a json object in MSMQ?
you cannot interact with anything outside of SQL Server in an Assembly marked as SAFE. You will need to mark the Assembly as EXTERNAL_ACCESS in order to communicate with MSMQ, but that shouldn't pose any inherent problem.

Related

SQL Server CLR stored procedures in data processing tasks - good or evil?

In short - is it a good design solution to implement most of the business logic in CLR stored procedures?
I have read much about them recently but I can't figure out when they should be used, what are the best practices, are they good enough or not.
For example, my business application needs to
parse a large fixed-length text file,
extract some numbers from each line in the file,
according to these numbers apply some complex business rules (involving regex matching, pattern matching against data from many tables in the database and such),
and as a result of this calculation update records in the database.
There is also a GUI for the user to select the file, view the results, etc.
This application seems to be a good candidate to implement the classic 3-tier architecture: the Data Layer, the Logic Layer, and the GUI layer.
The Data Layer would access the database
The Logic Layer would run as a WCF service and implement the business rules, interacting with the Data Layer
The GUI Layer would be a means of communication between the Logic Layer and the User.
Now, thinking of this design, I can see that most of the business rules may be implemented in a SQL CLR and stored in SQL Server. I might store all my raw data in the database, run the processing there, and get the results. I see some advantages and disadvantages of this solution:
Pros:
The business logic runs close to the data, meaning less network traffic.
Process all data at once, possibly utilizing parallelizm and optimal execution plan.
Cons:
Scattering of the business logic: some part is here, some part is there.
Questionable design solution, may encounter unknown problems.
Difficult to implement a progress indicator for the processing task.
I would like to hear all your opinions about SQL CLR. Does anybody use it in production? Are there any problems with such design? Is it a good thing?
I do not do it - CLR ins SQL Server is great for many things (calculating hashes, do string manipulation that SQL just sucks in, regex to validate field values etc.), but complex logic, IMHO, has no business in the database.
It is a single point of performance problems and also VERY expensive to scale up. Plus, either I put it all in there, or - well - I have a serious problem maintenance wise.
Personally I prefer to have business functionality not dependent on the database. I only use CLR stored procedures when I need advanced data querying (to produce a format that is not easy to do in SQL). Depending on what you are doing, I tend to get better performance results with standard stored procs anyway, so I personally only use them for my advanced tasks.
My two cents.
HTH.
Generally, you probably don't want to do this unless you can get a significant performance advantage or there is a compelling technical reason to do it. An example of such a reason might be a custom aggregate function.
Some good reasons to use CLR stored procedures:
You can benefit from a unique capability of the technology such as a custom aggregate function.
You can get a performance benefit from a CLR Sproc - perhaps a fast record-by-record processing task where you can read from a fast forward cursor, buffer the output in core and bulk load it to the destination table in batches.
You want to wrap a bit of .Net code or a .Net library and make it available to SQL code running on the database server. An example of this might be the Regex matcher from the OP's question.
You want to cheat and wrap something unmanaged and horribly insecure so as to make it accessible from SQL code without using XPs. Technically, Microsoft have stated that XP's are deprecated, and many installations disable them for security reasons.From time to time you don't have the option of changing the client-side code (perhaps you have an off-the-shelf application), so you may need to initiate external actions from within the database. In this case you may need to have a trigger or stored procedure interact with the outside world, perhaps querying the status of a workflow, writing something out to the file system or (more extremely) posting a transaction to a remote mainframe system through a screen scraper library.
Bad reasons to use CLR stored procs:
Minor performance improvements on something that would normally be done in the middle tier. Note that disk traffic is likely to be much slower than network traffic unless you are attemtping to stream huge amounts of data across a network connection.
CLR sprocs are cool and you want to put them on your C.V.
Can't write set-oriented SQL.

What is the best practice for persistence right now?

I come from a java background.
But I would like a cross-platform perspective on what is considered best practice for persisting objects.
The way I see it, there are 3 camps:
ORM camp
direct query camp e.g. JDBC/DAO, iBatis
LINQ camp
Do people still handcode queries (bypassing ORM) ? Why, considering the options available via JPA, Django, Rails.
There is no one best practice for persistence (although the number of people screaming that ORM is best practice might lead you to believe otherwise). The only best practice is to use the method that is most appropriate for your team and your project.
We use ADO.NET and stored procedures for data access (though we do have some helpers that make it very fast to write such as SP class wrapper generators, an IDataRecord to object translator, and some higher order procedures encapsulating common patterns and error handling).
There are a bunch of reasons for this which I won't go into here, but suffice to say that they are decisions that work for our team and that our team agrees with. Which, at the end of the day, is what matters.
I am currently reading up on persisting objects in .net. As such I cannot offer a best practice, but maybe my insights can bring you some benefit. Up until a few months ago I have always used handcoded queries, a bad habit from my ASP.classic days.
Linq2SQL - Very lightweight and easy to get up to speed. I love the strongly typed querying possibilities and the fact that the SQL is not executed at once. Instead it is executed when your query is ready (all the filters applied) thus you can split the data access from the filtering of the data. Also Linq2SQL lets me use domain objects that are separate from the data objects which are dynamically generated. I have not tried Linq2SQL on a larger project but so far it seems promising. Oh it only supports MS SQL which is a shame.
Entity Framework - I played around with it a little bit and did not like it. It seems to want to do everything for me and it does not work well with stored procedures. EF supports Linq2Entities which again allows strongly typed queries. I think it is limited to MS SQL but I could be wrong.
SubSonic 3.0 (Alpha) - This is a newer version of SubSonic which supports Linq. The cool thing about SubSonic is that it is based on template files (T4 templates, written in C#) which you can easily modify. Thus if you want the auto-generated code to look different you just change it :). I have only tried a preview so far but will look at the Alpha today. Take a look here SubSonic 3 Alpha. Supports MS SQL but will support Oracle, MySql etc. soon.
So far my conclusion is to use Linq2SQL until SubSonic is ready and then switch to that since SubSonics templates allows much more customization.
There is at least another one: System Prevalence.
As far as I can tell, what is optimal for you depends a lot on your circumstances. I could see how for very simple systems, using direct queries still could be a good idea. Also, I have seen Hibernate fail to work well with complex, legacy database schemata, so using an ORM might not always be a valid option. System Prevalence is supposed to unbeatingly fast, if you have enough memory to fit all your objects into RAM. Don't know about LINQ, but I suppose it has its uses, too.
So, as so often, the answer is: know a variety of tools for the job, so that you are able to use the one that's most appropriate for your specific situation.
The best practice depends on your situation.
If you need database objects in table structures with some sort of meaningful structure (so one column per field, one row per entity and so on) you need some sort of translation layer inbetween objects and the database. These fall into two camps:
If there's no logic in the database (just storage) and tables map to objects well, then an ORM solution can provide a quick and reliable persistence system. Java systems like Toplink and Hibernate are mature technologies for this.
If there is database logic involved in persistence, or your database schema has drifted from your object model significantly, stored procedures wrapped by Data Access Objects (with further patterns as you like) is a little more involved than ORM but more flexible.
If you don't need structured storage (and you need to be really sure that you don't, as introducing it to existing data is not fun), you can store serialized object graphs directly in the database, bypassing a lot of complexity.
I prefer to write my own SQL, but I apply all my refactoring techniques and other "good stuff" when I do so.
I have written data access layers, ORM code generators, persistence layers, UnitOfWork transaction management, and LOTS of SQL. I've done that in systems of all shapes and sizes, including extremely high-performance data feeds (forty thousand files totaling forty million transactions per day, each loaded within two minutes of real-time).
The most important criteria is destiny, as in control thereof. Don't ever let your ORM tool be an obstacle to getting your work done, or an excuse for not doing it right. Ultimately, all good SQL is hand-written and hand-tuned, but some decent tools can help you get a good first draft quickly.
I treat this issue the same way that I do my UI design. I write all my UIs directly in code, but I might use a visual designer to prototype some essential elements that I have in mind, then I tear apart the code it generates in order to kickstart my own.
So, use an ORM tool in any of its manifestations as a way to get a decent example--look at how it solves many of the issues that arise (key generation, associations, navigation, etc.). Tear apart its output, make it your own, then reuse the heck out of it.

Why should you use an ORM? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
If you are motivate to the "pros" of an ORM and why would you use an ORM to management/client, what are those reasons would be?
Try and keep one reason per answer so that we can see which one gets voted up as the best reason.
The most important reason to use an ORM is so that you can have a rich, object oriented business model and still be able to store it and write effective queries quickly against a relational database. From my viewpoint, I don't see any real advantages that a good ORM gives you when compared with other generated DAL's other than the advanced types of queries you can write.
One type of query I am thinking of is a polymorphic query. A simple ORM query might select all shapes in your database. You get a collection of shapes back. But each instance is a square, circle or rectangle according to its discriminator.
Another type of query would be one that eagerly fetches an object and one or more related objects or collections in a single database call. e.g. Each shape object is returned with its vertex and side collections populated.
I'm sorry to disagree with so many others here, but I don't think that code generation is a good enough reason by itself to go with an ORM. You can write or find many good DAL templates for code generators that do not have the conceptual or performance overhead that ORM's do.
Or, if you think that you don't need to know how to write good SQL to use an ORM, again, I disagree. It might be true that from the perspective of writing single queries, relying on an ORM is easier. But, with ORM's it is far too easy to create poor performing routines when developers don't understand how their queries work with the ORM and the SQL they translate into.
Having a data layer that works against multiple databases can be a benefit. It's not one that I have had to rely on that often though.
In the end, I have to reiterate that in my experience, if you are not using the more advanced query features of your ORM, there are other options that solve the remaining problems with less learning and fewer CPU cycles.
Oh yeah, some developers do find working with ORM's to be fun so ORM's are also good from the keep-your-developers-happy perspective. =)
Speeding development. For example, eliminating repetitive code like mapping query result fields to object members and vice-versa.
Making data access more abstract and portable. ORM implementation classes know how to write vendor-specific SQL, so you don't have to.
Supporting OO encapsulation of business rules in your data access layer. You can write (and debug) business rules in your application language of preference, instead of clunky trigger and stored procedure languages.
Generating boilerplate code for basic CRUD operations. Some ORM frameworks can inspect database metadata directly, read metadata mapping files, or use declarative class properties.
You can move to different database software easily because you are developing to an abstraction.
Development happiness, IMO. ORM abstracts away a lot of the bare-metal stuff you have to do in SQL. It keeps your code base simple: fewer source files to manage and schema changes don't require hours of upkeep.
I'm currently using an ORM and it has sped up my development.
So that your object model and persistence model match.
To minimise duplication of simple SQL queries.
The reason I'm looking into it is to avoid the generated code from VS2005's DAL tools (schema mapping, TableAdapters).
The DAL/BLL i created over a year ago was working fine (for what I had built it for) until someone else started using it to take advantage of some of the generated functions (which I had no idea were there)
It looks like it will provide a much more intuitive and cleaner solution than the DAL/BLL solution from http://wwww.asp.net
I was thinking about created my own SQL Command C# DAL code generator, but the ORM looks like a more elegant solution
Abstract the sql away 95% of the time so not everyone on the team needs to know how to write super efficient database specific queries.
I think there are a lot of good points here (portability, ease of development/maintenance, focus on OO business modeling etc), but when trying to convince your client or management, it all boils down to how much money you will save by using an ORM.
Do some estimations for typical tasks (or even larger projects that might be coming up) and you'll (hopefully!) get a few arguments for switching that are hard to ignore.
Compilation and testing of queries.
As the tooling for ORM's improves, it is easier to determine the correctness of your queries faster through compile time errors and tests.
Compiling your queries helps helps developers find errors faster. Right? Right. This compilation is made possible because developers are now writing queries in code using their business objects or models instead of just strings of SQL or SQL like statements.
If using the correct data access patterns in .NET it is easy to unit test your query logic against in memory collections. This speeds the execution of your tests because you don't need to access the database, set up data in the database or even spin up a full blown data context.[EDIT]This isn't as true as I thought it was as unit testing in memory can present difficult challenges to overcome. But I still find these integration tests easier to write than in previous years.[/EDIT]
This is definitely more relevant today than a few years ago when the question was asked, but that may only be the case for Visual Studio and Entity Framework where my experience lies. Plugin your own environment if possible.
.net tiers using code smith templates
http://nettiers.com/default.aspx?AspxAutoDetectCookieSupport=1
Why code something that can be generated just as well.
convince them how much time / money you will save when changes come in and you don't have to rewrite your SQL since the ORM tool will do that for you
I think one cons is that ORM will need some updation in your POJO. mainly related to schema, relation and query. so scenario where you are not suppose to make changes in model objects, might be because it is shared among more that on project or b/w client and server. so in such cases you will need to split it in two levels, which will require additional efforts .
i am an android developer and as you know mobile apps are usually not huge in size, so this additional effort to segregate pure-model and orm-affected-model does not seems worth full.
i understand that question is generic one. but mobile apps are also come inside generic umbrella.

What is a good balance in an MVC model to have efficient data access?

I am working on a few PHP projects that use MVC frameworks, and while they all have different ways of retrieving objects from the database, it always seems that nothing beats writing your SQL queries by hand as far as speed and cutting down on the number of queries.
For example, one of my web projects (written by a junior developer) executes over 100 queries just to load the home page. The reason is that in one place, a method will load an object, but later on deeper in the code, it will load some other object(s) that are related to the first object.
This leads to the other part of the question which is what are people doing in situations where you have a table that in one part of the code only needs the values for a few columns, and another part needs something else? Right now (in the same project), there is one get() method for each object, and it does a "SELECT *" (or lists all the columns in the table explicitly) so that anytime you need the object for any reason, you get the whole thing.
So, in other words, you hear all the talk about how SELECT * is bad, but if you try to use a ORM class that comes with the framework, it wants to do just that usually. Are you stuck to choosing ORM with SELECT * vs writing the specific SQL queries by hand? It just seems to me that we're stuck between convenience and efficiency, and if I hand write the queries, if I add a column, I'm most likely going to have to add it to several places in the code.
Sorry for the long question, but I'm explaining the background to get some mindsets from other developers rather than maybe a specific solution. I know that we can always use something like Memcached, but I would rather optimize what we can before getting into that.
Thanks for any ideas.
First, assuming you are proficient at SQL and schema design, there are very few instances where any abstraction layer that removes you from the SQL statements will exceed the efficiency of writing the SQL by hand. More often than not, you will end up with suboptimal data access.
There's no excuse for 100 queries just to generate one web page.
Second, if you are using the Object Oriented features of PHP, you will have good abstractions for collections of objects, and the kinds of extended properties that map to SQL joins. But the important thing to keep in mind is to write the best abstracted objects you can, without regard to SQL strategies.
When I write PHP code this way, I always find that I'm able to map the data requirements for each web page to very few, very efficient SQL queries if my schema is proper and my classes are proper. And not only that, but my experience is that this is the simplest and fastest way to implement. Putting framework stuff in the middle between PHP classes and a good solid thin DAL (note: NOT embedded SQL or dbms calls) is the best example I can think of to illustrate the concept of "leaky abstractions".
I got a little lost with your question, but if you are looking for a way to do database access, you can do it couple of ways. Your MVC can use Zend framework that comes with database access abstractions, you can use that.
Also keep in mind that you should design your system well to ensure there is no contention in the database as your queries are all scattered across the php pages and may lock tables resulting in the overall web application deteriorating in performance and becoming slower over time.
That is why sometimes it is prefereable to use stored procedures as it is in one place and can be tuned when we need to, though other may argue that it is easier to debug if query statements are on the front-end.
No ORM framework will even get close to hand written SQL in terms of speed, although 100 queries seem unrealistic (and maybe you are exaggerating a bit) even if you have the creator of the ORM framework writing the code, it will always be far from the speed of good old SQL.
My advice is, look at the whole picture not only speed:
Does the framework improves code readability?
Is your team comfortable with writing SQL and mixing it with code?
Do you really understand how to optimize the framework queries? (I think a get() for each object is not the optimal way of retrieving them)
Do the queries (after optimization) of the framework present a bottleneck?
I've never developed anything with PHP, but I think that you could mix both approaches (ORM and plain SQL), maybe after a thorough profiling of the app you can determine the real bottlenecks and only then replace that ORM code for hand written SQL (Usually in ruby you use ActiveRecord, then you profile the application with something as new relic and finally if you have a complicated AR query you replace that for some SQL)
Regads
Trust your experience.
To not repeat yourself so much in the code you could write some simple model-functions with your own SQL. This is what I am doing all the time and I am happy with it.
Many of the "convenience" stuff was written for people who need magic because they cannot do it by hand or just don't have the experience.
And after all it's a question of style.
Don't hesitate to add your own layer or exchange or extend a given layer with your own stuff. Keep it clean and make a good design and some documentation so you feel home when you come back later.

SQL Server - Using CLR integration to consume a Web Service

There are a few tutorials on the web that describe consuming a Web Service using SQL Server 2005's CLR integration. For the most, the process seems pretty convoluted. I've run into several issues including the need to change my database's trust level, and using the sgen tool to create a static XmlSerializer assembly; and I still haven't gotten it working right... (I'm sure I just need to put a little more time and energy into it)
What are the security, performance, and maintenance implications when going to this type of architecture? This would likely be a fairly heavily-used process, and ease of maintenance is relatively important.
I do have freedom to choose whether to integrate this into SQL Server as a UDF, or have it be a stand alone .NET library for console/Web applications. Is the SQL CLR integration with external assemblies worth the trouble?
I think you have answered your own question, I personally find that anything calling a WebService is more than likley better suited to exist OUTSIDE of SQL Server. The complications, elevated trust levels, and as you mentioned overall convoluted process makes it a hard to document and hard to maintain solution.
The short answer is, no, SQL CLR Integration is probably not worth the trouble.
The longer answer has several points, beginning with programming CLR in the database. It's a fine tool, when used correctly, but it does increase memory consumption and can lead to performance issues if not done correctly. I use it in my database for very specialized functionality, such as adding RegEx ability, but it's used sparingly, with well-tested code to prevent as many issues as possible from cropping up.
A second is, as you pointed out, you've got to modify security, opening up potential risks.
Use a stand alone application to load the data into your server. You'll have more control, less risk and a much easier time of it.
I have been doing clr procedures which calls webservices both on Exchange and AD and I agree to the posts above. It works, but we quickly ran into out-of-memory issues because of the special way memory is handled in CLR inside sql server. As you can imagine performance is ok for small queries but does not scale at all.
Generally your database performance determines the performance of your application and I think putting such logic in your database is a no-no if you don't have completely control over what you are doing.
Use CLR for simple text manipulations and other calculations that does not depend on external resources.

Resources