What to choose: Stored Procedure of Dynamic SQL in Postgresql - database

This is a Postgres specific question. I am in a middle of the classic design situation where I have to decide whether to use Stored Procedure or Dynamic SQL (Prepared statement). I have read a lot and lot of blogs regarding the same and have come to conclusion that with current implementation of advanced database systems, there isn't any specific attribute that would weigh one over the other.
Hence my question is being Postgresql specific.
What I want to ask is, are there advantages or disadvantages of using Stored Procedures in Postgres?
More about my design: As we are using Postgres specific functions like width_bucket and relying on various other things like Partitioning and Inheritance that Postgres provides, it is unlikely that we would switch to any other database provider in future. Our queries would be complex queries involving building of graphs and reports from the real time/ non-real time data.
There also would be some analytics built. Moreover, we would also be planning of sharding and partitioning our database.
I want view points on the use of Stored Procedure with the type of system and environment I have describe above, specific to Postgresql.
I would also like to understand how query optimization and execution works in Postgres.

Ok, so your question is whether to create sql on the client side and send it to the server, vs stored procedures. Note, usually if you use stored procedures, you still have to create the sql that calls them so it is not purely an either/or. So this is about a relational interface vs stored procedures.
Additionally it is worth noting that a key question is whether this is a database owned by an application or something that many applications may use. In the former, you may not worry about encapsulation, but in the latter you want to think about your database as having a service interface.
So if it is "my application has a database and all material use goes through my application" then go with dynamic SQL against the underlying tables.
If your database has one or more applications, however, you want to make sure you can change your database structure without breaking any or all of your databases. This usually means encapsulating access behind some sort of abstract interface. This can be a use of VIEWs or stored procedures.
Views have an advantage that they can be directly manipulated in SQL, and are very flexible. This allows wide-open retrieval (and with some work storage) of data behind them. The application does not need to know how data is physically stored, just how to access it.
Stored procedures have the same benefit of encapsulation but provide a much more limited interface. They also have the problem that usually people use them in ways that require a fixed number of arguments, so adding an argument requires close coordination of updates for the db and the application (Oracle's revision based editions are a solution to this problem but PostgreSQL has nothing similar). However, one can discover arguments and handle them appropriately at run-time with a little work.
All in all this is a wide question and the specifics will be more important than generalities.

Related

Does it make sense to use an OR-Mapper?

Does it make sense to use an OR-mapper?
I am putting this question of there on stack overflow because this is the best place I know of to find smart developers willing to give their assistance and opinions.
My reasoning is as follows:
1.) Where does the SQL belong?
a.) In every professional project I have worked on, security of the data has been a key requirement. Stored Procedures provide a natural gateway for controlling access and auditing.
b.) Issues with Applications in production can often be resolved between the tables and stored procedures without putting out new builds.
2.) How do I control the SQL that is generated? I am trusting parse trees to generate efficient SQL.
I have quite a bit of experience optimizing SQL in SQL-Server and Oracle, but would not feel cheated if I never had to do it again. :)
3.) What is the point of using an OR-Mapper if I am getting my data from stored procedures?
I have used the repository pattern with a homegrown generic data access layer.
If a collection needed to be cached, I cache it. I also have experience using EF on a small CRUD application and experience helping tuning an NHibernate application that was experiencing performance issues. So I am a little biased, but willing to learn.
For the past several years we have all been hearing a lot of respectable developers advocating the use of specific OR-Mappers (Entity-Framework, NHibernate, etc...).
Can anyone tell me why someone should move to an ORM for mainstream development on a major project?
edit: http://www.codinghorror.com/blog/2006/06/object-relational-mapping-is-the-vietnam-of-computer-science.html seems to have a strong discussion on this topic but it is out of date.
Yet another edit:
Everyone seems to agree that Stored Procedures are to be used for heavy-duty enterprise applications, due to their performance advantage and their ability to add programming logic nearer to the data.
I am seeing that the strongest argument in favor of OR mappers is developer productivity.
I suspect a large motivator for the ORM movement is developer preference towards remaining persistence-agnostic (don’t care if the data is in memory [unless caching] or on the database).
ORMs seem to be outstanding time-savers for local and small web applications.
Maybe the best advice I am seeing is from client09: to use an ORM setup, but use Stored Procedures for the database intensive stuff (AKA when the ORM appears to be insufficient).
I was a pro SP for many, many years and thought it was the ONLY right way to do DB development, but the last 3-4 projects I have done I completed in EF4.0 w/out SP's and the improvements in my productivity have been truly awe-inspiring - I can do things in a few lines of code now that would have taken me a day before.
I still think SP's are important for some things, (there are times when you can significantly improve performance with a well chosen SP), but for the general CRUD operations, I can't imagine ever going back.
So the short answer for me is, developer productivity is the reason to use the ORM - once you get over the learning curve anyway.
A different approach... With the raise of No SQL movement now, you might want to try object / document database instead to store your data. In this way, you basically will avoid the hell that is OR Mapping. Store the data as your application use them and do transformation behind the scene in a worker process to move it into a more relational / OLAP format for further analysis and reporting.
Stored procedures are great for encapsulating database logic in one place. I've worked on a project that used only Oracle stored procedures, and am currently on one that uses Hibernate. We found that it is very easy to develop redundant procedures, as our Java developers weren't versed in PL/SQL package dependencies.
As the DBA for the project I find that the Java developers prefer to keep everything in the Java code. You run into the occassional, "Why don't I just loop through all the Objects that just returned?" This caused a number of "Why isn't the index taking care of this?" issues.
With Hibernate your entities can contain not only their linked database properties, but can also contain any actions taken upon them.
For example, we have a Task Entity. One could Add or Modify a Task among other things. This can be modeled in the Hibernate Entity in Named Queries.
So I would say go with an ORM setup, but use procedures for the database intensive stuff.
A downside of keeping your SQL in Java is that you run the risk of developers using non-parameterized queries leaving your app open to a SQL Injection.
The following is just my private opinion, so it's rather subjective.
1.) I think that one needs to differentiate between local applications and enterprise applications. For local and some web applications, direct access to the DB is okay. For enterprise applications, I feel that the better encapsulation and rights management makes stored procedures the better choice in the end.
2.) This is one of the big issues with ORMs. They are usually optimized for specific query patterns, and as long as you use those the generated SQL is typically of good quality. However, for complex operations which need to be performed close to the data to remain efficient, my feeling is that using manual SQL code is stilol the way to go, and in this case the code goes into SPs.
3.) Dealing with objects as data entities is also beneficial compared to direct access to "loose" datasets (even if those are typed). Deserializing a result set into an object graph is very useful, no matter whether the result set was returned by a SP or from a dynamic SQL query.
If you're using SQL Server, I invite you to have a look at my open-source bsn ModuleStore project, it's a framework for DB schema versioning and using SPs via some lightweight ORM concept (serialization and deserialization of objects when calling SPs).

SQL Server CLR stored procedures in data processing tasks - good or evil?

In short - is it a good design solution to implement most of the business logic in CLR stored procedures?
I have read much about them recently but I can't figure out when they should be used, what are the best practices, are they good enough or not.
For example, my business application needs to
parse a large fixed-length text file,
extract some numbers from each line in the file,
according to these numbers apply some complex business rules (involving regex matching, pattern matching against data from many tables in the database and such),
and as a result of this calculation update records in the database.
There is also a GUI for the user to select the file, view the results, etc.
This application seems to be a good candidate to implement the classic 3-tier architecture: the Data Layer, the Logic Layer, and the GUI layer.
The Data Layer would access the database
The Logic Layer would run as a WCF service and implement the business rules, interacting with the Data Layer
The GUI Layer would be a means of communication between the Logic Layer and the User.
Now, thinking of this design, I can see that most of the business rules may be implemented in a SQL CLR and stored in SQL Server. I might store all my raw data in the database, run the processing there, and get the results. I see some advantages and disadvantages of this solution:
Pros:
The business logic runs close to the data, meaning less network traffic.
Process all data at once, possibly utilizing parallelizm and optimal execution plan.
Cons:
Scattering of the business logic: some part is here, some part is there.
Questionable design solution, may encounter unknown problems.
Difficult to implement a progress indicator for the processing task.
I would like to hear all your opinions about SQL CLR. Does anybody use it in production? Are there any problems with such design? Is it a good thing?
I do not do it - CLR ins SQL Server is great for many things (calculating hashes, do string manipulation that SQL just sucks in, regex to validate field values etc.), but complex logic, IMHO, has no business in the database.
It is a single point of performance problems and also VERY expensive to scale up. Plus, either I put it all in there, or - well - I have a serious problem maintenance wise.
Personally I prefer to have business functionality not dependent on the database. I only use CLR stored procedures when I need advanced data querying (to produce a format that is not easy to do in SQL). Depending on what you are doing, I tend to get better performance results with standard stored procs anyway, so I personally only use them for my advanced tasks.
My two cents.
HTH.
Generally, you probably don't want to do this unless you can get a significant performance advantage or there is a compelling technical reason to do it. An example of such a reason might be a custom aggregate function.
Some good reasons to use CLR stored procedures:
You can benefit from a unique capability of the technology such as a custom aggregate function.
You can get a performance benefit from a CLR Sproc - perhaps a fast record-by-record processing task where you can read from a fast forward cursor, buffer the output in core and bulk load it to the destination table in batches.
You want to wrap a bit of .Net code or a .Net library and make it available to SQL code running on the database server. An example of this might be the Regex matcher from the OP's question.
You want to cheat and wrap something unmanaged and horribly insecure so as to make it accessible from SQL code without using XPs. Technically, Microsoft have stated that XP's are deprecated, and many installations disable them for security reasons.From time to time you don't have the option of changing the client-side code (perhaps you have an off-the-shelf application), so you may need to initiate external actions from within the database. In this case you may need to have a trigger or stored procedure interact with the outside world, perhaps querying the status of a workflow, writing something out to the file system or (more extremely) posting a transaction to a remote mainframe system through a screen scraper library.
Bad reasons to use CLR stored procs:
Minor performance improvements on something that would normally be done in the middle tier. Note that disk traffic is likely to be much slower than network traffic unless you are attemtping to stream huge amounts of data across a network connection.
CLR sprocs are cool and you want to put them on your C.V.
Can't write set-oriented SQL.

What is best flexibility and why? Using db views, db tables, stored proc. and objects in tables

I want to know what is best practice for using db views, db tables, stored proc. and objects in tables... Which of these is more flexible and why, can You explain?
Each tool has its uses. Your choices will depend on the nature of the application, and its security, performance, and agility requirements.
Nowadays many programmers use Data Access Layers (DALs) for this sort of thing. Many DALs allow you to specify views and stored procedures to call. But you can also run queries against the tables directly, without the need for stored procedures or views.
Unless you are using an object database, you will be dealing with tables rather than objects. Most applications nowadays use table-based database systems, because they are so common, and you can use DALs to manage the object-relational impedance mismatch.
Stored procedures are used when high-performance is needed, and programmatic things need to be accomplished on the database itself (the addition of a timestamp value perhaps, or the addition/subtraction of child records). A good DAL will provide high performance without necessarily requiring the use of stored procedures.
Views are used to manage the interface between the database and the consumer of the data. In particular, the data can be filtered for security purposes. In large database scenarios, a DBA designs and creates the tables, and manages the views and stored procedures that the user is allows to use to access the data.
If you are looking for ultimate flexibility, most of what you need to do can be accomplished in the DAL, without the need for views or stored procedures. But again, it depends on what your application's requirements are. I would say that the larger your application and user base is, the more likely you are to use views and stored procedures in your application.
I would say that, for the most part, stored procedures are a relic of the '90s:
completely database-dependent
bad language choice for general purpose programming, regardless if it's plpgsql, t-sql or something else
hard to debug
low code scalability, a problem shared with any procedural programming language
code versioning issues
That's not to say that they (like triggers, views and rules) don't have anything to give: large scale reporting and data aggregation is one example of tasks at which they handle fairly well. For the rest, logic is better placed in the business logic layer (a service, domain entities...whatever) where a variety of tools and more advanced programming paradigms are available.
Ditto for views and triggers.
In e.g. a Java environment, JPA does much better 90+% of the time:
learn one query language and apply it to any database
the business logic is more focused, in one place in the application, the BLL
the code is easier to read and write and it's easier to find people who understand it
it's possible to express logic spanning multiple databases in a single unit of code
...and the list goes on.

What's the benefit of using a lot of complex stored procedures

For typical 3-tiered application, I have seen that in many cases they use a lot of complex stored procedures in the database. I cannot quite get the benefit of this approach. In my personal understanding, there are following disadvantages on this approach:
Transactions become coarse.
Business logic goes into database.
Lots of computation is done in the database server, rather than in the application server. Meanwhile, the database still needs to do its original work: maintain data. The database server may become a bottleneck.
I can guess there may be 2 benefits of it:
Change the business logic without compile. But the SPs are much more harder to maintain and test than Java/C# code.
Reduce the number of DB connection. However, in the common case, the bottleneck of database is hard disk io rather than network io.
Could anyone please tell me the benefits of using a lot of stored procedures rather than letting the work be done in business logic layer?
Basically, the benefit is #2 of your problem list - if you do a lot of processing in your database backend, then it's handled there and doesn't depend on the application accessing the database.
Sure - if your application does all the right things in its business logic layer, things will be fine. But as soon as a second and a third application need to connect to your database, suddenly they too have to make sure to respect all the business rules etc. - or they might not.
Putting your business rules and business logic in the database ensures that no matter how an app, a script, a manager with Excel accesses your database, your business rules will be enforced and your data integrity will be protected.
That's the main reason to have stored procs instead of code-based BLL.
Also, using Views for read and Stored Procs for update/insert, the DBA can remove any direct permissions on the underlying tables. Your users do no longer need to have all the rights on the tables, and thus, your data in your tables is better protected from unadvertent or malicious changes.
Using a stored proc approach also gives you the ability to monitor and audit database access through the stored procs - no one will be able to claim they didn't alter that data - you can easily prove it.
So all in all: the more business critical your data, the more protection layer you want to build around it. That's what using stored procs is for - and they don't need to be complex, either - and most of those stored procs can be generated based on table structure using code generation, so it's not a big typing effort, either.
Don't fear the DB.
Let's also not confuse business logic with data logic which has its rightful place at the DB.
Good systems designers will encompass flexible business logic through data logic, i.e. abstract business rule definitions which can be driven by the (non)existence or in attributes of data rows.
Just FYI, the most successful and scalable "enterprise/commercial" software implementations with which I have worked put all projection queries into views and all data management either into DB procedures or triggers on staged tables.
Network between appServer and sqlServer is the bottle neck very often.
Stored procedures are needed when you need to do complex query.
For example you want collect some data about employee by his surname. Especially imagine, that data in DB looks like some kind of tree - you have 3 records about this employee in table A. You have 10 records in table B for each record in table A. You have 100 records in table C for each record in table B. And you want to get only special 5 records from table C about that employee. Without stored procedures you will get a lot of queries traffic between appServer and sqlServer, and a lot of code in appServer. With stored procedure which accepts employee surname, fetches those 5 records and returns them to appServer you 1) decrease traffic by hundreds times, 2) greatly simplify appServer code.
The life time of our data exceeds that of our applications. Also data gets shared between applications. So many applications will insert data into the database, many applications will retrieve data from it. The database is responsible for the completeness, integrity and correctness of the data. Therefore it needs to have the authority to enforce the business rules relating to the data.
Taking you specific points:
Transactions are Units Of Work. I
fail to see why implementing
transactions in stored procedures
should change their granularity.
Business logic which applies to the
data belongs with the data: that
maximises cohesion.
It is hard to write good SQL and to
learn to think in sets. Therefore
it may appear that the database is
the bottleneck. In fact, if we are
undertaking lots of work which
relates to the data the database is
probably the most efficient place to
do.
As for maintenance: if we are familiar with PL/SQL, T-SQL, etc maintenance is easier than it might appear from the outside. But I concede that tool support for things like refactoring lags behind that of other languages.
You listed one of the main ones putting business logic in the Db often gives the impression of making it easier to maintain.
Generally complex SP logic in the db allows for cheaper implementation of the actual implementation code, which may be beneficial if its a transitional application (say being ported from legacy code), its code which needs to be implemented in several languages (for instance to market on different platforms or devices) or because the problem is simpler to solve in the db.
One other reason for this is often there is a general "best practice" to encapsulate all access to the db in sps for security or performance reasons. Depending on your platform and what you are doing with it this may or may not be marginally true.
I don't think there are any. You are very correct that moving the BL to the database is bad, but not for everything. Try taking a look at Domain Driven Design. This is the antidote to massive numbers of SPROCs. I think you should be using your database as somewhere to store you business objects, nothing more.
However, SPROCs can be much more efficient on certain, simple functions. For instance, you might want to increase the salary to every employee in your database by a fixed percentage. This is quicker to do via a SPROC than getting all the employees from the db, updating them and then saving them back.
I worked in a project where every thing is literally done in database level. We wrote lot of stored procedures and did lot of business validation / logic in the database. Most of the times it became a big overhead for us to debug.
The advantages I felt may be
Take advantage of full DB features.
Database intense activities like lot of insertion/updation can be better done in DB level. Call an SP and let it do all the work instead of hitting DB several times.
New DB servers can accommodate complex operations so they no longer see this as a bottleneck. Oh yeah, we used Oracle.
Looking at it now, I think few things could have been better done at application level and lesser at DB level.
It depends almost entirely on the context.
Doing work on the server rather than on the clients is generally a bad idea as it makes your server less scalable. However, you have to balance this against the expected workload (if you know you will only have 100 users in a closed enironment, you may not need a scalable server) and network traffic costs (if you have to read a lot of data to apply calculations/processes to, then it can be cheaper/faster overall to run those calculations on the server and only send the results over the net).
Also, if you have custom client applications (as opposed to web browsers etc) it makes it very easy to push updates out to your clients, because you don't need to recompile and deploy the client code, you simply upgrade the database stored procedures.
Of course, using stored procedures rather than executing dynamically compiled SQL statements can be more efficient (it's precompiled, and the code doesn't need to be uploaded to the server) and aids encapsulation to give the database better integrity/security. But by the sound of it, you're talking about masses of busines logic, not simple efficiency and security measures.
As with most things, a sensible compromise/balance is needed. Stored Procedures should be used enough to enhance efficiency and security, but you don't want your server to become unscalable.
"there are following disadvantages on this approach:
...
Business logic goes into database."
Insofar as by "busines logic" you mean "enforcement of business rules", the DBMS is EXACTLY where "business logic" belongs.

If all my sql server database access is done thru stored procedures

If all my sql server database access is done thru stored procedures, and I plan on continuing that practice, is using linq2sql and/or the entity framework for future projects an unnecessary layer of complexity that doesn't add much value?
Related question: is Microsoft trying to steer developers away from relying on stored procs for data access?
No. LINQ2SQL adds a lot of value in terms of being able to easily map your database entities to classes in your code and work with those classes easily with native langugage constructs. You can easily map the CRUD operations of the generated entity class onto your stored procedures if you want. I do find that some things no longer require stored procedures to work easily and so I have moved away from using them, but you are not forced to. Essentially what LINQ2SQL can do is replace much, if not all, of your DAL, saving you from having to write this code.
I use Linq2sql for calling my stored procedures as well just because its so fast to generate .net code I can call from my app, its just drag and drop, basically done in seconds. Still I think you need to ask yourself how much time you spend maintaing those stored procedures. If you do you would save a lot of time using linq2sql doing your crud calls. I use sprocs only when doing multiple step operations on the database.
LINQ-to-SQL supports mapping lots of operations (including CRUD) to stored procedures, but you lose composability - i.e. you can't just add (exp).Where(x=>x.IsActive). One option there is table-valued functions (UDFs) in place of stored procedures that query data; this also presents a more rigid meta model (rather than SET FMT_ONLY ON, which is hit'n'miss).
That way, your query methods are composable at the database; but note that Entity Framework does not support this, even though LINQ-to-SQL does.
Just to address your related question: The benefits of using stored procedures are not as prominent as they once were. Though I would not go as far as saying stored procedures are evil, as some have said in the past (http://www.tonymarston.net/php-mysql/stored-procedures-are-evil.html - this is a very nice article though), I would say that the use of dynamic sql, as long as it's done in a well defined structured way, is perfectly acceptable these days.
I don't this Microsoft are trying to steer developers away from using stored procs but that dynamic sql should be seen as an acceptable option.
You lose the ability to write Linq queries, which is the best part of linq-to-sql.

Resources