For the first time in years I've been doing some T-SQL programming in SQL Server 2008 and had forgotten just how bad the language really is:
Flow control (all the begin/end stuff) feels clunky
Exception handling is poor. Exceptions dont bubble in the same way they do in every other language. There's no re-throwing unless you code it yourself and the raiserror function isn't even spelt correctly (this caused me some headaches!)
String handling is poor
The only sequence type is a table. I had to write a function to split a string based on a delimiter and had to store it in a table which had the string parts along with a value indicating there position in the sequence.
If you need to doa lookup in the stored proc then manipulating the results is painful. You either have to use cursors or hack together a while loop with a nested lookup if the results contain some sort of ordering column
I realize I could code up my stored procedures using C#, but this will require permissioning the server to allow CLR functions, which isn't an option at my workplace.
Does anyone know if there are any alternatives to T-SQL within SQL Server, or if there are any plans to introduce something. Surely there's got to be a more modern alternative...
PS: This isn't intended to start a flame-war, I'm genuinely interested in what the options are.
There is nothing wrong with T-SQL; it does the job it was intended for (except perhaps for the addition of control flow structures, but I digress!).
Perhaps take a look at LINQ? You can write CLR Stored Procedures, but I don't recommended this unless it is for some feature that's missing (or heavy string handling).
All other database stored procedure languages (PL/SQL, SQL/PSM) have about the same issues. Personally, i think these languages are exactly right for what they are intended to be used for - they are best used to do code data-driven logic, esp. if you want to reuse that for multiple applications.
So I guess my counter question to you is, why do you want your program to run as part of the database server process? Isn't what you're trying to do better solved at the application or middle-ware level? There you can take any language or data-processing tool of your choosing.
From My point of view only alternative to T-SQL within SQL Server is to NOT use SQL Server
According to your point handling stings whit delimiter ,
From where cames these strings ?
You could try Integration services and "ssis packages" for converting data from one to other.
Also there is nice way to access non SQL data over Linked Serves,
Related
I started to learn functions and stored procedures in Microsoft SQL Server and noticed that everything that they do also can be done with a queries. I'm sure, they exist for a reason, so I'd like to ask:
What can be done using functions / procedures that's impossible to do with a query?
In which cases one should use procedures/functions and not queries?
There's nothing magic about functions or procedures - there's nothing you cannot do in an ad-hoc query as well.
They can be used to reuse some code - write the function once and use it everywhere, instead of writing the same T-SQL code over and over and over again.
And they can be used to combine code that belongs together (like withdraw amount x from account #1 and deposit it into account #2) into a single, reusable procedure which can also handle transactions internally.
So basically: nothing magic - but using functions and procedures (like in any other programming language) can help reuse and centralize some code, and make your life easier
I have an application that uses PostgreSQL but also interacts with a third-party-controlled database on MSSQL. The data are sometimes tied together closely enough that it becomes desirable to do things like:
select thing_from_pg, thing_from_ms_crossover_function(thing_from_pg) -- etc
Currently I implement thing_from_ms_crossover_function in plperl. Is there a way to do this in plpgsql or something, so that I don't need to start a plperl interpreter for such cases?
Another option is obviously to access both databases from my client app, but that becomes far less convenient than the view syntax above.
You have two basic options, well three basic ones rather.
The first is to use DBI-Link and then access this via your pl/pgsql or pl/perl function. The nice thing about DBI-Link is that it is relatively older and mature. If it works for you I would start there.
The second option is to use foreign data wrappers.
The third option is to write a more general framework in something like pl/perl that you can call from pl/pgsql. However at that point you are basically looking at re-inventing DBI-Link so I think you are better off starting with DBI-Link and modifying it as needed.
I need to profile reference fields in a database to understand the patterns they are composed of. This needs to be done at a character level as there will be no spaces or punctuation in the reference fields.
As an example I'm looking for a solution that will take input like:
ABA1235DV6778
ABA1235DV6788
ABA2335DV6778
And suggest patterns like:
ABA\d\d35DV67\d\d
This will be used to later validate those reference fields once I can understand the permissable values in those columns.
I have looked at the profiling functionality in SSIS but it seems to lack granularity. Does anybody know how I can tune the profiling in SSIS 2008 or have an efficient function for SQL Server 2008 that can be used to achieve this?
Any help would be greatly appreciated,
Niall
It's not really clear from your post exactly what logic you want to apply to the strings. I'm guessing you want to use some form of edit distance calculation to identify similar strings, then generate a regular expression that matches them all. Those are typically tasks that would be implemented in an external program written in an appropriate language, not in SSIS or SQL Server. It is certainly not something you can do with pre-existing SSIS functionality.
So I would forget SSIS for now and work out the best way to implement your algorithm in .NET (or whatever other language you're comfortable with). Once you've done that you can decide whether to:
Write a self-contained executable and call it from an Execute Process task
Write a .NET DLL and use it in a Script Task, Script Component or CLR stored procedure
Write your own custom SSIS component
Write a complete program instead of using SSIS
We have a database with ~100K business objects in it. Each object has about 40 properties which are stored amongst 15 tables. I have to get these objects, perform some transforms on them and then write them to a different database (with the same schema.)
This is ADO.Net 3.5, SQL Server 2005.
We have a library method to write a single property. It figures out which of the 15 tables the property goes into, creates and opens a connection, determines whether the property already exists and does an insert or update accordingly, and closes the connection.
My first pass at the program was to read an object from the source DB, perform the transform, and call the library routine on each of its 40 properties to write the object to the destination DB. Repeat 100,000 times. Obviously this is egregiously inefficent.
What are some good designs for handling this type of problem?
Thanks
This is exactly the sort of thing that SQL Server Integration Services (SSIS) is good for. It's documented in Books Online, same as SQL Server is.
Unfortunately, I would say that you need to forget your client-side library, and do it all in SQL.
How many times do you need to do this? If only once, and it can run unattended, I see no reason why you shouldn't reuse your existing client code. Automating the work of human beings is what computers are for. If it's inefficient, I know that sucks, but if you're going to do a week of work setting up a SSIS package, that's inefficient too. Plus, your client-side solution could contain business logic or validation code that you'd have to remember to carry over to SQL.
You might want to research Create_Assembly, moving your client code across the network to reside on your SQL box. This will avoid network latency, but could destabilize your SQL Server.
Bad news: you have many options
use flatfile transformations: Extract all the data into flatfiles, manipulate them using grep, awk, sed, c, perl into the required insert/update statements and execute those against the target database
PRO: Fast; CON: extremly ugly ... nightmare for maintanance, don't do this if you need this for longer then a week. And a couple dozens of executions
use pure sql: I don't know much about sql server, but I assume it has away to access one database from within the other, so one of the fastes ways to do this is to write it as a collection of 'insert / update / merge statements fed with select statements.
PRO: Fast, one technology only; CON: Requires direct connection between databases You might reach the limit of SQL or the available SQL knowledge pretty fast, depending on the kind of transformation.
use t-sql, or whatever iterative language the database provides, everything else is similiar to pure sql aproach.
PRO: pretty fast since you don't leave the database CON: I don't know t-sql, but if it is anything like PL/SQL it is not the nicest language to do complex transformation.
use a high level language (Java, C#, VB ...): You would load your data into proper business objects manipulate those and store them in the database. Pretty much what you seem to be doing right now, although it sounds there are better ORMs available, e.g. nhibernate
use a ETL Tool: There are special tools for extracting, transforming and loading data. They often support various databases. And have many strategies readily available for deciding if an update or insert is in place.
PRO: Sorry, you'll have to ask somebody else for that, I so far have nothing but bad experience with those tools.
CON: A highly specialized tool, that you need to master. I my personal experience: slower in implementation and execution of the transformation then handwritten SQL. A nightmare for maintainability, since everything is hidden away in proprietary repositories, so for IDE, Version Control, CI, Testing you are stuck with whatever the tool provider gives you, if any.
PRO: Even complex manipulations can be implemented in a clean maintainable way, you can use all the fancy tools like good IDEs, Testing Frameworks, CI Systems to support you while developing the transformation.
CON: It adds a lot of overhead (retrieving the data, out of the database, instanciating the objects, and marshalling the objects back into the target database. I'd go this way if it is a process that is going to be around for a long time.
Building on the last option you could further glorify the architectur by using messaging and webservices, which could be relevant if you have more then one source database, or more then one target database. Or you could manually implement a multithreaded transformer, in order to gain through put. But I guess I am leaving the scope of your question.
I'm with John, SSIS is the way to go for any repeatable process to import large amounts of data. It should be much faster than the 30 hours you are currently getting. You could also write pure t-sql code to do this if the two database are on the same server or are linked servers. If you go the t-sql route, you may need to do a hybrid of set-based and looping code to run on batches (of say 2000 records at a time) rather than lock up the table for the whole time a large insert would take.
LINQ simplifies database programming no doubt, but does it have a downside? Inline SQL requires one to communicate with the database in a certain way that opens the database to injections. Inline SQL must also be syntax-checked, have a plan built, and then executed, which takes precious cycles. Stored procedures have also been a rock-solid standard in great database application programming. Many programmers I know use a data layer that simplifies development, however, not to the extent LINQ does. Is it time to give up on the SP's and go LINQ?
LINQ to SQL actually presents some alarming performance problems in the database. Basically, it creates multiple execution plans based on the length of the parameter you are using. I posted about it a while back on my blog LINQ to SQL may cause performance problems.
Now, is that to say that LINQ doesn't have a place? Hardly. LINQ definitely has a place in the development toolkit, just like stored procedures. Ultimately, you want to use stored procedures when performance is absolutely necessary and use an ORM tool in any other situation.
As far as inline SQL goes, there are ways to execute inline SQL so that the plan is only built once and is never recompiled. Most ORMs should take care of this aspect of performance tuning as well and using these methods is usually the safest way to execute your SQL since it forces you to use parameterized queries.
Like most database solutions, the right answer depends on the problem you're trying to solve. If you favor development speed over database/application performance, then using LINQ or another DAL/ORM tool is the best way to go. If you favor performance over ease of development, then using stored procedures and pure datasets is going to be your best bet. LLBLGen even provides a LINQ to LLBLGen layer so you can use LINQ to query LLBLGen's objects and have LLBLGen actually handle building your queries and avoid some of the downfalls of LINQ.
Your basic premise is flawed..
Inline SQL requires one to communicate with the database in a certain way that opens the database to injections.
No it doesn't. Hard-coding user-inputted values into a SQL statement does, but you could do that with store procedures as well.
Parameterizing your queries guards against injection attacks, but inline SQL can be parameterizing just as easily as stored procedures.
Inline SQL must also be syntax-checked, have a plan built, and then executed.
All Sql (SPs and inline) must be syntax-checked and have a plan built on their first call. Thereafter, the exact text of the request & the execution plan are cached. If another request with the exact same text (not counting parameters) is received, the cached execution plan is used.
So, if you hard-code values into inline SQL, the text won't match, and it will have to re-parse the query. However, if you use parameters, the text of the query will match, and you will get a cache hit. In which case, it wouldn't matter if the query in inline SQL or a SP.
In other words, the only problem with inline SQL is that it easy to do something that slow & insecure. But making inline SQL fast & secure is no more work that using a SP.
Which brings us to LINQ, which always using parameters, even if you hard-code the values into the LINQ statement, making "fast & secure" inline SQL trivial.
LINQ also have the advantage over SPs of having all your code in one spot, instead of scattered over two different machines.
If you're interested in benchmarking, Rico Mariani has an excellent 5-part study that covers the qualitative and quantitative differences.
He may be an MS guy, but he's known as a performance nut - his benchmarks are thorough and well thought out.
This is a performance run by Maximilian Beller. According to him, LINQ is much much slower.
Read his comprehensive study
Just think about changing a columns name - now change the (n)SPs and (x)Views.
Do everything that is expensive on the database (like searches , sorting etc..) and you won't notice a problem.
Also, if you want to display a large grid without paging ... then use a dataset - that one is faster.
StackOverflow also uses linq2sql - do you see a problem :) ?
Use an ORM - it's the way to go on most applications.
PS: also, about micro benchmarks - like .. let's select 10.000 rows with an ORM - DON'T DO IT. That's not why you use an ORM. If you want to select 10.000 rows use ADO.
It depends on what you're doing. LINQ is going to be less efficient at the actual data/set manipulation than a real database. But you'll save a lot in not having to connect to the database over a network.
If your database is on the same machine or is formally 'well-connected', you're probably better off using it.
But if you're getting back a large result set from a remote db that could mean significant transmission time, or if it's a really short query that won't justify the overhead, LINQ would likely be better.
Because of the structure of LINQ to SQL, there is no possible way it can be faster than using raw SQL, either your own well-formed queries or as a stored procedure. What LINQ buys you is not speed but type safety and organization; in short most of the benefits that ORMs generally grant you.
LINQ to SQL is not about speed, it's about building a more maintainable software system. It's about all the stuff dedicated Software Engineers and Architects care about, stuff like loose coupling and layering
That's not to say that you can't build some really unmaintainable code with LINQ -- nobody is keeping you from shooting yourself in the foot but you -- but done properly, LINQ can help tremendously. I'm not saying LINQ is a silver bullet, however. It has a host of issues that make it difficult to use in many enterprise situations -- which is why MS offers Entity Framework (ADO.NET 3.0). Of course, even that's not perfect given the recent EF Vote of No Confidence.
Is LINQ to SQL or even EF better than raw SQL? I'd say a resounding Hells Yeah. Are there other solutions that might work better? Maybe.