Hiring a SqlServer OLTP Specialist, What experience or requirements should I look for? [closed] - sql-server

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I'm running into Db performance issues with an OTLP project I'm working on. Another developer and I have reached the end of our accumulated performance knowledge and seek out an individual to join the team to help us speed up our application.
For some background we've done schema changes to denormalize pieces of the data, optimized every query, ran multiple database tuning advisers to get our indexes just right, tuned MSSql's server options.
We don't need somebody to come in and tell us joins can be slow and what deadlocking is, we need somebody who knows what to do after exhausting all the steps listed above.
Anybody have any tips or experiences hiring OLTP DBA's to share? What kind of questions can we ask the DBA during the interview process?
Its an odd situation to be in, we know we need somebody who knows more than the current team, but we don't know what questions to ask because we don't know what the next steps are. Does that make sense?

Ok, this tells me something:
For some background we've done schema changes to denormalize pieces of the data, optimized every query, ran multiple database tuning advisers to get our indexes just right, tuned MSSql's server options.
You've already matched or exceeded what 90% of people who call themselves DBAs will be able to do.
The problem is that a lot of DBAs aren't really programmers, they are more on the system administration side of things. You need a DBA programmer who is not only really good at TSQL, but who knows your other programming language(s) as well.
I spend a significant portion of my time on database tuning issues like this, and the solution often involves significant redesigns all the way from the front end back to the database schema. You can't solve these problems in isolation, and without complete control (and total understanding) of the entire architecture, you'll never get the performance you need.
You might be the best person for the job, and it might be smarter for you to hire somebody to take busy work off your plate so you can concentrate on the OLTP performance problems.

You have to be vary careful here, you can end up hiring a Guru DBA, have him improve the database performance significantly and still have issues with your application that are rooted in its architecture.
A few ideas:
Take the most complex query you optimized give it to the candidate DBA with QA and get him/her to optimise it again. Have them describe what they did and how they did it.
Make sure this person understands hardware, when you would use multiple filegroups, raid arrays, data partitioning, 64 vs 32 bit performance etc...
Look for someone who also has some software architecture background.
Ask them a few harder SQL server questions eg. What is the OVER statement? Are GUIDs good primary keys and why, is int identity preferable?

De-tune your DB back a little to before your own optimization and give it to them. See if they can tune it to perform as good or better compared to the changes you made.
Ask why they chose that technique.
Take up references and find out about the environments they've come from which are most likely to match your own.

A good DBA will be able to tell you in the interview at a high level what steps they would take next. You should pay more attention to the thought processes here, rather than the solution to the problem. When the DBA has given some solutions, go back and quiz them and ask them why the problem should be solved in those ways.
This method will very quickly distinguish the men from the boys, as it were.

How close are you to the max performance of the db? It is very easy to create an OLTP problem that is unsolvable with this technology. As Eric said, total architecture redesign might be in order. More ram, just add more ram :)

Certainly without seeing the database it is hard to say what could be the best way to optimize. Given what you have done, likely you need to hire a database designer - one with experience designing and tuning databases in the size range that you have. In asking how they would approach the problem see if the interviewer would look at first the poorly performing queries and run profiler to see what is happening and to identify them. The person should be able to answer specific questions on parameter sniffing and how to avoid it, what are the methods that can be used to avoid cursors, why do statistics need to be updated, what makes a query saregable. There are some common things that I would look at in performance tuning. Is the network maxed out (Sometimes it isn't the database), is the overall design poorly thought out, are you using SQl code that in general performs badly? If all your searches allow a wildcard as the first character for instance, it isn't even possible to get them to be fast. If your joins are on multi-column natural keys, they are slower than they should be. Are you storing more than one piece of information in a field causing lots of manipulation to get data back out? Are you using cursors? Are you using functions? Are you reusing code when you shouldn't be? Are you always returning the minimum information needed? Are you closing connections? Are you getting deadlocks? Are your table rows too wide? Do you have too many records in particular tables (purging old records or putting them to an archive database can make a huge difference)? How much of your code is row oriented and not set oriented? These are examples of the kinds of things an experienced database person would look at and thus the kinds of things they should talk about in the interview.
Some bad code examples (you know what you have already optimized) can give you a good idea as to their approach to how they would tune. YOu want someone who is methodical and has depth of SQL knowledge.
There are some good books on performance tuning - I would suggest getting them and getting familar with them before interviewing.

Related

How can I learn to make realistic assumptions about database performance?

I'm primarily a Java developer who works with Hibernate, and in some of my use cases I have queries which perform very slowly compared to what I expect. I've talked to the local DBAs and in many cases they claim the performance can't be improved because of the nature of the query.
However, I'm somewhat hesitant to take them at their word. What resources can I use to learn when I have to suck it up and find a different way of getting the information I want or learn to live with the speed and when I can call bullshit on the DBAs.
You are at an interesting juncture. Most Java developers use ORM tools because they don't know anything about databases, they don't want to learn anything about databases and especially they don't want to learn anything about the peculiarities of a specific proprietary DBMS. ORMs ostensibly shield us from all this.
But if you really want to understand how a query ought to perform you are going to have to understand how the Oracle database works. This is a good thing: you will definitely build better Hibernate apps if you work with the grain of the database.
Oracle's documentation set includes a volume on Performance Tuning. This is the place to start. Find out more. As others have said, the entry level tool is EXPLAIN PLAN. Another essential read is Oracle's Database Concepts Guide. A vital aspect of tuning is understanding the logical and physical architecture of the database. I also agree with DCooke's recommendation of Tom Kyte's book.
Bear in mind that there are contractors and consultants who make a fine living out of Oracle performance tuning. If it was easy they would be a lot poorer. But you can certainly give yourself enough knowledge to force those pesky DBAs to engage with you properly.
DCookie, OMG, and APC all get +1 for their answers. I've been on the DBA side of Hibernate apps and as Tom Kyte puts it, there is no "fast=true" parameter that can take inefficient SQL and make it run faster. Once in a while I've been able to capture a problematic Hibernate query and use the Oracle SQL analyzer to generate a SQL Profile for that statement that improves performance. This profile is a set of hints that "intercept" the optimizer's generation of an execution plan and force it (with no change to the app) to a better one that would normally be overlooked by the optimizer for some reason. Still, these findings are the exception rather than the rule for poorly-performing SQL.
One thing you can do as a developer who presumably understands the data better than the ORM layer is to write efficient views to address specific query problems and present these views to Hibernate.
A very specific issue to watch out for is Oracle DATE (not TIMESTAMP) columns that end up in the Hibernate-generated queries and are compared to bind variables in a WHERE clause - the type mismatch with Java timestamp data types will prevent the use of indexes on these columns.
What is the basis for your query performance expectations? Gut feel? If you're going to argue with the DBA's, you'll need to know (at a minimum) what your queries are, understand EXPLAIN PLAN, and be able to point out how to improve things. And, as #OMG Ponies points out, Hibernate may be doing a poor job of constructing queries - then what do you do?
Not easy? Perhaps a better approach would be to take a bit less adversarial approach with the DBA staff and ask politely what it is about the queries that inhibits performance improvements and if there's any suggestions they might have about how you could refactor them to perform better.
Posting my comment as an answer:
That's the trade-off with using an ORM - you're at it's mercy for how it constructs the query that is shipped to the database. LINQ is the only one that interested me, because you can use it in tandem with stored procedures for situations like these. I'm surprised the DBAs don't tell you to ditch ORM if you want better speed...
The EXPLAIN plan will give you an idea of the efficiency, but not really a perspective on speed. For Oracle, you'd need to use tkprof (assuming available to you) to analyze what is going on.
it also maybe a table structure (normalization) issue. Mostly, this is exactly why i don't go with hybernate - you should always be able to write your own queries that are optimal.

What should every developer know about databases? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Whether we like it or not, many if not most of us developers either regularly work with databases or may have to work with one someday. And considering the amount of misuse and abuse in the wild, and the volume of database-related questions that come up every day, it's fair to say that there are certain concepts that developers should know - even if they don't design or work with databases today.
What is one important concept that developers and other software professionals ought to know about databases?
The very first thing developers should know about databases is this: what are databases for? Not how do they work, nor how do you build one, nor even how do you write code to retrieve or update the data in a database. But what are they for?
Unfortunately, the answer to this one is a moving target. In the heydey of databases, the 1970s through the early 1990s, databases were for the sharing of data. If you were using a database, and you weren't sharing data you were either involved in an academic project or you were wasting resources, including yourself. Setting up a database and taming a DBMS were such monumental tasks that the payback, in terms of data exploited multiple times, had to be huge to match the investment.
Over the last 15 years, databases have come to be used for storing the persistent data associated with just one application. Building a database for MySQL, or Access, or SQL Server has become so routine that databases have become almost a routine part of an ordinary application. Sometimes, that initial limited mission gets pushed upward by mission creep, as the real value of the data becomes apparent. Unfortunately, databases that were designed with a single purpose in mind often fail dramatically when they begin to be pushed into a role that's enterprise wide and mission critical.
The second thing developers need to learn about databases is the whole data centric view of the world. The data centric world view is more different from the process centric world view than anything most developers have ever learned. Compared to this gap, the gap between structured programming and object oriented programming is relatively small.
The third thing developers need to learn, at least in an overview, is data modeling, including conceptual data modeling, logical data modeling, and physical data modeling.
Conceptual data modeling is really requirements analysis from a data centric point of view.
Logical data modeling is generally the application of a specific data model to the requirements discovered in conceptual data modeling. The relational model is used far more than any other specific model, and developers need to learn the relational model for sure. Designing a powerful and relevant relational model for a nontrivial requirement is not a trivial task. You can't build good SQL tables if you misunderstand the relational model.
Physical data modeling is generally DBMS specific, and doesn't need to be learned in much detail, unless the developer is also the database builder or the DBA. What developers do need to understand is the extent to which physical database design can be separated from logical database design, and the extent to which producing a high speed database can be accomplished just by tweaking the physical design.
The next thing developers need to learn is that while speed (performance) is important, other measures of design goodness are even more important, such as the ability to revise and extend the scope of the database down the road, or simplicity of programming.
Finally, anybody who messes with databases needs to understand that the value of data often outlasts the system that captured it.
Whew!
Good question. The following are some thoughts in no particular order:
Normalization, to at least the second normal form, is essential.
Referential integrity is also essential, with proper cascading delete and update considerations.
Good and proper use of check constraints. Let the database do as much work as possible.
Don't scatter business logic in both the database and middle tier code. Pick one or the other, preferably in middle tier code.
Decide on a consistent approach for primary keys and clustered keys.
Don't over index. Choose your indexes wisely.
Consistent table and column naming. Pick a standard and stick to it.
Limit the number of columns in the database that will accept null values.
Don't get carried away with triggers. They have their use but can complicate things in a hurry.
Be careful with UDFs. They are great but can cause performance problems when you're not aware how often they might get called in a query.
Get Celko's book on database design. The man is arrogant but knows his stuff.
First, developers need to understand that there is something to know about databases. They're not just magic devices where you put in the SQL and get out result sets, but rather very complicated pieces of software with their own logic and quirks.
Second, that there are different database setups for different purposes. You do not want a developer making historical reports off an on-line transactional database if there's a data warehouse available.
Third, developers need to understand basic SQL, including joins.
Past this, it depends on how closely the developers are involved. I've worked in jobs where I was developer and de facto DBA, where the DBAs were just down the aisle, and where the DBAs are off in their own area. (I dislike the third.) Assuming the developers are involved in database design:
They need to understand basic normalization, at least the first three normal forms. Anything beyond that, get a DBA. For those with any experience with US courtrooms (and random television shows count here), there's the mnemonic "Depend on the key, the whole key, and nothing but the key, so help you Codd."
They need to have a clue about indexes, by which I mean they should have some idea what indexes they need and how they're likely to affect performance. This means not having useless indices, but not being afraid to add them to assist queries. Anything further (like the balance) should be left for the DBA.
They need to understand the need for data integrity, and be able to point to where they're verifying the data and what they're doing if they find problems. This doesn't have to be in the database (where it will be difficult to issue a meaningful error message for the user), but has to be somewhere.
They should have the basic knowledge of how to get a plan, and how to read it in general (at least enough to tell whether the algorithms are efficient or not).
They should know vaguely what a trigger is, what a view is, and that it's possible to partition pieces of databases. They don't need any sort of details, but they need to know to ask the DBA about these things.
They should of course know not to meddle with production data, or production code, or anything like that, and they should know that all source code goes into a VCS.
I've doubtless forgotten something, but the average developer need not be a DBA, provided there is a real DBA at hand.
Basic Indexing
I'm always shocked to see a table or an entire database with no indexes, or arbitrary/useless indexes. Even if you're not designing the database and just have to write some queries, it's still vital to understand, at a minimum:
What's indexed in your database and what's not:
The difference between types of scans, how they're chosen, and how the way you write a query can influence that choice;
The concept of coverage (why you shouldn't just write SELECT *);
The difference between a clustered and non-clustered index;
Why more/bigger indexes are not necessarily better;
Why you should try to avoid wrapping filter columns in functions.
Designers should also be aware of common index anti-patterns, for example:
The Access anti-pattern (indexing every column, one by one)
The Catch-All anti-pattern (one massive index on all or most columns, apparently created under the mistaken impression that it would speed up every conceivable query involving any of those columns).
The quality of a database's indexing - and whether or not you take advantage of it with the queries you write - accounts for by far the most significant chunk of performance. 9 out of 10 questions posted on SO and other forums complaining about poor performance invariably turn out to be due to poor indexing or a non-sargable expression.
Normalization
It always depresses me to see somebody struggling to write an excessively complicated query that would have been completely straightforward with a normalized design ("Show me total sales per region.").
If you understand this at the outset and design accordingly, you'll save yourself a lot of pain later. It's easy to denormalize for performance after you've normalized; it's not so easy to normalize a database that wasn't designed that way from the start.
At the very least, you should know what 3NF is and how to get there. With most transactional databases, this is a very good balance between making queries easy to write and maintaining good performance.
How Indexes Work
It's probably not the most important, but for sure the most underestimated topic.
The problem with indexing is that SQL tutorials usually don't mention them at all and that all the toy examples work without any index.
Even more experienced developers can write fairly good (and complex) SQL without knowing more about indexes than "An index makes the query fast".
That's because SQL databases do a very good job working as black-box:
Tell me what you need (gimme SQL), I'll take care of it.
And that works perfectly to retrieve the correct results. The author of the SQL doesn't need to know what the system is doing behind the scenes--until everything becomes sooo slooooow.....
That's when indexing becomes a topic. But that's usually very late and somebody (some company?) is already suffering from a real problem.
That's why I believe indexing is the No. 1 topic not to forget when working with databases. Unfortunately, it is very easy to forget it.
Disclaimer
The arguments are borrowed from the preface of my free eBook "Use The Index, Luke". I am spending quite a lot of my time explaining how indexes work and how to use them properly.
I just want to point out an observation - that is that it seems that the majority of responses assume database is interchangeable with relational databases. There are also object databases, flat file databases. It is important to asses the needs of the of the software project at hand. From a programmer perspective the database decision can be delayed until later. Data modeling on the other hand can be achieved early on and lead to much success.
I think data modeling is a key component and is a relatively old concept yet it is one that has been forgotten by many in the software industry. Data modeling, especially conceptual modeling, can reveal the functional behavior of a system and can be relied on as a road map for development.
On the other hand, the type of database required can be determined based on many different factors to include environment, user volume, and available local hardware such as harddrive space.
Avoiding SQL injection and how to secure your database
Every developer should know that this is false: "Profiling a database operation is completely different from profiling code."
There is a clear Big-O in the traditional sense. When you do an EXPLAIN PLAN (or the equivalent) you're seeing the algorithm. Some algorithms involve nested loops and are O( n ^ 2 ). Other algorithms involve B-tree lookups and are O( n log n ).
This is very, very serious. It's central to understanding why indexes matter. It's central to understanding the speed-normalization-denormalization tradeoffs. It's central to understanding why a data warehouse uses a star-schema which is not normalized for transactional updates.
If you're unclear on the algorithm being used do the following. Stop. Explain the Query Execution plan. Adjust indexes accordingly.
Also, the corollary: More Indexes are Not Better.
Sometimes an index focused on one operation will slow other operations down. Depending on the ratio of the two operations, adding an index may have good effects, no overall impact, or be detrimental to overall performance.
I think every developer should understand that databases require a different paradigm.
When writing a query to get at your data, a set-based approach is needed. Many people with an interative background struggle with this. And yet, when they embrace it, they can achieve far better results, even though the solution may not be the one that first presented itself in their iterative-focussed minds.
Excellent question. Let's see, first no one should consider querying a datbase who does not thoroughly understand joins. That's like driving a car without knowing where the steering wheel and brakes are. You also need to know datatypes and how to choose the best one.
Another thing that developers should understand is that there are three things you should have in mind when designing a database:
Data integrity - if the data can't be relied on you essentially have no data - this means do not put required logic in the application as many other sources may touch the database. Constraints, foreign keys and sometimes triggers are necessary to data integrity. Don't fail to use them because you don't like them or don't want to be bothered to understand them.
Performance - it is very hard to refactor a poorly performing database and performance should be considered from the start. There are many ways to do the same query and some are known to be faster almost always, it is short-sighted not to learn and use these ways. Read some books on performance tuning before designing queries or database structures.
Security - this data is the life-blood of your company, it also frequently contains personal information that can be stolen. Learn to protect your data from SQL injection attacks and fraud and identity theft.
When querying a database, it is easy to get the wrong answer. Make sure you understand your data model thoroughly. Remember often actual decisions are made based on the data your query returns. When it is wrong, the wrong business decisions are made. You can kill a company from bad queries or loose a big customer. Data has meaning, developers often seem to forget that.
Data almost never goes away, think in terms of storing data over time instead of just how to get it in today. That database that worked fine when it had a hundred thousand records, may not be so nice in ten years. Applications rarely last as long as data. This is one reason why designing for performance is critical.
Your database will probaly need fields that the application doesn't need to see. Things like GUIDs for replication, date inserted fields. etc. You also may need to store history of changes and who made them when and be able to restore bad changes from this storehouse. Think about how you intend to do this before you come ask a web site how to fix the problem where you forgot to put a where clause on an update and updated the whole table.
Never develop in a newer version of a database than the production version. Never, never, never develop directly against a production database.
If you don't have a database administrator, make sure someone is making backups and knows how to restore them and has tested restoring them.
Database code is code, there is no excuse for not keeping it in source control just like the rest of your code.
Evolutionary Database Design. http://martinfowler.com/articles/evodb.html
These agile methodologies make database change process manageable, predictable and testable.
Developers should know, what it takes to refactor a production database in terms of version control, continious integration and automated testing.
Evolutionary Database Design process has administrative aspects, for example a column is to be dropped after some life time period in all databases of this codebase.
At least know, that Database Refactoring concept and methodologies exist.
http://www.agiledata.org/essays/databaseRefactoringCatalog.html
Classification and process description makes it possible to implement tooling for these refactorings too.
About the following comment to Walter M.'s answer:
"Very well written! And the historical perspective is great for people who weren't doing database work at that time (i.e. me)".
The historical perspective is in a certain sense absolutely crucial. "Those who forget history, are doomed to repeat it.". Cfr XML repeating the hierarchical mistakes of the past, graph databases repeating the network mistakes of the past, OO systems forcing the hierarchical model upon users while everybody with even just a tenth of a brain should know that the hierarchical model is not suitable for general-purpose representation of the real world, etcetera, etcetera.
As for the question itself:
Every database developer should know that "Relational" is not equal to "SQL". Then they would understand why they are being let down so abysmally by the DBMS vendors, and why they should be telling those same vendors to come up with better stuff (e.g. DBMS's that are truly relational) if they want to go on sucking hilarious amounts of money out of their customers for such crappy software).
And every database developer should know everything about the relational algebra. Then there would no longer be a single developer left who had to post these stupid "I don't know how to do my job and want someone else to do it for me" questions on Stack Overflow anymore.
From my experience with relational databases, every developer should know:
- The different data types:
Using the correct type for the correct job will make your DB design more robust, your queries faster and your life easier.
- Learn about 1xM and MxM:
This is the bread and butter for relational databases. You need to understand one-to-many and many-to-many relations and apply then when appropriate.
- "K.I.S.S." principle applies to the DB as well:
Simplicity always works best. Provided you have studied how DB work, you will avoid unnecessary complexity which will lead to maintenance and speed problems.
- Indices:
It's not enough if you know what they are. You need to understand when to used them and when not to.
also:
Boolean algebra is your friend
Images: Don't store them on the DB. Don't ask why.
Test DELETE with SELECT
I would like everyone, both DBAs and developer/designer/architects, to better understand how to properly model a business domain, and how to map/translate that business domain model into both a normalized database logical model, an optimized physical model, and an appropriate object oriented class model, each one of which is (can be) different, for various reasons, and understand when, why, and how they are (or should be) different from one another.
I would say strong basic SQL skills. I've seen a lot of developers so far who know a little about databases but are always asking for tips about how to formulate a quite simple query. Queries are not always that easy and simple. You do have to use multiple joins (inner, left, etc.) when querying a well normalized database.
I think a lot of the technical details have been covered here and I don't want to add to them. The one thing I want to say is more social than technical, don't fall for the "DBA knowing the best" trap as an application developer.
If you are having performance issues with query take ownership of the problem too. Do your own research and push for the DBAs to explain what's happening and how their solutions are addressing the problem.
Come up with your own suggestions too after you have done the research. That is, I try to find a cooperative solution to the problem rather than leaving database issues to the DBAs.
Simple respect.
It's not just a repository
You probably don't know better than the vendor or the DBAs
You won't support it at 3 a.m. with senior managers shouting at you
Consider Denormalization as a possible angel, not the devil, and also consider NoSQL databases as an alternative to relational databases.
Also, I think the Entity-Relation model is a must-know for every developper even if you don't design databases. It'll let you understand thoroughly what's your database all about.
Never insert data with the wrong text encoding.
Once your database becomes polluted with multiple encodings, the best you can do is apply some kind combination of heuristics and manual labor.
Aside from syntax and conceptual options they employ (such as joins, triggers, and stored procedures), one thing that will be critical for every developer employing a database is this:
Know how your engine is going to perform the query you are writing with specificity.
The reason I think this is so important is simply production stability. You should know how your code performs so you're not stopping all execution in your thread while you wait for a long function to complete, so why would you not want to know how your query will affect the database, your program, and perhaps even the server?
This is actually something that has hit my R&D team more times than missing semicolons or the like. The presumtion is the query will execute quickly because it does on their development system with only a few thousand rows in the tables. Even if the production database is the same size, it is more than likely going to be used a lot more, and thus suffer from other constraints like multiple users accessing it at the same time, or something going wrong with another query elsewhere, thus delaying the result of this query.
Even simple things like how joins affect performance of a query are invaluable in production. There are many features of many database engines that make things easier conceptually, but may introduce gotchas in performance if not thought of clearly.
Know your database engine execution process and plan for it.
For a middle-of-the-road professional developer who uses databases a lot (writing/maintaining queries daily or almost daily), I think the expectation should be the same as any other field: You wrote one in college.
Every C++ geek wrote a string class in college. Every graphics geek wrote a raytracer in college. Every web geek wrote interactive websites (usually before we had "web frameworks") in college. Every hardware nerd (and even software nerds) built a CPU in college. Every physician dissected an entire cadaver in college, even if she's only going to take my blood pressure and tell me my cholesterol is too high today. Why would databases be any different?
Unfortunately, they do seem different, today, for some reason. People want .NET programmers to know how strings work in C, but the internals of your RDBMS shouldn't concern you too much.
It's virtually impossible to get the same level of understanding from just reading about them, or even working your way down from the top. But if you start at the bottom and understand each piece, then it's relatively easy to figure out the specifics for your database. Even things that lots of database geeks can't seem to grok, like when to use a non-relational database.
Maybe that's a bit strict, especially if you didn't study computer science in college. I'll tone it down some: You could write one today, completely, from scratch. I don't care if you know the specifics of how the PostgreSQL query optimizer works, but if you know enough to write one yourself, it probably won't be too different from what they did. And you know, it's really not that hard to write a basic one.
The order of columns in a non-unique index is important.
The first column should be the column that has the most variability in its content (i.e. cardinality).
This is to aid SQL Server ability to create useful statistics in how to use the index at runtime.
Understand the tools that you use to program the database!!!
I wasted so much time trying to understand why my code was mysteriously failing.
If you're using .NET, for example, you need to know how to properly use the objects in the System.Data.SqlClient namespace. You need to know how to manage your SqlConnection objects to make sure they are opened, closed, and when necessary, disposed properly.
You need to know that when you use a SqlDataReader, it is necessary to close it separately from your SqlConnection. You need to understand how to keep connections open when appropriate to how to minimize the number of hits to the database (because they are relatively expensive in terms of computing time).
Basic SQL skills.
Indexing.
Deal with different incarnations of DATE/ TIME/ TIMESTAMP.
JDBC driver documentation for the platform you are using.
Deal with binary data types (CLOB, BLOB, etc.)
For some projects, and Object-Oriented model is better.
For other projects, a Relational model is better.
The impedance mismatch problem, and know the common deficiencies or ORMs.
RDBMS Compatibility
Look if it is needed to run the application in more than one RDBMS. If yes, it might be necessary to:
avoid RDBMS SQL extensions
eliminate triggers and store procedures
follow strict SQL standards
convert field data types
change transaction isolation levels
Otherwise, these questions should be treated separately and different versions (or configurations) of the application would be developed.
Don't depend on the order of rows returned by an SQL query.
Three (things) is the magic number:
Your database needs version control too.
Cursors are slow and you probably don't need them.
Triggers are evil*
*almost always

How do you continue to improve your SQL skills? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
How do SQL developers go about keeping up on current techniques and trends in the SQL world? Are there any blogs, books, articles, techniques, etc that are being used to keep up to date and in the know?
There are a lot of opportunities out their for OO, procedural, and functional programmers to take part in a variety of open source projects, but it seems to me that the FOSS avenue is a bit more closed for SQL developers.
Thoughts?
Find challenging questions that test your TRANSACT-SQL knowledge ... personally I enjoy Joe Celko's SQL Puzzles and Answers.
Joe Celko's SQL Puzzles and Answers http://ecx.images-amazon.com/images/I/51DTJ099P7L._SL500_BO2,204,203,200_PIsitb-dp-500-arrow,TopRight,45,-64_OU01_AA240_SH20_.jpg
The thing about the SQL language is that it is pretty much a static target. Pretty soon you are looking at increasing your understanding of set theory and the problem domain itself rather than the details of the language.
The real meat is on either side of the language, in either the databases themselves (how to store, retrieve, and organize large data sets) or in the applications (with ORMs and such)
I skimmed the answers and apparently nobody has mentioned Stephane Faroult's work.
I strongly suggest you should consider "The Art of SQL" (http://www.amazon.com/Art-SQL-Stephane-Faroult/dp/0596008945), I found it really interesting and - amazingly enough, even fun to read.
How about http://sqlblog.com/
I improve by analyzing slow and complex queries and looking for ways to improve them. This can be done in SQL Server by analyzing the Query Plan tools and looking for bottlenecks. Also I find the Visual Quickstart Guide guide to be good for quick reference.
Joe Celko's SQL Puzzles and Answers and SQL for Smarties are the two best generic SQL books out there. Both are great sources to give you ideas for that tricky problem you used to think you needed a cursor or some client library to accomplish. For any truly interested SQL geek, these book are also pretty good for casual reading rather than as a mere desk reference. Two thumbs up.
While not a SQL Server expert, in general I find that community based events are great ways to keep up on current patterns. The underlying result of participating in a community of developers/DBAs/Marketing Pros/insert profession here is that you are learning new thought patterns and excercising critical thought. This is a great way to grow as whatever professional you are.
There aren't current techniques and trends in SQL. There's only that stuff you should already know but don't. The proper way to learn that stuff, is pain... so much pain.
Join a mailing list for the DB flavour you use...or lurk on stackoverflow ;)
Most "current" stuff is not SQL itself, but how the database stores the information, and how to retrieve it more quickly. Check out this other thread: What are some references, lessons and or best practices for SQL optimization training
The only real bleeding edge is in query planning, index structures, sort algorithms, things like that, not the SQL itself.
The fact that you asked this question is already a good sign. Avoiding complacency is "piece of advice #1". There is no substitute for writing and optimizing SQL. Practical use is the best way to stay sharp, but there is a risk of a "forest for the trees" scenario, where we tend to use what is comfortable and familiar. Trying new tactics, examining new approaches, and looking for new ways to train our brains to think about sets, SQL, relational theory, and staying on top of new developments in the particular dialects we employ are all hallmarks of good SQL developers.
There are many good blogs out there these days. I work mostly in the Microsoft arena, so I like SQLTeam.com.
Usenet is a good place to hang out and make a contribution. There are many SQL-related newsgroups. Often, you will find that working on someone else's problem helps you learn a new tactic or forces you to research a dusty corner of the language that you do not encounter every day. ISPs seem destined to shut all of the Usenet down, though, because of nefarious use, so this one may be going the way of the Dodo bird.
Also, some IRC servers have a vibrant sql channels where you can make the same sort of a difference (just take a thick skin with you).
Lastly, this very website might be another place to hang, where you can read over the answers to difficult questions, see how that might apply in your own world, practice the techniques, and internalize them. Contribute too, because seeing how others vote your solutions up or down is 100% pure honest feedback.
Of course, there are many wonderful books out there, too. Anything by Celko is a winner, and on the SQL Server side, Kalen Delaney and Ron Soukup have written some winners.
Best thing I've run into is working on other people's SQL code. Especially legacy business code. You want to test your skills against something, start changing some "voodoo code" that no one else understands. :)
Beyond that, I just try to keep an eye on changes with new releases of SQL and see if there's anything I can take advantage of.
gleam tips when using phpmyadmin
it's nice and verbose
Here is one with some interesting SSIS information.
http://blogs.conchango.com/jamiethomson/default.aspx
There is also some good information in the Wiki here:
http://wiki.lessthandot.com/index.php/Main_Page
For those who say SQL never changes, SQL Server 2005 and 2008 have some huge changes in the T-SQl that will help solve some difficult problems that were horrible to do in SQL Server 2000 and are much easier once you learn the new syntax, so yes there is stuff to keep up with.
Also performance tuning and SSIS are extremely complex subjects with much to learn.
I do find that developers who choose not to learn advanced SQL skills tend to write poorly performing SQL code and once the number of records grows in their databases, the applications they wrote tend to become glacially slow and very difficult to fix at that point. Right now I'm working with developers to fix some bad code they wrote that is causing timeouts on the site on virtually every query. Obviously, this is now an emergency and it would have been easy to write the code in a more efficeint manner at the start if the developer had better SQL skills.
I've never heard of the term "SQL developer." SQL should be a skill in your toolbox, like sorting, whatever framework you like, JavaScript, and so on. The best way to continue to improve your SQL skills to continue using it.
As a developer, and not a DBA, I keep an eye on various developer resources, and that often is DB related, but I don't specifically 'try to keep up'.
I know plenty, but I also know that there is so much more that there is to know. And in every project I have to learn something new. And every project also involves me taking a different approach to a similar task I'd encountered in the past.
Should I ever get to the stage where I think I'm doing the same things all the time, perhaps I'll make a concious effort to take specific steps. But currently, and for the foreseeable future, I'm learning organically, on-the-job, and as my projects dictate.
Reading:
Books - Celko (also read across to some Oracle-biased books)
Blogs - the above mentioned, plus SSWUG
Webinars and Conference - Best way to keep up with vendor-specific stuff like
SSIS/SSRS/SSAS
Practice:
Improving code (mine and others)
Refactoring
Mentoring/training other developers
Honestly, it's one of those things that you just get better at with time. Read as much as you can to know what's possible. Some things will take a while to really understand. I was scared off by sub queries for a long time until I pretty much had no choice but to use them.
When you get more experiance and need to do more complex things, you will just learn your way.
SqlServerCentral - great source of articles, scripts, advice
Unfortunately to access the articles you need to register (it is free though)
I guess one thing they could learn from StackOverflow is to remove login barrier
We've written a full tutorial, and you can test your SQL skills at a separate site (also created by our me, in the interests of full declaration).
SQL developers, or DBAs?
Aside from learning different dialects of SQL (Oracle, SQL Server, etc) in your day to day work, SQL doesn't actually change all that much. Sure you can bring in more advanced concepts as you develop your skills, work out where to implement stored procedures, etc, but in the end it's just SQL. The most important thing is to get your schema correct and maintainable.
Now administering the databases is a whole different thing, with a range of tools, and the database software itself getting updated every few years. Oracle at least have newsletters and websites and magazines that presumably include lots of information and examples and best practice scenarios.
To be honest, I don't see much a need for extreme SQL skills. Once I can create transactions (for DB consistency) and basic triggers (for cross-table consistency), I'm usually fine keeping program logic... in the program, and not putting it into whatever database I'm using. I've not found much depth to SQL worth investigating for a lifetime, unlike general programming, which keeps expanding in depth.

which db should i select if performance of postgres is low

In a web app that support more than 5000 users, postgres is becoming the bottle neck.
It takes more than 1 minute to add a new user.(even after optimizations and on Win 2k3)
So, as a design issue, which other DB's might be better?
Most likely, it's not PostgreSQL, it's your design. Changing shoes most likely will not make you a better dancer.
Do you know what is causing slowness? Is it contention, time to update indexes, seek times?
Are all 5000 users trying to write to the user table at the same exact time as you are trying to insert 5001st user? That, I can believe can cause a problem. You might have to go with something tuned to handling extreme concurrency, like Oracle.
MySQL (I am told) can be optimized to do faster reads than PostgreSQL, but both are pretty ridiculously fast in terms of # transactions/sec they support, and it doesn't sound like that's your problem.
P.S.
We were having a little discussion in the comments to a different answer -- do note that some of the biggest, storage-wise, databases in the world are implemented using Postgres (though they tend to tweak the internals of the engine). Postgres scales for data size extremely well, for concurrency better than most, and is very flexible in terms of what you can do with it.
I wish there was a better answer for you, 30 years after the technology was invented, we should be able to make users have less detailed knowledge of the system in order to have it run smoothly. But alas, extensive thinking and tweaking is required for all products I am aware of. I wonder if the creators of StackOverflow could share how they handled db concurrency and scalability? They are using SQLServer, I know that much.
P.P.S.
So as chance would have it I slammed head-first into a concurrency problem in Oracle yesterday. I am not totally sure I have it right, not being a DBA, but what the guys explained was something like this: We had a large number of processes connecting to the DB and examining the system dictionary, which apparently forces a short lock on it, despite the fact that it's just a read. Parsing queries does the same thing.. so we had (on a multi-tera system with 1000s of objects) a lot of forced wait times because processes were locking each other out of the system. Our system dictionary was also excessively big because it contains a separate copy of all the information for each partition, of which there can be thousands per table. This is not really related to PostgreSQL, but the takeaway is -- in addition to checking your design, make sure your queries are using bind variables and getting reused, and pressure is minimal on shared resources.
Please change the OS under which you run Postgres - the Windows port, though immensely useful for expanding the user base, is still not on a par with the (much older and more mature) Un*x ports (and especially the Linux one).
Ithink your best choice is still PostgresSQL. Spend the time to make sure you have properly tuned your application. After your confident you have reached the limits of what can be done with tuning, start cacheing everything you can. After that, start think about moving to an asynchronous master slave setup...Also are you running OLAP type functionality on the same database your doing OLTP on?
Let me introduce you to the simplest, most practical way to scale almost any database server if the database design is truly optimal: just double your ram for an instantaneous boost in performance. It's like magic.
PostgreSQL scales better than most, if you are going to stay with a relational db, Oracle would be it. ODBMS scale better but they have their own issues, as in that it is closer to programming to set one up.
Yahoo uses PostgreSQL, that should tell you something about is scalability.
As highlighted above the problem is not with the particular database you are using, i.e. PostgreSQL but one of the following:
Schema design, maybe you need to add, remove, refine your indexes
Hardware maybe you are asking to much of your server - you said 5k users but then again very few of them are probably querying the db at the same time
Queries: perhaps poorly defined resulting in lots of inefficiency
A pragmatic way to find out what is happening is to analyse the PostgeSQL log files and find out what queries in terms of:
Most frequently executed
Longest running
etc. etc.
A quick review will tell you where to focus your efforts and you will most likely resolve your issues fairly quickly. There is no silver bullet, you have to do some homework but this will be small compared to changing your db vendor.
Good news ... there are lots of utilities to analayse your log files that are easy to use and produce easy to interpret results, here are two:
pgFouine - a PostgreSQL log analyzer (PHP)
pgFouine: Sample report
PQA (ruby)
PQA: Sample report
First, I would make sure the optimizations are, indeed, useful. For example, if you have many indexes, sometimes adding or modifying a record can take a long time.
I know there are several big projects running over PostgreSQL, so take a look at this issue.
I'd suggest looking here for information on PostgreSQL's performance: http://enfranchisedmind.com/blog/2006/11/04/postgres-for-the-win
What version of PG are you running? As the releases have progressed, performance has improved greatly.
Hi had the same issue previously with my current company. When I first joined them, their queries were huge and very slow. It takes 10 minutes to run them. I was able to optimize them to a few milliseconds or 1 to 2 seconds. I have learned many things during that time and I will share a few highlights in them.
Check your query first. doing an inner join of all the tables that you need will always take sometime. One thing that I would suggest is always start off with the table with which you can actually cut your data to those that you need.
e.g. SELECT * FROM (SELECT * FROM person WHERE person ilike '%abc') AS person;
If you look at the example above, this will cut your results to something that you know you need and you can refine them more by doing an inner join. This is one of the best way to speed up your query but there are more than one way to skin a cat. I cannot explain all of them here because there are just too many but from the example above, you just need to modify that to suite your need.
It depends on your postgres version. Older postgres does internally optimize the query. on example is that on postgres 8.2 and below, IN statements are slower than 8.4's.
EXPLAIN ANALYZE is your friend. if your query is running slow, do an explain analyze to determine which of it is causing the slowness.
Vacuum your database. This will ensure that statistics on your database will almost match the actual result. Big difference in the statistics and actual will result on your query running slow.
If all of these does not help you, try modifying your postgresql.conf. Increase the shared memory and try to experiment with the configuration to better suite your needs.
Hope this helps, but of course, these are just for postgres optimization.
btw. 5000 users are not much. My DB contains users with about 200k to a million users.
If you do want to switch away from PostgreSQL, Sybase SQL Anywhere is number 5 in terms of price/performance on the TPC-C benchmark list. It's also the lowest price option (by far) on the top 10 list, and is the only non-Microsoft and non-Oracle entry.
It can easily scale to thousands of users and terabytes of data.
Full disclosure: I work on the SQL Anywhere development team.
We need more details: What version you are using? What is the memory usage of the server? Are you vacuuming the database? Your performance problems might be un-related to PostgreSQL.
If you have many reads over writes, you may want to try MySQL assuming that the problem is with Postgres, but your problem is a write problem.
Still, you may want to look into your database design, and possibly consider sharding. For a really large database you may still have to look at the above 2 issues regardless.
You may also want to look at non-RDBMS database servers or document oriented like Mensia and CouchDB depending on the task at hand. No single tool will manage all tasks, so choose wisely.
Just out of curiosity, do you have any stored procedures that may be causing this delay?

What are the general guidelines and best practices to keep in mind while designing database for an application? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
My questions is regarding Database Modeling. I tried to look for this question amongst other Database Designing questions on SO but haven't found it and so here am asking about:
What are the general guidelines and best practices to keep in mind while designing database for an application ?
What are the best resources/books/University Lectures available on Database Design Concepts ?
Thanks.
Just some things I've learned from experience (I'm sure some will disagree, but I've been querying and designing and programming databases for 30+years and have seen the effects of stupid design up close and personal):
There are three critical things to consider in all database design - data integrity (without this you essentially have no data), security and performance. All other considerations take a back seat to these three.
Never create a table without a way to uniquely identify a record.
There really are very few true natural keys that really work as a primary key, if you don't have control over whether it will change, do not use it as a primary key (you don't really want to change the company name through 27 child tables do you?). Use a surrogate key instead. Using a surrogate key does not exempt you from the need to set unique indexes if you could have used a unique composite key. Always set these indexes if you can determine a way to have a unique composite. Duplicate records are the bane of an application's existance. It seems obvious but never ever consider name to be a key field, names are not and never will be unique.
Do not use a GUID as your primary key as it can kill performance. If you need a guid for replication also consider having an int or big int primary key.
Do not design as if you will be changing database backends unless you know up front you will be doing so. Virtually all the good techniques for performance tuning are database specific, don't harm your own ability to tune your database for a non-existant requirement.
Avoid value-entity table structures. They are miserable to query.
Add all things you need to ensure data integrity into your database design, things like defaults, constraints, triggers, etc. are necessary to avoid having useless data. Do not rely on the application code to do this or you will be sorry.
Others mentioned normalization, I agree you must understand this thoroughly even if you later decide to denormalize.
Do not stack views on top of views if you want any kind of performance at all. Every database I've seen that does this is eventually a huge performance problem.
Consider what data you will need to manage the database as well as what the application needs. If you are going to be serious about databases you need to understand database auditing and your database should implement ways to find out who made what change and when and what the old data was. You'll thank me the first time someone malicious changes the data or someone deletes all the records in a table accidentally.
Really think through how the data will be queried when designing. It can make a huge difference in the design.
Do not store more than one piece of information in a field. It might look cool to put a comma delimited list into one field rather than add a related table but it is a really bad idea.
Elegance is often the enemy of performance in databases. Pick performance over elegance every time and you won't go wrong.
Avoid the use of database keywords in the naming of objects. Your programmers will thank you. Pick a naming convention and be consistent in always using it. If a field is in mulitple tables make sure it is the same name (exception if an id field has two possible foreign keys in the same table use the id field name and a prefix to identify the differnce between say Sales_person_id and Customer_person_id), same datatype and length, if applicable in all of them. Fix misspellings right away, you really don't want to spend the next ten years remembering that in tablea it is the persnoid instead of personid.
Read about database refactoring (search on amazon for some good books) and consider how to be able to do this in your design. Few databases are designed to be refactored and being able to do so is critical towards being able to fix database problems that arise from badly thought out designs or changes to business requirements.
While you are reading, read about performance tuning, you'll learn a tremendous amount about what to avoid in designing the database.
I'm sure there's more but this is enough to start with.
One addtional thing I wanted to add. Do not design your database as if the data entry application page is the most critical thing. Data is often queried more often than it is written even in a transactional database. Really think about how easy it will be to to get data back out of the database (Oh so that's why the EAV model is so bad!) and what effect the design will have on reporting. This is espcially critical as I often see that the people doing the reporting are not the people who design the database or reporting tasks are later in the project than createing the data entry. Databases are not easy to refactor, consider the whole life cycle of the data when designing a database. Think about things like storing moment in time values as you can't find out how much an order was for two years later by multiplying the quantity ordered by the price in the products table as that wasn't the price at the time of the order. Reporting needs this type if information, but it often too late to get it by the time the reports are written when the design is done badly. Stuff that works fine when you are handling one record at a time can be a disaster when you need to look at thousands or millions of records. Not every application is going to create a separate reporting datbase, so really think about this.
DEPENDS
this question is like saying "what is the best car to buy", it really depends on many factors including amount of data, number of concurrent users, what you are trying to do, etc. FYI, normalization is good for some database uses, but bad for others (data warehouse).
Give us a better idea of how you intend to use the data, and you'll get some better recommendations.
While I agree with others that your question right now is much too broad and can't really be answered (except for the "it depends" approach :-)), there is one book I would wholeheartedly recommend for anyone beginning database design in general:
Michael Hernandez: Database Design for Mere Mortals(R): A Hands-On Guide to Relational Database Design
It's a really hands-on, no-frills, down to earth book and introduces all the major and important concepts in a very understandable, very approachable fashion. Well written, interesting, very sound and useful - highly recommended!
Marc
your question is too broad. Normalization and denormalization are most used concepts.
The best thing to do is to start with a well normalized database. The wikipedia article has some good information on that along with some good references.
Typically you'll end up denormalizing parts of your database for better performance, but you almost always want to start with it in 4th normal form.
Look at wikipedia article about database normalization. There is also further reading section.
If you design a new database for brand new application you should try use ORM library (like JPA implementations in Java) that release you from database design, because these tools generate database from domain model. If you don't have any experience in this field - database generated with ORM tools will be much better of yours.
Consider all your use cases. Think about every single possible way someone might want to get to data, and plan for those. Wear your designer, developer, tester, and user hats.
Try to think of database tables as representing physical objects.
Normalize, as others have said.

Resources