DB2 performance analysis tools for developers - database

As a developer using DB2 for the first time, I'm not familiar with what the best database performance analysis tools are for it.
I'm wondering what others have found useful in terms of tools that come with DB2, and any third-party tools available for it.
e.g. Anything better than others for things like query planning, CPU measurement, index usage, etc.?

You don't specify which version/release of DB2 you're running, or whether you're running the mainframe (z/OS) version or DB2 for Linux, UNIX, and Windows (also known as DB2 for LUW).
If you're running DB2 on z/OS, talk to your DBA and you'll find out exactly which monitoring and analysis tools have been licensed.
If it's DB2 for LUW you're using, there are various structures and routines you can access directly in DB2 to capture detailed performance information. IBM adds more of these features with each new DB2 release (e.g. version 9.5 vs. 9.7), so be sure to access the version of the documentation for your specific release. Here is the monitoring guide for 9.5 and here is the 9.7 monitoring guide.
The challenge will be to capture and analyze that performance data in some useful way. BMC, CA, DBI, IBM, and even HP have very good third-party tools to help you do that. Some of them are even free.
On the open-source side, monitors from GroundWork Open Source and Hyperic HQ Open Source have some DB2 support, but you'll need to spend some time configuring either of those environments to access your DB2 server.
Many of the tools mentioned above track some combination of DB2 health and performance indicators, and may even alert you when something about DB2 or its underlying server has entered a problem status. You will face choices over what to use as the criteria for triggering alerts, versus the KPIs you simply want to capture without ever alerting.
There are a lot of monitoring tools out there that can be taught how to watch DB2, but one of the most versatile and widely used is RRDtool, either on its own with a collection of custom DB2 scripts, or as part of a Cacti or Munin installation, which automates many (but not all) aspects of working with RRDtool. The goal of RRDtool is to capture any kind of numeric time-series data so it can be rendered into various graphs; it has no built-in alert capabilities. Implementing RRDTool involves choosing and describing the data points you intend to capture and allocating RRDtool data files to store them. I use it a lot to identify baseline performance and resource utilization trends for a database or an application. The PNG bitmaps it produces can be integrated into a wide variety of IT dashboards, provided those dashboards are customizable.

Related

Extract & transform data from Sql Server to MongoDB periodically

I have a Sql Server database which is used to store data coming from a lot of different sources (writers).
I need to provide users with some aggregated data, however in Sql Server this data is stored in several different tables and querying it is too slow ( 5 tables join with several million rows in each table, one-to-many ).
I'm currently thinking that the best way is to extract data, transform it and store it in a separate database (let's say MongoDB, since it will be used only for read).
I don't need the data to be live, just not older that 24 hours compared to the 'master' database.
But what's the best way to achieve this? Can you recommend any tools for it (preferably free) or is it better to write your own piece of software and schedule it to run periodically?
I recommend respecting the NIH principle here, reading and transforming data is a well understood exercise. There are several free ETL tools available, with different approaches and focus. Pentaho (ex Kettle) and Talend are UI based examples. There are other ETL frameworks like Rhino ETL that merely hand you a set of tools to write your transformations in code. Which one you prefer depends on your knowledge and, unsurprisingly, preference. If you are not a developer, I suggest using one of the UI based tools. I have used Pentaho ETL in a number of smaller data warehousing scenarios, it can be scheduled by using operating system tools (cron on linux, task scheduler on windows). More complex scenarios can make use of the Pentaho PDI repository server, which allows central storage and scheduling of your jobs and transformations. It has connectors for several database types, including MS SQL Server. I haven't used Talend myself, but I've heard good things about it and it should be on your list too.
The main advantage of sticking with a standard tool is that once your demands grow, you'll already have the tools to deal with them. You may be able to solve your current problem with a small script that executes a complex select and inserts the results into your target database. But experience shows those demands seldom stay the same for long, and once you have to incorporate additional databases or maybe even some information in text files, your scripts become less and less maintainable, until you finally give in and redo your work in a standard toolset designed for the job.

Oracle XE or Postgres?

Should I use for a web application Postgres or Oracle XE and why?
They are both available for free and for use commercially.
Why should I use one database over the other?
Well, with Oracle XE, if you hit the limits, you have to either buy Oracle or migrate your entire application to a different database. With PostgreSQL, there are no limits, and the software is completely free and open-source.
A few things to be aware of if you choose Oracle XE:
It will only use one CPU if you have a multiprocessor server
It will only use up to a maximum of 1 gigabyte of memory
It has a database size limit of 4 gigabytes for user data
Only available for 32-bit Windows and 32-bit Linux
If those limitations aren't an issue for you and you like the Oracle approach then give it a shot, otherwise consider an opensource server like Postgres or MySQL, which have none of the aforementioned limitations.
If you do choose XE and then later find your requirements have changed, the next version up of Oracle is 900USD for 5 seats, and an additional 180 per seat. This is in fact a bit cheaper than MS SQL Server afaik.
There are some good reasons to choose Oracle, particularly if you're a Java developer e.g. you can write stored procedures in Java, and I think there's native support for Java web services. Ultimately however you need to weigh up the cost with the requirements of your application. MySQL and Postgres will allow you to scale your application without any cost (other than hardware obviously).
You haven't provided much information, but unless you're already an Oracle expert, I see no reason to choose Oracle XE over PostgreSQL. PostgreSQL will always be free and is far more capable and more scalable.
And you can choose to run PostgreSQL on Windows, Mac OS X or Linux. I think Oracle XE is limited to Windows and Linux.
when choosing a db vendor be careful to match the demands of your app against the strengths of the vendors' db product. And watch out for their weaknesses.
For example, if you know your writes will be as frequent as your reads and both will occur simultaneously, then you'll want to know how each vendor you consider handles concurency. Vendors that rely on elaborate lock managers with complex lock escalation schemes are likely to bring you grief if you expect heavy load on you app. You'll spend more time trying to work around the DB's lock manager than actually solving your problems.
That's one example. Every DB has its strengths and weaknesses to consider. Do your research, find a site that compares vendors and make a choice that balances your needs against that. If you can get an eval copy, all the better to run some proof of concept tests against. Write scripts that pummel the db in some similar to what you expect your app to produce and go from there. While your at it, get the query plans for the SQL in your scripts from each vendor and see what you can learn from that about how each vendor's optimizer works.
There's more that can be said, but hopefully you get the gist.
What is your platform of choice? If it's Windows, I'd go with Oracle or MySQL as I have heard only bad things about running PostgreSQL on Windows machines.
Also, Oracle has a more GUI apprach to configuration so if you're not a Linux hacker, you may like it better.
Postgres on the other hand has a way superior SQL console and console tools in general are more developed and easier to use. Oracle has more tools but the decent ares (like PL/SQL navigator) are not free and the free ones simply suck.
Remember that choosing a database is a long-term decision so consider the possible changes in requirements for your application as well. If you are not prepared to eventually spend money on Oracle, better go with Postgres -- safe option. And good enough for most projects.

Swapping out databases?

It seems like the goal of a lot of ORM tools and custom data access layers (DAO pattern, etc.) is to abstract the database to the point where you could supposedly swap out the entire database system with minimal work.
Following the common DAL patterns is usually a good idea in code, but it seems like it would never be minimal work to swap out a database. (Cost, training, data migration, etc.)
Does anyone have any experience with swapping out one database for another in a large system, and dealing with the implications in code? Is it worth it to worry about abstracting the actual database from your code?
Question 1: Does anyone have any experience with
swapping out one database for another
in a large system, and dealing with
the implications in code?
Yes we tried it. Our customer is using a large MS Access based Delphi client server application. After about five years we considered switching to SQL Server. We analyzed the problem and concluded that swapping the database would be very costly and provide only a few advantages. Customer decided not to swap the database. The application is still running fine and the customer is still happy.
Note that:
MS Access is only being used for data storage and report generation.
The server application ensures that MS Access is only being accessed on the server. Normal multi-user MS Access applications will transfer large chunks of the Access database over the network - resulting in slow and unreliable database functionality. This is not the case for this application. Client <> Server <> MS Access. Only the server application communicates with the MS Access database. Actually the Server has exclusive access to the MS Access database. No other computer can open to the MS Access database. Conclusion: MS Access is being used as a true RDBMS, Relational DataBase Management System - please no flaming about MS Access being inferior and unstable - it has been running fine for more than 10 years.
The most important issues you will have to consider:
SQL statements: (SELECT, UPDATE, DELETE, INSERT, CREATE TABLE) and make sure they would be compatible with the SQL database. It's amazing how much all the RDBMS differ in the details (date formats, number formats, search formats, string formats, join syntax, create table syntax, stored procedures, user defined functions, (auto) primary keys, etc.)
Report generation: Depending on your database you might be using a different reporting tool. Our customer has over 200 complex reports. Converting all these reports is very time consuming.
Performance: all RDBMS have different performances in different environments. Normally performance optimalisations are very much RDBMS dependent.
Costs: the costs of tools, developers, server and user licenses varies greatly. It ranges from free to very expensive. Free does not mean cheap and expensive does not always equate to good. A cost/value comparison will have to be made.
Experience: making the best use of your RDBMS requires experience. If you have to develop for an "unknown" RDBMS your productivity will suffer.
Question 2: Is it worth it to worry about
abstracting the actual database from
your code?
Yes. In an ideal world, swapping a database would just be adjusting the data connection string. In the real world this is not possible because all databases are different. They all have tables and SQL support but the differences are in the details. If you can keep the differences of the databases shielded through abstraction - please do so. Make a list of the databases you need to support. Check the selected database systems for the differences. Provide centralized code to handle the differences. Support one RDBMS and provide stubs for future support of other RDBMS.
I disagree that the purpose is to be able to swap out databases, and I think you are correct in showing some suspicion about ORMs leading towards that goal.
However, I would still use an ORM, as it abstracts away the details of data access. Isn't this the goal of object oriented programming? Keep your concerns separated.
I think the primary use case for database abstraction (via ORM tools) is to be able to ship a product that works with multiple database brands. I believe it's a rarer occurrence for a company to switch between database vendors, but that's still one of the use cases.
I've worked jobs where we started out using MySQL for monetary reasons (think a startup) and, one we started making money, wanted to switch to Oracle. We didn't end up making the switch, but it was nice to have the option.
Still, ORM tools are not a completely leak-less abstractions and I know our migration still would have been painful and costly. It totally depends on what you are building, but it has been my experience that -- for performance reasons, usually -- you end up either working around your ORM solution or exploiting vendor-specific features at some point.
The only time I've seen a database switch was from HSQL during early development to Oracle as the project progressed. The ORM made this easy.
I often use the DAO pattern to swap out data services (from a database to web service or to swap a web service to a test stub).
For ORM I don't think the goal is to enable you to switch databases - it is to hide you from the complexities of different database implementations and removing the need to worry about the fine details of translating from relational to object represenations of your data.
By having someone smart write an ORM that handles caching, only updates fields that have changed, groups updates, etc I don't need to. Although in the cases where I need something special I can still revert to SQL if I want.

When is it time to change database backends?

Is there a general rule of thumb to follow when storing web application data to know what database backend should be used? Is the number of hits per day, number of rows of data, or other metrics that I should consider when choosing?
My initial idea is that the order for this would look something like the following (but not necessarily, which is why I'm asking the question).
Flat Files
BDB
SQLite
MySQL
PostgreSQL
SQL Server
Oracle
It's not quite that easy. The only general rule of thumb is that you should look for another solution when the current one can't keep up anymore. That could include using different software (not necessarily in any globally fixed order), hardware or architecture.
You will probably get a lot more benefit out of caching data using something like memcached than switching to another random storage backend.
If you think you are going to ever need one of the heavyweights (SqlServer, Oracle), you should start with one of those at the beginning. Data migrations are extremely difficult. In the long run it will cost you less to just start at the top and stay there.
I think you're being overly specific in your rankings. You can pretty much start with flat files and the like for very small data sets, go up to something like DBM for slightly bigger ones that don't require SQL-like syntax, and go to some kind of SQL database after that.
But who wants to do all that rewriting? If the application will benefit from access to joins, stored procedures, triggers, foreign key validation, and the like--just use a SQL database regardless of the dataset size.
Which one should depend more on the client's existing installations and what DBA skills are available than on the amount of data you're holding.
In other words, the size of your database is far from the only consideration, and maybe not the most important one.
There is no blanket answer to this, but ALMOST always, using flat files is not a good idea. You have to parse through them (i suppose) and they do not scale well. Starting with a proper database, like Oracle or SQL Server (or MySQL, Postgres if you are looking for free options) is a good idea. For very little overhead, you will save yourself a lot of effort and headache later on. They also allow you to structure your data in a non-stupid fashion, leaving you free to think of WHAT you will do with the data rather than HOW you will be getting it in/out.
It really depends on your data, and how you intend to use it. At one of my previous positions, we used Postgres due to the native geo-location and timezone extensions which existed because it allowed us to manage our data using polygonal datatypes. For us, we needed to do that, and we also wanted to use stored procedures, views and the like.
Now, another place I worked at used MySQL simply because the data was normalized, standard row by row data.
SQL Server, for a long time, had a 4gb database limit (see SQL Server 2000), but despite that limitation it remains a very stable platform for small to medium applications for which the old data is purged.
Now, from working with Oracle and SQL Server 05/08, all I can tell you is that if you want the creme of the crop for stability, scalability and flexibility, then these two are your best bet. For enterprise applications, I strongly recommend them (merely because that's what we use where I work now).
Other things to consider:
Language integration (ASP.NET session storage, role management, etc.)
Query types (Select, Update, Delete) [Although this is more of a schema design issue, not a DBMS issue)
Data storage requirements
Your application's utilization of the database is the most critical ones. Mainly what queries are used most often (SELECT, INSERT or UPDATE)?
Say if you use SQLite, it is gears for smaller application but for "web" application you might a bigger one like MySQL or SQL Server.
The way you write scripts and your web application platforms also matters. If you're developing on a Microsoft platform, then SQL Server is a better alternative.
Typically, I go with what is commonly accepted by whichever framework I am using. So, if I'm doing .NET => SQL Server, Python (via Django or Pylons) => MySQL or SQLite.
I almost never use flat files though.
There is more to choosing an RDBMS solution that just "back end horsepower". The ability to have commitment control, for example, so you can roll back a failed transaction is one. reason.
Unless you are in the megatransaction rate application, most database engines would be adequate - so it becomes a question of how much you want to pay for the software, whether it runs on the hardware and operating system environment you want, and what expertise you have in managing that software.
That progression sounds painful. If you're going to include MS products (especially the for-pay SQL Server) in there anywhere, you may as well use the whole stack, since you only have to pay for the last of these:
SQL Server Compact -> SQL Server Express -> SQL Server Enterprise (clustered).
If you target your app at SQL Server Compact initially, all your SQL code is guaranteed to scale up to the next version without modification. If you get bigger than SQL Server Enterprise, then congratulations. That's what they call a good problem to have.
Also: go back and check the SO podcasts. I believe they talked about this briefly.
This question depends on your situation really.
If you have control over the server you're deploying to and you can install whatever services you need, then the time to install a MySql or MSSQL Express server and code against an existing database framework VERSUS coding against flat file structure is not worth the effort of considering.
What about FireBird? Where would that fit into that list?
And lets not forget the requirements that the "customer" of your solution must also have in place. If your writing a commercial application for a small companies, then Oracle might not be a good choice... but if your writing a customized solution for a large enterprise which must share data among multiple campuses, and has a good sized IT department then the decision of Oracle vs Sql Server would come down to what does the customer most likely already have deployed.
Data migration nowdays isn't that bad since we have those great tools from Embarcadero, so I would instead let the customer needs drive the decision.
If you have the option SQL Server is a good choice from the word go, predominantly because you have access to solid procedures and functions and the database backup facilities are totally reliable. Wrapping up as much as your logic as you can inside the database itself (rather than in whatever language you are using) helps security and performance - indeed there's an good argument to be made for always using procedures for insert/update logic as these make you invulnerable to injection attacks.
If I have the choice the only time I'd consider MySQL in preference is with a large, fairly simple, database predominantly used for read access. This isn't to decry MySQL which has improved markedly of late and I happily use if I don't have the choice, but for more complex systems with update/insert activity MSSQL is generally the superior option.
I think your list is subjective but I will play your game.
Flat Files
BDB
SQLite
MySQL
PostgreSQL
SQL Server
Oracle
Teradata

What are the advantages of VistaDB

I have seen the references to VistaDB over the years and with tools like SQLite, Firebird, MS SQL et. al. I have never had a reason to consider it.
What are the benefits of paying for VistaDB vs using another technology? Things I have thought of:
1. Compact Framework Support. SQLite+MSSQL support the CF.
2. Need migration path to a 'more robust' system. Firebird+MSSQL.
3. Need more advanced features such as triggers. Firebird+MSSQL
The VistaDB client runtime is free. The runtime will never "expire at 3am" as you put it. Only the developer tools are licensed in that manner. You need 1 license per developer, simple. We even offer a really inexpensive Lite version with no Visual Studio tools.
Some other benefits
100% managed code - there are no interop or other unmanaged calls in the engine. This is a big deal to some, and others couldn't care less.
No registry access required - Most other in proc databases require registry access to look for parent controls, or permissions. VistaDB only does what you tell it to do, and will even run in Medium Trust.
XCopy deployment for runtime and your database (single file). You can xcopy you application, the runtime, and your database and run. Nothing to install or configure on the machine, no special privileges needed (we can run in Medium Trust or higher).
Isolated storage - You can put your entire database into Isolated Storage and run it from there directly. This makes it very easy to build secure click once applications that write databases in a domain friendly way for corporate environments. There is no need to store the user data on a shared drive or worry about permission mapping.
CLR Triggers / CLR Procs - You can write CLR Code and use them as Triggers or Stored Procs. We have just recently introduced changes to make it even easier to maintain a single CLR Assembly that can run in both VistaDB and SQL Server 2005/2008.
T-SQL Procs - VistaDB T-SQL Procs are compatible with SQL Server 2005/2008. Any procedure that works in our engine will run in SQL Server. That does not mean anything that runs there will port to us. We are a subset of the functionality in SQL Server. But we are also the only way to run T-SQL Procs without SQL Server (SQL CE can't do it).
I personally think one of the biggest features is the ability to upsize to SQL Server later. All of the VistaDB types, syntax, and CLR Procs, T-SQL procs, etc all will run on SQL Server. (You can't take everything from SQL Server down to VistaDB though, it is a subset)
32/64 bit Deployment - VistaDB is a single assembly deployment that runs both 32 and 64 bit without changes. SQL CE requires two different runtimes depending upon the OS, and cannot run under IIS at all. Access has no 64 bit runtime, and the most recent 32 bit runtime can only be deployed through MSI. The 32 bit version of Windows has the runtime, the 64 bit version does not.
Relational Integrity - VistaDB also actually enforces your constraints and Foreign Keys. You can specific cascade update, and delete operations. The person who commented we are like SQLITE is wrong in this regard. They parse constraints, but do not enforce them.
EDIT: They do have support for FK's now in SQLite. But they are not compiled in by default, and do not use the same syntax as SQL Server.
Medium Trust - The ability to run on a medium trust web server is another feature that many will not care about, but it is a big deal. Many third party controls can't even run in Medium Trust. We can run the complete engine within Medium Trust because of our commitment to 100% managed code and least permission required.
- Full disclosure - I am the owner of VistaDB so I may be biased. :)
Well, the main thing is that it is pure managed code - for what that is worth; it works not only on your typical Windows machines running .NET, but works wherever you run the Compact Framework and even works on Mono. Here are some noteworthy bullet points from their homepage:
Small < 1 MB footprint truly embedded ZeroClick
Microsoft SQL Server 2005 compatible data types and T-SQL syntax
None of the SQL CE limits
Single user, multi user local or using shared network.
Partially trusted shared hosting is no problem.
Royalty-free distribution - single CPU deployment of SQL Server costs more than a site license of VistaDB!
One thing worth noting is that Rob Howard's company, telligent, uses it as the default database for their new CMS software, "Graffiti."
I have played with it here and there but have yet to build anything against it.
For me this most interesting feature of VistaDB is that it can be run in Medium Trust environment. Which makes it perfect solution for creating small to medium .NET websites which can be deployed on server by copying and pasting (x-copy deployment).
And almost all windows shared hosting providers (like GoDaddy) won't let you run your websites in Full Trust mode. And also won't install for you any 3rd party binaries into GAC like System.Data.SQLite.dll if you wish to use SQLite for example.
I hadn't seen VistaDB before, it does look pretty cool.
Update: Received a comment from someone from VistaDB - their update model is only for getting new versions. Your old ones won't stop working if your license expires, which is good to know.
Keeping the original post here as IMHO the warning about expiring software licenses is still worth thinking about, even though VistaDB itself is fine.
It definitely seems 'more featureful' than SQLite, but I don't see anything there to justify the cost. The site seems to indicate that you can buy one license for $279, but it implies this is just a 1 year subscription. Would you have to then pay another $279 next year to stop your site falling over?
If so, remember to factor into the 'cost' how much inconvenience it's going to be when you get a call at 3am (murphy's law, it's always 3am) from your panicking customers because their VistaDB license has expired :-(
I've had this experience personally with some expiring software, and it's never good. You can send your customers emails and messages and flash their entire screen blinking red saying "YOU NEED TO GET A NEW LICENSE BEFORE NEXT WEEK" and they'll still never do it, and you'll still get the pain at 3am when it does expire.

Resources