Virtualized SQL Server: Why not? - sql-server

The IT department where I work is trying to move to 100% virtualized servers, with all the data stored on a SAN. They haven't done it yet, but the plan eventually calls for moving the existing physical SQL Server machines to virtual servers as well.
A few months ago I attended the Heroes Happen Here launch event, and in one of the SQL Server sessions the speaker mentioned in passing that this is not a good idea for production systems.
So I'm looking for a few things:
What are the specific reasons why this is or is not a good idea? I need references, or don't bother responding. I could come up with a vague "I/O bound" response on my own via google.
The HHH speaker recollection alone probably won't convince our IT department to change their minds. Can anyone point me directly to something more authoritative? And by "directly", I mean something more specific than just a vague Books OnLine comment. Please narrow it down a little.

I can say this from personal experience because I am dealing with this very problem as we speak. The place I am currently working as a contractor has this type of environment for their SQL Server development systems. I am trying to develop a fairly modest B.I. system on this environment and really struggling with the performance issues.
TLB misses and emulated I/O are very slow on a naive virtual machine. If your O/S has paravirtualisation support (which is still not a mature technology on Windows) you use paravirtualised I/O (essentially a device driver that hooks into an API in the VM). Recent versions of the Opteron have support for nested page tables, which removes the need to emulate the MMU in software (which is really slow).
Thus, applications that run over large data sets and do lots of I/O like (say) ETL processes trip over the achilles heel of virtualisation. If you have anything like a data warehouse system that might be hard on memory or Disk I/O you should consider something else. For a simple transactional application they are probably O.K.
Put in perspective the systems I am using are running on blades (an IBM server) on a SAN with 4x 2gbit F/C links. This is a mid-range SAN. The VM has 4GB of RAM IIRC and now two virtual CPUs. At its best (when the SAN is quiet) this is still only half of the speed of my XW9300, which has 5 SCSI disks (system, tempdb, logs, data, data) on 1 U320 bus and 4GB of RAM.
Your mileage may vary, but I'd recommend going with workstation systems like the one I described for developing anything I/O heavy in preference to virtual servers on a SAN.
Unless your resource usage requirements are beyond this sort of kit (in which case they are well beyond a virtual server anyway) this is a much better solution. The hardware is not that expensive - certainly much cheaper than a SAN, blade chassis and VMWare licencing. SQL Server developer edition comes with V.S. Pro and above.
This also has the benefit that your development team is forced to deal with deployment right from the word go - you have to come up with an architecture that's easy to 'one-click' deploy. This is not as hard as it sounds. Redgate SQL Compare Pro is your friend here. Your developers also get a basic working knowledge of database administration.
A quick trip onto HP's website got me a list price of around $4,600 for an XW8600 (their current xeon-based model) with a quad-core xeon chip, 4GB of RAM and 1x146 and 4x73GB 15k SAS hard disks. Street price will probably be somewhat less. Compare this to the price for a SAN, blade chassis and VMware licensing and the cost of backup for that setup. For backup you can provide a network share with backup where people can drop compressed DB backup files as necessary.
EDIT: This whitepaper on AMD's web-site discusses some benchmarks on a VM. From the benchmarks in the back, heavy I/O and MMU workload really clobber VM performance. Their benchmark (to be taken with a grain of salt as it is a vendor supplied statistic) suggests a 3.5x speed penalty on an OLTP benchmark. While this is vendor supplied one should bear in mind:
It benchmarks naive virtualisation
and compares it to a
para-virtualised solution, not
bare-metal performance.
An OLTP benchmark will have a more
random-access I/O workload, and will
spend more time waiting for disk
seeks. A more sequential disk
access pattern (characteristic of
data warehouse queries) will have a
higher penalty, and a memory-heavy
operation (SSAS, for example, is a
biblical memory hog) that has a
large number of TLB misses will also
incur additional penalties. This
means that the slow-downs on this
type of processing would probably be
more pronounced than the OLTP
benchmark penalty cited in the whitepaper.
What we have seen here is that TLB misses and I/O are very expensive on a VM. A good architecture with paravirtualised drivers and hardware support in the MMU will mitigate some or all of this. However, I believe that Windows Server 2003 does not support paravirtualisation at all, and I'm not sure what level of support is delivered in Windows 2008 server. It has certainly been my experience that a VM will radically slow down a server when working on an ETL process and SSAS cube builds compared to relatively modest spec bare-metal hardware.

SAN - of course, and clustering, but regarding Virtualization - you will take a Performance Hit (may or may not be worth it to you):
http://blogs.technet.com/andrew/archive/2008/05/07/virtualized-sql-server.aspx
http://sswug.org has had some notes about it in their daily newsletter lately

I wanted to add this series of articles by Brent Ozar:
Why Your Sysadmin Wants to Virtualize Your Servers
Why Would You Virtualize SQL Server?
Reasons Why You Shouldn't Virtualize SQL Server
It's not exactly authoritative in the sense I was hoping for (coming from the team that builds the server, or an official manual of some kind), but Brent Ozar is pretty well respected and I think he does a great job covering all the issues here.

We are running a payroll system for 900+ people on VMWare with no problems. This has been in production for 10 months. It's a medium sized load as far as DB goes, and we pre-allocated drive space in VM to prevent IO issues. You have to defrag both the VM Host and the VM slice on a regular basis in order to maintain acceptable performance.

Here's some VMWARE testing on it..
http://www.vmware.com/files/pdf/SQLServerWorkloads.pdf
Granted, they do not compare it to physical machines. But, you could probably do similar testing with the tools they used for your environment.
We currently run SQL Server 2005 in a VMWARE environment. BUT, it is a very lightly loaded database and it is great. Runs with no problems.
As most have pointed out, it will depend on your database load.
Maybe you can convince the IT Department to do some good testing before blindly implementing.

No, I can't point to any specific tests or anything like that, but I can say from experience that putting your production database server on a virtual machine is a bad idea, especially if it has a large load.
It's fine for development. Possibly even testing (on the theory that if it runs fine under load on virtual box, it's going to run fine on prodcution) but not in production.
It's common sense really. Do you want your hardware running two operating systems and your sql server or one operating system and sql server?
Edit:
My experience biased my response. I have worked with large databases under heavy constant load. If you have a smaller database under light load, virtualization may work fine for you.

There is some information concerning this in Conor Cunningham's blog article Database Virtualization - The Dirty Little Secret Nobody is Talking About.... To quote:
Within the server itself, there is suprisingly little knowledge of a lot of things in this area that are important to performance. SQL Server's core engine assumes things like:
all CPUs are equally powerful
all CPUs process instructions at about the same rate.
a flush to disk should probably happen in a bounded amount of time.
And the post goes on elaborating these issues somewhat further also. I think a good read considering the scarcity of available information considering this issue in general.

Note there are some specialty virtualization products out there that are made for databases that might be worth looking into instead of a general product like VMWare.
Our company (over 200 SQL servers) is currently in the process of deploying HP Polyserve on some of our servers:
HP PolyServe Software for Microsoft SQL Server enables multiple Microsoft SQL Server instances to be consolidated onto substantially fewer servers and centralized SAN storage. HP PolyServe's unique "shared data" architecture delivers enterprise class availability and virtualization-like flexibility in a utility platform.
Our primary reason for deploying it is to make hardware replacement easier: add the new box to the "matrix", shuffle around where each SQL instance resides (seamlessly), then remove the old box. Transparent to the application teams, because the SQL instance names don't change.

Old Question with Old Answers
The answers in this thread are years old. Most of the negative points in this entire thread are technically still correct but much less relevant. The overhead cost of virtualization and SAN’s is much less a factor now than it used to be. A correctly configured Virtualization host, guest, network, and SAN can provide good performance with the benefits of virtualization and operational flexibility including good recovery scenarios that are only provided by being virtual.
However, in the real world it only takes one minor configuration detail to bring the whole thing to its knees. In practice your biggest challenge with virtual SQL servers is convincing and working with the people responsible for the virtualization to get it set up just right.
Irony, in 100 percent of the cases where we took production off of the virtualization and moved it back to dedicated hardware performance went through the roof on the dedicated hardware. In all of these cases it was not the virtualization but the way it was setup. By going back to dedicated hardware we actually proved that the virtualization would have been a much better use of resources by factors of 5 or more. Modern software is usually designed to scale out across nodes so virtualization works to your advantage on that front as well.

The biggest concern to me when virtualising software is normally licensing.
Here's an article on it for MS SQL. Not sure about your situation so can't pick out any salient points.
http://www.microsoft.com/sql/howtobuy/virtualization.mspx

SQL Server is supported in a virtual environment. In fact I would recommend it seeing that one of the licensing options is per socket. This means you can put as many SQL Server instances in a virtualized (e.g. Windows 2008 Server Datacenter) system as you like and pay only per processor socket the machine has.
It's better than that because DataCenter is licensed per socket with unlimited Virtual machine licenses as well.
I'd recommend clustering your Hyper-V on two machines however so that if one fails the other can pick up the slack.

I would think that the possibility of something bad happening to the data would be too great.
As a dead simple example, let's say you ran a SQL Server box in Virtual Server 2005 R2 and undo disks were turned on (so, the main "disk" file stays the same and all changes are made to a separate file which can be purged or merged later). Then something happens (usually, you run into the 128GB limit or whatever the size is) and some middle of the night clueless admin has to reboot and figures out he can't do so until he removes the undo disks. You're screwed - even if he keeps the undo disk files for later analysis the possibilities of merging the data together is pretty slim.
So echoing the other posts in this thread - for development it's great but for production it's not a good idea. Your code can be rebuilt and redeployed (that's another thing, VM's for source control aren't a good idea either) but your live production data is way more important.

Security issues that can be introduced when dealing with Vitalization should also be considered. Virtualization Security is a good article by PandaLabs that covers some of the concerns.

You are looking at this from the wrong angle. First, you are not going to find White Papers from Vendors why you should "not" virtualize or why you should Virtualize.
Every environment is different and you need to do what works in your Environment. With that said, there are some servers that are perfect for virtualization and there are some that should not be virtualized. For example, if your SQL Server/s are doing millions and millions of transactions per second, like if your server was located at the NYSE or the NASDAQ and millions and millions of dollars depend on it, you probably should not virtualize it. Make sure you understand the ramifications of virtualizing an SQL server.
I've seen where people virtualize SQL over and over just because Virtualization is cool. Then complain later on when the VM server does not perform as expected.
What you need to do is set a benchmark, fully test the solution you want to deploy and show what it can and can't do so you don't run into any surprises. Virtualization is great, it is good for the envronment and saves through consolidation, but you need to show why your supervisors why you should not virtualize your SQL Servers and only you can do this.

Related

Could Stackoverflow run on SQL Web Edition

I am starting a new web venture that may need to scale to a high number of users.
I am confident with the SPLA licencing for SQL Web edition, but want to know if I will need to factor in upgrading to Standard, Enterprise or DataCenter(pretty sure it won't be this one)
I know once doesn't need to scale before they need to, but this would affect the architecture of the site and the business plan.
I know the processor limit is per physical processor, not thread, so that doesn't worry me. However some of the mirroring and backup features would worry me. Does SO rely on these features?
So the question is, could a site like Stack overflow run on SQL Web? What aspects of maintenance and high availability would be impossible to achieve?
As first blush SQL Web edition is limited to 4 processors, 64 gigs of ram and a petabyte+ database size.
The real question is: does your design require features that are not included, such as table and index partitions, online index de-fragmentation, database mirroring, or replication?
Without knowing more about your app, the design, the hardware, and running lots load tests it's speculative, at best, for someone to say if your site will run on a given version of SQL.
As #home pointed out you can compare version features here : http://www.microsoft.com/sqlserver/en/us/product-info/compare.aspx Part of what you pay for in the fancier version of SQL are 'high-availability' features. Whether SO can run on a 4 proc box with 64 gigs of ram is only part of the answer. What happens to the site when there is a harware failure? When maintenance needs to be done? and so on...
With that said, if you're starting a venture based on MS technology the Biz Spark program is a great way to get 3 years of access to all Microsoft software. The dev licenses convert into 'go live' likenesses upon graduation. This way you can figure out what you need, while delaying the costs till you're needs are clearer.
EBarr is correct.
Here is some practical advice: Once you have that many customers that WebEdition is not enough for you, your business idea is working fantastically and you have no worries.
Modern hardware is so insanely fast that it can serve huge amounts of load. This was different 10 years ago.
When you max out even simple hard- and software, you are already rich.

Best C language key/value database around for massive amounts of entries

I am trying to create a key/value database with 300,000,000 key/value pairs of 8 bytes each (both for the key and the value). The requirement is to have a very fast key/value mechanism which can query about 500,000 entries per second.
I tried BDB, Tokyo DB, Kyoto DB, and levelDB and they all perform very bad when it comes to databases at that size. (Their performance is not even close to their benchmarked rate at 1,000,000 entries).
I cannot store my database in memory because of hardware limitations (32 bit software), so memcached is out of the question.
I cannot use external server software as well (only a database module), and there is no need for multi-user support at all. Of course server software cannot hold 500,000 queries per second from a single endpoint anyways, so that leaves out Redis, Tokyo tyrant, etc.
David Segleau, here. Product Manager for Berkeley DB.
The most common problem with BDB performance is that people don't configure the cache size, leaving it at the default, which is pretty small. The second most common problem is that people write application behavior emulators that do random look-ups (even though their application is not really completely random) which forces them to read data out of cache. The random I/O then takes them down a path of conclusions about performance that are not based on the simulated application rather than the actual application behavior.
From your description, I'm not sure if your running into these common problems or maybe into something else entirely. In any case, our experience is that Berkeley DB tends to perform and scale very well. We'd be happy to help you identify any bottlenecks and improve your BDB application throughput. The best place to get help in this regard would be on the BDB forums at: http://forums.oracle.com/forums/forum.jspa?forumID=271. When you post to the forum it would be useful to show the critical query segments of your application code and the db_stat output showing the performance of the database environment.
It's likely that you will want to use BDB HA/Replication in order to load balance the queries across multiple servers. 500K queries/second is probably going to require a larger multi-core server or a series of smaller replicated servers. We've frequently seen BDB applications with 100-200K queries/second on commodity hardware, but 500K queries per second on 300M records in a 32-bit application is likely going to require some careful tuning. I'd suggest focusing on optimizing the performance of a the queries on the BDB application running on a single node, and then use HA to distribute that load across multiple systems in order to scale your query/second throughput.
I hope that helps.
Good luck with your application.
Regards,
Dave
I found a good benchmark comparison web page that basically compares 5 renowned databases:
LevelDB
Kyoto TreeDB
SQLite3
MDB
BerkeleyDB
You should check it out before making your choice: http://symas.com/mdb/microbench/.
P.S - I know you've already tested them, but you should also consider that your configuration for each of these tests was not optimized as the benchmark shows otherwise.
Try ZooLib.
It provides a database with a C++ API, that was originally written for a high-performance multimedia database for educational institutions called Knowledge Forum. It could handle 3,000 simultaneous Mac and Windows clients (also written in ZooLib - it's a cross-platform application framework), all of them streaming audio, video and working with graphically rich documents created by the teachers and students.
It has two low-level APIs for actually writing your bytes to disk. One is very fast but is not fault-tolerant. The other is fault-tolerant but not as fast.
I'm one of ZooLib's developers, but I don't have much experience with ZooLib's database component. There is also no documentation - you'd have to read the source to figure out how it works. That's my own damn fault, as I took on the job of writing ZooLib's manual over ten years ago, but barely started it.
ZooLib's primarily developer Andy Green is a great guy and always happy to answer questions. What I suggest you do is subscribe to ZooLib's developer list at SourceForge then ask on the list how to use the database. Most likely Andy will answer you himself but maybe one of our other developers will.
ZooLib is Open Source under the MIT License, and is really high-quality, mature code. It has been under continuous development since 1990 or so, and was placed in Open Source in 2000.
Don't be concerned that we haven't released a tarball since 2003. We probably should, as this leads lots of potential users to think it's been abandoned, but it is very actively used and maintained. Just get the source from Subversion.
Andy is a self-employed consultant. If you don't have time but you do have a budget, he would do a very good job of writing custom, maintainable top-quality C++ code to suit your needs.
I would too, if it were any part of ZooLib other than the database, which as I said I am unfamiliar with. I've done a lot of my own consulting work with ZooLib's UI framework.
300 M * 8 bytes = 2.4GB. That will probably fit into memory (if the OS does not restrict the address space to 31 bits)
Since you'll also need to handle overflow, (either by a rehashing scheme or by chaining) memory gets even tighter, for linear probing you probably need > 400M slots, chaining will increase the sizeof item to 12 bytes (bit fiddling might gain you a few bits). That would increase the total footprint to circa 3.6 GB.
In any case you will need a specially crafted kernel that restricts it's own "reserved" address space to a few hundred MB. Not impossible, but a major operation. Escaping to a disk-based thing would be too slow, in all cases. (PAE could save you, but it is tricky)
IMHO your best choice would be to migrate to a 64 bits platform.
500,000 entries per second without holding the working set in memory? Wow.
In the general case this is not possible using HDDs and even difficult SSDs.
Have you any locality properties that might help to make the task a bit easier? What kind of queries do you have?
We use Redis. Written in C, its only slightly more complicated than memcached by design. Never tried to use that many rows but for us latency is very important and it handles those latencies well and lets us store the data in the disk
Here is a bench mark blog entry, comparing redis and memcached.
Berkely DB could do it for you.
I acheived 50000 inserts per second about 8 years ago and a final database of 70 billion records.

Which DB Server should I use?

I have to develop a new (desktop) app for a small business. This business currently has an Access database with millions of records. The file size is about 1.5 GB. The boss told me that searching on this DB is very slow. The DB consists of a single table with about 20 fields.
I also think the overall DB design isn't great. I thought to use another DB server with a new design to improve both performance and efficiency.
Considering this is a relatively small business, I don't want to spend much for a DB license, so I want to ask you what would you do.
Continue to use Access, maybe improving and optimizing the DB in some way
Buy a DB server license (in this case, which one?)
? (any idea?)
Things like SQL Server Express, MySQL and PostgreSQL are available for free, no license purchase necessary.
For improving search speeds, you will probably also want to look at things like what indexes are defined for the table, what exactly searches are doing, et cetera.
2nd the recommendation for Firebird. We've been using it for about 5 years and never had an issue. Cross platform, embedded & server deployments... brilliant. Oh, and Free as in Beer. Mozilla Public LIcense.
Put me down as another recommendation for Firebird. We use it with our commercial Point of Sale product. We have it installed at over 1,000 sites, with databases as large as 40+ Gigabytes. It's fast, stable, simple, easy to deploy, and requires no management.
Your could replace the Access Database with a SQL Server database that will scale well moving forward. You can use SQL Server Express which is free and supports databases up to 4Gb I beleive.
SQL Server Express. Free for database size up to 10 GB.
SQL Server Express is a perfect fit for this. http://www.microsoft.com/express/database/
I warmly recommend MySQL. Its sometimes free and is easy to install on both Windows and linux.
There are also a lots of great free tools to manage its content like tables, users, indexes etc...
I would look into if the big table needs to be broken up into smaller ones(rarely needed, but still) and also what indexes are on it. And for Database software I would recommend PostgreSQL. It is free, easy to use(and I consider it easy to setup, though others beg to differ), and it is fast enough for enterprise applications.
You can take a look at Firebird
Firebird is one of the best database for desktop application and will allways be free.
Some tools exist to convert database from access to firebird.
I also recommend Firebird.
Its key advantages for your scenario are (from top of my head):
embedded version. You can ship it with your application - no separate installation kit needed, no .NET dependencies etc.
later on you can scale seamlessly to the full client-server model. No code changes required.
very small footprint
the entire database is stored in a single file. Much easier to deploy compared with other solutions.
you can have your server on any platform you want: Windows, Linux, MacOSX etc. Of course, you can have your client also on the same platforms but since you mentioned Access, I suppose that you have a Windows application.
no need for server administration. It just works.
I recommend PostgreSQL as well (especially as an alternative to MySQL)
No-one ever seems to mention it, but Oracle also do a free-as-in-free-beer version of their database: Oracle Express Edition (aka XE). It is limited to 1 CPU, 1GB RAM and 4GB of user data but that sounds plenty big enough for your application.
As for your database design, just one table sounds more like a spreadsheet than a database application. Probably you have lots of denormalised data. Splitting those out into smaller de-duplicated tables might well speed up certain queries. However if you only have twenty columns there may not be a lot of scope for tuning.
As for recommendation, the question is, which products do you know? If you have a familiarity with Access then I suggest you try to optimize your existing database. I have worked with Access databases which store several million rows and they performed well enough. After all, there is no guarantee that moving the same design to a different product will automatically make things run x times faster. Another advantage of Access as a tool is that it comes with a built-in front-end tool. If you move from Access you may need to think about re-building your application.
Wow, folks have a lot of platform recommendations for you.
Let me just say this.
If you feel like there are design issues as well as platform issues, why not try the design changes first? These are changes you are likely to make anyway.
If they make no performance difference in Access, you are no worse off, since these changes improve maintainability on any platform.
Then you can try other platforms with the knowledge that you have a solid design, and that you have not wasted any time.
SQLite can also be a candidate.
I would recommand not going with express anything - yep your small but what if the busness takes off and need much higher loads in the future - surely thats the way you want it to go ?? do not want to have to switch db layers / run 2 ..
I would look at MySQL or Firebird (used to borland interbase) both very high quality.

SQLite as a production database for a low-traffic site?

I'm considering using SQLite as a production database for a site that would receive perhaps 20 simultaneous users, but with the potential for a peak that could be many multiples of that (since the site would be accessible on the open internet and there's always a possibility that someone will post a link somewhere that could drive many people to the site all at once).
Is SQLite a possibility?
I know it's not an ideal production scenario. I'm only asking if this is within the realm of being a realistic possibility.
SQLite doesn't support any kind of concurrency, so you may have problems running it on a production website. If you're looking for a 'lighter' database, perhaps consider trying a contemporary object-document store like CouchDB.
By all means, continue to develop against SQLite, and you're probably fine to use it initially. If you find your application has more users down the track, you're going to want to transition to Postgres or MySQL however.
The author of SQLite addresses this on the website:
SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites). The amount of web traffic that SQLite can handle depends on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.
The SQLite website (https://www.sqlite.org/) uses SQLite itself, of course, and as of this writing (2015), it handles about 400K to 500K HTTP requests per day, about 15-20% of which are dynamic pages touching the database. Dynamic content uses about 200 SQL statements per webpage. This setup runs on a single VM that shares a physical server with 23 others and yet still keeps the load average below 0.1 most of the time.
So I think the long and short of it is, go for it, and if it's not working well for you, making the transition to an enterprise-class database is fairly trivial anyway. Do take care of your schema, however, and design your database with growth and efficiency in mind.
Here's a thread with some more independent comments around using SQLite for a production web application. It sounds like it has been used with some mixed results.
Edit (2014):
Since this answer was posted, SQLite now features a multi-threaded mode and write ahead logging mode which may influence your evaluation of its suitability for low-medium traffic sites.
Charles Leifer has written a blog post about SQLite's WAL (write ahead logging) feature and some well-considered opinions on appropriate use cases.
The small excerpt from SQLite website says it all.
Is the data separated from the application by a network? → choose
client/server
Many concurrent writers? → choose client/server
Big data? → choose client/server
Otherwise → choose SQLite!
SQLite "just works" (until it doesn't of course)
We often use SQLite for internal databases; The employee directory, our calendar of events, and other intranet services all run on lightweight databases. It would be major overkill to be running these apps at the scale we do on a "real" database like mySQL. This is especially true when you factor in that they're running along side 4 other virtual machines on a single mid-range computer.
At one point we had an outward facing site that ran on an sqlite db for months with only a single reboot required. Obviously, it was very low traffic, but it putted along nicely for what it did.
We have encountered a similar option on an environment with absolutely no writes, and we selected using SQLite.
See my blog post on the subject:
Well, the main assumption which makes this solution theoretically
possible is that our SQLite database is totally read-only. Our server
code should never change it. This would solve any locking problems, as
there are no read locks. We could find nowhere on the internet anyone
saying there is a problem in high-throughput reading of SQLite when
there are no writes - it could be possible!
I think it would depend mostly on what your read/write ratio will be. If it's mostly reading from the database, you may be okay. Multi-user writing in SQLite can be a problem because of how it locks the database.
People speak about concurrency problems, but sqlite has a way to cache incoming requests and have them wait for some time. It doesn't timeout immediately.
I've read things about the default timeout setting begin zero, meaning it times out immediately and that's nonsense. Maybe people didn't adjust this setting?
Depends on the usage of the site. If most of the time you're just reading data, you can pretty much use anything for a DB and cache the data in the application to achieve good performance.
I am using it in a very low traffic web server (it is a genomic database) and I don't have any problems. But there are only SELECT statements, no writing to the DB involved.
To add to an already brilliant answer: Since you are working with a server-less solution in this case, you can say goodbye to replication, or any sort of horizontal scaling of your db, as well as other advanced options. It also isn't the best choice if you have multiple users updating the same exact chunk of information. If you were to shard the database in the future you would have to migrate the data and move to something else. Also if you have a load balancer and multiple systems involved it would be difficult to maintain data centrality if using sqlite. These are just some of the reasons why it isn't recommended. Its great for smaller projects, and great for development.
It seems like with queuing you could also get away with avoiding a lot of the concurrency write problems with SQLite. Instead of writing directly to the sqlite db you would write to a queue that then in turn sequentially writes to the sqlite db in a first in first out mode. Not sure if your application reaches to where you would need this if it would be worth writing or just moving on to client/server DB...but a thought.

What Are the Pros and Cons of Filemaker? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
A potential customer has asked me to look at some promotional flyers for a couple of apps which fall into the contact management / scheduler category. Both use Filemaker as their backend. It looks like these two apps are sold as web apps. At any rate I had not heard of Filemaker in about ten years, so it was surprising to see it pop up twice in the same sitting. I think it started out as a Mac platform db system.
I am more partial to SQL Server, MY SQL, etc, but before make any comments on Filemaker, I'd like to know some of the pros and cons of the system. It must be more than Access for Mac's, but I have never run across it as a player in the client / server or web app arena.
Many thanks
Mike Thomas
Calling Filemaker Pro, Access for the Mac is kind of like saying, Mac OS X is Windows for the Mac. They're both in the same category of software, they're integrated programming environments. It's like you have MySQL, PHP, HTML and your editor put together in a GUI. Comparing the two, they both have pros an cons. Here are the pros and cons of using Filemaker Pro vs PHP/MySQL/HTML in my experience.
Pros:
Easy to get started
Easy to deploy locally, turn on sharing and connect from another client
Cross-platform (Mac OS X, Windows, iOS)
There are many plugins available to extend functionality
Includes starter solutions
Anyone with access can edit the program
For the most part, drag and drop programming
Changing field/database/script names after the fact is free
Has some neat built in tricks like built in graphs, tab controls, web viewers
Built in support for importing exporting excel, cvs, tab-formatted
Cons:
Inflexible: it does what it does well, but if you need more your out of luck for the most part
Expensive compared to the free alternative: It costs about $100 per year for a local user, $150 per developer, if you are using it as a website you need specialized hosting, which tends to cost more. In addition the server part of the software is about $300-$800 a year
The plugins required to extend functionality can be expensive as well
Pretty much only drag and drop programming, you can only use predefined script steps, relationships are made by making a graph
Source control is problem
Lack of scalability
Unable to copy and paste/import or export some items from solutions
Requires the mouse to access functionality
Layout design is fairly static and dated (this is improving with the Filemaker 12 and above)
In general I would say that if you're developing exclusively for the web or a large organization Filemaker Pro probably isn't the best fit. It's difficult to have multiple people developing on the same solution. On the other hand, for a smaller organization in need of a customizable in-house database it could be a great boon. You can build rather complicated applications very quickly with it if your willing to deal with it's deficiencies.
Pros:
It's cheap
Cons:
It's cheap(ly made)
It's non-standard (easy to find
MySQL/Oracle/MSSQL/Access experts
but nobody knows Filemaker)
Using subpar and/or nonstandard technologies only creates technology debt. I've never found a respectable dev that actually enjoyed (or wanted to) using this niche product.
In my opinion this product exists because it is Access for Macs, and it gained enough of a userbase and existing applications that enough people bought each upgrade to keep it in business. There are many products on the market that still exist because it's users are locked in, not because it's a good choice.
I'll admit to bias on this subject -- I work with one of the larger FileMaker development shops out there, and have written the odd book on the subject. We actually employ many respectable developers who love using FMP. I'll try to keep it brief. :-)
FileMaker Pro is a rapid app development tool. It's primarily client-server, though it has some very respectable web publishing capabilities which work well for many applications. It is not SQL-based, but does have ODBC and JDBC interfaces, as well as an XML/HTTP interface.
As far as lock-in, FileMaker Inc has grown sales steadily, with very significant growth in new users who are attracted to the platform's solidity and ease of use.
I think Matt Haughton nailed it -- for the right applications, FMP is simply the best choice going. That said, your customer is looking at apps written in FMP Pro, and you need to evaluate those apps on their own merit. They may be good instances of FMP development, or they may not.
To know more about FMP's fitness for the task, we'd need to hear more about the proposed application and user base. Are these indeed web apps, or client-server? How many users will be using it? Do they work at one or two site, or are they spread across the Internet?
Happy to elaborate further if there's more interest.
FileMaker is designed to integrate very simply with other databases and client applications. If you are looking at building a complicated distributed system, look elsewhere.
FileMaker is NOT good to use as a front-end to another datasource due to the design goals of the External SQL Data Sources (ESS) feature set, and it is NOT good to use as a back-end to anything other that the FM client due to slow and buggy ODBC drivers. The nature of FileMaker's architecture means it doesn't scale very well with complicated solutions regardless of how well it can integrate with other systems.
Here's a developer's perspective on some limitations I've found when teaming FileMaker with other back-ends and ODBC clients:
The ODBC driver is limited, slow, and leaks memory on the client-side. The xdbc_listender.exe has similar memory leaking issues on the server side and will eventually crash when it uses a certain amount of RAM. We have a scheduled script to restart it each night.
FileMaker needs to load all related databases into memory before it can connect to a database. If its a complicated database, opening and closing a connection can be quite slow (1-2 seconds) depending on how it is structured, and more so if the database references tables in other FM databases because they need to be loaded as well. I get around this by creating persistent connections that stay open for the lifetime of the application. Although we try to minimize the number of open connections, we have yet to see a performance hit on the server.
The ODBC driver interprets queries in strange ways. For example I ran a query on 76k rows to UPDATE table_1 SET field_1 = 1 and it took 5 mins to perform the query because I think it split the one query into 46k update queries, one for each row. I know this because I watched it update the rows one-by-one in the FM client. So I don't trust the ODBC driver at all.
Here's another example of 3 different queries and how long they took searching on two date fields:
SELECT id FROM table
WHERE datefield1 = {d '2014-03-26'}
.5 seconds
SELECT id FROM table
WHERE datefield2 = {d '2014-03-26'}
.5 seconds
SELECT id FROM table
WHERE datefield1 = {d '2014-03-26'} OR datefield2 = {d '2014-03-26'}
1 minute 13 seconds!
We had problems with how FileMaker cached data from an SQL Express database. We tried to run the command to clear the cache, but it didn't always work (spent a lot of time investigating this).
FileMaker uses pessimistic locking of records; before editing (from the client or as part of an odbc transaction) FileMaker attempts to lock the row first.
The FileMaker Server service "prefers" being stopped using the Admin Console (though the Admin Console may sometimes be unable to stop it either). If the FileMaker Server service stops any other way (including power loss, via the management console, or even a normal system shutdown) then some of your databases may become corrupt. Same if a client crashes during an operation, or if the network connection is lost suddenly. The solution for a power loss is to write a batch script to try and automate the shutdown, and then buy a UPS and program it to execute your script before the juice runs out. And hope it works. Otherwise backup hourly using the built-in scheduler. Aside: SQL server doesn't have this problem because it can roll back uncommitted transactions.
Performing backups with the built-in scheduler actually suspends operations to the database during backup process. ie, if its a large database, then it might take a minute to backup and users will notice the pause because they wont be able to edit/insert, etc.
If you're using the FileMaker PHP API, take note that you can't use AND and OR together in the same request.
Running an intensive query using the ODBC driver might be fast on its own, but run the same query simultaneously (as in a multi-user environment) and it will slow down by about 300% exponentially. You will run into speed issues if you’re expecting a large volume of intensive queries to hit the database at the same time.
We have found that when the FileMaker ODBC driver says it has finished an update/insert operation, it still does not guarantee the transaction is committed; it appears that FileMaker will continue to hold the changes in the server cache until the auto-enter calculated fields are evaluated/indexed and then it saves to disc, meaning there may be more of a delay until the record is actually committed. So really the ODBC write operations are not always immediate writes, but rather eventual writes. This delay will be especially evident in complicated tables with many calculated fields and triggers.
Calculated fields may slow down execution and reading via the ODBC driver, depending on what is being evaluated. Try to read stored values whenever possible.
Using BLOB containers: Not Recommended. Storing documents such as PDFs in a container field will inflate your database file size, take longer to backup and complicate the retrieval and editing of those files via ODBC. It’s much easier to store files on a network share and write to the file on disk.
If you must use FM as a front-end solution to another database, make sure to carefully read FileMaker's Introduction to External SQL Sources.
Also refer to the the appropriate version FileMaker ODBC Guide found on their website.
Just a few comments on the subject
FileMaker is certainly cheaper than some enterprise solutions in licensing costs. However, the real cost benefit is in development time. The development life-cycle is typically orders of magnitude lower than other enterprise platforms (whatever the licensing costs of those platforms). By this I mean days instead of weeks, or weeks rather than months to develop some feature.
There is a strong argument that FileMaker is Access for the Mac. While this was a valid argument a few years ago, FileMaker has come into its own in recent years. It's worth noting that FileMaker is cross platform and used extensively on Windows as well as Mac. That being said there are still huge similarities and differences between FileMaker and Access, the truth is none of them have any bearing on your situation.
While FileMaker is non-standard it does support live connection to MySQL, MS SQL Server and Oracle.
Also, there are numerous FileMaker developers not as much as more standard platforms, but they are definitely about, if you let me know where you are I can put you in touch with a selection of developers in your area.
The important point I want to make is that in the correct context FileMaker is the best thing in the world at what it does - if you try to do something that it's not meant to do, you'll get stuck. However, it could support offices in 4 locations, it can and is being done.
Before you go and rewrite your system in some other platform you should get in touch with a FileMaker expert and see what they have to say about what you've currently got, writing more details on this site and having non-experts answer positively or negatively won't help you. In the end it has to be a business choice of costs vs. benefits.
No need to list anymore "Cons" - but here is a significant "Pro" - Filemaker Go. Once you have your database setup, download a ipad/iphone app (free for FM12) and run it from a mobile device. The database can be stored locally on the ipad/iphone or synced back to a host PC.
I'm sure this mobile solution is possible elsewhere - but the fundamental point is that an entry-level user (and I mean NO previous database experience) can create an impressive solution within a few weeks.
Personal experience: main database running FM 11 hosted on PC under my desk - 4 researchers scattered across the city collecting data on ipads - all syncing back to my PC. Previous solution was using paper and entering in data by hand.
FileMaker is an interesting app :) It started as an end-user tool and it still is one of very few database apps that a non-programmer can actually use. But somehow FileMaker developers managed to make it very scalable. There's no other platform where one can start with a useful tool and end up with a client-server app that for the whole company. In old days they used to have a splash screen that captured this very idea (I only found an imperfect version):
I.e. something as simple as a file cabinet that can grow quite big.
All FileMaker pros and cons come from its origin. As an end-user tool it's very much unlike other DBMS apps. No SQL. No real programming: scripts are basically macros that repeat user actions in a slightly more general way with variables and some logic. Lots of limitations; e.g. a list view cannot have a sidebar; a dynamic value list is always sorted alphabetically; to open a Save As dialog and read back the file name you'll need a plug-in; and so on. For a programmer this can be very frustrating, because most his assumptions will be wrong. And existing apps written by non-programmers are not exactly paragons of clarity and solid design.
But if you manage to overcome the obstacles you'll find a rather good RAD for client-server, single-user, web, and mobile apps, that stays rather usable over WAN, with such niceties as runtime and kiosk mode.
Having said that, I'm not quite sure about generic contact management and scheduling apps in FileMaker. If this is what they are, then they should be unlocked, so the customer can make changes; or they have to be niche apps that do for the customer what nothing else does.
Filemaker is enormously powerful and versatile. Excellent multi-user support. You can create wonderful solutions in Filemaker with document management, web interface, iphone interface, automated publishing support, scheduled scripts, PDF/Excel/HTML reports, XML support, caller ID record lookup, integration of web data (UPS & Fedex linked to order record for example). Extensible with plugins. It's like being in the Home Depot of data. Don't try to build Amazon; other than that what can't you build with it, and faster app dev than most anywhere else?
It has been more than a year now since I run through FM and use it in developing solutions for various clients. The following are my FM experience:
learning curve is much less than using the hard coded industry standard technology;
it can fit well as to industry standards platforms because of it's ODBC and JDBC connectivity. Your data is not locked in FM and other data format can get in FM;
it fits well as front end and back end solutions.
FM can match enterprise platform having a right database design and deployment i.e. workgroup or department oriented solutions. This is data to it's workgroup owner and make it available for other workgroups or departments;
FM is fits well for rapid application development that employs prototyping;
FM has many more capabilities you therein...
I suggest you try it yourself and I'm sure you'll love the stuff FM can offer!
Happy computing...
A little research has made me think that FileMaker is indeed Access for Mac, but perhaps a little more robust. I worked with Access for years, never really liked it, and am glad to be away from it (I always held a grudge for MSFT killing FoxPro, which I did like).
It is hard for me to imagine it as a good solution for a web based app used by offices in four locations around the country, plus many others logging on from home, etc.
Using it does not make much sense when MySQL, SQL Server, etc are available for the data storage and ASP.NET, PHP, Ruby etc are there for the programming.
Mike Thomas
While the comparisons to "Access for Mac" is inevitable, there are some important distinctions that have to be made.
FileMaker databases can be shared out to more than one person provided 1 of 2 things happen. One, a person on your network opens the DB and shares it from their computer, acting as the host. Two, you buy and install FileMaker server which hosts the DBs.
Also it's been my experience that while FileMaker developers LOVE FM, they're having to learn other technologies because more and more government agencies (my primary employer the past 10 years) are moving off of FM and into SQL Server, Oracle and to some extent Access and open source. FileMaker skills are becoming less and less in demand in the public sector, so getting support for these applications is harder and consequently, more expensive.
That being said, we have a FM server and FM 5.5 clients running an application that has been rock solid for the past 5 years.
i've been using FM for more than a year now. i'm doing and providing solutions for SMBs using the SQL standard for several years. i love those SQL stuff, but just a year a ago i run through FM Pro 9 and have it a try. amazingly, i got all i wanted in just a short time. in my experience as developer, FM Pro impressed me the way it does things.
true enough, FM is not an industry database standard but a good number of its features can compensate to what "standard" is being required of. FM pro has live connectivity to MySQL, MS SQL Server and Oracle. for me, it doesn't make sense to speak about standard if you can move your data around from FM to other platforms and vice-versa.
well, this note can't make that much convincing. it's good to try it for yourself... especially now that FM has its new version 10. believe me... you'll love it...
happy computing.
Two points seem to dominate this discussion and need consideration:
Non-Standard and what Government Agencies are doing.
Let's consider the small business owner or the single user both of whom a creating databases to meet their needs.
Now it doesn't matter what the government is doing, this is your database for your employees. Do what you want (as long as its legal, of course).
Non-Standard, well often this is the best idea since what you want to do works for you. Name your fields and tables as you like and later on rename this as you prefer. Don't try this with dbf or sql... Anyone remember those 'standard' file names bks1999.dbf bks2000.dbf Keep in mind that 'standards' exist because someone else wrote them before you arrived, not because they are the best possible idea.
And yes, there are a lot of 'bad' Filemaker solutions but they are working and supporting hundreds of thousands of people. But try to improve one of these bad solutions and compare that effort to improve a similarly bad dbf solution. A renamed field filters effortlessly through thousands of scripts and scripts in related Filemaker files. In a dbf solution it can become a nightmare as each instance has to be manually retyped.
One real test would be to compare how easily Filemaker can work with SQL, etc. as compared to other applications. That might be interesting. I've never done that but I bet I could create a working file in very little time that works with such data.
I have always said that every developer should use and be familiar with all of the tools.
25 years with Filemaker Pro, 3 years with FoxPro, 2 with 4D, etc.
Lots of comments about FileMaker being non-standard. But what is "standard"? By "standard", many people mean that a database supports Structured Query Language (SQL) (ISO Standard 9075) and FileMaker has and continues to support SQL. How every database engine supports SQL is proprietary to every database. Now it might be open source such as MySQL, but SQL is a standard to support, not the underlying language of how it is accomplished.
When most people talk about databases, they are only talking about the backend tables and schema. The front end user interface is frequently something else. And most of them now render those results as html pages via open standards like PHP. Again, FileMaker fully supports PHP calls and Apache or IIS (depending on which OS platform you are on).
So I would disagree with people saying FileMaker is non-standard.
What is unique about FileMaker is its tight integration between the schema and the User Interface. This is similar to Apple's tight integration between hardware and the Operating system, which has some nice benefits. Interestingly, FileMaker is owned by Apple, but I guess that is another topic.
Generally, FileMaker's User Interface is considerably easier to use than most open standards and most people stick to FileMaker's client User Interface instead of web interfaces. There are still a number of things supported only in FileMaker User Interface that can't be duplicated in a web browser.
FileMaker really makes rapid application development much easier with its close integration of schema and user interface. This makes development cost a whole lot less in most cases.
FileMaker's database services can be spread among up to 3 machines giving it primitive load balancing abilities with web services. While FileMaker easily supports hundreds of users, if you go into thousands of simultaneous users, many SQL only databases (eg Oracle, MS SQL Server, MySQL, Postgres) are designed to better spread out the load across more machines. Basically, if you have high simultaneous transactions, FileMaker is not your solution. For example, a company with many point of sale terminals from all over the county hitting it at the same time.
While FileMaker supports SQL and PHP, using it only that way is a waste of the money spent on the license for the FileMaker User Interface. It would not be a cost effective solution to develop a web front end and pay the full FileMaker license cost for only a backend. So, FileMaker's support of PHP and SQL is best combined with companies that have an in-house solution for staff, but also want to integrate that with their web development team for outside customers.
One last note is that FileMaker's tight integration of schema and User Interface makes security much easier. Obviously you have to set up the groups and users and I usually integrate FileMaker with Active Directory (or Open Directory). But when you use the FileMaker Client and Server connections, turning on encryption security is a single checkbox on the server. FileMaker handles all of the certificates and uses an AES 256bit cipher (at least since version 11, maybe before then too). Currently, the US Government considers that approved for up to and including the first level of Top Secret communications. In typical SQL systems, there is a lot of work to configure security on the database end as well as the user interface end of things and it is much more work than a single checkbox.
FileMaker's target audience has been small to medium sized companies, usually with 5 to 200 users, and it is a well priced product for rapid application development of databases for companies of that size.
And I can't end this comment without commenting on how easy it is to create and deploy a mobile solution on iOS devices like iPads and iPhones. FileMaker Go is a free app for use on these mobile devices and they fully support the same user interface and security. In fact, I am aware of one company that uses FileMaker as a front end interface for their Oracle database simply for access on iPhones. Expect a lot more in the mobile market in the future and FileMaker is clearly targeting mobile users.
Just to add my 2¢ to the already given answers: Everything everyone has written in the voted answers is true about Filemaker. The product is robust enough to warrant both positive and negative opinions.
I'm not a pro enough to speak to your concerns but there are a number of large complex applications written in FMP that you may want to look at. Jungle Software is a good place to start.
The down side to FMP for me as a user of some of those apps is that they come with a stack of files. The runtime of a FMP application isn't packaged as a bundle so it can look a bit complex with a large app. We did some tests a long time back because FMP had a reputation of being slow. At that time (12 years ago) FMP needed to index the db or it was slow but once it was indexed it was as fast as anything else we tested. It's big upside for semi pros is that it is very easy to do basic stuff and end up with working tool. My experience with Access was extremely negative so I wouldn't compare it at all with FMP.
In the end it doesn't really mater what it was written in, if the software does what you want and is stable buy it. If it doesn't don't. It is very easy to get data in and out of FMP so the proprietaryness of the db format doesn't really enter into it.

Resources