Related
I am trying to create a key/value database with 300,000,000 key/value pairs of 8 bytes each (both for the key and the value). The requirement is to have a very fast key/value mechanism which can query about 500,000 entries per second.
I tried BDB, Tokyo DB, Kyoto DB, and levelDB and they all perform very bad when it comes to databases at that size. (Their performance is not even close to their benchmarked rate at 1,000,000 entries).
I cannot store my database in memory because of hardware limitations (32 bit software), so memcached is out of the question.
I cannot use external server software as well (only a database module), and there is no need for multi-user support at all. Of course server software cannot hold 500,000 queries per second from a single endpoint anyways, so that leaves out Redis, Tokyo tyrant, etc.
David Segleau, here. Product Manager for Berkeley DB.
The most common problem with BDB performance is that people don't configure the cache size, leaving it at the default, which is pretty small. The second most common problem is that people write application behavior emulators that do random look-ups (even though their application is not really completely random) which forces them to read data out of cache. The random I/O then takes them down a path of conclusions about performance that are not based on the simulated application rather than the actual application behavior.
From your description, I'm not sure if your running into these common problems or maybe into something else entirely. In any case, our experience is that Berkeley DB tends to perform and scale very well. We'd be happy to help you identify any bottlenecks and improve your BDB application throughput. The best place to get help in this regard would be on the BDB forums at: http://forums.oracle.com/forums/forum.jspa?forumID=271. When you post to the forum it would be useful to show the critical query segments of your application code and the db_stat output showing the performance of the database environment.
It's likely that you will want to use BDB HA/Replication in order to load balance the queries across multiple servers. 500K queries/second is probably going to require a larger multi-core server or a series of smaller replicated servers. We've frequently seen BDB applications with 100-200K queries/second on commodity hardware, but 500K queries per second on 300M records in a 32-bit application is likely going to require some careful tuning. I'd suggest focusing on optimizing the performance of a the queries on the BDB application running on a single node, and then use HA to distribute that load across multiple systems in order to scale your query/second throughput.
I hope that helps.
Good luck with your application.
Regards,
Dave
I found a good benchmark comparison web page that basically compares 5 renowned databases:
LevelDB
Kyoto TreeDB
SQLite3
MDB
BerkeleyDB
You should check it out before making your choice: http://symas.com/mdb/microbench/.
P.S - I know you've already tested them, but you should also consider that your configuration for each of these tests was not optimized as the benchmark shows otherwise.
Try ZooLib.
It provides a database with a C++ API, that was originally written for a high-performance multimedia database for educational institutions called Knowledge Forum. It could handle 3,000 simultaneous Mac and Windows clients (also written in ZooLib - it's a cross-platform application framework), all of them streaming audio, video and working with graphically rich documents created by the teachers and students.
It has two low-level APIs for actually writing your bytes to disk. One is very fast but is not fault-tolerant. The other is fault-tolerant but not as fast.
I'm one of ZooLib's developers, but I don't have much experience with ZooLib's database component. There is also no documentation - you'd have to read the source to figure out how it works. That's my own damn fault, as I took on the job of writing ZooLib's manual over ten years ago, but barely started it.
ZooLib's primarily developer Andy Green is a great guy and always happy to answer questions. What I suggest you do is subscribe to ZooLib's developer list at SourceForge then ask on the list how to use the database. Most likely Andy will answer you himself but maybe one of our other developers will.
ZooLib is Open Source under the MIT License, and is really high-quality, mature code. It has been under continuous development since 1990 or so, and was placed in Open Source in 2000.
Don't be concerned that we haven't released a tarball since 2003. We probably should, as this leads lots of potential users to think it's been abandoned, but it is very actively used and maintained. Just get the source from Subversion.
Andy is a self-employed consultant. If you don't have time but you do have a budget, he would do a very good job of writing custom, maintainable top-quality C++ code to suit your needs.
I would too, if it were any part of ZooLib other than the database, which as I said I am unfamiliar with. I've done a lot of my own consulting work with ZooLib's UI framework.
300 M * 8 bytes = 2.4GB. That will probably fit into memory (if the OS does not restrict the address space to 31 bits)
Since you'll also need to handle overflow, (either by a rehashing scheme or by chaining) memory gets even tighter, for linear probing you probably need > 400M slots, chaining will increase the sizeof item to 12 bytes (bit fiddling might gain you a few bits). That would increase the total footprint to circa 3.6 GB.
In any case you will need a specially crafted kernel that restricts it's own "reserved" address space to a few hundred MB. Not impossible, but a major operation. Escaping to a disk-based thing would be too slow, in all cases. (PAE could save you, but it is tricky)
IMHO your best choice would be to migrate to a 64 bits platform.
500,000 entries per second without holding the working set in memory? Wow.
In the general case this is not possible using HDDs and even difficult SSDs.
Have you any locality properties that might help to make the task a bit easier? What kind of queries do you have?
We use Redis. Written in C, its only slightly more complicated than memcached by design. Never tried to use that many rows but for us latency is very important and it handles those latencies well and lets us store the data in the disk
Here is a bench mark blog entry, comparing redis and memcached.
Berkely DB could do it for you.
I acheived 50000 inserts per second about 8 years ago and a final database of 70 billion records.
I have to develop a new (desktop) app for a small business. This business currently has an Access database with millions of records. The file size is about 1.5 GB. The boss told me that searching on this DB is very slow. The DB consists of a single table with about 20 fields.
I also think the overall DB design isn't great. I thought to use another DB server with a new design to improve both performance and efficiency.
Considering this is a relatively small business, I don't want to spend much for a DB license, so I want to ask you what would you do.
Continue to use Access, maybe improving and optimizing the DB in some way
Buy a DB server license (in this case, which one?)
? (any idea?)
Things like SQL Server Express, MySQL and PostgreSQL are available for free, no license purchase necessary.
For improving search speeds, you will probably also want to look at things like what indexes are defined for the table, what exactly searches are doing, et cetera.
2nd the recommendation for Firebird. We've been using it for about 5 years and never had an issue. Cross platform, embedded & server deployments... brilliant. Oh, and Free as in Beer. Mozilla Public LIcense.
Put me down as another recommendation for Firebird. We use it with our commercial Point of Sale product. We have it installed at over 1,000 sites, with databases as large as 40+ Gigabytes. It's fast, stable, simple, easy to deploy, and requires no management.
Your could replace the Access Database with a SQL Server database that will scale well moving forward. You can use SQL Server Express which is free and supports databases up to 4Gb I beleive.
SQL Server Express. Free for database size up to 10 GB.
SQL Server Express is a perfect fit for this. http://www.microsoft.com/express/database/
I warmly recommend MySQL. Its sometimes free and is easy to install on both Windows and linux.
There are also a lots of great free tools to manage its content like tables, users, indexes etc...
I would look into if the big table needs to be broken up into smaller ones(rarely needed, but still) and also what indexes are on it. And for Database software I would recommend PostgreSQL. It is free, easy to use(and I consider it easy to setup, though others beg to differ), and it is fast enough for enterprise applications.
You can take a look at Firebird
Firebird is one of the best database for desktop application and will allways be free.
Some tools exist to convert database from access to firebird.
I also recommend Firebird.
Its key advantages for your scenario are (from top of my head):
embedded version. You can ship it with your application - no separate installation kit needed, no .NET dependencies etc.
later on you can scale seamlessly to the full client-server model. No code changes required.
very small footprint
the entire database is stored in a single file. Much easier to deploy compared with other solutions.
you can have your server on any platform you want: Windows, Linux, MacOSX etc. Of course, you can have your client also on the same platforms but since you mentioned Access, I suppose that you have a Windows application.
no need for server administration. It just works.
I recommend PostgreSQL as well (especially as an alternative to MySQL)
No-one ever seems to mention it, but Oracle also do a free-as-in-free-beer version of their database: Oracle Express Edition (aka XE). It is limited to 1 CPU, 1GB RAM and 4GB of user data but that sounds plenty big enough for your application.
As for your database design, just one table sounds more like a spreadsheet than a database application. Probably you have lots of denormalised data. Splitting those out into smaller de-duplicated tables might well speed up certain queries. However if you only have twenty columns there may not be a lot of scope for tuning.
As for recommendation, the question is, which products do you know? If you have a familiarity with Access then I suggest you try to optimize your existing database. I have worked with Access databases which store several million rows and they performed well enough. After all, there is no guarantee that moving the same design to a different product will automatically make things run x times faster. Another advantage of Access as a tool is that it comes with a built-in front-end tool. If you move from Access you may need to think about re-building your application.
Wow, folks have a lot of platform recommendations for you.
Let me just say this.
If you feel like there are design issues as well as platform issues, why not try the design changes first? These are changes you are likely to make anyway.
If they make no performance difference in Access, you are no worse off, since these changes improve maintainability on any platform.
Then you can try other platforms with the knowledge that you have a solid design, and that you have not wasted any time.
SQLite can also be a candidate.
I would recommand not going with express anything - yep your small but what if the busness takes off and need much higher loads in the future - surely thats the way you want it to go ?? do not want to have to switch db layers / run 2 ..
I would look at MySQL or Firebird (used to borland interbase) both very high quality.
I have experience with various versions of SQL Server and Oracle and my general sense is that, deserving or not, Oracle probably has a better reputation for being the preferred database although I sense MS has been closing ground for some time and sometimes even claiming that it outperforms Oracle in situations x,y, and whatever, a close cousin to z.
A friend of mine who works for the gov't has told me that "they can't use either of those databases because they aren't "robust enough" - or something to that effect and that they had to use IBM's DB2 database.
I'm would expect this to be a very difficult question to answer in such broad terms, but could someone just give me an idea as to which db products are generally regarded as being powerful enough (or whatever word you want to apply) for large size high volume enterprises?
If you want to throw in your perspective of how reality differs from general public perception, I'd be interested, too.
My gut tells me that either of these three products could probably be used o successfully implement the largest of enterprises if you know how to design and implement it, but again, I am looking for a little bit of what one product might have over the other vs. public perceptions, deserved or undeserved.
..and if you cancel this question, help me understand why subjective is a tag.:-)
Your gut is largely correct and a good argument can be made for any of the three. Keep in mind that what is used in any given Government, Corporate or other enterprise environment is largely a function of who got to the decision maker first or who the decision maker trusts. That is, the person setting policy may not be a domain expert at all but nonetheless has the power to dictate policy because of their position in the hierarchy. If a given government department will only use DB2, I'd suggest that you take that "evidence" that DB2 is the "best" or "most robust" database with a very large grain of salt.
I have worked many contracts and in the past 10 years I have seen Oracle and PostgreSQL more than anything.
IBM's DB2 is usually used in large enterprise data farms. For scale I have had Oracle systems handle Petabytes of data.
I have had to build systems specifically for database migration and upward scaling. I can tell you sometimes it is a matter of being locked in to a current Database System by application and operations. Meaning it is more cost effective to scale up the system/database in place than; say migrate your Oracle DBs to DB2.
The SQL engine is just one part of the equation, RAM, Disk IO subsystem (SAN), CPUs, fibre etc etc all have to be included.
All of the different engines have different things to offer like RAC, compression of data and backups, partitioning, index/materialized views etc etc etc
Design and proper indexing is also very important...a crappy design on any of these systems will be a crappy system in general
I think our company probably qualifies as one of the "big boys". We are one of the largest solar manufacturing companies in the world. We are exclusively a SQL Server 2005 and 2008 shop. Within the last year we evaluated Oracle as a possible replacement and rejected it. SQL Server manages everything from plant floor operations to ERP. We couldn't be happier with it.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
A potential customer has asked me to look at some promotional flyers for a couple of apps which fall into the contact management / scheduler category. Both use Filemaker as their backend. It looks like these two apps are sold as web apps. At any rate I had not heard of Filemaker in about ten years, so it was surprising to see it pop up twice in the same sitting. I think it started out as a Mac platform db system.
I am more partial to SQL Server, MY SQL, etc, but before make any comments on Filemaker, I'd like to know some of the pros and cons of the system. It must be more than Access for Mac's, but I have never run across it as a player in the client / server or web app arena.
Many thanks
Mike Thomas
Calling Filemaker Pro, Access for the Mac is kind of like saying, Mac OS X is Windows for the Mac. They're both in the same category of software, they're integrated programming environments. It's like you have MySQL, PHP, HTML and your editor put together in a GUI. Comparing the two, they both have pros an cons. Here are the pros and cons of using Filemaker Pro vs PHP/MySQL/HTML in my experience.
Pros:
Easy to get started
Easy to deploy locally, turn on sharing and connect from another client
Cross-platform (Mac OS X, Windows, iOS)
There are many plugins available to extend functionality
Includes starter solutions
Anyone with access can edit the program
For the most part, drag and drop programming
Changing field/database/script names after the fact is free
Has some neat built in tricks like built in graphs, tab controls, web viewers
Built in support for importing exporting excel, cvs, tab-formatted
Cons:
Inflexible: it does what it does well, but if you need more your out of luck for the most part
Expensive compared to the free alternative: It costs about $100 per year for a local user, $150 per developer, if you are using it as a website you need specialized hosting, which tends to cost more. In addition the server part of the software is about $300-$800 a year
The plugins required to extend functionality can be expensive as well
Pretty much only drag and drop programming, you can only use predefined script steps, relationships are made by making a graph
Source control is problem
Lack of scalability
Unable to copy and paste/import or export some items from solutions
Requires the mouse to access functionality
Layout design is fairly static and dated (this is improving with the Filemaker 12 and above)
In general I would say that if you're developing exclusively for the web or a large organization Filemaker Pro probably isn't the best fit. It's difficult to have multiple people developing on the same solution. On the other hand, for a smaller organization in need of a customizable in-house database it could be a great boon. You can build rather complicated applications very quickly with it if your willing to deal with it's deficiencies.
Pros:
It's cheap
Cons:
It's cheap(ly made)
It's non-standard (easy to find
MySQL/Oracle/MSSQL/Access experts
but nobody knows Filemaker)
Using subpar and/or nonstandard technologies only creates technology debt. I've never found a respectable dev that actually enjoyed (or wanted to) using this niche product.
In my opinion this product exists because it is Access for Macs, and it gained enough of a userbase and existing applications that enough people bought each upgrade to keep it in business. There are many products on the market that still exist because it's users are locked in, not because it's a good choice.
I'll admit to bias on this subject -- I work with one of the larger FileMaker development shops out there, and have written the odd book on the subject. We actually employ many respectable developers who love using FMP. I'll try to keep it brief. :-)
FileMaker Pro is a rapid app development tool. It's primarily client-server, though it has some very respectable web publishing capabilities which work well for many applications. It is not SQL-based, but does have ODBC and JDBC interfaces, as well as an XML/HTTP interface.
As far as lock-in, FileMaker Inc has grown sales steadily, with very significant growth in new users who are attracted to the platform's solidity and ease of use.
I think Matt Haughton nailed it -- for the right applications, FMP is simply the best choice going. That said, your customer is looking at apps written in FMP Pro, and you need to evaluate those apps on their own merit. They may be good instances of FMP development, or they may not.
To know more about FMP's fitness for the task, we'd need to hear more about the proposed application and user base. Are these indeed web apps, or client-server? How many users will be using it? Do they work at one or two site, or are they spread across the Internet?
Happy to elaborate further if there's more interest.
FileMaker is designed to integrate very simply with other databases and client applications. If you are looking at building a complicated distributed system, look elsewhere.
FileMaker is NOT good to use as a front-end to another datasource due to the design goals of the External SQL Data Sources (ESS) feature set, and it is NOT good to use as a back-end to anything other that the FM client due to slow and buggy ODBC drivers. The nature of FileMaker's architecture means it doesn't scale very well with complicated solutions regardless of how well it can integrate with other systems.
Here's a developer's perspective on some limitations I've found when teaming FileMaker with other back-ends and ODBC clients:
The ODBC driver is limited, slow, and leaks memory on the client-side. The xdbc_listender.exe has similar memory leaking issues on the server side and will eventually crash when it uses a certain amount of RAM. We have a scheduled script to restart it each night.
FileMaker needs to load all related databases into memory before it can connect to a database. If its a complicated database, opening and closing a connection can be quite slow (1-2 seconds) depending on how it is structured, and more so if the database references tables in other FM databases because they need to be loaded as well. I get around this by creating persistent connections that stay open for the lifetime of the application. Although we try to minimize the number of open connections, we have yet to see a performance hit on the server.
The ODBC driver interprets queries in strange ways. For example I ran a query on 76k rows to UPDATE table_1 SET field_1 = 1 and it took 5 mins to perform the query because I think it split the one query into 46k update queries, one for each row. I know this because I watched it update the rows one-by-one in the FM client. So I don't trust the ODBC driver at all.
Here's another example of 3 different queries and how long they took searching on two date fields:
SELECT id FROM table
WHERE datefield1 = {d '2014-03-26'}
.5 seconds
SELECT id FROM table
WHERE datefield2 = {d '2014-03-26'}
.5 seconds
SELECT id FROM table
WHERE datefield1 = {d '2014-03-26'} OR datefield2 = {d '2014-03-26'}
1 minute 13 seconds!
We had problems with how FileMaker cached data from an SQL Express database. We tried to run the command to clear the cache, but it didn't always work (spent a lot of time investigating this).
FileMaker uses pessimistic locking of records; before editing (from the client or as part of an odbc transaction) FileMaker attempts to lock the row first.
The FileMaker Server service "prefers" being stopped using the Admin Console (though the Admin Console may sometimes be unable to stop it either). If the FileMaker Server service stops any other way (including power loss, via the management console, or even a normal system shutdown) then some of your databases may become corrupt. Same if a client crashes during an operation, or if the network connection is lost suddenly. The solution for a power loss is to write a batch script to try and automate the shutdown, and then buy a UPS and program it to execute your script before the juice runs out. And hope it works. Otherwise backup hourly using the built-in scheduler. Aside: SQL server doesn't have this problem because it can roll back uncommitted transactions.
Performing backups with the built-in scheduler actually suspends operations to the database during backup process. ie, if its a large database, then it might take a minute to backup and users will notice the pause because they wont be able to edit/insert, etc.
If you're using the FileMaker PHP API, take note that you can't use AND and OR together in the same request.
Running an intensive query using the ODBC driver might be fast on its own, but run the same query simultaneously (as in a multi-user environment) and it will slow down by about 300% exponentially. You will run into speed issues if you’re expecting a large volume of intensive queries to hit the database at the same time.
We have found that when the FileMaker ODBC driver says it has finished an update/insert operation, it still does not guarantee the transaction is committed; it appears that FileMaker will continue to hold the changes in the server cache until the auto-enter calculated fields are evaluated/indexed and then it saves to disc, meaning there may be more of a delay until the record is actually committed. So really the ODBC write operations are not always immediate writes, but rather eventual writes. This delay will be especially evident in complicated tables with many calculated fields and triggers.
Calculated fields may slow down execution and reading via the ODBC driver, depending on what is being evaluated. Try to read stored values whenever possible.
Using BLOB containers: Not Recommended. Storing documents such as PDFs in a container field will inflate your database file size, take longer to backup and complicate the retrieval and editing of those files via ODBC. It’s much easier to store files on a network share and write to the file on disk.
If you must use FM as a front-end solution to another database, make sure to carefully read FileMaker's Introduction to External SQL Sources.
Also refer to the the appropriate version FileMaker ODBC Guide found on their website.
Just a few comments on the subject
FileMaker is certainly cheaper than some enterprise solutions in licensing costs. However, the real cost benefit is in development time. The development life-cycle is typically orders of magnitude lower than other enterprise platforms (whatever the licensing costs of those platforms). By this I mean days instead of weeks, or weeks rather than months to develop some feature.
There is a strong argument that FileMaker is Access for the Mac. While this was a valid argument a few years ago, FileMaker has come into its own in recent years. It's worth noting that FileMaker is cross platform and used extensively on Windows as well as Mac. That being said there are still huge similarities and differences between FileMaker and Access, the truth is none of them have any bearing on your situation.
While FileMaker is non-standard it does support live connection to MySQL, MS SQL Server and Oracle.
Also, there are numerous FileMaker developers not as much as more standard platforms, but they are definitely about, if you let me know where you are I can put you in touch with a selection of developers in your area.
The important point I want to make is that in the correct context FileMaker is the best thing in the world at what it does - if you try to do something that it's not meant to do, you'll get stuck. However, it could support offices in 4 locations, it can and is being done.
Before you go and rewrite your system in some other platform you should get in touch with a FileMaker expert and see what they have to say about what you've currently got, writing more details on this site and having non-experts answer positively or negatively won't help you. In the end it has to be a business choice of costs vs. benefits.
No need to list anymore "Cons" - but here is a significant "Pro" - Filemaker Go. Once you have your database setup, download a ipad/iphone app (free for FM12) and run it from a mobile device. The database can be stored locally on the ipad/iphone or synced back to a host PC.
I'm sure this mobile solution is possible elsewhere - but the fundamental point is that an entry-level user (and I mean NO previous database experience) can create an impressive solution within a few weeks.
Personal experience: main database running FM 11 hosted on PC under my desk - 4 researchers scattered across the city collecting data on ipads - all syncing back to my PC. Previous solution was using paper and entering in data by hand.
FileMaker is an interesting app :) It started as an end-user tool and it still is one of very few database apps that a non-programmer can actually use. But somehow FileMaker developers managed to make it very scalable. There's no other platform where one can start with a useful tool and end up with a client-server app that for the whole company. In old days they used to have a splash screen that captured this very idea (I only found an imperfect version):
I.e. something as simple as a file cabinet that can grow quite big.
All FileMaker pros and cons come from its origin. As an end-user tool it's very much unlike other DBMS apps. No SQL. No real programming: scripts are basically macros that repeat user actions in a slightly more general way with variables and some logic. Lots of limitations; e.g. a list view cannot have a sidebar; a dynamic value list is always sorted alphabetically; to open a Save As dialog and read back the file name you'll need a plug-in; and so on. For a programmer this can be very frustrating, because most his assumptions will be wrong. And existing apps written by non-programmers are not exactly paragons of clarity and solid design.
But if you manage to overcome the obstacles you'll find a rather good RAD for client-server, single-user, web, and mobile apps, that stays rather usable over WAN, with such niceties as runtime and kiosk mode.
Having said that, I'm not quite sure about generic contact management and scheduling apps in FileMaker. If this is what they are, then they should be unlocked, so the customer can make changes; or they have to be niche apps that do for the customer what nothing else does.
Filemaker is enormously powerful and versatile. Excellent multi-user support. You can create wonderful solutions in Filemaker with document management, web interface, iphone interface, automated publishing support, scheduled scripts, PDF/Excel/HTML reports, XML support, caller ID record lookup, integration of web data (UPS & Fedex linked to order record for example). Extensible with plugins. It's like being in the Home Depot of data. Don't try to build Amazon; other than that what can't you build with it, and faster app dev than most anywhere else?
It has been more than a year now since I run through FM and use it in developing solutions for various clients. The following are my FM experience:
learning curve is much less than using the hard coded industry standard technology;
it can fit well as to industry standards platforms because of it's ODBC and JDBC connectivity. Your data is not locked in FM and other data format can get in FM;
it fits well as front end and back end solutions.
FM can match enterprise platform having a right database design and deployment i.e. workgroup or department oriented solutions. This is data to it's workgroup owner and make it available for other workgroups or departments;
FM is fits well for rapid application development that employs prototyping;
FM has many more capabilities you therein...
I suggest you try it yourself and I'm sure you'll love the stuff FM can offer!
Happy computing...
A little research has made me think that FileMaker is indeed Access for Mac, but perhaps a little more robust. I worked with Access for years, never really liked it, and am glad to be away from it (I always held a grudge for MSFT killing FoxPro, which I did like).
It is hard for me to imagine it as a good solution for a web based app used by offices in four locations around the country, plus many others logging on from home, etc.
Using it does not make much sense when MySQL, SQL Server, etc are available for the data storage and ASP.NET, PHP, Ruby etc are there for the programming.
Mike Thomas
While the comparisons to "Access for Mac" is inevitable, there are some important distinctions that have to be made.
FileMaker databases can be shared out to more than one person provided 1 of 2 things happen. One, a person on your network opens the DB and shares it from their computer, acting as the host. Two, you buy and install FileMaker server which hosts the DBs.
Also it's been my experience that while FileMaker developers LOVE FM, they're having to learn other technologies because more and more government agencies (my primary employer the past 10 years) are moving off of FM and into SQL Server, Oracle and to some extent Access and open source. FileMaker skills are becoming less and less in demand in the public sector, so getting support for these applications is harder and consequently, more expensive.
That being said, we have a FM server and FM 5.5 clients running an application that has been rock solid for the past 5 years.
i've been using FM for more than a year now. i'm doing and providing solutions for SMBs using the SQL standard for several years. i love those SQL stuff, but just a year a ago i run through FM Pro 9 and have it a try. amazingly, i got all i wanted in just a short time. in my experience as developer, FM Pro impressed me the way it does things.
true enough, FM is not an industry database standard but a good number of its features can compensate to what "standard" is being required of. FM pro has live connectivity to MySQL, MS SQL Server and Oracle. for me, it doesn't make sense to speak about standard if you can move your data around from FM to other platforms and vice-versa.
well, this note can't make that much convincing. it's good to try it for yourself... especially now that FM has its new version 10. believe me... you'll love it...
happy computing.
Two points seem to dominate this discussion and need consideration:
Non-Standard and what Government Agencies are doing.
Let's consider the small business owner or the single user both of whom a creating databases to meet their needs.
Now it doesn't matter what the government is doing, this is your database for your employees. Do what you want (as long as its legal, of course).
Non-Standard, well often this is the best idea since what you want to do works for you. Name your fields and tables as you like and later on rename this as you prefer. Don't try this with dbf or sql... Anyone remember those 'standard' file names bks1999.dbf bks2000.dbf Keep in mind that 'standards' exist because someone else wrote them before you arrived, not because they are the best possible idea.
And yes, there are a lot of 'bad' Filemaker solutions but they are working and supporting hundreds of thousands of people. But try to improve one of these bad solutions and compare that effort to improve a similarly bad dbf solution. A renamed field filters effortlessly through thousands of scripts and scripts in related Filemaker files. In a dbf solution it can become a nightmare as each instance has to be manually retyped.
One real test would be to compare how easily Filemaker can work with SQL, etc. as compared to other applications. That might be interesting. I've never done that but I bet I could create a working file in very little time that works with such data.
I have always said that every developer should use and be familiar with all of the tools.
25 years with Filemaker Pro, 3 years with FoxPro, 2 with 4D, etc.
Lots of comments about FileMaker being non-standard. But what is "standard"? By "standard", many people mean that a database supports Structured Query Language (SQL) (ISO Standard 9075) and FileMaker has and continues to support SQL. How every database engine supports SQL is proprietary to every database. Now it might be open source such as MySQL, but SQL is a standard to support, not the underlying language of how it is accomplished.
When most people talk about databases, they are only talking about the backend tables and schema. The front end user interface is frequently something else. And most of them now render those results as html pages via open standards like PHP. Again, FileMaker fully supports PHP calls and Apache or IIS (depending on which OS platform you are on).
So I would disagree with people saying FileMaker is non-standard.
What is unique about FileMaker is its tight integration between the schema and the User Interface. This is similar to Apple's tight integration between hardware and the Operating system, which has some nice benefits. Interestingly, FileMaker is owned by Apple, but I guess that is another topic.
Generally, FileMaker's User Interface is considerably easier to use than most open standards and most people stick to FileMaker's client User Interface instead of web interfaces. There are still a number of things supported only in FileMaker User Interface that can't be duplicated in a web browser.
FileMaker really makes rapid application development much easier with its close integration of schema and user interface. This makes development cost a whole lot less in most cases.
FileMaker's database services can be spread among up to 3 machines giving it primitive load balancing abilities with web services. While FileMaker easily supports hundreds of users, if you go into thousands of simultaneous users, many SQL only databases (eg Oracle, MS SQL Server, MySQL, Postgres) are designed to better spread out the load across more machines. Basically, if you have high simultaneous transactions, FileMaker is not your solution. For example, a company with many point of sale terminals from all over the county hitting it at the same time.
While FileMaker supports SQL and PHP, using it only that way is a waste of the money spent on the license for the FileMaker User Interface. It would not be a cost effective solution to develop a web front end and pay the full FileMaker license cost for only a backend. So, FileMaker's support of PHP and SQL is best combined with companies that have an in-house solution for staff, but also want to integrate that with their web development team for outside customers.
One last note is that FileMaker's tight integration of schema and User Interface makes security much easier. Obviously you have to set up the groups and users and I usually integrate FileMaker with Active Directory (or Open Directory). But when you use the FileMaker Client and Server connections, turning on encryption security is a single checkbox on the server. FileMaker handles all of the certificates and uses an AES 256bit cipher (at least since version 11, maybe before then too). Currently, the US Government considers that approved for up to and including the first level of Top Secret communications. In typical SQL systems, there is a lot of work to configure security on the database end as well as the user interface end of things and it is much more work than a single checkbox.
FileMaker's target audience has been small to medium sized companies, usually with 5 to 200 users, and it is a well priced product for rapid application development of databases for companies of that size.
And I can't end this comment without commenting on how easy it is to create and deploy a mobile solution on iOS devices like iPads and iPhones. FileMaker Go is a free app for use on these mobile devices and they fully support the same user interface and security. In fact, I am aware of one company that uses FileMaker as a front end interface for their Oracle database simply for access on iPhones. Expect a lot more in the mobile market in the future and FileMaker is clearly targeting mobile users.
Just to add my 2¢ to the already given answers: Everything everyone has written in the voted answers is true about Filemaker. The product is robust enough to warrant both positive and negative opinions.
I'm not a pro enough to speak to your concerns but there are a number of large complex applications written in FMP that you may want to look at. Jungle Software is a good place to start.
The down side to FMP for me as a user of some of those apps is that they come with a stack of files. The runtime of a FMP application isn't packaged as a bundle so it can look a bit complex with a large app. We did some tests a long time back because FMP had a reputation of being slow. At that time (12 years ago) FMP needed to index the db or it was slow but once it was indexed it was as fast as anything else we tested. It's big upside for semi pros is that it is very easy to do basic stuff and end up with working tool. My experience with Access was extremely negative so I wouldn't compare it at all with FMP.
In the end it doesn't really mater what it was written in, if the software does what you want and is stable buy it. If it doesn't don't. It is very easy to get data in and out of FMP so the proprietaryness of the db format doesn't really enter into it.
The IT department where I work is trying to move to 100% virtualized servers, with all the data stored on a SAN. They haven't done it yet, but the plan eventually calls for moving the existing physical SQL Server machines to virtual servers as well.
A few months ago I attended the Heroes Happen Here launch event, and in one of the SQL Server sessions the speaker mentioned in passing that this is not a good idea for production systems.
So I'm looking for a few things:
What are the specific reasons why this is or is not a good idea? I need references, or don't bother responding. I could come up with a vague "I/O bound" response on my own via google.
The HHH speaker recollection alone probably won't convince our IT department to change their minds. Can anyone point me directly to something more authoritative? And by "directly", I mean something more specific than just a vague Books OnLine comment. Please narrow it down a little.
I can say this from personal experience because I am dealing with this very problem as we speak. The place I am currently working as a contractor has this type of environment for their SQL Server development systems. I am trying to develop a fairly modest B.I. system on this environment and really struggling with the performance issues.
TLB misses and emulated I/O are very slow on a naive virtual machine. If your O/S has paravirtualisation support (which is still not a mature technology on Windows) you use paravirtualised I/O (essentially a device driver that hooks into an API in the VM). Recent versions of the Opteron have support for nested page tables, which removes the need to emulate the MMU in software (which is really slow).
Thus, applications that run over large data sets and do lots of I/O like (say) ETL processes trip over the achilles heel of virtualisation. If you have anything like a data warehouse system that might be hard on memory or Disk I/O you should consider something else. For a simple transactional application they are probably O.K.
Put in perspective the systems I am using are running on blades (an IBM server) on a SAN with 4x 2gbit F/C links. This is a mid-range SAN. The VM has 4GB of RAM IIRC and now two virtual CPUs. At its best (when the SAN is quiet) this is still only half of the speed of my XW9300, which has 5 SCSI disks (system, tempdb, logs, data, data) on 1 U320 bus and 4GB of RAM.
Your mileage may vary, but I'd recommend going with workstation systems like the one I described for developing anything I/O heavy in preference to virtual servers on a SAN.
Unless your resource usage requirements are beyond this sort of kit (in which case they are well beyond a virtual server anyway) this is a much better solution. The hardware is not that expensive - certainly much cheaper than a SAN, blade chassis and VMWare licencing. SQL Server developer edition comes with V.S. Pro and above.
This also has the benefit that your development team is forced to deal with deployment right from the word go - you have to come up with an architecture that's easy to 'one-click' deploy. This is not as hard as it sounds. Redgate SQL Compare Pro is your friend here. Your developers also get a basic working knowledge of database administration.
A quick trip onto HP's website got me a list price of around $4,600 for an XW8600 (their current xeon-based model) with a quad-core xeon chip, 4GB of RAM and 1x146 and 4x73GB 15k SAS hard disks. Street price will probably be somewhat less. Compare this to the price for a SAN, blade chassis and VMware licensing and the cost of backup for that setup. For backup you can provide a network share with backup where people can drop compressed DB backup files as necessary.
EDIT: This whitepaper on AMD's web-site discusses some benchmarks on a VM. From the benchmarks in the back, heavy I/O and MMU workload really clobber VM performance. Their benchmark (to be taken with a grain of salt as it is a vendor supplied statistic) suggests a 3.5x speed penalty on an OLTP benchmark. While this is vendor supplied one should bear in mind:
It benchmarks naive virtualisation
and compares it to a
para-virtualised solution, not
bare-metal performance.
An OLTP benchmark will have a more
random-access I/O workload, and will
spend more time waiting for disk
seeks. A more sequential disk
access pattern (characteristic of
data warehouse queries) will have a
higher penalty, and a memory-heavy
operation (SSAS, for example, is a
biblical memory hog) that has a
large number of TLB misses will also
incur additional penalties. This
means that the slow-downs on this
type of processing would probably be
more pronounced than the OLTP
benchmark penalty cited in the whitepaper.
What we have seen here is that TLB misses and I/O are very expensive on a VM. A good architecture with paravirtualised drivers and hardware support in the MMU will mitigate some or all of this. However, I believe that Windows Server 2003 does not support paravirtualisation at all, and I'm not sure what level of support is delivered in Windows 2008 server. It has certainly been my experience that a VM will radically slow down a server when working on an ETL process and SSAS cube builds compared to relatively modest spec bare-metal hardware.
SAN - of course, and clustering, but regarding Virtualization - you will take a Performance Hit (may or may not be worth it to you):
http://blogs.technet.com/andrew/archive/2008/05/07/virtualized-sql-server.aspx
http://sswug.org has had some notes about it in their daily newsletter lately
I wanted to add this series of articles by Brent Ozar:
Why Your Sysadmin Wants to Virtualize Your Servers
Why Would You Virtualize SQL Server?
Reasons Why You Shouldn't Virtualize SQL Server
It's not exactly authoritative in the sense I was hoping for (coming from the team that builds the server, or an official manual of some kind), but Brent Ozar is pretty well respected and I think he does a great job covering all the issues here.
We are running a payroll system for 900+ people on VMWare with no problems. This has been in production for 10 months. It's a medium sized load as far as DB goes, and we pre-allocated drive space in VM to prevent IO issues. You have to defrag both the VM Host and the VM slice on a regular basis in order to maintain acceptable performance.
Here's some VMWARE testing on it..
http://www.vmware.com/files/pdf/SQLServerWorkloads.pdf
Granted, they do not compare it to physical machines. But, you could probably do similar testing with the tools they used for your environment.
We currently run SQL Server 2005 in a VMWARE environment. BUT, it is a very lightly loaded database and it is great. Runs with no problems.
As most have pointed out, it will depend on your database load.
Maybe you can convince the IT Department to do some good testing before blindly implementing.
No, I can't point to any specific tests or anything like that, but I can say from experience that putting your production database server on a virtual machine is a bad idea, especially if it has a large load.
It's fine for development. Possibly even testing (on the theory that if it runs fine under load on virtual box, it's going to run fine on prodcution) but not in production.
It's common sense really. Do you want your hardware running two operating systems and your sql server or one operating system and sql server?
Edit:
My experience biased my response. I have worked with large databases under heavy constant load. If you have a smaller database under light load, virtualization may work fine for you.
There is some information concerning this in Conor Cunningham's blog article Database Virtualization - The Dirty Little Secret Nobody is Talking About.... To quote:
Within the server itself, there is suprisingly little knowledge of a lot of things in this area that are important to performance. SQL Server's core engine assumes things like:
all CPUs are equally powerful
all CPUs process instructions at about the same rate.
a flush to disk should probably happen in a bounded amount of time.
And the post goes on elaborating these issues somewhat further also. I think a good read considering the scarcity of available information considering this issue in general.
Note there are some specialty virtualization products out there that are made for databases that might be worth looking into instead of a general product like VMWare.
Our company (over 200 SQL servers) is currently in the process of deploying HP Polyserve on some of our servers:
HP PolyServe Software for Microsoft SQL Server enables multiple Microsoft SQL Server instances to be consolidated onto substantially fewer servers and centralized SAN storage. HP PolyServe's unique "shared data" architecture delivers enterprise class availability and virtualization-like flexibility in a utility platform.
Our primary reason for deploying it is to make hardware replacement easier: add the new box to the "matrix", shuffle around where each SQL instance resides (seamlessly), then remove the old box. Transparent to the application teams, because the SQL instance names don't change.
Old Question with Old Answers
The answers in this thread are years old. Most of the negative points in this entire thread are technically still correct but much less relevant. The overhead cost of virtualization and SAN’s is much less a factor now than it used to be. A correctly configured Virtualization host, guest, network, and SAN can provide good performance with the benefits of virtualization and operational flexibility including good recovery scenarios that are only provided by being virtual.
However, in the real world it only takes one minor configuration detail to bring the whole thing to its knees. In practice your biggest challenge with virtual SQL servers is convincing and working with the people responsible for the virtualization to get it set up just right.
Irony, in 100 percent of the cases where we took production off of the virtualization and moved it back to dedicated hardware performance went through the roof on the dedicated hardware. In all of these cases it was not the virtualization but the way it was setup. By going back to dedicated hardware we actually proved that the virtualization would have been a much better use of resources by factors of 5 or more. Modern software is usually designed to scale out across nodes so virtualization works to your advantage on that front as well.
The biggest concern to me when virtualising software is normally licensing.
Here's an article on it for MS SQL. Not sure about your situation so can't pick out any salient points.
http://www.microsoft.com/sql/howtobuy/virtualization.mspx
SQL Server is supported in a virtual environment. In fact I would recommend it seeing that one of the licensing options is per socket. This means you can put as many SQL Server instances in a virtualized (e.g. Windows 2008 Server Datacenter) system as you like and pay only per processor socket the machine has.
It's better than that because DataCenter is licensed per socket with unlimited Virtual machine licenses as well.
I'd recommend clustering your Hyper-V on two machines however so that if one fails the other can pick up the slack.
I would think that the possibility of something bad happening to the data would be too great.
As a dead simple example, let's say you ran a SQL Server box in Virtual Server 2005 R2 and undo disks were turned on (so, the main "disk" file stays the same and all changes are made to a separate file which can be purged or merged later). Then something happens (usually, you run into the 128GB limit or whatever the size is) and some middle of the night clueless admin has to reboot and figures out he can't do so until he removes the undo disks. You're screwed - even if he keeps the undo disk files for later analysis the possibilities of merging the data together is pretty slim.
So echoing the other posts in this thread - for development it's great but for production it's not a good idea. Your code can be rebuilt and redeployed (that's another thing, VM's for source control aren't a good idea either) but your live production data is way more important.
Security issues that can be introduced when dealing with Vitalization should also be considered. Virtualization Security is a good article by PandaLabs that covers some of the concerns.
You are looking at this from the wrong angle. First, you are not going to find White Papers from Vendors why you should "not" virtualize or why you should Virtualize.
Every environment is different and you need to do what works in your Environment. With that said, there are some servers that are perfect for virtualization and there are some that should not be virtualized. For example, if your SQL Server/s are doing millions and millions of transactions per second, like if your server was located at the NYSE or the NASDAQ and millions and millions of dollars depend on it, you probably should not virtualize it. Make sure you understand the ramifications of virtualizing an SQL server.
I've seen where people virtualize SQL over and over just because Virtualization is cool. Then complain later on when the VM server does not perform as expected.
What you need to do is set a benchmark, fully test the solution you want to deploy and show what it can and can't do so you don't run into any surprises. Virtualization is great, it is good for the envronment and saves through consolidation, but you need to show why your supervisors why you should not virtualize your SQL Servers and only you can do this.