Store large amount of images on multiple servers - database

I would like to know what is the best solution for storing large amount of images on multiple servers like google, facebook.
It seems that storing in filesystem is better then inside a database but what about using a noSQL DB like cassandra.
Do Google/Facebooke store the same image in multiple servers for the load balancing.
How does it work? What is the best solution?
Thx a lot

There's nothing wrong with the approach you're taking. As mentioned, there are caveats, however, the possibilities do exist, and a lot of people and companies are successfully storing files in Apache Cassandra.
zjffdu/cassandra-fs is the first solution i'd look into. Now, this was last developed 2 years ago, so I'd be a bit cautious on it working the first time, out of the box. Apache Cassandra is now at version 1.0.x, with 1.1.x on the way. 2 years ago, that was version 0.6.x maybe? A lot has changed & improved in 24 months.
semantico/cassandra-fs a fork ... last touched 7 months ago
favoritas37/cassandra-fs another fork ... last touched 3 months ago and indicates compatibility with the 1.0.5 branch of Cassandra
The principal behind this is to take a file, break it into a set of chunks and store those chunks as columns in a row. When retrieving, pull each column, reassemble the file and voila.
Cassandra FAQ: large file and blog storage
...files of around 64Mb and smaller can be easily stored in the database without splitting them into smaller chunks...
Lucene indexes in Cassandra
...its files are broken down into blocks (whose sizes are capped), where each block (see FileBlock) is stored as the value of a column in the corresponding row...
You'll get more positive feedback on the Cassandra mailing list and on the IRC channel.
Finally, this is from 2009, and written by folks at Facebook, which should go some way to help answer more of the fundamental questions you have: Cassandra - A Decentralized Structured Storage System.

Note, I know this is an old question, I just want to counter balance some misconceptions about cost as I'm doing this right now as a test.
Unlike what DavidB thinks, it does not cost millions - even if you were to run dedicated hosted hardware, you'd easily be under a couple thousand/month (BTDT, one of my clients is running a 8 node cluster for about $800/month). That said, that's a maintenance headache you want to avoid, and Cassandra on EC2 is far easier to deal with.
You could easily run a substantial production cloud on EC2 for less than $1000/month and you can do R&D clouds for less than $100/month (I spend about $52 last month for an 10 machine test cluster). I highly recommend using TurnKey Linux to manage & provision your R&D farm, as their tools will allow you to migrate instances from your desktop to pretty much any virtualized hosting platform in a few minutes (and vice versa). Plus they have really slick integration with EC2.
For really serious levels of traffic, Pintrest once stated they spend $15 to $50/hour depending on server load, auto-scaling to meet traffic demands, see http://www.theregister.co.uk/2012/04/30/inside_pinterest_virtual_data_center/ for details
The real cost is in setup and manage of your distributed Cassandra instance. Luckily, NetFlix has just release a ton of manage tools just for this. You can find them here: https://github.com/netflix - there are also a ton of interesting videos about NetFlix's use of AWS, particularly moving stuff from Cassandra to S3 - see their blog here http://techblog.netflix.com/2012/12/videos-of-netflix-talks-at-aws-reinvent.html

If you want to store in a "cloud" environment you're best going with a cloud solution that has the resources such as Google App Engine or Amazon Web Services. You're not going to be able to setup your own if that is the question. It will costs millions of dollars and resources to manage them. And yes, Google and Facebook use thousands of servers to distribute their data in "clouds".

Related

Resolving Overloaded Webserver Issues

I am new to the area of web development and currently interviewing companies, the most favorite questions among what people ask is:
How do you scale your webserver if it
starts hitting a million queries?
What would you do if you have just one
database instance running at that
time? how do you manage that?
These questions are really interesting and I would like to learn about them.
Please pour in your suggestions / practices (that you follow) for such scenarios
Thank you
How to scale:
Identify your bottlenecks.
Identify the correct solution for the problem.
Check to see you you can implement the correct solution.
Identify alternate solution and check
Typical Scaling Options:
Vertical Scaling (bigger, faster server hardware)
Load balancing
Split tiers/components out onto more/other hardware
Offload work through caching/cdn
Database Scaling Options:
Vertical Scaling (bigger, faster server hardware)
Replication (active or passive)
Clustering (if DBMS supports it)
Sharding
At the most basic level, scaling web servers consists of writing your app in such a way that it can run on > 1 machine, and throwing more machines at the problem. No matter how much you tune them, the eventual scaling will involve a farm of web servers.
The database issue is way more sticky to deal with. What is your read / write percentage? What kind of application is this? OLTP? OLAP? Social Media? What is the database? How do we add more servers to handle the load? Do we partition our data across multiple dbs? Or replicate all changes to loads of slaves?
Your questions call more questions, i.e. in an interview, if someone just "has the answer" to a generic question like you've posted, then they only know one way of doing things, and that way may or may not be the best one.
There are a few approaches I'd take to the first question:
Are there hardware upgrades that may get things up enough to handle the million queries in a short time? If so, this is likely an initial point to investigate.
Are there software changes that could be made to optimize the performance of the server? I know IIS has a ton of different settings that could be used to improve performance to some extent.
Consider going into a web farm situation rather than use a single server. I actually did have a situation where I worked once where we did have millions of hits a minute and it was thrashing our web servers rather badly and taking down a number of sites. Our solution was to change the load balancer so that a few of the servers served up the site that would thrash the servers so that other servers could keep the other sites up as this was in the fall and in retail this is your big quarter. While some would start here, I'd likely come here last as this can be opening a bit can of worms compared to the other two options.
As for the database instance, it would be a similar set of options to my mind though I may do the multi-server option first as redundancy may be an important side benefit here that I'm not sure it is as easy with a web server. I may be way off, but that is how I'd initially tackle this.
Use a caching proxy
If you serve identical pages to all visitors (say, a news site) you can reduce load by an order of magnitude by caching generated content with a caching proxy such as Varnish or Apache Traffic Server.
The proxy will sit between your server and your visitors. If you get 10,000 hits to your front page it will only have to be generated once, the proxy will send the same response to the other 9999 visitors without asking your app server again.
probably before developer starting to develop the system,
they will consider the specification of the server
maybe you can decrease use of SEO and block it from search engine to craw it
(which is the task that taking a lot of resource)
try to index everything well and avoid to making search easily
Deploy it on the cloud, make sure your web server and webapp cloud ready and can scale across different nodes. I recommend cherokee web server (very easy to load balance across different servers, and benchmarks proves faster than Apache,). For ex, google cloud (appspot) needs your web app to be Python or Java
Use caching proxy eg. Nginx.
For database use memcache on some queries which are suppose to be repeated.
If the company wants data to be private , build a private cloud , Here , Ubuntu is doing very good job at it fully free and opensource : http://www.ubuntu.com/cloud/private

The difficulty of choosing right database for analytics

I need some help deciding which database we should choose for our project. We are developing a web application that collects data about user's behavior and analyses that (bad explanation, but I can't provide much more detail; web analytics data is one of our core datasets). We have estimated that we will insert approx 200 million rows per week into database + data calculated from that raw data. The data must be retained for at least six months.
I have spent last week and half gathering information about different solutions, but there seems to be so many that I feel lost. Most promising ones I found are Cassandra, Hbase and Hive. I also looked at MongoDb, Redis and some others, but they looked like they suited different needs or community wasn't that active.
The whole app will be run in Amazon's EC2. As a startup company pay-as-you-go pricing model fits us like a glove. The easier the database is to manage in the cloud, the better.
Scalability is important. The amount of data we will generate varies quite much and will grow over time.
We can't pay huge licensing fees. Otherwise we would probably use something like http://www.vertica.com/.
We need to do all sorts of analysis on data, and the easier they are write the better. I thought about using Map/Reduce for the task; Hbase seems to have better support for this than Cassandra, and Hive has it's own query language. Real-time analysis isn't needed; we can calculate results once a day and shovel those back to database for fast retrieval.
Compression support would be nice, but not necessary (disk space is cheap :).
I also though about using MySql (because we will use that for all the user information etc. anyway), but scaling will be much harder in the future and I think at some point we would have to move to some other db anyway. We are also more than willing to commit some time and effort to push the selected database forward in terms of development.
We have decided to go on with Hadoop(& Hive/Hbase) as our primary data store. Main reasons for this are:
It is proven technology, and many big sites are using it (Facebook...).
Lot's of documentation around and even Hadoop books have been written.
Hive provides nice SQL-like query language and command line, so even guys who don't know Java/Python/etc. can write queries easily.
It's free and community people seem to be helpful :)

Pros & Cons of Google App Engine [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
[An Updated List 21st Aug 09]
Help me Compile a List of all the Advantages & Disadvantages of Building an Application on the Google App Engine
Pros:
No need to buy servers or server space (no maintenance).
Makes solving the problem of scaling easier.
Free up to a certain level of consumed resources.
Cons:
Locked into Google App Engine ?
Developers have read-only access to the filesystem on App Engine.
App Engine can only execute code called from an HTTP request (except for scheduled background tasks).
Users may upload arbitrary Python modules, but only if they are pure-Python; C and Pyrex modules are not supported.
App Engine limits the maximum rows returned from an entity get to 1000 rows per Datastore call. (Update - App Engine now supports cursors for accessing larger queries)
Java applications may only use a subset (The JRE Class White List) of the classes from the JRE standard edition.
Java applications cannot create new threads.
Known Issues!! : http://code.google.com/p/googleappengine/issues/list
Hard limits
Apps per developer - 10
Time per request - 30 sec
Files per app - 3,000
HTTP response size - 10 MB
Datastore item size - 1 MB
Application code size - 150 MB
Update Blob store now allows storage of files up to 50MB
Pro or Con?
App Engine's infrastructure removes many of the system administration and development challenges of building applications to scale to millions of hits. Google handles deploying code to a cluster, monitoring, failover, and launching application instances as necessary.
While other services let users install and configure nearly any *NIX compatible software, App Engine requires developers to use Python or Java as the programming language and a limited set of APIs. Current APIs allow storing and retrieving data from a BigTable non-relational database; making HTTP requests; sending e-mail; manipulating images; and caching. Most existing Web applications can't run on App Engine without modification, because they require a relational database.
Pros:
Scalable
Easy and cheaper (in short term).
Nice option for start-ups/individuals.
Suitable for apps that just store and retrieve data.
Cons:
Not suitable for CPU intensive calculations. They are slower and expensive.
Scalability doesn't matter much cuz if an app works at Google scale then probably it makes enough money to run on its own servers.
They have lots of limitations thrown here and there, as a result deep data analysis is difficult. Like you cannot produce a social graph using GAE.
I would say its not meant for serious businesses and expensive in long run.
(A huge new) PRO: GAE now supports MySQL :
https://developers.google.com/cloud-sql/
Pros:
built-in ui for unified logs
built-in web interface for task queues
built-in indexes on list of primary objects.
Cons:
loose logs very fast
VERY expensive
VERY expensive
VERY expensive
Un-hackable. Scales because you're obligated to code in a way that scales.
Longer development cycles. Sometimes you just want to hack something together and throw it away after 5 hors. With appengine you have to proper code it and write a lot of stuff to make it sure it scales. You can't just do a "find . | grep .avi | xargs ffmpeg -compress ...." :)
You will loose hours trying to do the simplest tasks like sending push notifications to APNS (iPhone). Although it's fine if you only want to support android in the future.
Terrible to make cleanups on the database. It's a HUGE pain in the ass to fix rows in the database, mainly because terribly slow, but it also requires a lot of code to loop properly within it's time constraints.
It was a pain to port Lucene to work on it's "filesystem".
Slow for what you pay.
Even MORE expensive if your app has spikes of traffic. My app has those spikes if a user that has many followers makes an action and we have to push notifications to his followers. Because of that I have to keep 10 inactive servers always on ($$$$$) to handle spikes.
Appengine isn't too bad due to the fact that I have the option to burn $$$$ instead of being concerned about scalability and fixing bottlenecks to reduce server usage. Sometimes it worth it.
My advice to people starting new products is to go with hetzner.de which is where I host my other products servers. It's cheap and extremely hackable. I have one server at hetzner that is handling 3x more traffic than the product that I have on appengine. The difference in price is $100 a month versions $2700 a month!
I have system admin experience, so the bottom line is that I would never choose appengine over having my own ROOT server. Don't be that bored software engineer wanting to experiment new things instead of building great products!
Pro: Unlimited scalabity to your application and scales with demand.
Con: Not available in some countries (Argentina).
Edit
Available worldwide, but only through Google Groups for App Engine.
When assessing pros and cons, I think it is important to clarify the market for which one is representing. Developers looking for a cost-effective solution to help them with the steep part of their planned hockey-stick growth curve will weigh heavily the cons already listed. For a small business owner, however, GAE is a God-send. These folks most often are looking to "the cloud" as a means to more effectively run their business (i.e. sell physical product and services). For the SMB, GAE the pros already listed can be much more valuable compared to the hockey-stick seeking dev, whilst the cons weight in at a fraction of the devs' measure. I don't see the GAE team doing anything related to SMB positioning, so I guess answers like this are me just pulling on Superman's cape, and spitting into the wind. Really GAE should be absolutely ruling the SMB space now. If not (I have no insights re: user base), then its is a greatly lamentable failure.
I believe , GAE is yet to mature in terms of providing the basic features for serious business such as Datastore with complex primary key, java.awt.* support, these are just a few I'm naming.
Other than the free space and to build some "Hobby" websites, I strongly feel GAE is NOT the place java guys should looking into.
I'm having applications built on the JSP/Servlets and MySQL, thinking about migrating to GAE, but I find I will be spending more "value time" on the migration than just buying a space from some java hosting provider such as EATJ, etc (Sorry not marketing, just an experience).
Another big issue I've got is migration of my existing mySQL data into GAE, bulkupload is really pathetic and has no client support.
No support for Local Db to Server DB upload.
Once the GAE is ready with "all the Cons" mentioned by above, then I'll think we can look in to this migration.
You are force to own a cell phone line, and your country+carrier must be able to receive international SMSs.
(I hate cell phones, and my mom's or co-workers won't get the SMSs)
Con: No Other RDBMS or NoSQL databases are not possible ....
Con: All your base are belong to us
... On a serious note:
Con: You don't control the environment your application runs in. The same cons as with outsourcing any component. Fun for toys, not for business (yet) IMHO.
Various things like API for Google proprietary backends such as their database system and other 'lockdowns' and frameworks that mean your code is tied, in some loose sense to their system can create cost issues later if you want to migrate from GAE. Of course, you could abstract these.
I like GAE, AppJet and others. They are cool. But everything has its place. If you want freedom and the ability to control your language's modules, API, syntax/stdlib versions and whatnot ... don't relinquish control to a service provider.
The lack of standards for environments and specifications for what your app can expect worries me in the cloud arena.
common sense stuff really.
Con: Limited to Java and Python

What Are the Pros and Cons of Filemaker? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
A potential customer has asked me to look at some promotional flyers for a couple of apps which fall into the contact management / scheduler category. Both use Filemaker as their backend. It looks like these two apps are sold as web apps. At any rate I had not heard of Filemaker in about ten years, so it was surprising to see it pop up twice in the same sitting. I think it started out as a Mac platform db system.
I am more partial to SQL Server, MY SQL, etc, but before make any comments on Filemaker, I'd like to know some of the pros and cons of the system. It must be more than Access for Mac's, but I have never run across it as a player in the client / server or web app arena.
Many thanks
Mike Thomas
Calling Filemaker Pro, Access for the Mac is kind of like saying, Mac OS X is Windows for the Mac. They're both in the same category of software, they're integrated programming environments. It's like you have MySQL, PHP, HTML and your editor put together in a GUI. Comparing the two, they both have pros an cons. Here are the pros and cons of using Filemaker Pro vs PHP/MySQL/HTML in my experience.
Pros:
Easy to get started
Easy to deploy locally, turn on sharing and connect from another client
Cross-platform (Mac OS X, Windows, iOS)
There are many plugins available to extend functionality
Includes starter solutions
Anyone with access can edit the program
For the most part, drag and drop programming
Changing field/database/script names after the fact is free
Has some neat built in tricks like built in graphs, tab controls, web viewers
Built in support for importing exporting excel, cvs, tab-formatted
Cons:
Inflexible: it does what it does well, but if you need more your out of luck for the most part
Expensive compared to the free alternative: It costs about $100 per year for a local user, $150 per developer, if you are using it as a website you need specialized hosting, which tends to cost more. In addition the server part of the software is about $300-$800 a year
The plugins required to extend functionality can be expensive as well
Pretty much only drag and drop programming, you can only use predefined script steps, relationships are made by making a graph
Source control is problem
Lack of scalability
Unable to copy and paste/import or export some items from solutions
Requires the mouse to access functionality
Layout design is fairly static and dated (this is improving with the Filemaker 12 and above)
In general I would say that if you're developing exclusively for the web or a large organization Filemaker Pro probably isn't the best fit. It's difficult to have multiple people developing on the same solution. On the other hand, for a smaller organization in need of a customizable in-house database it could be a great boon. You can build rather complicated applications very quickly with it if your willing to deal with it's deficiencies.
Pros:
It's cheap
Cons:
It's cheap(ly made)
It's non-standard (easy to find
MySQL/Oracle/MSSQL/Access experts
but nobody knows Filemaker)
Using subpar and/or nonstandard technologies only creates technology debt. I've never found a respectable dev that actually enjoyed (or wanted to) using this niche product.
In my opinion this product exists because it is Access for Macs, and it gained enough of a userbase and existing applications that enough people bought each upgrade to keep it in business. There are many products on the market that still exist because it's users are locked in, not because it's a good choice.
I'll admit to bias on this subject -- I work with one of the larger FileMaker development shops out there, and have written the odd book on the subject. We actually employ many respectable developers who love using FMP. I'll try to keep it brief. :-)
FileMaker Pro is a rapid app development tool. It's primarily client-server, though it has some very respectable web publishing capabilities which work well for many applications. It is not SQL-based, but does have ODBC and JDBC interfaces, as well as an XML/HTTP interface.
As far as lock-in, FileMaker Inc has grown sales steadily, with very significant growth in new users who are attracted to the platform's solidity and ease of use.
I think Matt Haughton nailed it -- for the right applications, FMP is simply the best choice going. That said, your customer is looking at apps written in FMP Pro, and you need to evaluate those apps on their own merit. They may be good instances of FMP development, or they may not.
To know more about FMP's fitness for the task, we'd need to hear more about the proposed application and user base. Are these indeed web apps, or client-server? How many users will be using it? Do they work at one or two site, or are they spread across the Internet?
Happy to elaborate further if there's more interest.
FileMaker is designed to integrate very simply with other databases and client applications. If you are looking at building a complicated distributed system, look elsewhere.
FileMaker is NOT good to use as a front-end to another datasource due to the design goals of the External SQL Data Sources (ESS) feature set, and it is NOT good to use as a back-end to anything other that the FM client due to slow and buggy ODBC drivers. The nature of FileMaker's architecture means it doesn't scale very well with complicated solutions regardless of how well it can integrate with other systems.
Here's a developer's perspective on some limitations I've found when teaming FileMaker with other back-ends and ODBC clients:
The ODBC driver is limited, slow, and leaks memory on the client-side. The xdbc_listender.exe has similar memory leaking issues on the server side and will eventually crash when it uses a certain amount of RAM. We have a scheduled script to restart it each night.
FileMaker needs to load all related databases into memory before it can connect to a database. If its a complicated database, opening and closing a connection can be quite slow (1-2 seconds) depending on how it is structured, and more so if the database references tables in other FM databases because they need to be loaded as well. I get around this by creating persistent connections that stay open for the lifetime of the application. Although we try to minimize the number of open connections, we have yet to see a performance hit on the server.
The ODBC driver interprets queries in strange ways. For example I ran a query on 76k rows to UPDATE table_1 SET field_1 = 1 and it took 5 mins to perform the query because I think it split the one query into 46k update queries, one for each row. I know this because I watched it update the rows one-by-one in the FM client. So I don't trust the ODBC driver at all.
Here's another example of 3 different queries and how long they took searching on two date fields:
SELECT id FROM table
WHERE datefield1 = {d '2014-03-26'}
.5 seconds
SELECT id FROM table
WHERE datefield2 = {d '2014-03-26'}
.5 seconds
SELECT id FROM table
WHERE datefield1 = {d '2014-03-26'} OR datefield2 = {d '2014-03-26'}
1 minute 13 seconds!
We had problems with how FileMaker cached data from an SQL Express database. We tried to run the command to clear the cache, but it didn't always work (spent a lot of time investigating this).
FileMaker uses pessimistic locking of records; before editing (from the client or as part of an odbc transaction) FileMaker attempts to lock the row first.
The FileMaker Server service "prefers" being stopped using the Admin Console (though the Admin Console may sometimes be unable to stop it either). If the FileMaker Server service stops any other way (including power loss, via the management console, or even a normal system shutdown) then some of your databases may become corrupt. Same if a client crashes during an operation, or if the network connection is lost suddenly. The solution for a power loss is to write a batch script to try and automate the shutdown, and then buy a UPS and program it to execute your script before the juice runs out. And hope it works. Otherwise backup hourly using the built-in scheduler. Aside: SQL server doesn't have this problem because it can roll back uncommitted transactions.
Performing backups with the built-in scheduler actually suspends operations to the database during backup process. ie, if its a large database, then it might take a minute to backup and users will notice the pause because they wont be able to edit/insert, etc.
If you're using the FileMaker PHP API, take note that you can't use AND and OR together in the same request.
Running an intensive query using the ODBC driver might be fast on its own, but run the same query simultaneously (as in a multi-user environment) and it will slow down by about 300% exponentially. You will run into speed issues if you’re expecting a large volume of intensive queries to hit the database at the same time.
We have found that when the FileMaker ODBC driver says it has finished an update/insert operation, it still does not guarantee the transaction is committed; it appears that FileMaker will continue to hold the changes in the server cache until the auto-enter calculated fields are evaluated/indexed and then it saves to disc, meaning there may be more of a delay until the record is actually committed. So really the ODBC write operations are not always immediate writes, but rather eventual writes. This delay will be especially evident in complicated tables with many calculated fields and triggers.
Calculated fields may slow down execution and reading via the ODBC driver, depending on what is being evaluated. Try to read stored values whenever possible.
Using BLOB containers: Not Recommended. Storing documents such as PDFs in a container field will inflate your database file size, take longer to backup and complicate the retrieval and editing of those files via ODBC. It’s much easier to store files on a network share and write to the file on disk.
If you must use FM as a front-end solution to another database, make sure to carefully read FileMaker's Introduction to External SQL Sources.
Also refer to the the appropriate version FileMaker ODBC Guide found on their website.
Just a few comments on the subject
FileMaker is certainly cheaper than some enterprise solutions in licensing costs. However, the real cost benefit is in development time. The development life-cycle is typically orders of magnitude lower than other enterprise platforms (whatever the licensing costs of those platforms). By this I mean days instead of weeks, or weeks rather than months to develop some feature.
There is a strong argument that FileMaker is Access for the Mac. While this was a valid argument a few years ago, FileMaker has come into its own in recent years. It's worth noting that FileMaker is cross platform and used extensively on Windows as well as Mac. That being said there are still huge similarities and differences between FileMaker and Access, the truth is none of them have any bearing on your situation.
While FileMaker is non-standard it does support live connection to MySQL, MS SQL Server and Oracle.
Also, there are numerous FileMaker developers not as much as more standard platforms, but they are definitely about, if you let me know where you are I can put you in touch with a selection of developers in your area.
The important point I want to make is that in the correct context FileMaker is the best thing in the world at what it does - if you try to do something that it's not meant to do, you'll get stuck. However, it could support offices in 4 locations, it can and is being done.
Before you go and rewrite your system in some other platform you should get in touch with a FileMaker expert and see what they have to say about what you've currently got, writing more details on this site and having non-experts answer positively or negatively won't help you. In the end it has to be a business choice of costs vs. benefits.
No need to list anymore "Cons" - but here is a significant "Pro" - Filemaker Go. Once you have your database setup, download a ipad/iphone app (free for FM12) and run it from a mobile device. The database can be stored locally on the ipad/iphone or synced back to a host PC.
I'm sure this mobile solution is possible elsewhere - but the fundamental point is that an entry-level user (and I mean NO previous database experience) can create an impressive solution within a few weeks.
Personal experience: main database running FM 11 hosted on PC under my desk - 4 researchers scattered across the city collecting data on ipads - all syncing back to my PC. Previous solution was using paper and entering in data by hand.
FileMaker is an interesting app :) It started as an end-user tool and it still is one of very few database apps that a non-programmer can actually use. But somehow FileMaker developers managed to make it very scalable. There's no other platform where one can start with a useful tool and end up with a client-server app that for the whole company. In old days they used to have a splash screen that captured this very idea (I only found an imperfect version):
I.e. something as simple as a file cabinet that can grow quite big.
All FileMaker pros and cons come from its origin. As an end-user tool it's very much unlike other DBMS apps. No SQL. No real programming: scripts are basically macros that repeat user actions in a slightly more general way with variables and some logic. Lots of limitations; e.g. a list view cannot have a sidebar; a dynamic value list is always sorted alphabetically; to open a Save As dialog and read back the file name you'll need a plug-in; and so on. For a programmer this can be very frustrating, because most his assumptions will be wrong. And existing apps written by non-programmers are not exactly paragons of clarity and solid design.
But if you manage to overcome the obstacles you'll find a rather good RAD for client-server, single-user, web, and mobile apps, that stays rather usable over WAN, with such niceties as runtime and kiosk mode.
Having said that, I'm not quite sure about generic contact management and scheduling apps in FileMaker. If this is what they are, then they should be unlocked, so the customer can make changes; or they have to be niche apps that do for the customer what nothing else does.
Filemaker is enormously powerful and versatile. Excellent multi-user support. You can create wonderful solutions in Filemaker with document management, web interface, iphone interface, automated publishing support, scheduled scripts, PDF/Excel/HTML reports, XML support, caller ID record lookup, integration of web data (UPS & Fedex linked to order record for example). Extensible with plugins. It's like being in the Home Depot of data. Don't try to build Amazon; other than that what can't you build with it, and faster app dev than most anywhere else?
It has been more than a year now since I run through FM and use it in developing solutions for various clients. The following are my FM experience:
learning curve is much less than using the hard coded industry standard technology;
it can fit well as to industry standards platforms because of it's ODBC and JDBC connectivity. Your data is not locked in FM and other data format can get in FM;
it fits well as front end and back end solutions.
FM can match enterprise platform having a right database design and deployment i.e. workgroup or department oriented solutions. This is data to it's workgroup owner and make it available for other workgroups or departments;
FM is fits well for rapid application development that employs prototyping;
FM has many more capabilities you therein...
I suggest you try it yourself and I'm sure you'll love the stuff FM can offer!
Happy computing...
A little research has made me think that FileMaker is indeed Access for Mac, but perhaps a little more robust. I worked with Access for years, never really liked it, and am glad to be away from it (I always held a grudge for MSFT killing FoxPro, which I did like).
It is hard for me to imagine it as a good solution for a web based app used by offices in four locations around the country, plus many others logging on from home, etc.
Using it does not make much sense when MySQL, SQL Server, etc are available for the data storage and ASP.NET, PHP, Ruby etc are there for the programming.
Mike Thomas
While the comparisons to "Access for Mac" is inevitable, there are some important distinctions that have to be made.
FileMaker databases can be shared out to more than one person provided 1 of 2 things happen. One, a person on your network opens the DB and shares it from their computer, acting as the host. Two, you buy and install FileMaker server which hosts the DBs.
Also it's been my experience that while FileMaker developers LOVE FM, they're having to learn other technologies because more and more government agencies (my primary employer the past 10 years) are moving off of FM and into SQL Server, Oracle and to some extent Access and open source. FileMaker skills are becoming less and less in demand in the public sector, so getting support for these applications is harder and consequently, more expensive.
That being said, we have a FM server and FM 5.5 clients running an application that has been rock solid for the past 5 years.
i've been using FM for more than a year now. i'm doing and providing solutions for SMBs using the SQL standard for several years. i love those SQL stuff, but just a year a ago i run through FM Pro 9 and have it a try. amazingly, i got all i wanted in just a short time. in my experience as developer, FM Pro impressed me the way it does things.
true enough, FM is not an industry database standard but a good number of its features can compensate to what "standard" is being required of. FM pro has live connectivity to MySQL, MS SQL Server and Oracle. for me, it doesn't make sense to speak about standard if you can move your data around from FM to other platforms and vice-versa.
well, this note can't make that much convincing. it's good to try it for yourself... especially now that FM has its new version 10. believe me... you'll love it...
happy computing.
Two points seem to dominate this discussion and need consideration:
Non-Standard and what Government Agencies are doing.
Let's consider the small business owner or the single user both of whom a creating databases to meet their needs.
Now it doesn't matter what the government is doing, this is your database for your employees. Do what you want (as long as its legal, of course).
Non-Standard, well often this is the best idea since what you want to do works for you. Name your fields and tables as you like and later on rename this as you prefer. Don't try this with dbf or sql... Anyone remember those 'standard' file names bks1999.dbf bks2000.dbf Keep in mind that 'standards' exist because someone else wrote them before you arrived, not because they are the best possible idea.
And yes, there are a lot of 'bad' Filemaker solutions but they are working and supporting hundreds of thousands of people. But try to improve one of these bad solutions and compare that effort to improve a similarly bad dbf solution. A renamed field filters effortlessly through thousands of scripts and scripts in related Filemaker files. In a dbf solution it can become a nightmare as each instance has to be manually retyped.
One real test would be to compare how easily Filemaker can work with SQL, etc. as compared to other applications. That might be interesting. I've never done that but I bet I could create a working file in very little time that works with such data.
I have always said that every developer should use and be familiar with all of the tools.
25 years with Filemaker Pro, 3 years with FoxPro, 2 with 4D, etc.
Lots of comments about FileMaker being non-standard. But what is "standard"? By "standard", many people mean that a database supports Structured Query Language (SQL) (ISO Standard 9075) and FileMaker has and continues to support SQL. How every database engine supports SQL is proprietary to every database. Now it might be open source such as MySQL, but SQL is a standard to support, not the underlying language of how it is accomplished.
When most people talk about databases, they are only talking about the backend tables and schema. The front end user interface is frequently something else. And most of them now render those results as html pages via open standards like PHP. Again, FileMaker fully supports PHP calls and Apache or IIS (depending on which OS platform you are on).
So I would disagree with people saying FileMaker is non-standard.
What is unique about FileMaker is its tight integration between the schema and the User Interface. This is similar to Apple's tight integration between hardware and the Operating system, which has some nice benefits. Interestingly, FileMaker is owned by Apple, but I guess that is another topic.
Generally, FileMaker's User Interface is considerably easier to use than most open standards and most people stick to FileMaker's client User Interface instead of web interfaces. There are still a number of things supported only in FileMaker User Interface that can't be duplicated in a web browser.
FileMaker really makes rapid application development much easier with its close integration of schema and user interface. This makes development cost a whole lot less in most cases.
FileMaker's database services can be spread among up to 3 machines giving it primitive load balancing abilities with web services. While FileMaker easily supports hundreds of users, if you go into thousands of simultaneous users, many SQL only databases (eg Oracle, MS SQL Server, MySQL, Postgres) are designed to better spread out the load across more machines. Basically, if you have high simultaneous transactions, FileMaker is not your solution. For example, a company with many point of sale terminals from all over the county hitting it at the same time.
While FileMaker supports SQL and PHP, using it only that way is a waste of the money spent on the license for the FileMaker User Interface. It would not be a cost effective solution to develop a web front end and pay the full FileMaker license cost for only a backend. So, FileMaker's support of PHP and SQL is best combined with companies that have an in-house solution for staff, but also want to integrate that with their web development team for outside customers.
One last note is that FileMaker's tight integration of schema and User Interface makes security much easier. Obviously you have to set up the groups and users and I usually integrate FileMaker with Active Directory (or Open Directory). But when you use the FileMaker Client and Server connections, turning on encryption security is a single checkbox on the server. FileMaker handles all of the certificates and uses an AES 256bit cipher (at least since version 11, maybe before then too). Currently, the US Government considers that approved for up to and including the first level of Top Secret communications. In typical SQL systems, there is a lot of work to configure security on the database end as well as the user interface end of things and it is much more work than a single checkbox.
FileMaker's target audience has been small to medium sized companies, usually with 5 to 200 users, and it is a well priced product for rapid application development of databases for companies of that size.
And I can't end this comment without commenting on how easy it is to create and deploy a mobile solution on iOS devices like iPads and iPhones. FileMaker Go is a free app for use on these mobile devices and they fully support the same user interface and security. In fact, I am aware of one company that uses FileMaker as a front end interface for their Oracle database simply for access on iPhones. Expect a lot more in the mobile market in the future and FileMaker is clearly targeting mobile users.
Just to add my 2¢ to the already given answers: Everything everyone has written in the voted answers is true about Filemaker. The product is robust enough to warrant both positive and negative opinions.
I'm not a pro enough to speak to your concerns but there are a number of large complex applications written in FMP that you may want to look at. Jungle Software is a good place to start.
The down side to FMP for me as a user of some of those apps is that they come with a stack of files. The runtime of a FMP application isn't packaged as a bundle so it can look a bit complex with a large app. We did some tests a long time back because FMP had a reputation of being slow. At that time (12 years ago) FMP needed to index the db or it was slow but once it was indexed it was as fast as anything else we tested. It's big upside for semi pros is that it is very easy to do basic stuff and end up with working tool. My experience with Access was extremely negative so I wouldn't compare it at all with FMP.
In the end it doesn't really mater what it was written in, if the software does what you want and is stable buy it. If it doesn't don't. It is very easy to get data in and out of FMP so the proprietaryness of the db format doesn't really enter into it.

Anyone using CouchDB? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've followed the CouchDB project with interest over the last couple of years, and see it is now an Apache Incubator project. Prior to that, the CouchDB web site was full of do not use for production code type disclaimers, so I'd done no more than keep an eye on it. I'd be interested to know your experiences if you've been using CouchDB either for a live project, or a technology pilot.
After 18 Months of prototypes, testing and waiting for CouchDb to get ready we moved an internal application over to CouchDB in December 2008. So far I'm very happy with that move. It gets rid of a lot of filesystem objects for us (PDFs and JPEGs, now stored as attachments in CouchDB). This enables us to get rid of NFS and easier cluster/replicate our frontend webservers.
To what degree CouchDB is ready for you depends very much on the culture of your organization. We have an in-house development team maintaining several internal Erlang applications. Since CouchDB is written in Erlang and the codebase is of quite decent quality we felt confident that we could fix show stopper issues in CouchDB should the need arise - or at least get our data back out. We also hired one of the CouchDB core team as an consultant - just in case.
But CouchDB for sure isn't 1.0 yet. There are crashes in the Web worker processes all the time (if you misuse them). Replication breaks for us and we don't get error messages about it. Documentation is still very lacking. Still I'm confident that it will not eat our data and development moves forward with reasonable pace.
To give you an idea about our application: currently our biggest database is about 512000 records taking 7.5 GB of diskspace.
I use the CouchDB to power a Facebook application (over 35k monthly active users). For a while it was using MySQL but after porting the entire project over from Perl to Erlang, I decided to go for the gold and migrate all of the data into CouchDB and use that instead.
CouchDB has been a great data store to work with. I think that it is on track to becoming a major player in web-based services.
I got to know one of the people (Jan) working on it a while ago (like 6 months) and have been playing with it ever since. I found the community around CouchDB to be both very knowledgable and helpful so that whenever I ran into an issue it was resolved in a matter of minutes or hours at least.
We just kicked off a project the other week which basically requires us to store data in the non-relational way and due to CouchDB's document oriented store we selected it as one of the technologies to use. So this is actually the first time that I will run it in production, but I'm still pretty confident about it. :)
Just an update here (2009-10-25):
Our first CouchDB install is 20 GB, it hosts 40 million records. It's been running in production since January 2009, and it's been great. Read (GET) speed is outstanding and we use it as a store for complex data, and then it's just pull.
Our second couchdb installment has two databases, one is 160,000,000+ documents (210 GB), and growing between 150,000-300,000 documents a day. The other is only 35,000,000 documents (7 GB). This setup has a lot more reads and writes and initial tests are performing very well.
View building on the 160,000,000 document database took roughly a week, but since then we upgraded to a larger Amazon EC2 instance and we are also getting ready to update to CouchDB 0.10.x (from 0.9.1) as this release includes a lot of performance improvements in view building.
I am using couchdb in a few scenarios, as a document store for http://devk.it (under development) and in a much larger scale as a template store for a distributed email delivery system.
CouchDB is very slick for what it does, but I was not able to get it to run at as high of a concurrency level as I would have expected. Also note that the maximum document size is fairly limiting at 1MB due to the hardcoded max input buffer size in mochiweb. You can however alter a header file and recompile to get around this limit.
I'm using CouchDB to store (and serve) article ratings on my blog. It's not exactly heavy traffic but it's been rock solid so far.
Also planning on adding comments sometime which I'll most likely also store in CouchDB.
I've found it quite easy to get started with, on OSX you can just download CouchDBX to get started quickly. I use a Sinatra backend with RestClient to interact with 'the couch' through straight HTTP verbs and such.
Great fun.
At the moment I'm working with CouchDB for a computer science thesis. I'm writing about my progresses and opinions on my blog, http://metalelf0dev.blogspot.com. I think the project is well done, but the existing documentation isn't organized as it should. A quick tutorial about the Futon web interface could be really useful for starters IMHO :)
I used couchdb twice in production. First was the wiki likes project and I think that couchdb was perfect candidate for that role. Saving the version of all docs helps a lot.
The second project was quite query loaded and idea was dumping social data first, then query it with various filters. It was looked like standard CouchDB query features looks a bit pure for our needs. But we add Lucene like a full text indexer and after that doing many queries during Lucene part. And that solution looks good enough.

Resources