I am trying to decide which add-on DB to use with my application when I deploy it on AppHarbor. I've two choices: JustOneDB or Cloudant. I am planning to develop a web and mobile application, which will should work with Terabytes of data.
I am searching for the easiest solution to deploy my database, without me needing to partition the DB and the tables. I want a DB that can handle a very large amount of data, but takes the sharding and partitioning architecture building away from the developer.
I also want a solution that will allow me to easily backup my large database and easily restore it.
From what I've read, Cloudant and JustOneDB are the two most popular ones, and those are available as add-ons on AppHarbor for easy deployment.
I need your recommendations on which one I should go with, the cons and pros of each one. I am developing my application in ASP.NET and C# inside Visual Studio.
There's a recent post on the Cloudant blog about using the MyCouch .Net library with Cloudant databases:
https://cloudant.com/blog/how-to-customize-quorum-with-cloudant-using-mycouch/
Cloudant also offers free hosting up to a greater than $5 bill and can work with Apache CouchDB's replication if you want to develop locally and sync it to the cloud for production/deployment. Multi-master replication isn't something many other databases offer.
Best of luck with your application!
MyCouch.Cloudant was just released. Except from CouchDb and Cloudant core feature support the MyCouch.Cloudant NuGet package adds support for Searches. There will be more Cloudant specific features added to this. It's written in C# and supports .Net40, .Net45 and Windows store apps.
You will find more info about MyCouch in the GitHub repo.
You should probably also consider MongoDB and RavenDB.
If you're just starting out, your first concern should probably be to find a database that'll let you quickly get started and build the application you have in mind. When the application becomes a success and actually attracts terabytes of data, you can start worrying about how to scale it. If the application is soundly architected, adapting it to use an appropriate datastore should not be a monumental task.
Comment removed by originator.
Related
I am developing a Analytics tool similar to Google Analytics. That will store keywords, visits and pages in a database.
So the database can grow very quickly because I want to have many people using it.
How should I setup the database? One database for all the accounts and all the websites being monitored? Or it would be better to have one database for every account?
Also, I am planning to start with one dedicated server but I'm sure that I will need more than one server in the future so I have to build it keeping that in mind.
I also know that if I do multiple databases for every account then I will have to run upgrade scripts on all of them when the schema of the app will change.
What kind of database do you plan to use ? There is a BIG difference between relational (PostgreSQL, MySQL) and "NoSQL" (MongoDB, CouchDB)
I'm only going to talk about PostgreSQL on the relational side since it's the only database I have experience with.
First, I would keep everything in one database. There's no benefit in using a database per account.
Second, you should be absolutely sure you WILL outgrow a single machine. Given the kind of application you'll be dealing with a lot more writes than reads, so a master-slave replication will only serve for high availability, and multi-master replication with PostgreSQL is NOT easy.
From my last research the least painful way to do that was to use a tool like Postgres-XC which is designed to be write-scalable, but I have no idea how production-ready it is.
Another solution is using tools like Bucardo or SkyTools. No experience with SkyTools but I had a lot of trouble getting Bucardo to work last year.
The last solution is to do sharding. The naive way to shard is to do something like
shard number = id % 10. However using this you would need to rebalance your cluster whenever you add/remove a shard.
It would require that you write your application "shard-aware" so that you direct the queries to the correct shard.
Anyway like I said before, make sure you will NEED to shard/clusterize first.
Now for the "NoSQL" side, I have no experience with any of the solutions, but I do know that MongoDB and CouchDB handle sharding themselves so it's way easier with those solutions, however you give up quite a lot.
I'll expand a bit on Vincent's answer.
As for sharding we have had good experience with PL/Proxy. And with sharding you can outgrow single machine without issues (read or write).
As for replication Londiste from Skytools is very easy to set up and use. And with it you get PgQ, quite nice messaging solution for Postgres.
Requirements for archival type software
1. Data/Image/possibly video.... upload/search/retrevial/edit from web.
2. Easily implemented user defined Custom Fields
3. Easy backup.
4. Low cost ... either opensource or very low cost
I am a very novice programmer. My primary goal is to manage a collection and publish it to the web.
Options
A. Open source software such as collective access
Problems: Custom fields not supported. Continued support? Portablity of
database?
B. Use Microsoft Access and then use MVC or other development platforms to eventually
publish to the web.
Problems:Difficult to integrate to web?
C. Design my own MVC database application.
Problems:Difficult for novice programmer? Custom Fields and Upload of various data
formats difficult to implement?
Sounds like you are looking for a Digital Assets Management system. I found ResourceSpace (http://www.resourcespace.org/) and Razuna (http://www.razuna.org/) very useful for similar projects - both fall into your A category.
Requirements for archival type
software 1. Data/Image/possibly
video.... upload/search/retrevial/edit
from web. 2. Easily implemented user
defined Custom Fields 3. Easy backup.
4. Low cost ... either opensource or very low cost
Hi there,
As mentioned here before, but Razuna will satisfy your requirements quite well.
It can manage images, documents, videos and audios. It will share folderd and collections on the web with access permissions and will allow you to search among the different kind of assets as well.
Moreover, it can handle metadata of all this asset. It will not only read metadata, but also WRITE metadata, also. Furthermore, you can set the custom fields for each asset type and users will have a web interface to work with.
Razuna supports different databases (H2, MySQL, MS SQL and Oracle (soon DB2)) and let's you migrate from one db to another with ease (backup / restore option).
Best of it all: It is available under a open source license for you to deploy and enjoy today. You can get it at http://razuna.org.
Kind Regards,
Nitai
PS: I'm the main developer and founder of Razuna.
With the rising of non-sql database usage in high traffic website, I'm interested to use it for my project. Now I've heard several names like Voldermort, MongoDB and CouchDB. But which are among these NonSQL database that is production ready? I've seen the download pages and it seems that none of them is production ready because is not version 1.0 yet. Is there any other names other than these 3 that is recommendable to be used in production?
What do you mean by production ready? As far as I know, all of them are being used on live systems.
You should make your choice based on how the features they provide fit your needs.
You can also add Tokyo Cabinet to the list as well as the mnesia database provided by the Erlang VM.
I think you need to start out from your project requirements to see what kind of database you really need. There are many non-relational DBMS:s out there and they differ a lot in what kind of problems they are good at solving. I think the article Should you go Beyond Relational Databases? by Martin Kleppmann is a good starting point for finding out what you need. There's also a lot of stackoverflow threads on similar topics, these are my favorites:
The Next-gen Databases
Non-Relational Database Design
When shouldn’t you use a relational
database?
Good reasons NOT to use a relational
database?
When you have narrowed down what you actually need you can take a deeper look into the alternatives to see which DBMS are production ready for your use case. Production readiness isn't a yes/no thing: people may successfully deploy some solution that for example lacks in tool support - in another project this could be a no-go.
As for version numbers different projects have a different take on this, so you can't just compare the version numbers. I'm involved in the graph database project Neo4j and even if it has been in production use for 5+ years by now we still haven't released a version 1.0 final yet.
I'm tempted to answer "use SIRA_PRISE".
It's definitely non-SQL.
And its current version is 1.2, meaning that someone like you must definitely assume it's "production-ready".
But perhaps I shouldn't be answering at all.
Nice article comparing rdbms with 'next gen' and listing some providers:
Is the Relational Database Doomed?
http://readwrite.com/2009/02/12/is-the-relational-database-doomed
I will suggest you to use Arangodb.
ArangoDB is a multi-model mostly-memory database with a flexible data model for documents and graphs. It is designed as a “general purpose database”, offering all the features you typically need for modern web applications.
ArangoDB is supposed to grow with the application—the project may start as a simple single-server prototype, nothing you couldn’t do with a relational database equally well. After some time, some geo-location features are needed and a shopping cart requires transactions. ArangoDB’s graph data model is useful for the recommendation system. The smartphone app needs a lean API to the back-end—this is where Foxx, ArangoDB’s integrated Javascript application framework, comes into play.
Another unique feature is ArangoDB’s query language AQL — it makes querying powerful and convenient. AQL enables you to describe complex filter conditions and joins in a readable format, much in the same way as SQL.
You can model your data in several ways:
in key/value pairs
as collections of documents
as graphs with nodes, edges, and properties for both
You can access data in ArangoDB:
using the general HTTP REST API via curl/wget, or your browser
via the ArangoDB shell (“arangosh”)
using a programming language specific client library
Server requirements for ArangoDB:
ArangoDB runs on Linux, OS X and Microsoft Windows.
It runs on 32bit and 64bit systems, though using a 32bit system will limit you to using only approximately 2 to 3 GB of data with ArangoDB.
I'm hoping you can help.
I'm looking for a zero config multi-user datbase that my winforms application can easily upload to a webserver folder (together with 1 or 2 classic asp pages) and am looking for some suggestions/recommendations.
The idea is that the database will be used to collect feedback entered by people filling in the asp pages. The pages will write to the database using javascript.
The database will subsequently be downloaded again for processing once the responses are in.
In Summary:
It will mostly run in MS Windows environments.
I have a modest budget for this and do not mind paying for such a database.
No runtime licensing costs.
Should be xcopy - Once uploaded to a website folder it should be operational.
It should not have a dotnet CLR dependency.
It should support a resonable level of concurrent access. Average respondent count would be around 20-30 but one never knows.
Should be a reasonable size so that uploads/downloads to and from the site will be reasonably fast.
Would appreciate your suggestions/comments
Many thanks
Abz
To clarify - this is a desktop commercial application for feedback management in a vertical market. It uses SQL Server as the backing store.
The application currently provides feedback management from email and paper feedback. I now want to add web feedback capability. Getting users to to make their SQL servers accessible to a website is not at option at this time as I am want to make getting up and running as painless as possible.
I intend to release a web based implementation of the software in the near future but for now am looking at the above as a pragmatic way to provide web based feedback collection.
SQLite comes to mind. It meets all of your stated requirements, is open source, and has a liberal license (public domain).
http://sqlite.org/
I would use 'normal' database (say MySql, Postgresql, Firebird, etc.) on server. Instead of copying files to server your winforms application would create custom tables (or even custom databases). After collecting data you could just get it back to your application using plain old SQL.
why reinvent the wheel ? If you want to collect feedback and stuffs from users of your app and if they are connected to internet, it might be a better idea - and in the long term cheaper - to use a service like wufoo. We recently switched from homegrown setup to wufoo and are very pleased. Check it out.
Otherwise you might want to take a look at sqlite orfirebird. Both of them are very robust, and have ADO.NET providers. Firebird scales from a single user to full blown client server system and has no .NET dependency.
If you really don't want a DB/SQL Solution, you could try simple text files and ftp/xcopy files down and parse them into the back-office server as needed. ASP/VBScript or ASP.NET can create the files to store the basic feedback comments. Need to consider security of course!
We have a new django powered project which have a potential heavy-traffic characteristic(means a heavy db interaction). So we need to consider the database scalability in advance. With some researches, the following questions are still not clear to us:
coarse-grained: how to specify one db table(a django model) to a specific db(maybe in another server)?
fine-grained: how to specify a group of table rows to a specific db(so-called sharding, also can in another db server)?
how to specify write and read to different db?(which will be helpful for future mysql master/slave replication)
We are finding the solution with:
be transparent to application program(means we don't need to have additional codes in views.py)
should be in ORM level(means only needs to specify in models.py)
compatible with the current(or future) django release(to keep a minimal change for future's upgrading of django)
I'm still doing the research. And will share in this thread later if I've got some fruits.
Hope anyone with the experience can answer. Thanks.
Don't forget about caching either. Using memcached to relieve your DB of load is key to building a high performance site.
As alex said, django-core doesn't support your specific requests for those features, though they are definitely on the todo list.
If you don't do this in the application layer, you're basically asking for performance trouble. There aren't any really good open source automation layers for this sort of task, since it tends to break SQL axioms. If you're really concerned about it, you should be coding the entire application for it, not simply hoping that your ORM will take care of it.
There is the GSoC project by Alex Gaynor that in future will allow to use multiple databases in one Django project. But now there is no cross-RDBMS working solution.
There is no solution right now too.
And again - there is no cross-RDBMS solution. But if you are using MySQL you can try excellent third-party Django application called - mysql_replicated. It allows to setup master-slave replication scenario easily.
here for some reason we r using django with sqlalchemy. maybe combination of django and sqlalchemy also works for your needs.