I was looking into the possibility of using CouchDB. I heard that it was similar to Lotus Notes which everyone loves to hate. Is this true?
Development of Lotus Notes began over 20 years ago, with version 1 released in 1989. It was developed by Ray Ozzie, currently Chief Software Architect for Microsoft.
Lotus Notes (the client) and Domino (the server) have been around for a long time and are mature well featured products. It has:
A full client server stack with rapid application design and deployment of document oriented databases.
A full public key infrastructure for security and encryption.
A robust replication model and active active clustering across heterogeneous platforms (someone once showed a domino cluster with an xbox and a huge AIX server).
A built in native directory for managing users that can also be accessed over LDAP.
A built in native mail system that can scale to manage millions of users with multi GB mail files, with live server access or replicated locally for off-line access. This can interface with standard internet mail through SMTP and also has POP and IMAP access built in. The mail infrastructure is a core feature that is available to all applications built on Notes Domino (any document in a database can be mailed to any other database with a simple doc.send() command).
A built in HTTP stack that allows server hosted databases to be accessed over the web.
A host of integration options for accessing, transferring and interoperating with RDBMS and ERP systems, with a closely coupled integration with DB2 available allowing Notes databases to be backed by a relational store where desired.
Backwards compatibility has always been a strong feature of Notes Domino and it is not uncommon to find databases that were developed for version 3 running flawlessly in the most up to date versions. IBM puts a huge amount of effort into this and it has a large bearing on how the product currently operates.
-
CouchDB was created by Damien Katz, starting development in 2004. He had previously worked for IBM on Notes Domino, developing templates and eventually completely rewriting one of the core features, the formula engine, for ND6.
CouchDB shares a basic concept of a document oriented database with views that Notes Domino has.
In this model "documents" are just arbitrary collections of values that are stored some how. In CouchDB the documents are JSON objects of arbitrary complexity. In Notes the values are simple name value pairs, where the values can be strings, numbers, dates or arrays of those.
Views are indexes of the documents in the database, displaying certain value, calculating others and excluding undesired docs. Once the index is build they are incrementally updated when any document in the database changes (created updated or deleted).
In CouchDB views are build by running a mapping function on each document in the database. The mapping function calls an emit method with a JSON object for every index entry it wants to create for the given document. This JSON object can be arbitrarily complex. CouchDB can then run a second reducing function on the mapped index of the view.
In Notes Domino views are built by running a select function (written in Notes Domino formula language) on each document in the database. The select function simply defines if the document should be in the view or not. Notes Domino view design also defines a number of columns for the view. Each column has a formula that is run against the selected document to determine the value for that column.
CouchDB is able to produce much more sophisticated view indexes than Notes Domino can.
CouchDB also has a replication system.
-
Summary ( TL;DR ) : CouchDB is brand new software that is developing a core that has a similar conceptual but far more sophisticated design to that used in Lotus Notes Domino. Lotus Notes Domino is a mature fully featured product that is capable of being deployed today. CouchDB is starting from scratch, building a solid foundation for future feature development. Lotus Notes Domino is continuing to develop new features, but is doing so on a 20 year old platform that strives to maintain backwards compatibility. There are features in Notes Domino that you might wish were in CouchDB, but there are also features in Notes Domino that are anachronistic in today's world.
It is the Notes application and UI that people usually hates. Not the architecture behind.
Damien Katz worked at Iris (Lotus), but he was not the guy behind the Notes Database. He is well-known in the Lotus Notes community for redesigning the Notes Formula Engine.
There are definitely some similarities between CouchDB and Lotus Notes, such as their document-oriented, non-relational data, and replication capabilities, but they are more disparate than similar. CouchDB is a database server and Lotus Notes is an enterprise-level collaboration platform.
#Lex, You should prehaps say what version of Notes/Domino you are working on because your comments are incorrect.
"No transaction support" - Domino has transactional logging. If you want more complex transaction logging that is also available within coding.
"not well suited for handling multiple data transactions" - Actually it handles them just fine. You have document locking and replication conflict resolution. Depends a lot on how you set up your application to handle workflow.
"No separation between production/dev environments." - False. The only way this could be true is if you had a badly deployed environment. Developers normally should have 0 access to deploy design changes to the production environment. They would work off a template which does not replicate to main servers. Once updates are done and approved then the administrator deploys it. They do this by taking the template and signing it with a controlled signature allowed to run on production, then drop the template in and update the design of the related applications.
"The more data lotus notes contains, the more views will likely get created" - This comment makes absolutly no sense what-so-ever. I don't believe you have used Notes/Domino in any professional ability.
"lotus script is not object oriented" - Yes you make good points there. However it doesn't mean that the language is flawed. Also they have made a large number of improvements since 8.x and with 8.5.1. For example built in web services support (point to WSDL and LS code is made for you). 8.5.1 Also has a lot of new designer features like Code Templates, auto-completion, LSDoc popup help on your own functions, etc.
You also only touch on LotusScript. Yet you can also code in:
Java, SSJS/DOJO (XPages), Javascript, #Formula language, Web Services (SOAP/REST), C-API, Eclipse Plugins(RCP). Output in JSON as well as XML.
8.5.1 Designer client is now free to download if you want to test it out.
So while I believe I am not in a position to comment on CouchDb you most certainly are not on Notes/Domino.
Lotus Notes client/Domino server is comprised of an object("document")-storage (not relational) mechanism, has fully integrated certificate-based security model / user management and conflict-resolution for syncing offline/online changes to data - it's a platform for distributed applications.
"CouchDB is a document-oriented, Non-Relational Database Management Server (NRDBMS)."
CouchDB is accessible via a REST style API.
There's a podcast interview with Jan Lehnardt of the CouchDB team here.
Without going back and listening to it again, I believe that Damien Katz, who was the initiator and is still the lead developer on CouchDB was also the guy behind the Notes database. So there's a sense in which CouchDB is a better Notes DB, I guess. He explains some of the differences in his blog.
It's similar to how Notes deals with data in that everything is a document of arbitrary structure, and you have views over those documents instead of tables and records like you'd have in a relational database. The replication etc also has some similarities.
There isn't anything wrong with the Notes server architecture, people don't hate that so much. It's more the implementation and bloat that comes with Notes.
CouchDB has no front end either, just a server component. The Notes client sucks, and that is what people REALLY hate. Have you ever tried to email uh I mean "memo" something from Notes? Not pleasant :(
Comparing Apples & Oranges
Lotus Notes Domino hasn't changed much and there is not a NoSQL service option on-prem or cloud for Notes Domino v12 or any earlier version. Domino is not cloud based tech.
When it comes to NoSQL, Domino uses NoSQL for its own application solutions built in Domino. There was an attempt with Domino Access Services which is based on Java 6, Rest API still uses Vectors in v12. This service is ok, not robust, it provided a way to interface with data in a NSF. Remember, Domino is key value pairs storage and very slow on large data sets because of the security model, each document is checked for readers and authors with every search to identify if the document can be viewed by the user. Domino is still Web 1.0.
With CouchDB one can build app on mobile and deploy it. There is no way to do the same with Notes/Domino because of the Domino Server. Domino dev also only supports MS Windows and the IDE is based on older versions of Eclipse, to this day v12, there is no way to use dual monitors with the Domino IDE. Ask any Domino Developer, they hate being forced to use a IDE on a specific platform that cannot keep up with industry.
Couch has gone through many changes as well, brief history:
CouchDB started by Damian Katz, IBM Lotus Domino engineer
Apache project BigCouch is born ; scalability and clustering added
Cloudant is born ; BigData and IBM funding and IBM Cloud offering
CouchDB 2.0 is born; Cloudant + BigData merged back into CouchDB
CouchDB 3.0 is born; Enhanced security and prep for Foundation DB
CouchDB 4.0 is born; architecture changed to Apples Foundation DB
https://www.dataengineeringpodcast.com/couchdb-document-database-episode-124/
Related
I am currently building a knowledge graph for an e-commerce company, and it mainly consists of the product category hierarchies, properties, and relations among them. Besides the common relational queries, we care about the following points very much:
Master-slave cluster support. This graph database will be used for online search query processing, so high availability is crucial to us. The data volume won't be as big as millions of nodes, so we don't need a distributed cluster that can span data across multiple machines. Still, rather we may need multiple machines that can be read simultaneously, and the service won't go down even if one of the machines is offline.
Fast online query performance. Reasoning about relations can be done offline, so the performance is not that important. But we need to do a lot of online queries like "find the nodes whose property P equals to value V", so we need good performance for online query processing. This database will be read-intensive and won't be changed very much after it's initialization.
Community and documentation. Since our team is new to the field of a graph database, so we expect user-friendly documentation for deployment and development and an active community for solving problems.
Based on the requirements above, I investigated some candidates:
Neo4j. We first tried Neo4j since it's the most popular one in the field. Actually, I liked it, especially the Cypher query language. But we are about to abandon it because the community edition does not support any cluster, and currently, we don't have the budget to pay for the enterprise edition.
OrientDB. OrientDB is like the second most popular one on the market, and it seems to support cluster in its community edition. I use the word "seems" because it is not clearly stated on its website. Can anyone clear this out? Besides, I found a negative article about OrientDB which makes me hesitate: http://orientdbleaks.blogspot.jp/2015/06/the-orientdb-issues-that-made-us-give-up.html
Titan. Titan is also great, but since its original company has been acquired and its original developers are developing a different product, its future development and maintenance are in doubt.
ArangoDB. This one seems to be very fast, according to the performance report(https://www.arangodb.com/2015/10/benchmark-postgresql-mongodb-arangodb/), but I don't know if its online query processing ability is good enough, and its support for the cluster is also unknown to me.
As for documentation and community, I really have no idea since these are the kind of things that you only get to know after you start doing it.
To sum up, based on my requirements, I think OrientDB and ArangoDB maybe my candidates, but I don't know which one to choose because of the points I stated above. Or perhaps is there any other right candidate that I'm missing?
Thanks.
Max working for ArangoDB here. ArangoDB does not only do online queries for graphs, but due to its multi-model nature you can mix graph queries with document queries (using secondary indexes), key lookups and joins. It has a sophisticated query engine with an optimizer that is fully aware of the ArangoDB cluster structure and can optimize and distribute query executions across all instances.
In a cluster, sharding, synchronous replication and self-healing are all fully automatic with configurable parameters. Deployment of an ArangoDB cluster is particularly simple (literally two clicks) on Apache Mesos or DC/OS, but is also relatively straightforward with other orchestration frameworks. ArangoDB on DC/OS additionally allows you to scale up and down via the graphical user interface or REST API calls, and failed tasks are automatically replaced.
As to the performance, all our benchmarks show a very good performance, the just released Version 3.1 even has vertex centric indexes, which is particularly important for graph queries.
We do our best to provide extensive documentation, which you find at https://www.arangodb.com/documentation/ . We have a user manual, a manual for our query language AQL as well as one for the HTTP/REST API. Furthermore, we have tutorials, frequently asked questions, a "Cookbook" for standard tasks, and we try to answer questions on StackOverflow and github issues in a timely manner.
All of this is included in the Community Edition, which is available with the Apache 2.0 open source license.
If you have more questions, do not hesitate to reach out to our team or to me personally.
OrientDB Community Edition is a free open source software, built upon by a community of developers and is constantly improving. Features such as horizontal scaling, fault tolerance, clustering, sharding and replicating aren’t disabled in OrientDB community.
For more information about cluster, take a look at the official OrientDB guide: http://orientdb.com/docs/last/Tutorial-Clusters.html
Hope it helps.
Regards
Neo4j enterprise edition can be used under the AGPL license. I am surprised a lot of people arn't aware this. If you are using Neo4j Enterprise as a server and communicating with it via REST or bolt protocol (Apache Licensed), then you don't have to worry about releasing the code of the system connecting to it under AGPL.
If you are using it embedded, then you to adhere to AGPL, but then why would you need Neo4j enterprise in that situation?
Remember to clone and compile Neo4j Enterprise from github if you plan on using it under AGPL, don't download trial.
Neo Technology gives great support and that is what you are essentially paying for for the enterprise subscription.
At first, I found a P2P CRM on http://www.ajatus.info/. But it was discontinued for years. And it is not natural to have a local web server. And the worst thing is that is hard to integrate its data with other data source for it used CouchDb.
So I draft a P2P CRM proposal and I am thinking to implement it.
Features:
Decentralization
Free( Free for software, no additional cost for related software)
Run Immediately (No installation needed, no configuration needed)
Social networking support.
Email and Contacts friendly
Basic architecture: 4 independent software.
1, Personal CRM
A Silverlight CRM application with a built- in SQL CE database. This is a completed package to run and no installation needed any more.
2, Central CRM
Central server is for performance and to simplify the support, which could be based on a typical SQL Server database of Splendid/Tiger CRM. This is a completed package also.
3, CRM Bridge
A bridge to synchronize the personal CRM and Central CRM. This will be an open source project for ANY CRM synchronize to the client. This is to be done by MS Sync Framework. ( MS Live Sync could be a better solution when it is ready and available in XP platform.)
4, Social Collector
A social data collector to collect all data from social networks and other data source. There is a good project in Codeplex.com (http://semsync.codeplex.com/) to collect and synchronize all contacts information together.
Scenario:
personal only.
Client to Central CRM directly (in DB layer).
Personal with synchronization to the central.
Any suggestions?
Ying
If Java is an option for you, the JXTA framework will help you with the P2P features of your application.
Sorry but I feel your base analysis is somewhat flawed.
"not natural to have a local web server"
By whose rules?
If you are a intenet application vendor (cloud computing hawkers) then they will tell you it is not natural.
The whole thing of P2P is to rethink those values.
If it makes sense put a web server on the localmachine put it on the local machine
Remember the original vision "The network is the computer";
Not "Large Data Centers are the computer"
2."CouchDB is hard to integrate"!
I think that is misinformed.
CouchDB has a RESTful JSON API that makes it about as integratable as you can get.
What you really mean is Couch doesn't fit into Visual Studio Development System like SQL Server. Which is true but doesn't make it hard to integrate data with other data.
There are some replication options you might want to look at.
To be honest what you are offering isn't much different than MS CRM with a social plugins module.
I think it would be difficult to get traction in the OSS Space and your gonna need help for a project that size.
Requirements for archival type software
1. Data/Image/possibly video.... upload/search/retrevial/edit from web.
2. Easily implemented user defined Custom Fields
3. Easy backup.
4. Low cost ... either opensource or very low cost
I am a very novice programmer. My primary goal is to manage a collection and publish it to the web.
Options
A. Open source software such as collective access
Problems: Custom fields not supported. Continued support? Portablity of
database?
B. Use Microsoft Access and then use MVC or other development platforms to eventually
publish to the web.
Problems:Difficult to integrate to web?
C. Design my own MVC database application.
Problems:Difficult for novice programmer? Custom Fields and Upload of various data
formats difficult to implement?
Sounds like you are looking for a Digital Assets Management system. I found ResourceSpace (http://www.resourcespace.org/) and Razuna (http://www.razuna.org/) very useful for similar projects - both fall into your A category.
Requirements for archival type
software 1. Data/Image/possibly
video.... upload/search/retrevial/edit
from web. 2. Easily implemented user
defined Custom Fields 3. Easy backup.
4. Low cost ... either opensource or very low cost
Hi there,
As mentioned here before, but Razuna will satisfy your requirements quite well.
It can manage images, documents, videos and audios. It will share folderd and collections on the web with access permissions and will allow you to search among the different kind of assets as well.
Moreover, it can handle metadata of all this asset. It will not only read metadata, but also WRITE metadata, also. Furthermore, you can set the custom fields for each asset type and users will have a web interface to work with.
Razuna supports different databases (H2, MySQL, MS SQL and Oracle (soon DB2)) and let's you migrate from one db to another with ease (backup / restore option).
Best of it all: It is available under a open source license for you to deploy and enjoy today. You can get it at http://razuna.org.
Kind Regards,
Nitai
PS: I'm the main developer and founder of Razuna.
I am evaluating two object databases, db4o (http://www.db4o.com) and Eloquera Database (http://eloquera.com) for a coming project. I have to choose one. My basic requirement is scalability, multi user support and easy type evolution for RAD.
Please share your real world experience.
If you have both, can you compare these two? Which do you prefer?
For the last 2 years I've been using DB4O, and I'm now switching to Eloquera.
My reasons, in order:
I'm building a commercial product, and the royalty based licensing on DB4O is WAY to high; DB4O said we could "talk about it", but I'm a very small development shop and giving away a huge chunk of each sale I make just doesn't make any sense when there's a perfectly good alternative.
I'm using the Db4oTool.exe to modify my assmeblies in a post-build step, and it really slows down the build process. Eloquera doesn't need to modify my assemblies.
I found a bug in the DB4O code, and it took many many months before it was integrated into their codebase. I have found bugs in Eloquera and they fixed them in a day or two
DB4O is not yet on .NET 4 (although they finally have an early beta). DB4O is the ONLY thing holding me back from using VS2010 (and .NET 4). I tried migrating to VS2010 but VS2010 automatically converts all unit tests to .NET 4, so all of my persistence related unit tests immediately failed.
DB4O is not really designed to be thread-safe.
DB4O has features and many API features that are obviously ported from Java.
Robert
Eloquera ( www.eloquera.com ) originally designed and developed for use in the Web environment and it’s designed as native .NET application in C#.
Eloquera wasn’t ported from Java as many other databases.
Eloquera natively as part of architecture supports:
Simultaneous user access
Security settings
Has genuine C/S architecture, has desktop mode available.
Max database size 1TB+, in a large data scale Eloquera maintains the fast query response; it has patents pending technologies including virtual file system, indexing, and adaptive cache. Eloquera has state of the art reflection written in MSIL that allows Eloquera to outperform many databases that use Microsoft’s standard reflection.
Supports in-memory database for the fast data processing
Since most of the users in the Web come from relational database world it was natural for Eloquera to support SQL and LINQ
EF support is due next month
Unlike some databases Eloquera does not put blindly objects in the database, if you change fields from int;int; to long; it will not keep querying with a wrong results because it still sees two int;int; - it will notify the user to update the definition
Eloquera provides a native indexing for properties and fields. Most of the databases do not provide properties indexing.
I might argue with Carl regarding DB4O the easiest database on the market, since Eloquera can do the same things from API perspective.
Eloquera is younger than Versant and still has some enterprise features coming.
Last month Eloquera R&D department got engaged with Eloquera Parallel Server to provide horizontal scaling that arguably will be magnitude cheaper than Versant’s VOD.
Some of the distinguished points
Eloquera is FREE for commercial use. You are not required to pay any royalties. All features above you have for FREE.
Eloquera has a commercial support available.
Eloquera is designed for the modern world with modern architecture. It was not adapting from time to time to market needs. It is natural part of Eloquera’s architecture.
If you are interested to hear user experiences with db4o, I suggest you also ask in our db4o user forums.
While db4o was originally developed for embedded use in applications with limited resources (and now runs very well on constrained platforms like Android, CompactFramework and Silverlight) I know that we do have many users that are happily using db4o for web applications.
Indeed there is some correctness to the db4o-bashing-post by leatrop: The db4o server core currently only allows one thread to enter for storing and querying tasks in a particular database.
However there are a couple of ways to make db4o applications scale very well:
Since the setup costs for db4o databases is very low (one single API call) it is possible to work with multiple databases. You can use the db4o replication system (dRS) to distribute objects between multiple databases. It is also possible to create backups of db4o databases while they are running and to replicate these backups to multiple machines. The approach of using multiple databases (for timeslices of data or for different usecases in your application) can be very nice for backup and debugging purposes. You don't need to copy the entire database if you want to test only some aspects of your live app.
If you still find that db4o does not scale good enough for concurrent users or database sizes, you can later switch to our high-end object database Versant VOD. It was built to run in the cloud and it has a proven track record to work for thousands of concurrent users with multi-terabyte databases. VOD for .NET also comes with a LINQ provider, so the interfaces of db4o and VOD are compatible.
My recommendation: Start with db4o. It is the easiest object database to get started with and to develop with. Just store any object with one line of code, without setting up schemas or mapping files. Use LINQ to query (or native queries, if you work with Java).
db4o is open source and it's free (under the GPL).
I'm creating a 2nd generation Social Media Platform completely based on Javafx and Db4o. We are able to do things with db4o that would be impossible with any other database. Semantic OWL Ontologies and Complex relationships with Objects and Our User Definable Canvas make Db4o an amazing fit for us. We have no worries about scaling either and have found several solutions. Carl is one of the most intelligent people in software. This fact is obvious when you learn about his product.
Mike Tallent
CEO
Objectwheel
With the rising of non-sql database usage in high traffic website, I'm interested to use it for my project. Now I've heard several names like Voldermort, MongoDB and CouchDB. But which are among these NonSQL database that is production ready? I've seen the download pages and it seems that none of them is production ready because is not version 1.0 yet. Is there any other names other than these 3 that is recommendable to be used in production?
What do you mean by production ready? As far as I know, all of them are being used on live systems.
You should make your choice based on how the features they provide fit your needs.
You can also add Tokyo Cabinet to the list as well as the mnesia database provided by the Erlang VM.
I think you need to start out from your project requirements to see what kind of database you really need. There are many non-relational DBMS:s out there and they differ a lot in what kind of problems they are good at solving. I think the article Should you go Beyond Relational Databases? by Martin Kleppmann is a good starting point for finding out what you need. There's also a lot of stackoverflow threads on similar topics, these are my favorites:
The Next-gen Databases
Non-Relational Database Design
When shouldn’t you use a relational
database?
Good reasons NOT to use a relational
database?
When you have narrowed down what you actually need you can take a deeper look into the alternatives to see which DBMS are production ready for your use case. Production readiness isn't a yes/no thing: people may successfully deploy some solution that for example lacks in tool support - in another project this could be a no-go.
As for version numbers different projects have a different take on this, so you can't just compare the version numbers. I'm involved in the graph database project Neo4j and even if it has been in production use for 5+ years by now we still haven't released a version 1.0 final yet.
I'm tempted to answer "use SIRA_PRISE".
It's definitely non-SQL.
And its current version is 1.2, meaning that someone like you must definitely assume it's "production-ready".
But perhaps I shouldn't be answering at all.
Nice article comparing rdbms with 'next gen' and listing some providers:
Is the Relational Database Doomed?
http://readwrite.com/2009/02/12/is-the-relational-database-doomed
I will suggest you to use Arangodb.
ArangoDB is a multi-model mostly-memory database with a flexible data model for documents and graphs. It is designed as a “general purpose database”, offering all the features you typically need for modern web applications.
ArangoDB is supposed to grow with the application—the project may start as a simple single-server prototype, nothing you couldn’t do with a relational database equally well. After some time, some geo-location features are needed and a shopping cart requires transactions. ArangoDB’s graph data model is useful for the recommendation system. The smartphone app needs a lean API to the back-end—this is where Foxx, ArangoDB’s integrated Javascript application framework, comes into play.
Another unique feature is ArangoDB’s query language AQL — it makes querying powerful and convenient. AQL enables you to describe complex filter conditions and joins in a readable format, much in the same way as SQL.
You can model your data in several ways:
in key/value pairs
as collections of documents
as graphs with nodes, edges, and properties for both
You can access data in ArangoDB:
using the general HTTP REST API via curl/wget, or your browser
via the ArangoDB shell (“arangosh”)
using a programming language specific client library
Server requirements for ArangoDB:
ArangoDB runs on Linux, OS X and Microsoft Windows.
It runs on 32bit and 64bit systems, though using a 32bit system will limit you to using only approximately 2 to 3 GB of data with ArangoDB.