Related
What is the best ORM db combination for simple data structures. That is data that contains names as identifiers and locations, but whose main interaction will be numerical data for times(sports durations), and currency related data.
I initially want to create a sports data base that will take names and statistics. Secondarily I plan to start into an investment and stock analysis db.
Which ORM suits storing many numerical types and have strong query functions?
I really am not biased to db engine (most likely use sqlite or mongo) so any suggestions to best network less db server to suit said ORM appreciated.
I had reviewed several options but I don't want to influence any suggestion or opinion. But for reference.
Genstone/Glass - Smalltalk/Pharo/Squeak
Magma - Pharo/Squeak
SQLalchemy - Python
Sequel - Ruby
Access/Excel - Micorosoft
I am learning scheme but haven't seen an ORM on offer via Racket or Chicken at the moment.
Dabo - python
I disagree that there is no need for an ORM with NoSQL, e.g., mongodb. If there is any difference in the data store from the way objects are created, modified, inter-related, found and deleted in the programming environment then that different needs to be made as small and non-obtrusive as possible. This it the job of an ORM when working with a RDBMS. But in principle the problem of mapping objects in one or more languages to a persistent store is much broader than just the subset when the persistent store is a relational database.
Today with multiple levels of distributed and local store the problem is larger, not smaller. Data can be spread from process memory to local shared memory to local disk stores which may be an arbitrary mix of SSD and HD and from there to distributed memory (e.g., memcache) and remote possibly replicated stores. Not to mention mobile, local, cloud.
THe problem that ORM is made to solve is deeper and wider today.
I wrote my first ORM in 1987 from Objective C to a relational database core (file level). I then worked for an object database company a few years interfacing languages to their ODBMS. Even with an object database there was some mismatch and need for language specific powerful but transparent interfaces.
In my case I have to say that the best ORM I have used is The Sharp Factory
It can handle thousands of tables and creates a repository, interfaces, entities and all of the code needed to interact with the databse.
The downside is that it only supports C#.
I am currently researching the best practises (at a reasonably high level) for application design for highly maintainable systems which result in minimal friction to change. By "Data Tier" I mean database design, Object Relation Mappers (ORMs) and general data access technologies.
From your experiences, what have you found to be the common mistakes & bad practises when it comes to data tier development and what measures have you taken / put in place / or can recommend to make the data tier a better place to be from a developer perspective?
An example answer may include: What is the most common causes of a slow, poorly scalable and extendible data tiers? + What measures can be taken (be it in design or refactoring) to cure this issue?
I am looking for war stories here and some real world advice that I can build into publicly available guidance documents and samples.
Magic.
I have used Hibernate, which automatically stores and fetches objects from a database. It also supports lazy loading, so that a related object is only retrieved from the database when you ask for it. This works in some magic way I don't understand.
This all works fine as long as it works, but when it breaks down it is impossible to track it down. I think we had a problem when we combined Hibernate with AOP, that somehow the object was not yet initialized by Hibernate when our code was executed. This problem was very hard to debug, because Hibernate works in such mysterious ways.
Object-Relational mapping is bad practice. By this, I mean that it tends to produce data schemas that can only loosely be described as "relational", and so they scale poorly and exhibit poor data integrity.
This is because properly relational schemas have been through the process of normalisation, whereas the results of O-R Mapping are normally object classes implemented as database tables. These will not normally have been normalised, but will instead have been designed for the immediate convenience of the OO developer.
Of course, in cases where the persistent data requirements are minimal, this is unimportant.
However, I once worked for a shipping company that had grown by taking over several other companies, and had outsourced development of an integrated operational system (to replace the various company-specific systems it had inherited) to a company using an OO methodology, with a data schema produced by O-R mapping. The performance characteristics of the system being developed were so poor, and the data schema so complex, that the shipping company dropped it after something like two years of development - before it even went live!
This was a direct consequence of the O-R mapping; the worst complexity in the schema (and the consequently poor performance) was caused by the existence of tables created solely as artifacts of the OO design process - they reflected screen layouts, not data relationships.
I am evaluating two object databases, db4o (http://www.db4o.com) and Eloquera Database (http://eloquera.com) for a coming project. I have to choose one. My basic requirement is scalability, multi user support and easy type evolution for RAD.
Please share your real world experience.
If you have both, can you compare these two? Which do you prefer?
For the last 2 years I've been using DB4O, and I'm now switching to Eloquera.
My reasons, in order:
I'm building a commercial product, and the royalty based licensing on DB4O is WAY to high; DB4O said we could "talk about it", but I'm a very small development shop and giving away a huge chunk of each sale I make just doesn't make any sense when there's a perfectly good alternative.
I'm using the Db4oTool.exe to modify my assmeblies in a post-build step, and it really slows down the build process. Eloquera doesn't need to modify my assemblies.
I found a bug in the DB4O code, and it took many many months before it was integrated into their codebase. I have found bugs in Eloquera and they fixed them in a day or two
DB4O is not yet on .NET 4 (although they finally have an early beta). DB4O is the ONLY thing holding me back from using VS2010 (and .NET 4). I tried migrating to VS2010 but VS2010 automatically converts all unit tests to .NET 4, so all of my persistence related unit tests immediately failed.
DB4O is not really designed to be thread-safe.
DB4O has features and many API features that are obviously ported from Java.
Robert
Eloquera ( www.eloquera.com ) originally designed and developed for use in the Web environment and it’s designed as native .NET application in C#.
Eloquera wasn’t ported from Java as many other databases.
Eloquera natively as part of architecture supports:
Simultaneous user access
Security settings
Has genuine C/S architecture, has desktop mode available.
Max database size 1TB+, in a large data scale Eloquera maintains the fast query response; it has patents pending technologies including virtual file system, indexing, and adaptive cache. Eloquera has state of the art reflection written in MSIL that allows Eloquera to outperform many databases that use Microsoft’s standard reflection.
Supports in-memory database for the fast data processing
Since most of the users in the Web come from relational database world it was natural for Eloquera to support SQL and LINQ
EF support is due next month
Unlike some databases Eloquera does not put blindly objects in the database, if you change fields from int;int; to long; it will not keep querying with a wrong results because it still sees two int;int; - it will notify the user to update the definition
Eloquera provides a native indexing for properties and fields. Most of the databases do not provide properties indexing.
I might argue with Carl regarding DB4O the easiest database on the market, since Eloquera can do the same things from API perspective.
Eloquera is younger than Versant and still has some enterprise features coming.
Last month Eloquera R&D department got engaged with Eloquera Parallel Server to provide horizontal scaling that arguably will be magnitude cheaper than Versant’s VOD.
Some of the distinguished points
Eloquera is FREE for commercial use. You are not required to pay any royalties. All features above you have for FREE.
Eloquera has a commercial support available.
Eloquera is designed for the modern world with modern architecture. It was not adapting from time to time to market needs. It is natural part of Eloquera’s architecture.
If you are interested to hear user experiences with db4o, I suggest you also ask in our db4o user forums.
While db4o was originally developed for embedded use in applications with limited resources (and now runs very well on constrained platforms like Android, CompactFramework and Silverlight) I know that we do have many users that are happily using db4o for web applications.
Indeed there is some correctness to the db4o-bashing-post by leatrop: The db4o server core currently only allows one thread to enter for storing and querying tasks in a particular database.
However there are a couple of ways to make db4o applications scale very well:
Since the setup costs for db4o databases is very low (one single API call) it is possible to work with multiple databases. You can use the db4o replication system (dRS) to distribute objects between multiple databases. It is also possible to create backups of db4o databases while they are running and to replicate these backups to multiple machines. The approach of using multiple databases (for timeslices of data or for different usecases in your application) can be very nice for backup and debugging purposes. You don't need to copy the entire database if you want to test only some aspects of your live app.
If you still find that db4o does not scale good enough for concurrent users or database sizes, you can later switch to our high-end object database Versant VOD. It was built to run in the cloud and it has a proven track record to work for thousands of concurrent users with multi-terabyte databases. VOD for .NET also comes with a LINQ provider, so the interfaces of db4o and VOD are compatible.
My recommendation: Start with db4o. It is the easiest object database to get started with and to develop with. Just store any object with one line of code, without setting up schemas or mapping files. Use LINQ to query (or native queries, if you work with Java).
db4o is open source and it's free (under the GPL).
I'm creating a 2nd generation Social Media Platform completely based on Javafx and Db4o. We are able to do things with db4o that would be impossible with any other database. Semantic OWL Ontologies and Complex relationships with Objects and Our User Definable Canvas make Db4o an amazing fit for us. We have no worries about scaling either and have found several solutions. Carl is one of the most intelligent people in software. This fact is obvious when you learn about his product.
Mike Tallent
CEO
Objectwheel
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.
What exactly is NoSQL? Is it database systems that only work with {key:value} pairs?
As far as I know MemCache is one of such database systems, am I right?
What other popular NoSQL databases are there and where exactly are they useful?
Thanks, Boda Cydo.
I'm not agree with the answers I'm seeing, although it's true that NoSQL solutions tends to break the ACID rules, not all are created from that approach.
I think first you should define what is a SQL Solution and then you can put the "Not Only" in front of it, that will be more accurate definition of what is a NoSQL solution.
With this approach in mind:
SQL databases are a way to group all the data stores that are accessible using Structured Query Language as the main (and most of the time only) way to communicate with them, this means it requires that the database support the structures that are common to those systems like "tables", "columns", "rows", "relationships", etc.
Now, put the "Not Only" in front of the last sentence and you will get a definition of what means "NoSQL". NoSQL groups all the stores created as an attempt to solve problems which cannot fit into the table/column/rows structures or even in SQL Statements, in most of the cases these databases will not support relationships, they're abandoning the well known structures just because the problems have changed since their conception.
If you have a text file, and you create an API to store/retrieve/organize this information, then you have a NoSQL database in your hands.
All of these means that there are several solutions to store the information in a way that traditional SQL systems will not allow to achieve better performance, flexibility, etc etc. Every NoSQL provider tries to solve a different problem and that's why you wont be able to compare two different solutions, for example:
djondb is a document store created to be used as
NoSQL enterprise solution supporting transactions, consistency, etc.
but sacrifice performance of its counterparts.
MongoDB is a document store (similar to
djondb) which accomplish great performance but trades some of the
ACID properties to achieve this.
CouchDB is another document store which
solves the queries slightly different providing views to retrieve the
information without doing a full query every time.
...
As you may have noticed I only talked about the document stores, that's because I wanted to show you that 3 different document stores implementations have different approach, therefore you should keep in mind the golden rule of NoSQL stores "Use the right tool for the right job".
I'm the creator of djondb and I've been doing a lot of research even before trying to start my own NoSQL implementation, but this is a field where the concepts will keep changing the way we see the information storage.
From wikipedia:
NoSQL is an umbrella term for a loosely defined class of non-relational data stores that break with a long history of relational databases and ACID guarantees. Data stores that fall under this term may not require fixed table schemas, and usually avoid join operations. The term was first popularised in early 2009.
The motivation for such an architecture was high scalability, to support sites such as Facebook, advertising.com, etc...
To quickly get a handle on NoSQL systems, see this blog post I wrote: Visual Guide to NoSQL Systems. Essentially, NoSQL systems sacrifice either consistency or availability in favor of tolerance to network partitions.
What is NoSQL ?
NoSQL is the acronym for Not Only SQL. The basic qualities of NoSQL databases are schemaless, distributed and horizontally scalable on commodity hardware. The NoSQL databases offers variety of functions to solve various problems with variety of data types, where “blob” used to be the only data type in RDBMS to store unstructured data.
1 Dynamic Schema
NoSQL databases allows schema to be flexible. New columns can be added anytime. Rows may or may not have values for those columns and no strict enforcement of data types for columns. This flexibility is handy for developers, especially when they expect frequent changes during the course of product life cycle.
2 Variety of Data
NoSQL databases support any type of data. It supports structured, semi-structured and unstructured data to be stored. Its supports logs, images files, videos, graphs, jpegs, JSON, XML to be stored and operated as it is without any pre-processing. So it reduces the need for ETL (Extract – Transform – Load).
3 High Availability Cluster
NoSQL databases support distributed storage using commodity hardware. It also supports high availability by horizontal scalability. This features enables NoSQL databases get the benefit of elastic nature of the Cloud infrastructure services.
4 Open Source
NoSQL databases are open source software. The usage of software is free and most of them are free to use in commercial products. The open sources codebase can be modified to solve the business needs. There are minor variations in the open source software licenses, users must be aware of license agreements.
5 NoSQL – Not Only SQL
NoSQL databases not only depend SQL to retrieve data. They provide rich API interfaces to perform DML and CRUD operations. These are APIs are move developer friendly and supported in variety of programming languages.
Take a look at these:
http://en.wikipedia.org/wiki/Nosql#List_of_NoSQL_open_source_projects
and this:
http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
I used something called the Raima Data Manager more than a dozen years ago, that qualifies as NoSQL. It calls itself a "Set Oriented Database" Its not based on tables, and there is no query "language", just an C API for asking for subsets.
It's fast and easier to work with in C/C++ and SQL, there's no building up strings to pass to a query interpreter and the data comes back as an enumerable object rather than as an array. variable sized records are normal and don't waste space. I never saw the source code, but there were some hints at the interface that internally, the code used pointers a lot.
I'm not sure that the product I used is even sold anymore, but the company is still around.
MongoDB looks interesting, SourceForge is now using it.
I listened to a podcast with a team member. The idea with NoSQL isn't so much to replace SQL as it is to provide a solution for problems that aren't solved well with traditional RDBMS. As mentioned elsewhere, they are faster and scale better at the cost of reliability and atomicity (different solutions to different degrees). You wouldn't want to use one for a financial system, but a document based system would work great.
Here is a comprehensive list of NoSQL Databases: http://nosql-database.org/.
I'm glad that you have had success with RDM John! I work at Raima so it's great to hear feedback. For those looking for more information, here are a couple of resources:
Video Overview of RDM's General Architecture
Free Evaluation Download of RDM