Is it possible to use clusters in clusters with orientdb? - database

I was going through the OrientDB tutorial yesterday and think it looked really promising.
I like the possibility to add entities to different clusters and my question is if it is possible to have a hierarchy of clusters?
As an example, if I add a User to the cluster California I would like it to also be a part of the cluster USA without having to specifically add it to that cluster.
I am thinking about clusters as a way to divide data in sets and subsets, but maybe that is wrong?

This is what Classes are. You have real inheritance between classes, so querying a base class will give you all the instances of that class and its subclasses.
Clusters are a more physical concepts, think to them like a file or a table. A Class uses one or more clusters to store instances.

Related

Is it possible to store graph databases within another database of any form?

I was wondering if it is possible to store graph related data (for example a guitar node having various other nodes of data connected to it like specs and description) inside another database?
Imagine a graph inside a block in a tabular database. Or a key of guitar which has various nodes connected with a value pair. If possible, what are the performance limitations, drawbacks and etc.?
you can take most business scenarios and model them in multiple ways so yes, in theory, you could define one solution that uses a graph DB and another solution that uses a relational DB.
Whether you can actually model the business scenario for specific DB types would be down to the details of the requirements, as would the pros and cons.
I'm afraid your question is too general to provide an answer that gives any specifics on performance limitations, drawbacks, etc.

What is a good web application SQL Server data mart implementation in ElasticSearch?

Coming from a RDBMS background and trying to wrap my head around ElasticSearch data storage patterns...
Currently in SQL Server, we have a star schema data mart, RecordData. Rows are organized by user ID, geographic location that pertains to the rest of the searchable record, title and description (which are free text search fields).
I would like to move this over to ElasticSearch, and have read about creating a separate index per user. If I understand this correctly, with this suggestion, I would be creating a RecordData type in each user index, correct? What is a recommended naming convention for user indices that will be simple for Kibana analysis?
One issue I have with this recommendation is, how would you organize multiple web applications on the ES server? You wouldn't want to have all those user indices all over the place?
Is it so bad to have one index per application, and type per SQL Server table?
Since in SQL Server, we have other tables for user configuration, based on user ID's, I take it that I could then create new ES types in user indices for configuration. Is this a recommended pattern? I would rather not have two data base systems for this web application.
Suggestions welcome, thank you.
I went through the same thing, and there are a few things to take into account.
Data Modeling
You say you use a star schema today. Elasticsearch is typically appropriate for denormalized data where the totality of the information resides in each document unlike with a star schema. If you can live with denormalized, that is fine but I assume that since you already have star schema, denormalized data is not an option because you don't want to go and update millions of documents each time the location name change for example(if i understand the use case). At least in my use case that wasn't an option.
What are Elasticsearch options for normalized data?
This leads us to think of how to put star schema like data in a system like Elasticsearch. There are a few options in the documentation, the main ones i focused were
Nested Objects - more details at https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html . In nested objects the entire information is kept in a single document, meaning one location and its related users would be in a single document. That may make it not optimal becasue the document will be huge and again, a change in the location name will require to update the entire document. So this is better but still not optimal.
Parent - Child Relationship - more details at https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html . In this case the location and the User records would be kepts in separate indices similarly to a relational database. This seems to be the right modeling for what we need. The only major issue with this option is the fact that Kibana 4 does not provide ways to manipulate/aggregate documents based on parent/child relationship as of this writing. So if you main driver for using Elasticsearch is Kibana(this was mine), that kind of eliminates the option. If you want to benefit from the elasticsearch speed as an engine this seems to be the desired option for your use case.
In my opinion once you got right the data modeling all of your questions will be easier to answer.
Regarding the organization of the servers themselves, the way we organize that is by having a separate cluster of 3 elasticsearch nodes behind a Load Balancer(all of that is hosted on a cloud) and then have all your Web Applications connect to that cluster using the Elasticsearch API.
Hope that helps.

Grails: Best approach to dealing with an existing database

I'm writing a Grails application that will be pulling data from an existing Oracle database. If I were designing this from scratch I could hold all the information in two or three domain models because logically that's how the data should be arranged. However, this is a pre-existing database that has the data I need spread across approximately 25-30 tables. So I am wondering which of the following approaches would be considered best. I don't want to do tons of extra work to take advantage of what Grails has to offer, but at the same time I'd like to take advantage of as much of Grails as possible.
Create domain models for all 25-30 tables and then gather the data into two or three classes.
Create the two or three domain models and populate them "manually" with SQL calls
Since I'm new to Grails and how it handles data, something else that I haven't thought of yet.
There is one answer to all of your queries:
database-reverse-engineer plugin
You can configure the way you want to reverse engineer the tables to domain classes. Refer docs as well.

How graph databases store data to a persistent storage?

How graph databases store data to a persistent storage?
PKV
I would expect that every implementation of a graph database uses a different approach.
To take one example, look at Neo4j's NeoStore class, and the other kinds of store it refers to. It seems that Neo4j uses multiple files, each containing fixed-length records; one for nodes, one for keys of properties of nodes, one for values of properties of nodes, etc. Records in each contain indexes to refer to records in the others. It seems overcomplicated to me, but it evidently seemed like a good idea to the guys who wrote it!
To know more about how OrientDB stores graphs look at: http://code.google.com/p/orient/wiki/Concepts#Storage

Pros/cons of document-based databases vs. relational databases

I've been trying to see if I can accomplish some requirements with a document based database, in this case CouchDB. Two generic requirements:
CRUD of entities with some fields which have unique index on it
ecommerce web app like eBay (better description here).
And I'm begining to think that a Document-based database isn't the best choice to address these requirements. Furthermore, I can't imagine a use for a Document based database (maybe my imagination is too limited).
Can you explain to me if I am asking pears from an elm when I try to use a Document oriented database for these requirements?
You need to think of how you approach the application in a document oriented way. If you simply try to replicate how you would model the problem in an RDBMS then you will fail. There are also different trade-offs that you might want to make. ([ed: not sure how this ties into the argument but:] Remember that CouchDB's design assumes you will have an active cluster of many nodes that could fail at any time. How is your app going to handle one of the database nodes disappearing from under it?)
One way to think about it is to imagine you didn't have any computers, just paper documents. How would you create an efficient business process using bits of paper being passed around? How can you avoid bottlenecks? What if something goes wrong?
Another angle you should think about is eventual consistency, where you will get into a consistent state eventually, but you may be inconsistent for some period of time. This is anathema in RDBMS land, but extremely common in the real world. The canonical transaction example is of transferring money from bank accounts. How does this actually happen in the real world - through a single atomic transactions or through different banks issuing credit and debit notices to each other? What happens when you write a cheque?
So lets look at your examples:
CRUD of entities with some fields with unique index on it.
If I understand this correctly in CouchDB terms, you want to have a collection of documents where some named value is guaranteed to be unique across all those documents? That case isn't generally supportable because documents may be created on different replicas.
So we need to look at the real world problem and see if we can model that. Do you really need them to be unique? Can your application handle multiple docs with the same value? Do you need to assign a unique identifier? Can you do that deterministically? A common scenario where this is required is where you need a unique sequential identifier. This is tough to solve in a replicated environment. In fact if the unique id is required to be strictly sequential with respect to time created it's impossible if you need the id straight away. You need to relax at least one of those constraints.
ecommerce web app like ebay
I'm not sure what to add here as the last comment you made on that post was to say "very useful! thanks". Was there something missing from the approach outlined there that is still causing you a problem? I thought MrKurt's answer was pretty full and I added a little enhancement that would reduce contention.
Is there a need to normalize the data?
Yes: Use relational.
No: Use document.
I am in the same boat, I am loving couchdb at the moment, and I think that the whole functional style is great. But when exactly do we start to use them in ernest for applications. I mean, yes we can all start to develop applications extremely quickly, cruft free with all those nasty hang-ups about normal form being left in the wayside and not using schemas. But, to coin a phrase "we are standing on the shoulders of giants". There is a good reason to use RDBMS and to normalise and to use schemas. My old oracle head is reeling thinking about data without form.
My main wow factor on couchdb is the replication stuff and the versioning system working in tandem.
I have been racking my brain for the last month trying to grok the storage mechanisms of couchdb, apparently it uses B trees but doesn't store data based on normal form. Does this mean that it is really really smart and realises that bits of data are replicated so lets just make a pointer to this B tree entry?
So far I am thinking of xml documents, config files, resource files streamed to base64 strings.
But would I use couchdb for structural data. I don't know, any help greatly appreciated on this.
Might be useful in storing RDF data or even free form text.
A possibility is to have a main relational database that stores definitions of items that can be retrieved by their IDs, and a document database for the descriptions and/or specifications of those items. For example, you could have a relational database with a Products table with the following fields:
ProductID
Description
UnitPrice
LotSize
Specifications
And that Specifications field would actually contain a reference to a document with the technical specifications of the product. This way, you have the best of both worlds.
Document based DBs are best suiting for storing, well, documents. Lotus Notes is a common implementation and Notes email is an example. For what you are describing, eCommerce, CRUD, etc., realtional DBs are better designed for storage and retrieval of data items/elements that are indexed (as opposed to documents).
Re CRUD: the whole REST paradigm maps directly to CRUD (or vice versa). So if you know that you can model your requirements with resources (identifiable via URIs) and a basic set of operations (namely CRUD), you may be very near to a REST-based system, which quite a few document-oriented systems provide out of the box.

Resources