Which nosql database for heterogeneous records? - database

I'm looking at different options for storing log entries for easier querying/reporting.
Currently I write scripts that parse and find the data, but the data is becoming more and more in demand, so it's becoming worth it to put the log data in a database.
Log entries are composed of key-value pairs, such as{"timestamp":"2012-04-24 12:34:56.789", "Msg":"OK" (simplified example).
I'm sure that eventually the log format will be extended to, say {"timestamp":"2012-04-24 12:34:56.789", "Msg":"OK", "Hostname":"Bubba", which means that the "schema" or "document definition" will need to change. Also, we're a Windows + .NET shop.
Hence, I was primarily looking for some NoSQL engine and found RavenDB attractive to use from .NET.
However, I have a hard time finding information about how it, and other NoSQL databases, work with heterogeneous records.
What would be a good fit in your opinion?

With RavenDB you can just store the different types of docs and it will be able to handle the "changes" in schema. Because it is in fact "schema-free", you can write indexes that will only index the fields that are there. See this blog post for some extra info. It's talking about migrations, but the same applies here.
Also the dynamic fields option will help you here. So given doc with arbitrary properties:
public class LogEntry
{
public string Id { get; set; }
public List<Attribute> Attributes { get; set; }
}
public class Attribute
{
public string Name { get; set; }
public string Value { get; set; }
}
You can write queries like this:
var logs = session.Advanced.LuceneQuery<LogEntry>("LogEntry/ByAttribute")
.WhereEquals("Msg", "OK")
.ToList();

Related

Dapper.Contrib: How to get a row by filtering on column other than ID?

My class is like below:
[Table("tblUser")]
public class User
{
[Key]
public int Id { get; set; }
public string Title { get; set; }
}
Using Dapper.Contrib, is there a way to get the User record by Title instead of Id?
If I query like below, it works. But, I want to filter on Title column which is not a key.
await connection.GetAsync<User>(Id);
Looking at the documentation, Dapper.Contrib does not support retrieval of records using criteria other than key. In other words, it does not support any kind of predicate system in its current implementation.
Using GetAll, you can further filter it with linq. But remember that this is not being executed on RDBMS. It will be executed on application side or in memory. That means, entire data will be loaded first and then it will be filtered.
Personally, I will choose to use Dapper (bypassing Contrib) for such specific scenario. Other part of the project will still use Contrib.

best event sourcing db strategy

I want to setup a small event sourcing lib.
I read a few tutorials online, everything understood so far.
The only problem is, in these different tutorials, there are two different database strategies, but without any comments why they use the one they use.
So, I want to ask for your opinion.
And important, why do you prefer the solution you choose.
Solution is the db structure where you create one table for each event.
Solution is the db structure where you create only one generic table, and save the events as serialized string to one column.
In both cases I'm not sure how they handle event changes, maybe they create a whole new one.
Kind regards
I built my own event sourcing lib and I opted for option 2 and here's why.
You query the event stream by aggregate id not event type.
Reproducing the events in order would be a pain if they are all in different tables
It would make upgrading events a bit of pain
There is an argument to say you can store events on a per aggregate but that depends of the requirements of the project.
I do have some posts about how event streams are used that you may find helpful.
6 Code Smells With Your CQRS Events and How to Avoid Them
Aggregate Root – How to Build One for CQRS and Event Sourcing
How to Upgrade CQRS Events Without Busting Your Event Stream
Solution is the db structure where you create only one generic table, and save the events as serialized string to one column
This is by far the best approach as replaying events is simpler. Now my two cents on event sourcing: It is a great pattern, but you should be careful because not everything is as simple as it seems. In a system I was working on we saved the stream of events per aggregate but we still had a set of normalized tables, because we just could not accept that in order to get the latest state of an object we would have to run all the events (snapshots help but are not a perfect solution). So yes event sourcing is a fine pattern, it gives you a complete versioning of your entities and a full auditing log, and it should be used just for that, not as a replacement of a set of normalized tables, but this is just my two cents.
I think best solution will be to go with #2. And even you can save your current state together with the related event at the same time if you use a transactional db like mysql.
I realy dont like and recommend the solution #1.
If your concern for #1 is about event versioning/upgrading; then declare a new class for each new change. Dont be too lazy; or be obsess with reusing. Let the subscribers know about changes; give them the event version.
If your concers for #1 is about something like querying/interpreting events; then later you can easily push your events to an nosqldb or eventstore at any time (from original db).
Also; the pattern I use for eventsourcing lib is something like that:
public interface IUserCreated : IEventModel
{
}
public class UserCreatedV1 : IUserCreated
{
public string Email { get; set; }
public string Password { get; set; }
}
public class UserCreatedV2 : IUserCreated
{
// Fullname added to user creation. Wrt issue: OA-143
public string Email { get; set; }
public string Password { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
}
public class EventRecord<T> where T : IEventModel
{
public string SessionId { get; set; } // Can be set in emitter.
public string RequestId { get; set; } // Can be set in emitter.
public DateTime CreatedDate { get; set; } // Can be set in emitter.
public string EventName { get; set; } // Extract from class or interface name.
public string EventVersion { get; set; } // Extract from class name
public T EventModel { get; set; } // Can be set in emitter.
}
public interface IEventModel { }
So; make event versioning and upgrading explicit; both in domain and codebase. Implement handling of new events in subscribers before deploying origin of new events. And; if not required, dont allow direct consuming of domain events from external subscribers; put an integration layer or something like that.
I wish my thoughts will be useful for you.
I read about an event-sourcing approach that consists in:
having two tables: aggregate and event;
base on you use cases either:
a. creates and registry on aggregate table, generating an ID, version = 0 and a event type and create an event on event table;
b. retrieve from aggregate table, events by ID or event type, apply business cases and then update aggregate table (version and event type) and then create an event on event table.
although I this approach updates some fields on aggregate table, it leaves event table as append only and improves performace as you have the latest version of an aggregate in aggregate table.
I would go with #2, and if you really want to have an efficient way of search via event type, I would just add an index on that column.
Here are the two strategies to access the data about a subject involved in this case.
1) current state and 2) event sequencing.
With current state we process the events but keep only the last state of the subject.
With event sequencing we keep the events and rebuild the current state by processing the events every time we need the state.
Event sequencing is more reliable as we can track everything that happened causing the current state but it's definitely not efficient. It's a common sense to keep also intermediate states (snapshots) not only the last one to avoid reprocessing all the events all the time. Now we have reliability and performance.
In crypto currencies there are the event sequencing and local snapshots - the local in the name is because blockchains are distributed and data are replicated.

Modelling hierarchical data in RavenDb

In RavenDb, I have to store hierarchical data and I need to query it recursively. The performance is the biggest concern here.
What I have is similar to the following one:
public class Category
{
public int Id { get; set; }
public string Name { get; set; }
public Category Parent { get; set; }
}
In this case, if I store the parent category inside the document itself, it will hard for me to manage the data as I will duplicating the categories all over the place.
So, to make that easy, I can store this as below:
public class Category
{
public int Id { get; set; }
public int? ParentId { get; set; }
public string Name { get; set; }
}
But in that case I'm not sure how the performance will be here as I will have millions of records and I need to create the category tree from this reference.
Is there a certain decision in RavenDb on how to model this type of data when the performance is the biggest concern?
Hierarchies are usually best modeled in one document that defines the hierarchy. In your situation that would be to define the categories tree, where the categories themselves can be represented by standalone documents (and thus hold Name, Description etc, and allow for other collection to reference them), or not.
Modeled from code a Category document would look something like this:
public class Category
{
public string Id { get; set; }
public string Name { get; set; }
// other meta-data that you want to store per category, like image etc
}
And the hierarchy tree document can be serialized from a class like the following, where this class can have methods for making nodes in it easily accessible:
public class CategoriesHierarchyTree
{
public class Node
{
public string CategoryId { get; set; }
public List<Node> Children { get; set; }
}
public List<Node> RootCategories { get; private set; }
// various methods for looking up and updating tree structure
}
This approach of hierarchy-tree has several important advantages:
One transactional scope - when the tree changes, the tree changes in one transaction, always. You cannot get affected by multiple concurrent changes to the tree since you can leverage optimistic concurrency when editing this one document. Using the approach you propose it is impossible to guarantee that therefore harder to guarantee the completeness and correctness of the hierarchy tree over time. If you think of a hierarchy as a tree, it actually makes a lot of sense to have each change lock the entire tree until it completes. The hierarchy tree is one entity.
Caching - the entire hierarchy can be quickly and efficiently cached, even using aggressive caching which will minimize the times the server is accessed with queries on hierarchy.
All operations are done entirely in-memory - since its one document, aka object, all queries on the hierarchy (whose the parent of, list of children etc) are made entirely in-memory and effectively cost close to nothing to perform. Using an index with Recurse() to answer such queries is order of magnitude costlier (network costs and computational). You mention performance is the biggest concern - so this is a winner.
Multiple parents per category, no denormalization - if a category document is saved outside the hierarchy tree, like demonstrated above, you can effectively put a category under multiple parents without the need to denormalize. All category data is in one place, in a document outside of the tree, and the tree only holds a reference to the category.
I will highly recommend going with this approach. It is a bit of a shift from the relational mindset, but its so worth it, even when the tree grows big.

How to design domain with entity referencing entity on another sql server with NHibernate persistance

I need to design domain that has two simple entities:
public class User
{
public virtual int Id { get; protected set; }
public virtual string Email { get; protected set; }
public virtual Country Country { get; protected set; }
...
}
public class Country
{
public virtual int Id { get; protected set; }
public virtual string Name { get; protected set; }
...
}
It's all nice and clear in domain world but the problem is that User and Country persisted in two different databases on two different servers (tho they are both MSSQL 2005 servers).
So, how should I correctly implement persistance of entites across different sql servers in NHibernate?
Using IDs instead of objects in references? Yeah, thats simple but it's hitting hard on the whole domain thing making domain object more like DTO. And it will require that IUserRepository get it's hands on ICountryRepository to load User entity.
Linked servers? Hm... Somehow I don't like it (distributed transactions and no XML columns). And what I should be aware in case of using them and more importantly how should I configure NHibernate to work effectively with linked servers?
Maybe some other solution?
I've heard of people using the schema property in a class mapping to contain the linked server name (like otherserver.dbo), but I don't know anyone that hasn't ran into one problem or another when doing that.
There are a few DDD bootstrapping frameworks that allow you to transparently map entities to different databases (resulting in multiple ISessionFactories, which it will manage for you). NCommon is one I would recommend. This assumes, however, that Country only exists in one database, and User only exists in another.
As for transactions... well, if you use a TransactionScope and configure DTS, that might work. NCommon uses a UnitOfWork API that also wraps TransactionScope.
You would have to change User so that Country is just an ID. Here's why. You'd end up with two session factories, one that has a mapping for Country and the other that has a mapping for User. If you don't make that change, NHibernate will complain that there is no mapping for Country when you save User (since they are stored in two different DBs).
Now you could instruct NHibernate to ignore Country property, and keep Country so your domain doesn't change. However, when you load User from the database next time, Country will be null.
You could use NHibernate.Shards from NHContrib.

Trouble updating one to many relationships when using ria services with nhibernate

I am working on a silverlight application and I am using RIA data services and nHibernate.
Currently, I have an entity with a one to many relationship to another entity.
public class Employer {
[Key]
public virtual int Id { get; set; }
public virtual string Name { get; set; }
}
public class Person {
[Key]
public virtual int Id { get; set; }
public virtual string Name { get; set; }
[Include]
[Association("PersonCurrentEmployer", "CurrentEmployerId", "Id", IsForeignKey = true)]
public virtual Employer CurrentEmployer { get; set; }
public virtual int? CurrentEmployerId { get; set; }
}
The property CurrentEmployerId is set for no insert and no update in the mappings.
On the Silverlight side, I set the CurrentEmployer property of the person to an existing employer on the client side submit the changes.
personEntity.CurrentEmployer = megaEmployer;
dataContext.SubmitChanges();
On the server side, the person entity's CurrentEmployerId is set to megaEmployer.Id but the CurrentEmployer is null. Because I am using the CurrentEmployer property and not the CurrentEmployerId to save the relationship, the relationship isn't changed.
Is there a way to force RIA to send the CurrentEmployer object with the save or do I have to use the CurrentEmployerId on the server side to load the employer and set it to the CurrentEmployer?
The reason you're not seeing your CurrentEmployer on the client side is because you don't have your association setup correctly.
RIA services doesn't work with references in the usual way so referencing your Employer on the client side doesnt work. RIA services works with entity sets and creates the "references" based on the association attributes. Your employer needs a property with an association back to the Person as follows.
public class Employer
{
private Person person;
[Key]
public virtual int Id { get; set; }
public virtual string Name { get; set; }
public virtual int PersonID { get; set; }
[Include]
[Association("PersonCurrentEmployer", "PersonID", "Id", IsForeignKey = false)]
public virtual Person Person {
get
{
return this.person;
}
set
{
this.person = value;
if (value != null)
{
this.PersonID = value.Id;
}
}
}
}
Is there a way to force RIA to send the CurrentEmployer object with the save or do I have to use the CurrentEmployerId on the server side to load the employer and set it to the CurrentEmployer?
I'm running into this problem as well. Basically, you either have to use the [Composition] attribute (which I wouldnt' recommend), or load the entity from the database, server-side. Composition muddies up the client data model and doesn't take care of all cases you need to worry about. (there is a lot more on Composition in the RIA forums.silverlight.net)
[UPDATE] Once you implement 2nd level cache, the worry of reading supporting entities from the database mostly goes away, as they will be loaded from cache. Also, if you only need a proxy for NHibernate to not complain, then look into Get/Load (can never remember which) .. which will return an NH proxy and will result in a single-column-and-entity select from the database. (If you try to access another property of the proxy, NH will select the rest. you can find more on this on Ayende's blog..)[/UPDATE]
The biggest problem I'm having is getting NHib to actually save and load the relationship. (I'm also using Fluent). The response from the responsible parties has so far been "waah, you can't do that. it looks like RIA wasn't developed with NHib in mind" .. which is a crap answer, IMHO. Instead of helping me figure out how to map it, they're telling me i'm doing it wrong for having a ForeignKey in my entity (NHib shouldn't care that i have my FK in my entity) ...
I want to share what I did to make this work, because 'official' support for this scenario was ... let's just say unhelpful at best, and downright rude at worst.
Incidentally, you had the same idea I had: making the Foreign Key not insert/update. BUT, I've also made it Generated.Always(). this way it will always read the value back.
Additionally, I override DomainService.Submit() and DomainService.ExecuteChangeSet(). I start an NHibernate Transaction in the Submit (though I'm not yet sure this does what I expect it does).
Instead of putting my save logic in the InsertSomeEntity() or UpdateSomeEntity() methods, I'm doing it all inside ExecuteChangeSet. this is because of NHibernate, and its NEED to have the entity graph fully-bi-directional and hydrated out prior to performing actions in NHibernate. This includes loading of entities from the database or session when a child item comes across the wire from RIA services. (I started down the path of writing methods to get the various other pieces of the graph as those specialized methods needed them, but I found it easier to do it all in a single method. Moreover, I was running into the problem of RIA wanting me to perform the insert/updates against the child objects first, which for new items is a problem.)
I want to make a comment about the composition attribute. I still stand by my previous comment about not recommending it for standard child entity collections, HOWEVER, it works GREAT for supporting NHibernate Components, because otherwise RIA will never send back the parent instance (of the composition), which is required for NHibernate to work right.
I didn't provide any code here because i would have to do some heavy redacting, but it's not a problem for me to do if you would like to see it.

Resources