Creating dynamic database structure with CouchDB? - database

I'm new to CouchDB and want to give it a try. But before I do that I want to know if I can create dynamic database structure in CouchDB.
Eg.
The user starts on a blank thread and chooses whatever structure he/she wants (eg. title, body and tags and fill them in)
When he clicks save thread the database for this is created, maybe nested if necessary.
Then the user could get the thread from the database and read it.
Questions:
Is this dynamic creation of database structure possible?
I also read that you have to predefine views that will be used to get the documents. But how can you predefine views for data that yet doesn't exist and you have no idea what data and structure the user is going to create.

Yes, CouchDB's document appear from the outside like a JSON object in which you can put whatever you want except probably for a few reserved fieldnames for handling document ids and revisions.
These "predefined" views are themselves just documents, so you can modify them dynamically.
If what you require is more in the direction of searching then there are some ways out there to integrate solr with CouchDB which provides a more dynamic approach to queries.

Related

How to handle a single data point table in your database

This is sort of a random question but I am building a backend in express and mongodb and I need to store data for a settings page. This would contain random one-off global settings an admin user would input to use. However, it needs to be saved in the DB so it is constant across all users.
Right now I have a single collection/table that just has one record stored and I just updated that specific record whenever the settings are updated.
Just feels a little goofy to do that a create a full schema and collection for one piece of data but I can't think of any other way to do it. Is this the normal way of doing this?
Thanks

Best practice for handling large sets of documents in MongoDB. Referencing VS Embedding

I am working on a webapp at the moment where I intend to store data that looks something like this:
Organizations
Users within such organization
Documents under each user
My question is if it would be better to store this data across multiple databases, and reference each data point where it is necessary, or to have one database that stores organizations, then each organization document holds an array of documents of each user, then each user document holds and array of documents. For the first option, I would have a database for organizations, users, and then user data. In the organization file for example, I would store/reference users as documentIDs opposed to their actual documents.
I hope this makes sense, but yea, my question is, which would be a better practice? To spread the data across multiple DBs and reference it, or have each point of data directly store the data it needs to reference.
Thank you!

What is a good web application SQL Server data mart implementation in ElasticSearch?

Coming from a RDBMS background and trying to wrap my head around ElasticSearch data storage patterns...
Currently in SQL Server, we have a star schema data mart, RecordData. Rows are organized by user ID, geographic location that pertains to the rest of the searchable record, title and description (which are free text search fields).
I would like to move this over to ElasticSearch, and have read about creating a separate index per user. If I understand this correctly, with this suggestion, I would be creating a RecordData type in each user index, correct? What is a recommended naming convention for user indices that will be simple for Kibana analysis?
One issue I have with this recommendation is, how would you organize multiple web applications on the ES server? You wouldn't want to have all those user indices all over the place?
Is it so bad to have one index per application, and type per SQL Server table?
Since in SQL Server, we have other tables for user configuration, based on user ID's, I take it that I could then create new ES types in user indices for configuration. Is this a recommended pattern? I would rather not have two data base systems for this web application.
Suggestions welcome, thank you.
I went through the same thing, and there are a few things to take into account.
Data Modeling
You say you use a star schema today. Elasticsearch is typically appropriate for denormalized data where the totality of the information resides in each document unlike with a star schema. If you can live with denormalized, that is fine but I assume that since you already have star schema, denormalized data is not an option because you don't want to go and update millions of documents each time the location name change for example(if i understand the use case). At least in my use case that wasn't an option.
What are Elasticsearch options for normalized data?
This leads us to think of how to put star schema like data in a system like Elasticsearch. There are a few options in the documentation, the main ones i focused were
Nested Objects - more details at https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html . In nested objects the entire information is kept in a single document, meaning one location and its related users would be in a single document. That may make it not optimal becasue the document will be huge and again, a change in the location name will require to update the entire document. So this is better but still not optimal.
Parent - Child Relationship - more details at https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html . In this case the location and the User records would be kepts in separate indices similarly to a relational database. This seems to be the right modeling for what we need. The only major issue with this option is the fact that Kibana 4 does not provide ways to manipulate/aggregate documents based on parent/child relationship as of this writing. So if you main driver for using Elasticsearch is Kibana(this was mine), that kind of eliminates the option. If you want to benefit from the elasticsearch speed as an engine this seems to be the desired option for your use case.
In my opinion once you got right the data modeling all of your questions will be easier to answer.
Regarding the organization of the servers themselves, the way we organize that is by having a separate cluster of 3 elasticsearch nodes behind a Load Balancer(all of that is hosted on a cloud) and then have all your Web Applications connect to that cluster using the Elasticsearch API.
Hope that helps.

Designing a generic unstructured data store

The project I have been given is to store and retrieve unstructured data from a third-party. This could be HR information – User, Pictures, CV, Voice mail etc or factory related stuff – Work items, parts lists, time sheets etc. Basically almost any type of data.
Some of these items may be linked so a User many have a picture for example. I don’t need to examine the content of the data as my storage solution will receive the data as XML and send it out as XML. It’s down to the recipient to convert the XML back into a picture or sound file etc. The recipient may request all Users so I need to be able to find User records and their related “child” items such as pictures etc, or the recipient may just want pictures etc.
My database is MS SQL and I have to stick with that. My question is, are there any patterns or existing solutions for handling unstructured data in this way.
I’ve done a bit of Googling and have found some sites that talk about this kind of problem but they are more interested in drilling into the data to allow searches on their content. I don’t need to know the content just what type it is (picture, User, Job Sheet etc).
To those who have given their comments:
The problem I face is the linking of objects together. A User object may be added to the data store then at a later date the users picture may be added. When the User is requested I will need to return the both the User object and it associated Picture. The user may update their picture so you can see I need to keep relationships between objects. That is what I was trying to get across in the second paragraph. The problem I have is that my solution must be very generic as I should be able to store anything and link these objects by the end users requirements. EG: User, Pictures and emails or Work items, Parts list etc. I see that Microsoft has developed ZEntity which looks like it may be useful but I don’t need to drill into the data contents so it’s probably over kill for what I need.
I have been using Microsoft Zentity since version 1, and whilst it is excellent a storing huge amounts of structured data and allowing (relatively) simple access to the data, if your data structure is likely to change then recreating the 'data model' (and the regression testing) would probably remove the benefits of using such a system.
Another point worth noting is that Zentity requires filestream storage so you would need to have the correct version of SQL Server installed (2008 I think) and filestream storage enabled.
Since you deal with XML, it's not an unstructured data. Microsoft SQL Server 2005 or later has XML column type that you can use.
Now, if you don't need to access XML nodes and you think you will never need to, go with the plain varbinary(max). For your information, storing XML content in an XML-type column let you not only to retrieve XML nodes directly through database queries, but also validate XML data against schemas, which may be useful to ensure that the content you store is valid.
Don't forget to use FILESTREAMs (SQL Server 2008 or later), if your XML data grows in size (2MB+). This is probably your case, since voice-mail or pictures can easily be larger than 2 MB, especially when they are Base64-encoded inside an XML file.
Since your data is quite freeform and changable, your best bet is to put it on a plain old file system not a relational database. By all means store some meta-information in SQL where it makes sense to search through structed data relationships but if your main data content is not structured with data relationships then you're doing yourself a disservice using an SQL database.
The filesystem is blindingly fast to lookup files and stream them, especially if this is an intranet application. All you need to do is share a folder and apply sensible file permissions and a large chunk of unnecessary development disappears. If you need to deliver this over the web, consider using WebDAV with IIS.
A reasonably clever file and directory naming convension with a small piece of software you write to help people get to the right path will hands down, always beat any SQL database for both access speed and sequential data streaming. Filesystem paths and file names will always beat any clever SQL index for data location speed. And plain old files are the ultimate unstructured, flexible data store.
Use SQL for what it's good for. Use files for what they are good for. Best tools for the job and all that...
You don't really need any pattern for this implementation. Store all your data in a BLOB entry. Read from it when required and then send it out again.
Yo would probably need to investigate other infrastructure aspects like periodically cleaning up the db to remove expired entries.
Maybe i'm not understanding the problem clearly.
So am I right if I say that all you need to store is a blob of xml with whatever binary information contained within? Why can't you have a users table and then a linked(foreign key) table with userobjects in, linked by userId?

Copying entities between multiple databases with NHibernate

I have a desktop (winforms) application that uses a Firebird database as a data store (in embedded mode) and I use NHibernate for ORM. One of the functions we need to support is to be able to import / export groups of data to/from an external file. Currently, this external file is also a database with the same schema as the main database.
I've already got NHibernate set up to look at multiple databases and I can work with two databases at the same time. The problem, however, is copying data between the two databases. I have two copy strategies: (1) copy with all the same IDs for objects [aka import/export] and (2) copy with mostly new IDs [aka duplicate / copy]. I say "mostly new" because there are some lookup items that will always be copied with the same ID.
Copying everything with new IDs is fine, because I'll just have a "CopyForExport" method that can create copies of everything and not assign new IDs (or wipe out all the IDs in the object tree).
What is the "best practices" way to handle this situation and to copy data between databases while keeping the same IDs?
Clarification: I'm not trying to synchronize two databases, just exporting a subset (user-selectable) or data for transfer to someone else (who will then import the subset of data into their own database).
Further Clarification: I think I've isolated the problem down to this:
I want to use the ISession.SaveOrUpdate feature of NHibernate, so I set up my entities with an identity generator that isn't "assigned". However, I have a problem when I want to override the generated identity (for copying data between multiple databases in the same process).
Is there a way to use a Guid.Comb or UUID generator, but be able to sometimes specify my own identifier (for transferring to a different database connection with the same schema).
I found the answer to my own question:
The key is the ISession.Replicate method. This allows you to copy object graphs between data stores and keep the same identifier. To create new identifiers, I think I can use ISession.Merge, but I still have to verify this.
There are a few caveats though: my test class has a reference to the parent object (many-to-one relationship) and I had to make the class non-lazy-loading to get Replicate to work properly. If I didn't have it set to eager load (non lazy load I guess), it would only replicate the object and not the parent object (cascade="all" in my hbm.xml file).
The java Hibernate docs have a reference to Replicate(), but the NHibernate documentation doesn't (section 10.9 in the java docs).
This makes sense for the Replicate behavior because we want to have fully hydrated entities before transferring them to another data store. What's weird though is that even with both sessions open (one to each data store), it didn't think to hydrate the object when I wanted to replicate it.
You can use FBCopy for this. Just define which tables and columns you want copied and I'll do the job. You can also add optional WHERE clause for each table, so it only copies the rows you want.
While copying it makes sure the order of which data is exported is maintained, so that foreign keys do not break. It also supports generators.

Resources