Databases for easy comparison - database

We have an application which has metadata information stored in database (some tables with relations between). The metadata can be edited through web app or directly manipulating values in SQL Server database.
The problem: metadata changes and needs to be merged between different environments (test, staging, production, etc.). There are tools (e.g. RedGate) that help but it is still quite a lot of work to compare databases if autogenerated ID's are being used (as it is now in our DB, and yes, one way is to use natural keys to make comparison easier).
However, our metadata may be stored not necessarily in SQL database - it could be stored as documents in NOSQL databases (MongoDB, CouchDB, RavenDB) or even simple XML databases (maybe Berkeley DB XML?). Storing as XML file seems would work (as it easier to compare and merge files rather than databases) but may not be a good option as there needs to some concurrency mechanisms, some degree of transaction support.
We do not need replication to other servers, there is no need for high availability, etc.
The requirements to store data:
some kind of ACID
Should run on Windows
Easy comparison (bi-directional sync)
(optional) GUI to see what is in database
(optional) export to file (JSON, XML)
What are the options?

Why conflate the storage with the representation you are performing the diff on?
I'd keep everything in SQL, but when it came time to compare, select all the important data (not the ids) into a XML format, and use an XML differencing tool (or a csv format, and use a plain text comparer).

I have never used it but CouchDB has built-in support for birectional syncing between db's.

Related

I have text data, and want to get it into AWS

I have what is essentially a traditional relational database, consisting of four tables, all related with IDs. Currently this database resides in four tab-delimited text files, in an S3 bucket. Very little, if any, data will ever be added to these tables. It is an unchanging reference database. So it will be exclusively read from, never added to or edited.
I would like to access this database in an Alexa skill. I've built a few skills already, using NodeJS, so I know how that all works. But I'm anxious to learn how to link up a skill with a back-end DB. This skill will need to do SQL SELECT statements against this DB, based-on user-provided parameters, and based on the query filter be able to pull a set of records into an array that can be used by my skill's lambda function.
Each of the current text files holds one of four tables. The largest table is about 35k rows. Whole DB is maybe 5 Mb, 90% of which is one of the four. Like I said, they are all connected with ID columns like a traditional RDBMS. This will not be for commercial purposes. Probably.
I am already familiar with SQL Server, it's the DB I know, and I'm comfortable with SQL Server Express and can whip something up there, but I'm open to learning NoSQL or some other method if it's more appropriate for this use case. And as this is mostly a learning exercise, if something is "just as good", it's good for me to know.
What is my best DB solution?
* NoSQL such as DynamoDB?
* Some sort of MySQL?
* SQL Server?
* Leave them as tab-delimited text and use them from the Lambda function directly?
Thanks, I don't want to start down the wrong road here.
A few options...
S3 Select
S3 Select (in Preview at the time of writing this) "enables applications to retrieve only a subset of data from an object by using simple SQL expressions. By using S3 Select to retrieve only the data needed by your application, you can achieve drastic performance increases – in many cases you can get as much as a 400% improvement."
DynamoDB
The benefit of using DynamoDB is that there is no need to run a database server -- it is a fully-managed service. While it doesn't support SQL syntax, it is very fast and can suit many use-cases.
In fact, most projects should consider using a NoSQL database like DynamoDB for every situation, unless there is a particular reason to use SQL (such as business reporting).
Cost is based upon storage and provisioned capacity (which can scale-up and down based on demand).
SQL Database
Yes, you can certainly run an SQL database, either through Amazon RDS (Relational Database Service) or on your own EC2 instance (eg MySQL or even Apache Derby. However, you are then paying for the server even when it isn't being used.
Using Microsoft SQL Server is probably too much for your use-case (and more expensive than using an open-source product).
I wonder if you could incorporate SQLite in your app, which would provide SQL capabilities without much overhead?
Do it in memory
5 MB is, quite frankly, not much data. You could simply load all the data into memory and do your manipulations from there. While the load might consume a few cycles, data access will be very quick after that.

Sync, not import, CSV files with SQLite

In MySQL I have observed that under table options, the storage engine can be configured as a CSV file. This gives me a nice clean csv file in the database directory. This file doesn't come with column names but this isn't a deal breaker.
Can SQLite be configured to point to or sync with a csv file? I am not just trying to Import data from a csv file into the SQLite database.
If this is not possible in SQLite, I welcome alternative suggestions.
Big picture: This is a portable database that deals with tables of radically different sizes. Some may have 10 lines. Other a few hundred, others a few ten-thousand. Because of the nature of the data, most tables are manually maintained and this is best suited by spreadsheet like interfaces. Some are the result of an automated process, but in general, they still require some manual characterization in a few columns to Join (via SQL) to the other tables.
Previously I performed all this in Microsoft Access, but because I now need a cross platform open-source approach, I am exploring alternatives. I have had reasonable productivity with MySQL, but I would something a little smaller, simpler, and more portable for my users.

MS Access database with no relations

Can anyone recommend a tool or suggest the approach when dealing with MS Access database with no relationships between tables?
As part of data migration project I am creating data mapping definition rules but it becomes more and more difficult and time consuming to correctly identify source tables/fields for extraction.
I have many tables with the same data appearing in different places. Furthermore, as there were no validation rules when data was input, many entries contain spelling errors or generally do not match expected data type. Most of the tables however already have the keys (primary & foreign) created.
I am looking for a quick solution to rebuild the database (*.mdb), ideally with a use of some software which could identify all potential data issues, suggest corrections, allow for adjustments and finally left off with fully relational database where the data can easily be identified and not scattered all over the place.
I have some general knowledge of databases and SQL but didn't use Access much before so I'm trying to save myself some of the time. And - if it matters - I don't care about database performance at all... Only the data itself. I will be extracting it to *.csv files later anyway...
Comments, suggestions and/or other considerations will be appreciated.
Thanks in advance
J.
I don't believe there is any software that will analyze an Access database and use some kind of artificial intelligence to generate a new database with good data and strong relationships.
My recommendation though is to export all the data into SQL Server (or even MySQL) and then work with it there. It's much easier to manipulate the data with a real query language instead of trying to scrub data in Access.
You can do mass updates, comparisons, joins, etc. with SQL Server. You can query the schema easily (write queries to see if a field appears in a table), change schemas/table definitions with code, etc.
Then once you're done you can use jobs (SSIS) to export the data to CSV.
(You can download SQL Express if you don't have/can't afford SQL Server.)

What nosql database is ideal to use for storing code/snippets?

I want to store code similar to how jsfiddle stores code. I currently use Postgres for my main database but I'm wondering if it's more ideal to be using a NoSQL database?
Code snippets for now will have just one author, but in the future there may be multiple authors and I want the ability for reverting as well.
I know there are key/value databases and document-oriented databases. Which specific noSQL db would suite my needs? Or should I still stick with my Postgres db?
FYI:
I'm using django
The users will be permanently stored in postgres ( I'm using openID )
You can't choose a non-relational data strategy without defining what you want to do with your data.
Relational database design comes from rules of normalization, which you can apply once you know your data alone. But non-relational database design depends on your queries more than your data.
But without knowing anything about your application, my first recommendation would be to stick with PostgreSQL. Store your code snippets in text blobs, and meta-data about the code (authorship, date, language, project, etc.) in additional columns alongside the text blob. Also you can consider using GIST indexes to allow for flexible searching.
You might also consider Apache Solr, which is technically similar to a document-oriented DBMS, though it is usually presented as a fulltext search engine.
As for NoSQL databases, the only ones I'm familiar with are XML (doesn't scale well and has bad concurrency), and local databases such as Paradox, dBase, FoxProx and Access. I would not recommend any of these.
I think that the idea that it's a NoSQL database should be a smaller factor in your decision. Consider these things instead.
Redundancy. Can you run it on two servers at the same time or does it support failover? (SQL Server, Interbase, Firebird)
Concurrency. Will you host this app on the web? How will it handle 10 concurrent operations? (PostGres, MySql, Interbase, Firebird)
Speed. How long is acceptable for a lookup or post?
Embeddability. Is this a desktop application? An embedded database can make things easier. (Local databases such as Paradox, dBase, FoxPro, Access, Interbase, Firebird or SQLite)
Portability. Desktop apps may run on Mac, Linux, Windows. (SQLite)
Sounds like a relatively uncomplicated application which could be implemented in a traditional relational database or a NoSQL without too many problems.
However if you're keeping the userbase info in PostgreSQL, it would seem simplest to just stick with that as a single storage method. Using both an SQL database and a NoSQL adds complexity, makes joining across the datasets hard (so eg. you couldn't make a query to do something like ‘list users along with their most recent document’), and makes it impossible to ensure consistency between the two datasets.
What do you get for this trouble? You want versioning. CouchDB will give you revision control, but it's questionable whether you should be using that for UI-level versioning (eg because compacting the database will lose your old versions).

What advantages does a Document-based database have over a Relational database?

For example: Microsoft SQL Server vs. CouchDB.
The main benefit for me with CouchDB is that you can access it from pretty much anywhere! What advantages does a document based database have over a relational one?
Where would a document based db be a better choice over a relational?
I wouldn't say "accessing it from anywhere" is an advantage of CouchDB over SQL Server. Both are fully accessible from a variety of clients.
The key differentiating factor is the fundamental concept of how data is persisted as tables & columns (SQL Server) versus documents (CouchDB). In addition, CouchDB is designed to leverage multiple copies with replication/map-reduce in a highly forgiving fashion. SQL Server can do the same level of fault tolerance but true map-reduce is non-existant in it (it's ability to deal with sets mimics the capabilities fundamentally however - see GROUPING SETS keyword).
You should note this post which really shows that map reduce has its place, but you need to pick the right tool for the job:
http://gigaom.com/2009/04/14/mapreduce-vs-sql-its-not-one-or-the-other/

Resources