Choosing a column db - database

I am trying to evaluate and use column db for my application.
I have evaluated InfiniDb and InfoBright. Can you suggest some other Column DB's

Your question might suggest that column-store database are NoSQL database.
Those are two different kind of database / way of representing datas that should not be confused.
I do personally use LucidDB that works like a charm. I use it for Business Intelligence use, and LucidDB is optimized for Business Intelligence. One of the activest member of the community Nicholas Goodman , someone really influent in the world of Open Source BI.
For this use, it is really well though and I would recommend it for this use.

There's a good list on wikipedia

Related

How to present a database design?

I am doing a project in the university and it includes a MySQL database. I have a design for the database in terms of a list of tables and their respective fields.
In what form should I present this design? Just the list of tables and content? In an ERD? How do you present your designs?
To clarify - whatever you answer, I expect not only specification of how you present your design, but also which tools do you use the create the diagrams/list/tables etc.
ERD is the only way to go. As they say, a picture is worth a thousand words.
But don't try to put the whole database on one diagram. It will, in all but the most trivial cases, be overwhelming to your audience to try to digest the entire database design in one go. Instead, break the diagrams into subject areas depicting only the most relevant tables in each diagram. For example, a point-of-sale system might have separate diagrams for Inventory, Sales, Accounting, Customer Management, Security, Auditing, and Reporting. Some tables will show up in more than one subject area -- this is to be expected.
As far as tooling, nothing beats ErWin, but it is really expensive and only available for Windows. Visio is ubiquitous in a corporate environment, but is only available on Windows and is not exactly cheap either. Macs offer some really nice diagramming tools; most of them are not free.
Dia is a decent, free, and cross-platform diagramming tool. It is a bit quirky, though; and I have not had much success making the diagrams look as nice I want them to look.
For MySQL, I have played with fabFORCE dbDesigner and it is not bad, but I did find its support for multiple subject areas to be a bit lacking at the time -- perhaps they've improved it since. But it is free and works on Windows and Linux.
For the actual presentation, I create images from these diagramming tools and pull them into presentation software (PowerPoint, KeyNote, or OpenOffice Impress). These presentations can be exported to PDF and distributed to the audience; they won't need anything more than a PDF viewer to review the information later.
Let's look at this from your professor's perspective. If I were him/her:
I would require an ERD. Without it, I cannot see one of the most fundamental issues of a database design, how are the tables related.
I would also expect some basic use cases/ requirements. What problems are you trying to solve with this database design?
I would want to see what indexes are in place, especiall on the foreign key columns. I would want to see expected row counts in all tables to determine if indexes are even required.
I would want to see column data types to determine if they meet the requirements. I would want to see what columns accept NULL values, since that often can cause problems if you're not careful.
If I were using SQL Server, I would probably create a diagram in SSMS to display a somewhat basic ERD. Visio can be used as well. I might use Visio to create my use cases, or perhaps Microsoft Word.
mysql workbench will make you pretty graphics for presentation amongst other many sophisticated features.
Depends on the audience. ERD certainly isn't the only answer and may not be the best. You should choose a medium that your audience will understand.
Don't forget to discuss design aspects that can't fit to ERD:
1) how inheritance/aggregation relationships from your analytical model implemented in your db.
2) how you are going to support hierarchies of your objects in the rdb (if you have any)
3) list relationships that are in your analytical model but are not supported by the rdb design.
4) ETL process, track changes, track schema changes, security based on resource.
5) storage partitioning and maintenance aspects (one of the goal optimize backup time)
6) in prod test (test island data) and easy cloning db for test environment

Why dont people simply use "Object Database"s?

Instead of JDO , Hibernate , iBATIS why we can not simply use "Object DataBases" ?
http://en.wikipedia.org/wiki/Comparison_of_object_database_management_systems
Even if these object databases would sometimes suffer to store and retrieve the data for an application, most of the time there are other edge conditions:
You already have an installed relational db and hired an admin for it.
You need programs like Crystal Reports to do some stuff with your data.
You don't want to rely on a database that isn't as widespread as a relational one.
The reason is clearly laid out here by Mark Harrison amongst others. In short, relational DBs have historical momentum, and are technically superior for a lot of stuff. Also relational DBs just work better, at least in 2009 (check out the other answers to the question I referenced).
At the same time, you do need JDO, ActiveRecord, or something to avoid writing standard object-DB translations yourself.
Because most developers do not know enough, most of customers already have an installed relational db and hired an admin for it and best of them are quite specific and commercial. Here you are one suitable database benchmark to test and see result of work on most famous DMS
Because objects are all about hiding data and databases are all about making data public.
From that point of view, one could even say that "an OO dbms" is a contradiction of terms.

Database schema for a site like SO?

Since I took the basic undergrad course in databases design and SQL I haven't really touched anything like this.
So my question is - how would the database schema for a site like this one would usually look like? What are you generally expected to find? For instance, how are questions and answers stored?
Are there some tools which allow you to design it? or is it just something the devs come up with?
Stack Overflow used the MediaWiki database schema as a template before they went through a database refactoring a few weeks back.
There are few principles in database design which im sure you pretty much know if you took that course. They all say basically that your data should not be duplicated across multiple tables and all your columns should be integral to the table they appear in. Then there are common entities aplied in all software design like objects and relations. Managing them is easiest part because it comes instinctively. Then there is optimization for scalability and performance which shouldn't be hard if first steps are done right. And this is usually done together with software team which is writing code for your db.
You can use tools to design a database, but they are normally just templates for creating the right shapes in a diagram.
The logical structure will be designed in the same way arn architect would design a building, using their best knowledge and experiences.
Also, always "work in pencil" until you are happy.

How would you design your database to allow user-defined schema

If you have to create an application like - let's say a blog application, creating the database schema is relatively simple. You have to create some tables, tblPosts, tblAttachments, tblCommets, tblBlaBla… and that's it (ok, i know, that's a bit simplified but you understand what i mean).
What if you have an application where you want to allow users to define parts of the schema at runtime. Let's say you want to build an application where users can log any kind of data. One user wants to log his working hours (startTime, endTime, project Id, description), the next wants to collect cooking recipes, others maybe stock quotes, the weekly weight of their babies, monthly expenses they spent for food, the results of their favorite football teams or whatever stuff you can think about.
How would you design a database to hold all that very very different kind of data? Would you create a generic schema that can hold all kind of data, would you create new tables reflecting the user data schema or do you have another great idea to do that?
If it's important: I have to use SQL Server / Entity Framework
Let's try again.
If you want them to be able to create their own schema, then why not build the schema using, oh, I dunno, the CREATE TABLE statment. You have a full boat, full functional, powerful database that can do amazing things like define schemas and store data. Why not use it?
If you were just going to do some ad-hoc properties, then sure.
But if it's "carte blanche, they can do whatever they want", then let them.
Do they have to know SQL? Umm, no. That's your UIs task. Your job as a tool and application designer is to hide the implementation from the user. So present lists of fields, lines and arrows if you want relationships, etc. Whatever.
Folks have been making "end user", "simple" database tools for years.
"What if they want to add a column?" Then add a column, databases do that, most good ones at least. If not, create the new table, copy the old data, drop the old one.
"What if they want to delete a column?" See above. If yours can't remove columns, then remove it from the logical view of the user so it looks like it's deleted.
"What if they have eleventy zillion rows of data?" Then they have a eleventy zillion rows of data and operations take eleventy zillion times longer than if they had 1 row of data. If they have eleventy zillion rows of data, they probably shouldn't be using your system for this anyway.
The fascination of "Implementing databases on databases" eludes me.
"I have Oracle here, how can I offer less features and make is slower for the user??"
Gee, I wonder.
There's no way you can predict how complex their data requirements will be. Entity-Attribute-Value is one typical solution many programmers use, but it might be be sufficient, for instance if the user's data would conventionally be modeled with multiple tables.
I'd serialize the user's custom data as XML or YAML or JSON or similar semi-structured format, and save it in a text BLOB.
You can even create inverted indexes so you can look up specific values among the attributes in your BLOB. See http://bret.appspot.com/entry/how-friendfeed-uses-mysql (the technique works in any RDBMS, not just MySQL).
Also consider using a document store such as Solr or MongoDB. These technologies do not need to conform to relational database conventions. You can add new attributes to any document at runtime, without needing to redefine the schema. But it's a tradeoff -- having no schema means your app can't depend on documents/rows being similar throughout the collection.
I'm a critic of the Entity-Attribute-Value anti-pattern.
I've written about EAV problems in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Here's an SO answer where I list some problems with Entity-Attribute-Value: "Product table, many kinds of products, each product has many parameters."
Here's a blog I posted the other day with some more discussion of EAV problems: "EAV FAIL."
And be sure to read this blog "Bad CaRMa" about how attempting to make a fully flexible database nearly destroyed a company.
I would go for a Hybrid Entity-Attribute-Value model, so like Antony's reply, you have EAV tables, but you also have default columns (and class properties) which will always exist.
Here's a great article on what you're in for :)
As an additional comment, I knocked up a prototype for this approach using Linq2Sql in a few days, and it was a workable solution. Given that you've mentioned Entity Framework, I'd take a look at version 4 and their POCO support, since this would be a good way to inject a hybrid EAV model without polluting your EF schema.
On the surface, a schema-less or document-oriented database such as CouchDB or SimpleDB for the custom user data sounds ideal. But I guess that doesn't help much if you can't use anything but SQL and EF.
I'm not familiar with the Entity Framework, but I would lean towards the Entity-Attribute-Value (http://en.wikipedia.org/wiki/Entity-Attribute-Value_model) database model.
So, rather than creating tables and columns on the fly, your app would create attributes (or collections of attributes) and then your end users would complete the values.
But, as I said, I don't know what the Entity Framework is supposed to do for you, and it may not let you take this approach.
Not as a critical comment, but it may help save some of your time to point out that this is one of those Don Quixote Holy Grail type issues. There's an eternal quest for probably over 50 years to make a user-friendly database design interface.
The only quasi-successful ones that have gained any significant traction that I can think of are 1. Excel (and its predecessors), 2. Filemaker (the original, not its current flavor), and 3. (possibly, but doubtfully) Access. Note that the first two are limited to basically one table.
I'd be surprised if our collective conventional wisdom is going to help you break the barrier. But it would be wonderful.
Rather than re-implement sqlservers "CREATE TABLE" statement, which was done many years ago by a team of programmers who were probably better than you or I, why not work on exposing SQLSERVER in a limited way to the users -- let them create thier own schema in a limited way and leverage the power of SQLServer to do it properly.
I would just give them a copy of SQL Server Management Studio, and say, "go nuts!" Why reinvent a wheel within a wheel?
Check out this post you can do it but it's a lot of hard work :) If performance is not a concern an xml solution could work too though that is also alot of work.

Defining the database schema in the application or in the database?

I know that the title might sound a little contradictory, but what I'm asking is with regards to ORM frameworks (SQLAlchemy in this case, but I suppose this would apply to any of them) that allow you to define your schema within your application.
Is it better to change the database schema directly and then update the column types in your program manually, or does it make more sense to define the tables in your application and then use the ORM framework's table generation functions to make the schema and then build the tables on the database side for you?
Bear in mind that applications and databases tend to live in a M:M relationship in any but the most trivial cases. If your application is at all likely to have interfaces to other systems, reports, data extracts or loads, or data migrated onto or off it from another system then the database has more than one stakeholder.
Be nice to the other stakeholders in your application. Take the time and get the schema right and put some thought into data quality in the design of your application. Keep an eye on anyone else using the application and make sure you don't break bits of the schema that they depend on without telling them. This means that the database has a life of its own to a greater or lesser extent. The more integration, the more independent the database.
Of course, if nobody else uses or cares about the data, feel free to ignore my advice.
My personal belief is that you should design the database on its own merits. The database is the best place to handle things modeling your Domain data. The database is also the biggest source of slow down in applications and letting your ORM design your database seems like a bad idea to me. :)
Of course, I've only got a couple of big projects behind me. I'm still learning daily. :)
The best way to define your database schema is to start with modeling your application domain (domain driven design anyone?) and seeing what tables take shape based on the domain objects you define.
I think this is the best way because really the database is simply a place to persist information from the application, it should never lead the design. It's not the only place to persist information as well. We have users that want to work from flat files or the database for instance. They could also use XML files. So by starting with your domain objects and then generating tables (or flat file or XML schema or whatever) from there will lead to a much better design in the end.
While this may depend on you using an object-oriented language, using an ORM tool like Hibernate/NHibernate, SubSonic, etc. can really make this transition easy for you up to, and including generating the database creation scripts.
In reference to performance, performance should be one of the last things you look at in an application, it should never drive the design. After you get a good schema up and running based on your domain you can always make tweaks to improve its performance.
Alot depends on your skill level with the specific database product that you're going to use. Think of it as the difference between a "manual" and "automatic" transmission car. ORMs provide you with that "automatic" transmission, just start designing your classes, and let the ORM worry about getting it stored into the database somehow.
Sounds good. The problem with most ORMs is that in their quest to be PI "persistence ignorant", they often don't take advantage of specific database features that can provide elegant solutions for a given task. Notice, I didn't say ALL ORMs, just most.
My take is to design the conceptual data model first yourself. Then you can go in either direction, up towards the application space, or down towards the physical database. But remember, only YOU know if it's more advantageous to use a view instead of a table, should you normalize or de-normalize a table, what non-clustered index(es) make sense with this table, is a natural or surrogate key more appropriate for this table, etc... Of course, if you feel that these questions are beyond your grasp, then let the ORM help you out.
One more thing, you really need to seperate the application design from the database design. They are almost never the same. How important is that data? Could another application be designed to use that data? It's a lot easier to refactor an application than it is to refactor a database with a billion rows of data spread across thousands of tables.
Well, if you can get away with it, doing it in the application is probably the best way. Since it's a perfect example of the DRY principle.
Having said that however, getting away with it is always going to be hard to pull off since you're practically choosing to give up most database specific optimizations. (more so, with querying, but it still applies to schemas (indexes, etc)).
You'll probably end up changing the schema by hand anyway, and then you'll be stuck with a brittle database schema that's going to be the source of your worst nightmares :)
My 2 Cents
Design each based on their own requirements as much as possible. Trying to keep them in too rigid sync is a good illustration of increased coupling/decreased cohesion.
Come to think of it, ORMs can easily be used to spread coupling (even though it can be avoided to some degree).

Resources