Related
I am doing a project in the university and it includes a MySQL database. I have a design for the database in terms of a list of tables and their respective fields.
In what form should I present this design? Just the list of tables and content? In an ERD? How do you present your designs?
To clarify - whatever you answer, I expect not only specification of how you present your design, but also which tools do you use the create the diagrams/list/tables etc.
ERD is the only way to go. As they say, a picture is worth a thousand words.
But don't try to put the whole database on one diagram. It will, in all but the most trivial cases, be overwhelming to your audience to try to digest the entire database design in one go. Instead, break the diagrams into subject areas depicting only the most relevant tables in each diagram. For example, a point-of-sale system might have separate diagrams for Inventory, Sales, Accounting, Customer Management, Security, Auditing, and Reporting. Some tables will show up in more than one subject area -- this is to be expected.
As far as tooling, nothing beats ErWin, but it is really expensive and only available for Windows. Visio is ubiquitous in a corporate environment, but is only available on Windows and is not exactly cheap either. Macs offer some really nice diagramming tools; most of them are not free.
Dia is a decent, free, and cross-platform diagramming tool. It is a bit quirky, though; and I have not had much success making the diagrams look as nice I want them to look.
For MySQL, I have played with fabFORCE dbDesigner and it is not bad, but I did find its support for multiple subject areas to be a bit lacking at the time -- perhaps they've improved it since. But it is free and works on Windows and Linux.
For the actual presentation, I create images from these diagramming tools and pull them into presentation software (PowerPoint, KeyNote, or OpenOffice Impress). These presentations can be exported to PDF and distributed to the audience; they won't need anything more than a PDF viewer to review the information later.
Let's look at this from your professor's perspective. If I were him/her:
I would require an ERD. Without it, I cannot see one of the most fundamental issues of a database design, how are the tables related.
I would also expect some basic use cases/ requirements. What problems are you trying to solve with this database design?
I would want to see what indexes are in place, especiall on the foreign key columns. I would want to see expected row counts in all tables to determine if indexes are even required.
I would want to see column data types to determine if they meet the requirements. I would want to see what columns accept NULL values, since that often can cause problems if you're not careful.
If I were using SQL Server, I would probably create a diagram in SSMS to display a somewhat basic ERD. Visio can be used as well. I might use Visio to create my use cases, or perhaps Microsoft Word.
mysql workbench will make you pretty graphics for presentation amongst other many sophisticated features.
Depends on the audience. ERD certainly isn't the only answer and may not be the best. You should choose a medium that your audience will understand.
Don't forget to discuss design aspects that can't fit to ERD:
1) how inheritance/aggregation relationships from your analytical model implemented in your db.
2) how you are going to support hierarchies of your objects in the rdb (if you have any)
3) list relationships that are in your analytical model but are not supported by the rdb design.
4) ETL process, track changes, track schema changes, security based on resource.
5) storage partitioning and maintenance aspects (one of the goal optimize backup time)
6) in prod test (test island data) and easy cloning db for test environment
I asked this over on Superuser and it was suggested I try this here:
Can anyone recommend a quality source to learn databases? I am changing careers and have no background in computers but this is what I have chosen to do now.
I was thinking of taking an intro course at a community college but I have no problem teaching myself with a book and some software. I am looking to accelerate my learning curve and don't want to spend an entire semester on an introductory course if there is something better out there.
Any suggestions are welcome. Thank you.
Edit:
Thank you for the feedback. The MS site and Oracle look promising.
I am pursuing a career in software development. I have taken C++ and C# at a community college and was accepted to a masters program for the spring. What level of database knowledge/implementation is required to program in C++ at the master's level and not get handed your lunch?
I guess I need to know how databases are used in programming. I don't have the common core of experience in order to explain the question more thoroughly without tangentially departing into illusory ideals of a programming career.
At any rate, what you have provided is enough to get me going and it will, I am sure, uncover further areas of interest.
Thank you kindly.
There's a whole lot of aspects of "learning databases":
administering databases
designing databases (modeling)
implementing databases (might require adding business logic into triggers and stored procedures).
using databases (and this might be 'writing a query' to aspects of data mining)
And then there's what type of database. Most people these days assume RDBMS, but there's the push towards NoSQL and tuples stores, etc.
There's no one good answer without a little more information about what you're trying to do.
In addition to what people here would consider as database-related jobs (DBA, database developer, data modelling analyst), there are a lot of semi-technical jobs around that refer to themselves as "database people". In particular, you get report analyst -- often using tools like Crystal Reports, Cognos, SSRS -- and business analysts who work on large customer databases, often defining more how the data should be used in various marketing activities than doing much coding. So it would be up to you to decide if you wish to have a more "business" aspect to your career goal, or to work in data entry/management, or do development and design, or to become a specialist admin certified in a particular vendor's software.
In all cases, however, given you start from scratch, some understanding in how relational database systems work would be beneficial. Someone mentioned Element K courses. Personally, I've found the SQL Zoo interactive online course particularly engaging, and have used it with new staff who had to get up to speed with SQL skills in a short time. Also, this tutorial looks well done. For self-study, I recommend the Head First book series, as in Head First SQL. All these recommendations are extremely SQL-centric, though.
Microsoft also offers tutorials that go with Access and SQL Server Express, but I'd warn not to become locked in. SQLite is a free and extremely lightweight RDBMS, which is perfect for trying things out, either using a programming language (if you know one -- if not, I recommend Python).
As for recommending a provider, you're not even saying which country you're in (though from mentioning "community college" I guess it's the US), what your budget and mobility are like etc, so it's hard to decide. Maybe a mentor or advisor at your community college could help?
The IEEE Computer Society has some online Element K database courses available for members. IEEE runs memberships by the calendar year. Professional membership dues are $50 for a half year and $99 for a full year of professional membership. Student memberships are $40 and $20 for full and half year, respectively. This might be less than the cost of one course.
ACM has some online database courses in their catalog also. They also use Element K, so it seems likely they are the same courses. A full membership is also $99 per year, student is $19 per year. (Their membership year starts when you join.)
I assume your goal is to get a job, since you mention you are changing careers.
I understand you wish to accelerate your learning curve. Make sure you do lots of hands-on work, then, to make sure you know how to do things and have skills in databases. The skills, as they say, pay the bills.
I don't know what your background in computer programming, computer science, or information science / managed information science is. If you have some background into those topics, you might be able to make it into the stuff I recommend below.
Most databases are relational in businesses. Depending on the business, they run different datbases. The two really big ones are Oracle and SQL Server.
I've known a couple of friends who learned databases (and other enterprise software) by going online and reading as much documentation as they could put their hands (mouse?) on the Oracle and SQL Server websites, and for other enterprise software. They went and applied for jobs, had enough business sense / charisma, could read/write American business English well, and were good enough to get the job with little/no experience, and have done very well. It pays their bills, and is a stepping stone to bigger things. You might be able to do the same. Go on the Oracle or Microsoft websites, and grab the documentation. Read as much of it as you can, circle words you don't know, try to find the definitions. Find a local guru (that's the main advantage of the community college, they have smart people with communication skills who can help), and talk with them.
If your a tremendous reader, you could do this in a month.
EDIT:
As to how much "database" knowledge you need to know to get into your master's...
It depends on the program; I'm going to assume your'e going for a professional program and not a academic one. Depending on the master's program, you may/may not need to know databases. Consult your professor or advisor to see if this is something you could tackle right away.
You need to know, ultimately, how to "talk" to a database. Think of this like trying to do business with folks in a foreign country:
You'll need to learn the language (relational databases use a language called SQL; make sure you know how to formulate a basic query (SELECT), and modify the data in the database (INSERT / UPDATE / DELETE)).
Make sure you know the culture (what is a table, what is a row, what is a column, what is a tuple, what are keys (foreign/primary/candidate/alternate/super), what are triggers, what are the normal forms, what is an ERD diagram, what is relational theory)
Make sure you have safe transportation (learn APIs to communicate with the database in C++ and C#. Learn how you can run SQL queries on the database from your app and get resulting data back inside your app. Perhaps learn about ORM tools and how to get/set data that way).
The above is what an intro course covers.
Does anyone know of any design patterns for interfacing with relational databases? For instance, is it better to have SQL inline in your methods, or instantiate a SQL object where you pass data in and it builds the SQL statements? Do you have a static method to return the connection string and each method just gets that string and connects to the DB, performs its action, then disconnects as needed or do you have other structures that are in charge of connecting, executing, disconnecting, etc?
In otherwords, assuming the database already exists, what is the best way for OO applications to interact with it?
Thanks for any help.
I recommend the book Patterns of Enterprise Application Architecture by Martin Fowler for a thorough review of the most common answers to these questions.
There are a few patterns you can use:
Repository Pattern
Active Record Pattern
I personally would hate to work with a database without an ORM. NHibernate is preferable but iBatis is also an option for existing databases (not to say that NH can't handle existing databases).
In general, the best way for OO applications to interface with a relational database is through an ORM; while this isn't a design pattern per se, it's a type of tool that has a specific usage pattern, so it's similar enough. Object Relational Mapping (ORM) tools provide a mapping between a database and a set of objects in memory; usually, these tools provide means for managing things such as sessions, connections, and transactions. A good example of an ORM that works fantastically well would be Hibernate (NHibernate on .NET).
In my experience it is best to have no SQL statements at all (most ORMs will allow that), and it is best not to have any knowledge of connection details (connection string, etc). Even better if you can have the exact same piece of code working with any major db vendor.
POEAA has a wealth of knowledge on the issue if you intend to roll your own.
Everything you will identify by googling for DAL describes a design pattern. Seems like standing in the middle of the forest and asking to see a tree. There are dozens if not thousands.
Here's a quote from a book I'm reading to start with for looking for resources.
... it is impossible to discuss ORM without talking about patterns and best practices for building persistence layers. Then again, it is also impossible to discuss ORM patterns without calling out the gurus in the industry, namely Martin Fowler, Eric Evans, Jimmy Nilsson, Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, the last four of whom are known in the industry as the Gang of Four (GoF). The purposes of this chapter are to explain and expand some of the patterns created by these gurus and to provide concrete examples using language that every developer can understand.
Since I took the basic undergrad course in databases design and SQL I haven't really touched anything like this.
So my question is - how would the database schema for a site like this one would usually look like? What are you generally expected to find? For instance, how are questions and answers stored?
Are there some tools which allow you to design it? or is it just something the devs come up with?
Stack Overflow used the MediaWiki database schema as a template before they went through a database refactoring a few weeks back.
There are few principles in database design which im sure you pretty much know if you took that course. They all say basically that your data should not be duplicated across multiple tables and all your columns should be integral to the table they appear in. Then there are common entities aplied in all software design like objects and relations. Managing them is easiest part because it comes instinctively. Then there is optimization for scalability and performance which shouldn't be hard if first steps are done right. And this is usually done together with software team which is writing code for your db.
You can use tools to design a database, but they are normally just templates for creating the right shapes in a diagram.
The logical structure will be designed in the same way arn architect would design a building, using their best knowledge and experiences.
Also, always "work in pencil" until you are happy.
If you have to create an application like - let's say a blog application, creating the database schema is relatively simple. You have to create some tables, tblPosts, tblAttachments, tblCommets, tblBlaBla… and that's it (ok, i know, that's a bit simplified but you understand what i mean).
What if you have an application where you want to allow users to define parts of the schema at runtime. Let's say you want to build an application where users can log any kind of data. One user wants to log his working hours (startTime, endTime, project Id, description), the next wants to collect cooking recipes, others maybe stock quotes, the weekly weight of their babies, monthly expenses they spent for food, the results of their favorite football teams or whatever stuff you can think about.
How would you design a database to hold all that very very different kind of data? Would you create a generic schema that can hold all kind of data, would you create new tables reflecting the user data schema or do you have another great idea to do that?
If it's important: I have to use SQL Server / Entity Framework
Let's try again.
If you want them to be able to create their own schema, then why not build the schema using, oh, I dunno, the CREATE TABLE statment. You have a full boat, full functional, powerful database that can do amazing things like define schemas and store data. Why not use it?
If you were just going to do some ad-hoc properties, then sure.
But if it's "carte blanche, they can do whatever they want", then let them.
Do they have to know SQL? Umm, no. That's your UIs task. Your job as a tool and application designer is to hide the implementation from the user. So present lists of fields, lines and arrows if you want relationships, etc. Whatever.
Folks have been making "end user", "simple" database tools for years.
"What if they want to add a column?" Then add a column, databases do that, most good ones at least. If not, create the new table, copy the old data, drop the old one.
"What if they want to delete a column?" See above. If yours can't remove columns, then remove it from the logical view of the user so it looks like it's deleted.
"What if they have eleventy zillion rows of data?" Then they have a eleventy zillion rows of data and operations take eleventy zillion times longer than if they had 1 row of data. If they have eleventy zillion rows of data, they probably shouldn't be using your system for this anyway.
The fascination of "Implementing databases on databases" eludes me.
"I have Oracle here, how can I offer less features and make is slower for the user??"
Gee, I wonder.
There's no way you can predict how complex their data requirements will be. Entity-Attribute-Value is one typical solution many programmers use, but it might be be sufficient, for instance if the user's data would conventionally be modeled with multiple tables.
I'd serialize the user's custom data as XML or YAML or JSON or similar semi-structured format, and save it in a text BLOB.
You can even create inverted indexes so you can look up specific values among the attributes in your BLOB. See http://bret.appspot.com/entry/how-friendfeed-uses-mysql (the technique works in any RDBMS, not just MySQL).
Also consider using a document store such as Solr or MongoDB. These technologies do not need to conform to relational database conventions. You can add new attributes to any document at runtime, without needing to redefine the schema. But it's a tradeoff -- having no schema means your app can't depend on documents/rows being similar throughout the collection.
I'm a critic of the Entity-Attribute-Value anti-pattern.
I've written about EAV problems in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Here's an SO answer where I list some problems with Entity-Attribute-Value: "Product table, many kinds of products, each product has many parameters."
Here's a blog I posted the other day with some more discussion of EAV problems: "EAV FAIL."
And be sure to read this blog "Bad CaRMa" about how attempting to make a fully flexible database nearly destroyed a company.
I would go for a Hybrid Entity-Attribute-Value model, so like Antony's reply, you have EAV tables, but you also have default columns (and class properties) which will always exist.
Here's a great article on what you're in for :)
As an additional comment, I knocked up a prototype for this approach using Linq2Sql in a few days, and it was a workable solution. Given that you've mentioned Entity Framework, I'd take a look at version 4 and their POCO support, since this would be a good way to inject a hybrid EAV model without polluting your EF schema.
On the surface, a schema-less or document-oriented database such as CouchDB or SimpleDB for the custom user data sounds ideal. But I guess that doesn't help much if you can't use anything but SQL and EF.
I'm not familiar with the Entity Framework, but I would lean towards the Entity-Attribute-Value (http://en.wikipedia.org/wiki/Entity-Attribute-Value_model) database model.
So, rather than creating tables and columns on the fly, your app would create attributes (or collections of attributes) and then your end users would complete the values.
But, as I said, I don't know what the Entity Framework is supposed to do for you, and it may not let you take this approach.
Not as a critical comment, but it may help save some of your time to point out that this is one of those Don Quixote Holy Grail type issues. There's an eternal quest for probably over 50 years to make a user-friendly database design interface.
The only quasi-successful ones that have gained any significant traction that I can think of are 1. Excel (and its predecessors), 2. Filemaker (the original, not its current flavor), and 3. (possibly, but doubtfully) Access. Note that the first two are limited to basically one table.
I'd be surprised if our collective conventional wisdom is going to help you break the barrier. But it would be wonderful.
Rather than re-implement sqlservers "CREATE TABLE" statement, which was done many years ago by a team of programmers who were probably better than you or I, why not work on exposing SQLSERVER in a limited way to the users -- let them create thier own schema in a limited way and leverage the power of SQLServer to do it properly.
I would just give them a copy of SQL Server Management Studio, and say, "go nuts!" Why reinvent a wheel within a wheel?
Check out this post you can do it but it's a lot of hard work :) If performance is not a concern an xml solution could work too though that is also alot of work.