So I've just read a bunch of the Cakephp model saving related data questions here on stack, but I am not finding what I'm looking for. Beyond the obvious technical issue, I have the distinct feeling that I am doing it wrong. My question is this: If you have an organizations table, and a users table, and you want to link them with a lookup table, so that neither is associated with another except by the linking association in the lookup table, how would you do it? Is a lookup table advisable in Cakephp, or is that a horrible hold-over from my sql days that needs to die? What is best practice here? HABTM what I need? Furthermore, how do you learn this stuff? I try things I think might work, but they turn out kludgy at best.
Related
I am an unexperienced computer science student and while making projects for different courses a few conceptual questions occurred.
Say I am to develop a website similar to imdb, but for music, from scratch and I want to list some artists on the frontpage.
The database schema is already done with all its relationship and attributes, and there is a table artists.
Should my server-side artist-class contain all table columns and relationships at creation time even it is not necessarily needed at that time?
Or should I construct these objects with minimal parameters (like id, name) and get all the rest when needed (resulting in more individual sql statements) via helper-methods?
I know that there is maybe no definitive answer except for 'it depends' or boils down to personal preference, but maybe there is even a consensus.
If someone could name or link to resources to read up on things like this I would be very grateful, I didn't know what to search for exactly. Thanks.
PS: For people wondering why I don't ask these questions in the CS course; they are mostly held by students/assistants who only had to pass the course and don't have that much experience themselves.
I am not sure what this means so I am answering assuming this does not exist in the question. Will edit answer when clarification is given.
Or should I construct these objects with minimal parameters (like id, name) and get all the rest when needed (resulting in more individual sql statements) via helper-methods?
Actual answer starts here
It does not boil down to personal preference but whether you can or cannot find a practical reason to do something. All design patterns follow practicality instead of personal preferences. Even if there is a consensus you can always ask why.
If there are 100 tables in the database already present and in my web application I can get by with just 2 of them I don't see a reason why I should sit down and create all 100 tables in my web application's domain model. It's just not logical.
There may be some cases when a big application is being created and we are like 99% sure that we will need to model all of it and that requires us to model a bit more classes (say 5 instead of 2) for ensuring that our future work is not hindered.
Also there is the concern of data integerity. Does those 2 tables depend on some other table? Do some table depend on them? If there is a dependency then you might need to include those tables also.
FYI such questions are better suited on programmers stackexchange
I am starting on a ASP.NET MVC 3 General Management System (Project Management being the first component). Now I have been reading up a bit on RavenDB and it sounds pretty interesting. One of the biggest things that I like about it is the fact I would not need any type on ORM to handle the data from the DB. This will make my code a lot cleaner and quicker. However coming from a background working exclusively with MySQL for the past 6+ years, I tend to think very relationally with my data. There are a few things that seems like NoSQL would not be good for. I want to throw these things out there and maybe these issues can be handle in a NoSQL solution and I am just think too relationally (then again, maybe this project should be done with MySQL). These are the issues I am thinking of:
Unique Idenifiers: I am going to want to be able to have unique identifiers for a lot of things. For stuff like projects, the name should be unique and could use that however when it come to tasks under a project, the title may not be unique and this is where I would use a quto-increment field but I can do that in RavenDB (from what I can tell)
Linking: Using for fields like status and type I would just use a linking with a foreign key. Now for one-to-many relationships, I can just use the text instead of trying to link a foreign key (which you don't have in NoSQL) but with many-to-many linking, that because a problem. For example, I intend to have a tagging system (like on here) where most items can have 1 to many tags attached to it and then I can perform searches on those tag for the items. Is there a way to do this in NoSQL?
Is a RDBMS really the best tool for the job here or am I just not properly think the "NoSQL" way and I can accomplish this with NoSQL (RavenDB)?
I know this is an old post. Perhaps the docs weren't as good when originally written. But for reference in case other stumble here:
Raven comes with a HiLo document id generation strategy by default. Storing a new document without specifying an id yourself will get an auto incrementing id such as "projects/1", "projects/2", etc. Read more here.
The best guidance on the different ways to handle document relationships is here in the documentation. For the situation you described, you don't really need a separate document at all. You can simply embed a string array of tag names into each item. Documents are not flat, they can be structured. And yes, you can still query on them.
Hopefully you've discovered this on your own since the original post.
Ayende wrote a post "Modeling reference data in RavenDB" which answers some of your questions re Linking. You will have copies of the data between the reference document and your other documents and that redundancy is "ok" for document databases. You can still build indexes or query based on the on either Id or text that you store.
I would favor SQL for a transaction system such as Accounts Receivable application where you need to perform ad hoc queries. With document database you really need to think through how you will be fetching your data and build indexes up front to answers those questions. With RavenDB there is also a dynamic indexing function that learns from and caches the queries that are fired at the database.
For project management where the majority of items would be tasks I would think a RavenDB would fit your needs.
I'm working with a client who has a piece of custom website software that has something I haven't seen before. It has a MySQL Database backend, but most of the tables are auto-generated by the php code. This allows end-users to create tables and fields as they see fit. So it's a database within a database, but obviously without all the features available in the 'outermost' database. There are a couple tables that are basically mappings of auto-generated table names and fields to user-friendly table names and fields.* This makes queries feel very unintuitive :P
They are looking for some additional features, ones that are immediately available when you use the database directly, such as data type enforcement, foreign keys, unique indexes, etc. But since this a database within a database, all those features have to be added into the php code that runs the database. The first thing that came to my mind is Inner Platform Effect* -- but I don't see a way to get out of database emulation and still provide them with the features they need!
I'm wondering, could I create a system that gives users nerfed ability to create 'real' tables, thus gaining all the relational features for free? In the past, it's always been the developer/admin who made the tables, and then the users did CRUD operations through the application. I just have an uncomfortable feeling about giving users access to schema operations, even when it is through the application. I'm in uncharted territory.
Is there a name for this kind of system? Internally, in the code, this is called a 'collection' system. The name of 'virtual' tables and fields within the database is called a 'taxonomy'. Is this similiar to CCK or the taxonomy modules in Drupal? I'm looking for models of software that do this kind of this, so I can see what the pitfalls and benefits are. Basically I'm looking for more outside information about this kind of system.
Note this is not a simple key-value mapping, as the wikipedia article on inner-platform effect references. These work like actual tuples of multiple cells -- like simple database tables.
I've done this, you can make it pretty simple or go completely nuts with it. You do run into problems though when you put it into customers' hands, are we going to ask them to figure out primary keys, unique constraints and foreign keys?
So assuming you want to go ahead with that in mind, you need some type of data dictionary, aka meta-data repository. You have a start, but you need to add the ideas that columns are collected into tables, then specify primary and foreign keys.
After that, generating DDL is fairly trivial. Loop through tables, loop through columns, build a CREATE TABLE command. The only hitch is you need to sequence the tables so that parents are created before children. That is not hard, implement a http://en.wikipedia.org/wiki/Topological_ordering
At the second level, you first have to examine the existing database and then sometimes only issue ALTER TABLE ADD COLUMN... commands. So it starts to get complicated.
Then things continue to get more complicated as you consider allowing DEFAULTS, specifying indexes, and so on. The task is finite, but can be much larger than it seems.
You may wish to consider how much of this you really want to support, and then make a value judgment about coding it up.
My triangulum project does this: http://code.google.com/p/triangulum-db/ but it is only at Alpha 2 and I would not recommend using it in a Production situation just yet.
You may also look at Doctrine, http://www.doctrine-project.org/, they have some sort of text-based dictionary to build databases out of, but I'm not sure how far they've gone with it.
I am practicing SQL, and suddenly I have so many tables. What is a good way to organize them? Is there a way to put them into different directories?
Or is the only option to create a tablespace as explained here?
It depends what you mean by organise - tablespaces are really focused on organising storage.
For organising tables, grouping them into different SCHEMAS may be more useful.
This is more like the concept of a 'namespace' - i.e. schema1.people is not the same as schema2.people.
It often pays off to separate Operational and Configuration data into different schemas.
If you are talking about organising tables within a schema - and in a real world application, having hundreds of tables in one schema is not unknown - then all you can really do is come up with good naming conventions.
Some places group tables with prefixes at the start of the table name. Personally, I think this leads to duplication - EMP_ADDRESSES and CUST_ADDRESSES rather than a properly linked Addresses.
It depends why you want to organise them and why (and when) you're creating them. If the number is just overwhelming when you look in, say, user_tables, then splitting into tablespaces wont help much as you'd need to specify which one you wanted to query each time. And there isn't really a 'directory' equivalent.
If you're creating practice tables just to experiment with mini projects, then one option might be to create a new Oracle user for each project and create all the related tables under that user schema. Then you'd only see relevant tables when logged in as that user, while working on that project. This has the advantage of allowing you to reuse table names, which can simplify things a bit of you're doing lots of similar projects.
You should also probably be thinking about tidying up a bit, dropping tables when you're sure you've finished that bit of experimentation.
They are allready organised because they are in a database and you have a repository.
I'm implementing a web - based application using silverlight with an SQL Server DB on the back end for all the data that the application will display. I want to ensure that the application can be easily scalable and I feel the direction to go in with this is to make the database loosely coupled and not to tie everything up with foreign keys. I've tried searching for some examples but to no avail.
Does anyone have any information or good starting points/samples/examples to help me get off the ground with this?
Help greatly appreciated.
Kind regards,
I think you're mixing up your terminology a bit. "Loosely coupled" refers to the desirability of having software components that aren't so dependent upon each other that they can't function or even compile without being together in the same program. I've never seen the term used to describe the relationships between tables in the same database.
I think if you search on the terms "normalization" and "denormalization" you'll get better results.
Unless you're doing massive amounts of inserts at a time, like with a data warehouse, use foreign keys. Normalization scales like crazy, and you should take advantage of that. Foreign keys are fast, and the constraint really only holds you back if you're inserting millions upon millions of records at a time.
Make sure that you're using integer keys that have a clustered index on them. This should make joining table very rapid. The issues you can get yourself wrapped around without foreign keys are many and frustrating. I just spent all weekend doing so, and we made a conscious choice to not have foreign keys (we have terabytes of data, though).
Before you even think of such a thing, you need to think about data integrity. Foreign keys exist so that you cannot put records into tables if the primary data they are based on is not there. If you do not use foreign keys, you will sooner or later (probably sooner) end up with worthless data because you don't really know who the customer is that the order is attached to for instance. Foreign keys are data protection, you should never consider not using them.
And even though you think all your data will come from your application, in real life, this is simply not true. Data gets in from multiple applications, from imports of large amounts of data, from the query window (think about when someone decides to update all the prices they aren't going to do that one price at a time from the user interface). Data can get into database from many sources and must be protected at the database level. To do less is to put your entire application and data at risk.
Intersting comment about database security when data is input through external sources like database scripts.