I am working on a group project and we are having a discussion about whether to calculate data that we want from an existing database and store it in a new database to query later, or calculate the data from the existing database every time we need to use it. I was wondering what the pros and cons may be for either implementation. Is there any advice you could give?
Edit: Here is more elaborate explanation. We have a large database that has a lot of information being submitted to it daily. We are building a system to track certain points of data. For example, we are getting the count of how many times a user does something that is entered in the database. Using this example (are actual idea is a bit more complex), we are discussing to methods of getting the count of actions per users. The first method is to create a database that stores the users and their action count, and query this database every time we need the action count. The second method would be to query the large database and count the actions per user every time we need to use it. I hope this explanation helps explain. Thoughts?
Edit 2: Two more things that may be useful to point out is 1: I only have read access to the large database and 2: My ultimate goal is to display this information on a web page for end users.
This is a generic question about optimization by caching. The following was my answer to essentially the same question. Even though that question provided a bunch of different details, none of them were specific enough to merit a non-generic answer either:
The more you want to calculate at query time, the more you want views,
calculated columns and stored or user routines. The more you want to
calculate at normalized base update time, the more you want cascades
and triggers. The more you want to calculate at some other (scheduled
or ad hoc) time, the more you use snapshots aka materialized views and
updated denormalized bases. You can combine these. Any time the
database is accessed it can be enabled by and restricted by stored
routines or other api.
Until you can show that they are in adequate, views and calculated
columns are the simplest.
The whole idea of a DBMS is to store a representation of your
application state as the database (which normalization reduces the
redundancy of) and then you query and let the DBMS implement and
optimize calculation of the answer. You haven't presented a reason for
not doing that in the most straightforward way possible.
[sic]
Always make sure an application is reading its own personal ("external") database that is a view of "the" ("conceptual") database so that when you change the implemention of the former (plus the rest of some combined interfact) by the latter (plus the rest of some compbined mechanisms) your applications do not have to change ("logical independence"). Here the applications are your users' and your trackers'.
Ultimately you must instrument and guestimate. When it is worth it you start caching. Preferably as much as possible in terms of high-level notions like views and snapshots and as little as possible in non-DBMS code. One of he benefits of the relational model is that it is easy to describe a strightforward relational interface in terms of another straightforward relational interface. You protect your applications from change by offering an interface that hides secrets of implementation or which of a family of interfaces is the current one.
People seem to avoid building user interfaces that pull their information (names, field types, etc. as well as relationships) from a database; they instead hard-code forms (and tables, etc.) that have pretty much the same data names and types and things.
Am I making sense?
For instance, imagine an emumerated field in MySQL: why not just have the UI construct a drop-down list whenever it encounters an ENUM? Why put the same values in both the database and the code?
Perhaps I'm just missing something; perhaps there are projects out there that do this — sort of super-crud interfaces that can be pointed at any database and from it build a fully-functional relationally-aware user interface. Are there?
I'm possibly not quite conforming to the stackoverflow norms with this question; I shall summarise:
Can you please tell me of a project that constructs its user interface (solely) from analysis of the database schema?
Why is this not a common way to do it — surely it is good to only define data structure in one place (i.e. the database)?
Thank you, and may joyous code-love rain upon your IDE.
I'd like to point out that, last time I checked, .NET and Qt (and probably other environments) make it possible to use "database-aware widgets" (sometimes shortened to just data-aware widgets), which is probably the best pragmatic solution available. What I mean by data-aware widgets is that the widgets themselves know that they're linked directly to database fields, so you would have a combobox that knows that it's backed by an enum and fetches the possible values directly from the database at runtime, just like you suggested.
This is a really neat utility, and used well, it probably won't hurt anything. It still requires that you spend some time laying out widgets manually on a form, but then if you update the database to add a new value to that enum, you don't have to rebuild your app to see it show up in the UI.
But the reason most usability experts will cringe when they hear your question is because programmers tend to think that, well, why not just generate the entire UI, form layout and everything, from the database? And this is where it starts to get really nasty, really fast.
Let's say you have a simple Person table, with first_name, last_name, email_address, street_address, city, state, zip, and phone_number. You want to automatically generate a UI based on these fields. How do you sort the fields? I mean, ideally, first name and last name should be right next to one another. And it would look very silly if you had city and state before street address. So you have to add a new column to the table to specify sort order, if you go with the quickest method, or a new table to specify each field's order index to their field ID.
What if you want to group parts of the information separately? Then you have to add more UI-specific cruft into your database layout (to do this generically, you'll need a new table specifying which UI fields belong to which UI groupboxes). So you've only solved two problems and already your database layout has gotten twice as ugly, plus now instead of a simple O(1) layout operation when you load the UI, you've gotta do several database queries to find out what fields exist and dynamically lay them out while applying the correct widget order... and we haven't even dealt with sizing (should every field be the maximum size to fit its possible contents, or should all text fields be the same width? Wouldn't it be nice if you could say that some text fields should be one width and height, and some should be another combination? etc), or text justification, or formatting, or any other really common elementary usability requirements that will require further sacrifices from the clarity and simplicity of your database schema.
Most visual database editors. phpMyAdmin for instance.
Because the database structure isn't always a very good logical structure for a user to be using, especially in the case of databases that have been denormalised on purpose for efficiency reasons.
Yup, this route has already been traveled.
Simply pointing at a database will create an oversimplified UI, not giving much more than the CRUD of an Access UI. That's why Naked Objects (I'm one of its committers) builds its metamodel from a pojo domain model. This allows the UI to expose any public methods as menus ... we call this "behaviourally complete".
Per the comment about the UI not being suitable for end-users, I have two points:
distinguish between power users vs casual users. Most internal apps are for the former (we use Alan Coopers' term of a "sovereign application" for this), who understand the domain and don't want fancy UI stuff getting in the way. Most external apps, eg public web sites, are for the latter.
for the latter, there's nothing to prevent the autogenerated UI of a tool like Naked Objects being replaced with a custom or semi-customized viewer. One such viewer is Scimpi, I'm also working on an Eclipse RCP viewer that'll expose extension points. But even here, the auto gen UI is still very valuable for the development team and business analysts for exploration and prototyping.
Hope some of the above has piqued your interest. If you want more, google around, or you might want to check out my book on domain-driven design and NO, at pragprag.com.
HTH
Dan
List of projects that implement this idea.
.NET
dotObjects
Naked Objects
TrueView
Java
Domain Object Explorer
JMatter
Naked Objects
Sanssouci
Trails
Lablz
C++
Typical Objects
Specifically to your second question: Alot of it really depends on your data model. Some are very complicated and would lead to un-intuitive user interfaces. Perhaps for simply CRUD based systems, having your UI be a front end to the database would be preferable. In that case, I think that some of these tools would be great. However, for some more complicated systems where some db data needs to be hidden from the users, it would be better if you UI didn't mirror the db schema.
Microsoft Access has used this model for years - the database and UI development are very closely tied. You can auto-generate a form directly from a table definition with smart defaults and search built in. The model works well for developing applications with few concurrent users such as custom applications for small businesses where the amount of data stored is small.
If you are scaling to larger relational DBs with a number of concurrent users, or large databases then reliability and performance become more important, and separately constructed UI and databases make more sense. When more users are involved they often have different requirements so decoupling the UI from the DB schema makes it more efficient to develop.
Just a note on Java "projects that implement this idea" - tynamo is the new version of Trails framework
There are many systems that build an interface for you to edit stuff directly from table information. End-user interfaces, however, must be tweaked a little bit. You may not want to reveal to the user every field in your table.
Frameworks that make good use of the MVC design pattern can let you do all kinds of things with your models, which are the preferred way to build new systems (rather than creating database tables directly).
To answer your questions specifically:
django allows you to construct forms (and a complete admin CMS) out of models.
It is a common thing to do.
Naked Objects is about one step removed from this. They base the UI on an object model, and then persist the object model.
I think you are forgetting to consider the user in your design process if you are thinking like that. Bad mistake. Users don't like it when the interface changes, they would especially not like it if it changed frequently as they then wouldn't know what to do. Further, if you generate your UI on the fly based on the database structure, then what order would the objects be in? UIs need to have objects in an order that makes sense to the users not the database designers.
Further in a well-designed database there are fields that are not meant for the users to see. Things like numeric keys, insert date, last updated etc. You don't want to automatically expose these to the users and you certainly don't want them to have the ability to mess with the data in such fields.
Finally, if you don't think about the functionality of the page, then you aren't doing your job. A UI needs to be more than just a list of fileds that can be edited. You need to have constraints on who can see what, checks of the data before inserting to the database, business rules that need to be applied. You can't just autogenerate a lot of this (and you shouldn't even if you could!). Design needs thought and care.
Now as to drop down lists, of course you can generate them from the database and not the code, in fact it is the better choice. Just make the query the source for your particular object, not a list generated in code.
You can do it with the help of this cool tool from a developer in Philippines, it is called COBALT. You can download it here.
Sorry if this seems a little crazy but ive been messing around with NHibernate for a while and have come accross a scenario that may not be possible to solve with NHibernate...
I have 1 database that contains a load of static data, imagine it like a huge product lookup database, just to point out this is an example scenario, my actual one is a bit more complex but similar principle to this... It is hosted on a completely different box so i cant do "database2.table1.somecolumn" which i noticed as a possible way round the issue if the 2 DBs were in the same box and server.
Anyway i also have another DB which contains data relating to users, so imagine a user has bought a load of stuff from Generic Website A, you have a list of IDs pertaining to what they have bought and an amount of how many they bought as well as some other information, but the actual data relating to the product is stored in the other database...
So if you imagine you want to combine this data into a PreviousPurchasedProduct model which contained all the information from the 1st database and the additional data from the 2nd db you would have to do a query similar to this: (if they were all on one box)
SELECT db1.products., db2.purchases.
FROM db2.purchases
INNER JOIN db1.products ON db2.purchases.product_id = db1.products.id
WHERE db2.purchases.user_id = XXX;
Now first of all is it possible to map this sort of thing up even though they are in seperate DB hosts, im guessing not and if thats the case can you achieve this flexibility via a child class. So having a product class that purely works off db1 and a derived class that takes the purchases info that only works from db2.
Also is it possible to restrict the db1 portion of data from INSERT/UPDATE/DELETE statements, im pretty sure you can in the default mappings but as this would be outside of the per class scope im not sure what flexibility i have...
Thanks for reading my waffle :D
I would recommend you first understand how to solve this problem without NHibernate. Once you have one or more clean solutions that work without NHibernate, come back and update your question to say something like "how do I represent this SQL in NHibernate?".
Querying across databases can quickly bring database vendor specific quirks into play and you never mention which database vendor(s) you are dealing with. If both databases are the same vendor, you may be able to link the databases somehow using database vendor specific techniques (but linking isn't necessarily a good solution, so you'll want to try to uncover alternatives as well).
I am working on an new web app I need to store any changes in database to audit table(s). Purpose of such audit tables is that later on in a real physical audit we can asecertain what happened in a situation, who edited what and what was the state of db at the time of e.g. a complex calculation.
So mostly audit table will be written and not read. Report may be generated though sometimes.
I have looked for available solution
AuditTrail - simple and that is why I am inclining towards it, I can understand it single file code.
Reversion - looks simple enough to use but not sure how easy it would be to modify it if needed.
rcsField seems to be very complex and too much for my needs
I haven't tried anyone of these, so I wanted to know some real experiences and which one I should be using. e.g. which one is faster uses less space, easy to extend and maintain?
Personally I prefer to create audit tables in the database and populate through triggers so that any change even ad hoc queries from the query window are stored. I would never consider an audit solution that is not based in the database itself. This is important because people who are making malicious changes to the database or committing fraud are not likely to do so through the web interface but on the backend directly. Far more of this stuff happens from disgruntled or larcenous employees than outside hackers. If you are using an ORM already, your data is at risk because the permissions are at the table level rather than the sp level where they belong. Therefore it is even more important that you capture any possible change to the dat not just what was from the GUI. WE have a dynamic proc to create audit tables that is run whenever new tables are added to the database. Since our audit tables populate only the changes and not the whole record, we do not need to change them every time a field is added.
Also when evaluating possible solutions, make sure you consider how hard it will be to revert the data to undo a specific change. Once you have audit tables, you will find that this is one of the most important things you need to do from them. Also consider how hard it will be to maintian the information as the database schema changes.
Choosing a solution because it appears to be the easiest to understand, is not generally a good idea. That should be lowest of your selction criteria after meeting the requirements, security, etc.
I can't give you real experience with any of them but would like to make an observation.
I assume by AuditTrail you mean AuditTrail on the Django wiki. If so, I think you'll want to instead look at HistoricalRecords developed by the same author (Marty Alchin aka #gulopine) in his book Pro Django. It should work better with Django 1.x.
This is the approach I'll be using on an upcoming project, not because it necessarily beats the others from a technical standpoint, but because it matches the "real world" expectations of the audit trail for that application.
As i stated in my question rcField seems to be to much for my needs, which is simple that i want store any changes to my table, and may be come back later to those changes to generate some reports.
So I tested AuditTrail and Reversion
Reversion seems to be a better full blown application with many features(which i do not need), Also as far as i know it saves data in a single table in XML or YAML format, which i think
will generate too much data in a single table
to read that data I may not be able to use already present db tools.
AuditTrail wins in that regard that for each table it generates a corresponding audit table and hence changes can be tracked easily, per table data is less and can be easily manipulated and user for report generation.
So i am going with AuditTrail.
I've thought about this too much now with no obviously correct solution. It might be a real wood-for-the-trees situation, so I need stackoverflow's help.
I'm trying to enforce database filtering on a regional basis. My system has various users and each one is assigned to a regional office. I only want users to be able to see data that is associated with their regional office.
Put simply my application is: Java App -> JPA (hibernate) -> MySQL
The database contains object from all regions, but I only want the users to be able to manipulate objects from their own region. I've thought about the following ways of doing it:
1) modify all database querys so they read something like select * from tablex where region="myregion". This is nasty. It doesn't work to well with JPA eg the entitymanager.find() method only accepts primary key. Of course I can go native, but I only have to miss one select statement and my security is shot
2) use a mysql proxy to filter results. kind of funky, but then the mysql proxy just sees the raw call and doesn't really know how it should be filtering them (ie which region the user that made this request belongs to). Ok, I could start a proxy for each region, but it starts getting a little messy..
3) use separate schemas for each region. yeah, simple, I'm using spring so I could use the RoutingDataSource to route the requests via the correct datasource (1 datasource per schema). Of the course the problem now is somewhere down the line I'm going to want to filter by region and some other category. ohps.
4) ACL - not really sure about this. If a did a select * from tablex; would it quietly filter out objects I don't have access for or would a load of access exceptions be thrown?
But am I thinking too much about this? This seems like a really common problem. There must be some easy solution I'm just too dumb to see. I'm sure it'll be something close to / or in the database as you want to filter as near to source as possible, but what?
Not looking to be spoonfed - any links, keywords, ideas, commerical/opensource product suggestions would be really appreciated!! thanks.
I've just been implementing something similar (REALbasic talking to MySQL) over the last couple of weeks for a hierarchical multi-company extension to an accounting package.
There's a large body of existing code which composes SQL statements so we had to live with that and just do a lot of auditing to ensure the restrictions were included in each table as appropriate. One gotcha was related lookups where lookup tables were normally only used in combination with a primary table but for some maintenance GUIs would load the lookup table itself, directly.
There's a danger of giving away implied information such as revealing that Acme Pornstars are a client of some division of the company ;-)
The only solution for that part was very careful construction of DB diagrams to show all implied relationships and lots of auditing and grepping source code, with careful commenting to indicate areas which had been OK'd as not needing additional restrictions.
The one pattern I've come up with to make this more generalised in future is, rather than explicit region=currentRegionVar type searches, using an arbitrary entityID which is supplied by a global CurrentEntityForRole("blah") function.
This abstraction allows for sharing of some data as well as implementing pseudo-entities which represent other restriction boundaries.
I don't know enough about Java and Spring to be able to tell but is there a way you could use views to provide a single-key lookup, where the views are restricted by the region filter?
The desire to provide aggregations and possible data sharing was why we didn't go down the separate database route.
Good Question.
Seems like #1 is the best since it's the most flexible.
Region happens to be what you're filtering on today, but it could be region + department + colour of hair tomorrow.
If you start carving up the data too much it seems like you'll be stuck working harder than necessary to glue them all back together for reporting.
I am having the same problem. It is hard to believe that such a common task (filtering a list of model entities based on the user profile) has not a 'standard' way, pattern or best-practice to do it.
I've found pgacl, a PostgreSQL module. Basically, you do your query like you normally would, and then you tack on an acl_access() predicate to work as a filter.
Maybe there is something similar for MySQL.
I suggest you to use ACL. It is more flexible than other choices. Use Spring Security. You can use it without using Spring Framework. Read the tutorial from link text