I need to store data's change histories in database. For example some time some user modify some property of some data. The expected result is we can get the change histories for one data like
Tom changed title to 'Title one;'
James changed name to 'New name'
Steve added new_tag 'tag23'
Based on these change histories we can get all versions for some data.
Any good idea to achieve this? Not limited to traditional relation database.
These are commonly called audit tables. How I generally manage this is using triggers on the database. For every insert/update from a source table the trigger copies the data into another table called the same table name with an _AUDIT appended to it (the naming convention does not matter, it's just what I use). ORACLE provides you with something called journal tables. Using ORACLE designer (or manually) you can achieve the same thing and often developers put a _JN to the end of the journal/audit table. This, however, works the same, with triggers on the source table copying data into the audit table.
EDIT:
I should also note that you can create a new separate schema to manage just your audit tables or you can keep them in your schema with the source tables. I do both, it just depends on the situation.
I wrote an article about various options: http://blog.schauderhaft.de/2009/11/29/versioned-data/
If you are not tied to a relational database, there are things called 'append only' databases (I think), which never change data, but only append new versions. For your case this sounds kind of perfect. Unfortunately I don't know of any implementation.
Related
I am working on a project to create a simplified version of SQLite Database. I got stuck when trying to figure out how does it manages to store data of multiple tables with different schema, in a single file. I suppose it should be using some indexes to map the data of different tables. Can someone shed more light on how its actually done? Thanks.
Edit: I suppose there is already an explanation in the docs, but looking for some easier way to understand it better and faster.
The schema is the list of all entities (tables, views etc) (the database as a whole) rather than a database existing of many schemas on a per entity basis.
Data itself is stored in pages each page being owned by an entity. It is these blocks that are saved.
The default page size is 4k. You will notice that the file size will always be a mutliple of 4K. You could also, with experimentation create a database with some tables, note it's size, then add some data, and if the added data does not require another page, see that the size of the file is the same. This demonstrating how it's all about pages rather than a linear/contiguos stream of data.
It, the schema, is saved in a table called sqlite_master. This table has columns :-
type (the type e.g. table etc),
name (the name given to the entity),
tbl_name (the tale to which the entity applies )
root page (the map to the first page)
sql (the SQL used to generate the entity, if any)
note that another schema, sqlite_temp_master, may also exist if there are temporary tables.
For example :-
Using SELECT * FROM sqlite_master; could result in something like :-
2.6. Storage Of The SQL Database Schema
My application has a database table that is used to record the attendance of employees. And the column attedance_status has only three possible values - "present", "absent", "on_leave", and NULL as default.
How do I add it to the database? So far I have come up with two possible ways.
Create another table attendance_status with status_id and status_value and add the above values to it. And then use the id in the application for all SQL queries.
Probably the bad way. Hardcode the values (maybe in a config file) and use it throughout the app's SQL queries.
Am I missing the right way? How should this be approached?
Either will work, but Option 1 will give you flexibility in the event that the requirements change and is the standard data model. I would, however, name my columns a little differently. I would have id, value, name. Then the references become attendance_status.id and attendance_status.value. The third column is available for use in displays or reports or whatever. value is on_leave and name is On leave.
Option 2 works provided the data input point is totally closed. If someone codes new functionality there is the risk that he or she will invent something different to mean the same thing like onLeave.
The setup
I have the following database setup:
CentralDB
Table: Stores
Table: Users
Store1DB
Table: Orders
Store2DB
Table: Orders
Store3DB
Table: Orders
Store4DB
Table: Orders
... etc
CentralDB contains the users, logging and a Stores table with the name of each store database and general information about each store such as address, name, description, image, etc...
All the StoreDB's use the same structure just different data.
It is important to know that the list of stores will shrink and increase in the future.
The main client communicating with this setup is an API REST Service which gets passed a STOREID in the Header of each request telling it which database to connect to. This works flawlessly so far.
The reasoning
Whenever we need to do database maintenance on one store, we don't want all other stores to be down.
Backup management should be per store
Not having to write the WHERE storeID=x every time and for every table
Performance: each store could run on its own database server if the need arises
The goal
I need my REST API Service to somehow get all orders from all stores in one query.
Will you help me figure out a way to do this without hardcoding all storedb names? I was thinking about a stored procedure on the CentralDB but I was hoping there would be other solutions. In any case it has to be very efficient.
One option would be to have a list of databases stored in a "system" table in CentralDB.
Then you could create a stored procedure that would read the database names from the table, loop through them with cursor and generate a dynamic SQL that would UNION the results from all the databases. This way you would get a single recordset of results.
However, this database design is IMHO flawed. There is no reason for using multiple databases to store data that belongs to the same "domain". All the reasons that you have mentioned can be solved by using a single database with proper database design. Having multiple databases will create multiple problems on the long term:
you will need to change structure of all the DBs when you modify your database model
you will need to create/drop new databases when new stores are added/removed from your system
you will need to have items and other entities that are "common" to all the stores duplicated in all the DBs
what about reporting requirements (e.g. get sales data for stores 1 and 2 together, etc.) - this will require creating complex union queries...
etc...
On the long term, managing and maintaining this model will be a big pain.
I'd maintain a set of views that UNION ALL all the data. Every time a store is added or deleted those views must be updated. This can be automated.
The views provide an illusion to the application that there is only one database.
What I would not do is have each SQL query or procedure query all the database names and create dynamic SQL. That would entail lots of code duplication and an unnecessary loss of performance. This approach is error prone. Better generate code once in a central place and have all other SQL code reference that generated code.
I'm working on a data conversion utility which can push data from one master database out to a number of different databases. The utility its self will have no knowledge of how data is kept in the destination (table structure), but I would like to provide writing a SQL statement to return data from the destination using a complex SQL query with multiple join statements. As long as the data is in a standardized format that the utility can recognize (field names) in an ADO query.
What I would like to do is then modify the live data in this ADO Query. However, since there are multiple join statements, I'm not sure if it's possible to do this. I know at least with BDE (I've never used BDE), it was very strict and you had to return all fields (*) and such. ADO I know is more flexible, but I don't know quite how flexible in this case.
Is it supposed to be possible to modify data in a TADOQuery in this manner, when the results include fields from different tables? And even if so, suppose I want to append a new record to the end (TADOQuery.Append). Would it append to two different tables?
The actual primary table I'm selecting from has a complimentary table which is joined by the same primary key field, one is a "Small" table (brief info) and the other is a "Detail" table (more info for each record in Small table). So, a typical statement would include something like this:
select ts.record_uid, ts.SomeField, td.SomeOtherField from table_small ts
join table_detail td on td.record_uid = ts.record_uid
There are also a number of other joins to records in other tables, but I'm not worried about appending to those ones. I'm only worried about appending to the "Small" and "Detail" tables - at the same time.
Is such a thing possible in an ADO Query? I'm willing to tweak and modify the SQL statement in any way necessary to make this possible. I have a bad feeling though that it's not possible.
Compatibility:
SQL Server 2000 through 2008 R2
Delphi XE2
Editing these Fields which have no influence on the joins is usually no problem.
Appending is ... you can limit the Append to one of the Tables by
procedure TForm.ADSBeforePost(DataSet: TDataSet);
begin
inherited;
TCustomADODataSet(DataSet).Properties['Unique Table'].Value := 'table_small';
end;
but without an Requery you won't get much further.
The better way will be setting Values by Procedure e.g. in BeforePost, Requery and Abort.
If your View would be persistent you would be able to use INSTEAD OF Triggers
Jerry,
I encountered the same problem on FireBird, and from experience I can tell you that it can be made(up to a small complexity) by using CachedUpdates . A very good resource is this one - http://podgoretsky.com/ftp/Docs/Delphi/D5/dg/11_cache.html. This article has the answers to all your questions.
I have abandoned the original idea of live ADO query updates, as it has become more complex than I can wrap my head around. The scope of the data push project has changed, and therefore this is no longer an issue for me, however still an interesting subject to know.
The new structure of the application consists of attaching multiple "Field Links" on various fields from the original set of data. Each of these links references the original field name and a SQL Statement which is to be executed when that field is being imported. Multiple field links can be on one single field, therefore can execute multiple statements, placing the value in various tables, etc. The end goal was an app which I can easily and repeatedly export a common dataset from an original source to any outside source with different data structures, without having to recompile the app.
However the concept of cached updates was not appealing to me, simply for the fact pointed out in the link in RBA's answer that data can be changed in the database in the mean-time. So I will instead integrate my own method of customizable data pushes.
We have a program in which each user is given their own Access database. We'd like to merge these all together into a single SQL Server database.
The problem is that, using the SQL Server import/export wizard, the primary/foreign keys do not get updated. So for instance if one user has this table:
1 Apple
2 Banana
and another user has this:
1 Coconut
2 Cheeseburger
the resulting table looks like this:
1 Apple
2 Banana
1 Coconut
2 Cheeseburger
Similarly, anything that referenced Banana by its primary key (2) is now referencing both Banana and Cheeseburger, which will not make the vegans very happy.
Is there any way to automatically update the primary/foreign key references when importing, other than writing an extremely long and complex import-script?
If you need to keep them fully compartmentalized, you have to assign some kind of partitioning column to each table. Is there a reason you need your SQL Server to have the same referential integrity as Access? Are you just importing to SQL Server for read-only reporting? In that case, I would not bother with RI. The queries will all require a partitionid/siteid/customerid. You could enforce that for single-entity access by wrapping tables with a table-valued UDF which required the partitionid. For cross-site that doesn't work.
If you are just loading to SQL Server for reporting, I would also consider altering the data model to support reporting (i.e. a dimensional model is sometimes better than a normalized model) instead of worrying about transaction processing.
I think we need to know more about the underlying goals.
Need more information of requirements.
My basic question is 'Do you need to preserve the original record key?' e.g. 1:apple in table T of user-database A; 1:coconut in table T of user-database B. Table T is assumed to have the same structure in all database instances. Reasons I can suppose that you may want to preserve the original data: (a) you may have a requirement to the reference the original data (maybe a visual for previous reporting), and/or (b) there may be a data dependency in the application itself.
If the answer is 'no,' then you are probably interested only in preserving all of the distinct data values. Allow the SQL table to build using a new key and constrain the SQL table field such that it contains unique data. This approach seems to preserve the original table structure (but not the original key value or its 'location') and may suffice to meet your requirement.
If the answer is 'yes,' I do not see a way around creating an index that preserves a pointer to the original database and the key that was created in its table T. This approach would seem to require an application modification.
The best approach in this case is probably to split the incoming data into two tables: one to identify the database and original key, another to identify the distinct data values. For example: (database) table D has records such as 'A:1:a,' 'A:2:b,' 'B:1:c,' 'B:2:d,' 'B:15:a,' 'C:8:a'; (data) table T1 has records such as 'a:apple,' 'b:banana,' 'c:coconut,' 'd:cheeseburger' where 'A' describes the original database 'location,' 1 is the original value in location 'A,' and 'a' is a value that equates records in table D and table T1. (Otherwise you have a lot of redundant data in the one table; e.g. A:1:apple, B:15:apple, C:8:apple.) Also, T1 has a structure similar to the original T and is seems to be more directly useful in the application.
Ended up creating an SSIS project for this. SSIS is a visual programming tool made by Microsoft (and part of their "Business Integration Studio", which comes with SQL Server) designed for solving exactly these sorts of problems.
Why not let Access use its replication manager to merge the databases? This will allow you to identify the conflicts and resolve them before importing to SQL Server. I'm fairly confident it will retain the foreign key relationships. If I understand your situation correctly, and the databases are the same structure with different data, you could load the combined database to the application and verify the data before moving to SQL Server.
What version of Access are you using? Here's a link for Access 2000. Use the language to adjust search parameters to fit your version.
http://technet.microsoft.com/en-us/library/cc751054.aspx