Content Version Control - database

I just starting a project, I would like to have a small content manager with version control. However I don't know what is the best way to model the database.
I have content table which contains the following fields:
id primary key serial,
content_id int (field to identify diferent contents),
title varchar,
content longtext,
version int default '1',
create_date date,
I have seen some CMS separes the revisions in another table than the actual revision, What's the best way? Is there any optimized way?
Thanks!

I designed something like this and here's the gist of it;
I create mirror table for every table that I want to have row level version control. Let's say you have CUSTOMER table. Your mirror version control table will be VER_CUSTOMER
Every table that I want to have row level version control has a column called RECORD_ID (GUID)
When a record inserted to that table, I generate new GUID and populate that field. New record also inserted into VER_CUSTOMER table with RECORD_ID as added to table's natural PK.
When record is updated, I generate new GUID again. Populate RECORD_ID with this new GUID. Updated record also goes to VER_CUSTOMER table.
When record is deleted, I mark record on CUSTOMER table as DELETED (not physically delete the record). I have IS_DELETED column on every table. I set that column to TRUE when record is attempted to be deleted. Again copy of the deleted record also goes into VER_CUSTOMER table.
So every transaction that you have on that table, you have a corresponding record in VER_CUSTOMER table with RECORD_ID and table's natural PK as PK. For example if CUSTOMER table's PK is CUST_ID. PK of VER_CUSTOMER will be composite CUST_ID and RECORD_ID.
Hope this helps...

This already exists, without a database:
gitit (written in Haskell, uses git or darcs as a backend)
ikiwiki (written in Perl, can use various version control systems as a backend)
They're both open source, and both have a plugin architecture, so can be customised for your specific needs. (However, I've only used gitit.)
I would however note that git is not perfect at versioning large binary files, and darcs is terrible at it. Something to watch out for.

Related

Data Versioning/Auditing in SQL Database best patterns

I have a Job table where I post the Job description, posted date, qualifications etc.. with below schema
Job(Id ##Identity PK, Description varchar (200), PostedOn DateTime, Skills Varchar(50))
Other attributes of jobs we would like to track such as Department, team etc will be stored in another table as Attriibutes of Job
JobAttributesList(Id ##Identity PK, AttributeName varchar(50))
JobAttributes(JobID ##Identity PK, AttributeID FK REFERENCES JobAttributesList.Id, AttributeValue varchar(50))
Now if a job description has changed, we do not want to lose old one and hence keep track of versioning.What are the best practices? we may have to scale later by adding more versioning tables
A strategy would be to use a History table for all the tables we want to enable versioning but that would add more and more tables as we add versioning requirements and I feel its schema duplication.
There is a difference between versioning and auditing. Versioning only requires that you keep the old versions of the data somewhere. Auditing typically requires that you also know who made a change.
If you want to keep the old versions in the database, do create an "old versions" table for each table you want to version, but don't create a new table for every different column change you want to audit.
I mean, you can create a new table for every column, whose only columns are audit_id, key, old_column_value, created_datetime, and it can save disk space if the original table is very wide, but it makes reconstructing the complete row for a given date and time extraordinarily expensive.
You could also keep the old data in the same table, and always do inserts, but over time that becomes a performance problem as your OLTP table gets way, way too big.
Just have a single table with all the columns of the original table, which you always insert into, which you can do inside an update, delete trigger on the original table. You can tell which columns have changed either by adding a bit flag for every column, or just determine that at select time by comparing data in one row with data in the previously audited row for the given key.
I would absolutely not recommend creating a trigger which concatenates all of the values cast to varchar and dumps it all into a single, universal audit table with an "audited_data" column. It will be slow to write, and impossible to usefully read.
If you want to use this for actual auditing, and not just versioning, then either the user making the change must be captured in the original table so it is available to the trigger, or you need people to be connecting with specific logins, in which case you can use transport information like original_login(), or you need to set a value like context_info or session_context on the client side.

Adding new dimensions to data warehouse (adding new columns to fact table)

I am building an OLAP database and am running into some difficulty. I have already setup a fact table that includes columns for sales data, like quantity, sales, cost, profit, etc. The current dimensions I have are Date, Location, and Product. This means I have the foreign key columns for these dimension tables included in the fact table as well. I have loaded the fact table with this data.
I am now trying to add a dimension for salesperson. I have created the dimension, which has the salesperson's ID and their name and location. However, I can't edit the fact table to add the new column that will act as a foreign key to the salesperson dimension.
I want to use SSIS to do this, by using a look up on the sales database which the fact table is based on, and the salesperson ID, but I first need to add the Salesperson column to my fact table. When I try to do it, I get an error saying that it can't create a new column because it will be populated with NULLs.
I'm going to take a guess as to the problem you're having, but this is just a guess: your question is a little difficult to understand.
I'm going to make the assumption that you have created a Fact table with x columns, including links to the Date, Location, and Product dimensions. You have then loaded that fact table with data.
You are now trying to add a new column, SalesPerson_SK (or ID), to that table. You do not wish to allow NULL values in the database, so you clear the 'allow NULL' checkbox. However, when you attempt to save your work, the table errors out with the objection that it cannot insert NULL into the SalesPerson_SK column.
There are a few ways around this limitation. One, which is probably the best if you are still in the development stage, is to issue the following command:
TRUNCATE TABLE dbo.FactMyFact
which will remove all data from the table, allowing you to make your changes and reload the table with the new column included.
If, for some reason, you cannot do so, you can alter the table to add the column but include a default constraint that will put a default value into your fact table, essentially a dummy record that says, "I don't know what this is"
ALTER TABLE FactMyFact
ADD Salesperson_SK INT NOT NULL
CONSTRAINT DF_FactMyFact_SalesPersonSK DEFAULT 0
If you do not wish to put a default value into the table, simply create the column and allow NULL values, either by checking the box on the design page or by issuing the following command:
ALTER TABLE FactMyFact
ADD Salesperson_SK INT NULL
This answer has been given based on what I think your problem is: let me know if it helps.
Dimension inner join with fact table, get the values from dimensions and insert into fact...
or else create the fact less fact way

Table Structure for Multiple Histrory

i want to create table to keep histroy of the ammendments & history of the object.
for That i have created two column Primary Key ( Id & update date)
I have 3 more date columns to maintain history & Status Column for Actual object history.
Status , StatusFrom , Statusto, UpdateDate & NextUpdateDate
UpdateDate & NextUpdateDate is for maintain histroy of ammendment.
Is there any better way to maintain actual history of the Record & Ammend histroy of the record?
You're creating what is known as an "audit table". There are many ways to do this; a couple of them are:
Create a table with appropriate key fields and before/after fields for all columns that you're interested in on the source table, along with a timestamp so you know when the change was made.
Create a table with a appropriate key fields, a modification timestamp, a field name, and before/after columns.
Method (1) has the problem that you end up with a lot of fields in the audit table - basically two for every field in your source table. In addition, if only one or two fields on the source table change then most of the fields on the audit table will be NULL which may waste space (depending on your database). It also requires a lot of special-purpose code to figure out which field changed when you go back to process the audit table.
Method (2) has the problem that you end up creating a separate row in the table for each field that is changed on your source table, which can result in a lot of rows in the audit table (one row for each field changed). Because each field change results in a new row being written to the audit table you also have the same key values in multiple rows which can use up a bunch of space just for the keys.
Regardless of how the audit table is structured it's usual to use a trigger to maintain them.
I hope this helps.

How I can make Recycle Bin for Database Application?

I have database application, I want to allow the user to restore the deleted records from the database, like in windows we have Recycle bin for files I want to do the same thing but for database records, Assume that I have a lot of related tables that have a lot of fields.
Edit:
let's say that I have the following structures:
Reports table
RepName primary key
ReportData
Users table
ID primary key
Name
UserReports table
RepName primary key
UserID primary key
IsDeleted
now if I put isdeleted field in UserReports table, the user can't add same record again if it marked as deleted, because the record is already and this will make duplication.
Note: I always use surrogate primary key.
Add a timestamp 'deleted_at' column. When user deletes entry put there current time. Make this key part of your constrain.
In every query remember to search only for records that have null in deleted_at field.
Some frameworks (like ActiveRecord) make it trivial to do.

what to do when we may need to save slave data first

In a one-to-many relationship what's the best way to handle data so it's flexible enough for the user to save the slave data before he saves the master table data.
reserving the row ID of the master so the I can save de slave data with the reserved master id
save slave data in a temporary table so that when we save the master data we can "import" the data in the temporary table
other??
Example in a ticket/upload multiple files form where the users has the possibility to upload the files before sendind the ticket information:
Master table
PK
ticket description
Slave table
PK
Master_FK
File
Are your id's autogenerated?
You have several choices all with possible problems.
First don't define a FK relationship. Now how do you account for records in a partial state and those who never get married up to the real record? And how do you intend to marry up the records when the main record is inserted?
Insert a record into the master table first that where everything is blank except the id. This makes enforcing all required fields default to the user application, which I'm not wild about from a data integrity standpoint.
Third and most complex but probably safest - use 3 tables. Create the master record in a table that only contains the master recordid and return that to your application on opening the form to create a new record. Create a pk/fk relationship to both the orginal master table and the foreign key table. Remove the autogeneration of the id from the orginal master table and insert the id from the new master table instead when you insert the record. Insert the new master table id when you insert records to the orginal FK table as well. At least this way, you can continue to have all the required fields marked as required in the database but the relationship is between the new table and the other table not the original table and the other table. This won't affect querying (as long as you have proper indexing), but will make things more complicated if you delete records as you could leave some hanging out if you aren't careful. Also you would have to consider if there are other processes (such as data imports from another source) which might be inserting records to the main table which would have to be adjusted as the id would no longer be autogenerated..
In Oracle (maybe others?) you can defer a constraint's validation until COMMIT time.
So you could insert the child rows first. (You'd need the parent key, obviously.)
Why can't you create the master row and flag it as incomplete?
In case of upload you will have to create temporary storage for not committed upload. So that once upload started you save all new files in a separate table. Once user ready to submit ticket you save ticket and append files from temp table.
Also you can create fake record if it possible with some fixed id in master table. You then have to make sure that fake record does not appear in queries in other places.
Third, you can create stored procedure which would generate id for primary table and increment identity counter. If user aborts operation reserved id will not affect anything. It is just like if you create master record and then delete it. You can create temporary records in master table as well.

Resources