How to update translation table for translation dictionary database - database

I am implementing a database for a translation dictionary, and am using the design indicated here.
Is there any way to update an entry in the translation table? Or would you need to have a primary key as well in order to facilitate any updates? Ideally, there wouldn't need to be updates, but it is conceivable a translation could be incorrect and need to be changed.
It seems you could delete the incorrect translation and insert a new one. In my case, I have a server DB, and an Android app that will pull down the languages it needs, and the associated words and translations, into a local DB. In this case, while it may be simple to delete the incorrect translation on the server, how would the client know, unless it deleted and repopulated the entire translation table?
Is a primary key, then a UNIQUE constraint on the two word_id columns the best way around this?

You can update an entry in the translation table with a statement such as:
update TRANSLATION_EN_DE
set ID_DE = 3
where ID_DE = 2 and
ID_EN = 1;
I would not have one table per language though.
Add a new table for unique languages, and add its primary key to a words table that holds all languages.
Then your translation table would be "word_from" and "word_to".
It will make your design and code much more simple.
To propagate changes to the client you'd probably want to version all of the changes in a new column on all tables to take account of new words/translations, spelling corrections, possible removal of words/translations, and have the client record the version number up to which it has retrieved data.
Since you might have deletes that you want to propagate you'll need to use a "soft delete" flag in the tables, because otherwise there would be no record in the table to hold the version number.
You'd probably also want a table holding those version numbers as a unique key with a text to explain the type of changes that have taken place, and the timestamp for the change. Remove the timestamp columns from all other tables.
So when you make a new batch of changes, create a new version record, make all of the required changes, and then commit all changes in a single transaction. then the entire change set becomes visible to other database users, and they can very efficiently check whether they are up to date or not, and retrieve only the relevant changes.

Related

DB - Is table with just one column the right way?

i am trying to build a db structure for a multi-language admin panel, and one of the entities is Meal_Plans which will also be referenced by other tables in the design. I can't see at the moment useful attributes that will not have to be translated rather than id (even "active" won't be needed because all of the Meal Plans will be active by default), so the right way of doing things should be
TABLE Meal_PLans
id
TABLE MealPlan_Translations
mealplan_id
language_code
name
description
PRIMARY_KEY (mealplan_id, language_code)
Is having a table with just one column legit? Because referencing mealplan_id inside MealPlan_Translations won't be correct, given that it won't be a unique value in that table.
Thanks for your help
Such a structure makes sense. It captures the concept of a MealPlan being an entity; you also keep the door open for possible future additions to the model.
Other option would be to only use a sequence for generating MealPlan id's and only capture them in the MealPlan_Translations table. Specifics depend on the DB you're using, e.g. MSSQL docs.
This option is also viable, but it doesn't allow a situation where a MealPlan doesn't have a translation (which may or may not be OK, depending on the domain you're modelling).

Concept of "version control" for database table rows (Not referring to storing scripts in GIT/SVN)

I require a data store that will maintain not only a history of changes made to data (easy to do) but also store any number of proposed changes to data, including chained proposals (ie. proposal-on-proposal).
Think of these "changes" as really long-running transactions which are saved to the database and have a lifespan of anywhere between minutes and years.
They are created (proposed) and then either rolled back (essentially deleted) or committed, when committed they become the effective data visible to 3rd parties.
Of course this all requires some form of conflict resolution as proposed changes can be in contradictory states (eg. Change A proposes to delete a record but change B proposes to update it - if change A is committed first then change B will have to revert)
I have found no off-the-shelf product that can do this. The closest was Oracle Workspace Manager but it did not provide for change-on-change or the ability to see proposed deletes. The only way I have been able to achieve this is to have a set of common columns on my versioned tables:
Root ID: Required - set once to the same value as the primary key when the first version of a record is created. This represents the primary key across all of time and is copied into each version of the record. You should consider the Root ID when naming relation columns (eg. PARENT_ROOT_ID instead of PARENT_ID). As the Root ID is also the primary key of the initial version, foreign keys can be created against the actual primary key - the actual desired row will be determined by the version filters defined below.
Change ID: Required - every record is created, updated, deleted via a change
Copied From ID: Nullable - null indicates newly created record, not-null indicates which record ID this row was cloned/branched from when updated/deleted
Effective From Date/Time: Nullable - null indicates proposed record, not-null indicates when the record became current. Unfortunately a unique index cannot be placed on Root ID/Effective From as there can be multiple null values for any Root ID. (Unless you want to restrict yourself to a single proposed change per record)
Effective To Date/Time: Nullable - null indicates current or proposed, not-null indicates when it became historical. Not technically required but helps speed up queries finding the current data. This field could be corrupted by hand-edits but can be rebuilt from the Effective From Date/Time if this occurs.
Delete Flag: Boolean - set to true when it is proposed that the record be deleted upon becoming current. When deletes are committed, their Effective To Date/Time is set to the same value as the Effective From Date/Time, filtering them out of the current data set.
The query to get the current state of data at a point in time would be;
SELECT * FROM table WHERE EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now)
The query to get the current state of data according to a change would be;
SELECT * FROM table WHERE (CHANGE_ID IN :ChangeIds OR (EFFECTIVE_FROM <= :Now AND (EFFECTIVE_TO IS NULL OR EFFECTIVE_TO > :Now) AND ROOT_ID NOT IN (SELECT ROOT_ID FROM table WHERE CHANGE_ID IN :ChangeIds)))
Note that this 2nd query contains the 1st time-based query to overlay the current data with the proposed changed data.
The change ID column refers to the primary key of a change table which also contains a parent ID column (nullable) providing the change-on-change functionality. Hence the 2nd query refers to change IDs not a single change ID. I am filtering multiple versions in a change-on-change scenario in the client and not using SQL so it's not seen in those queries (The client has a linked list of change IDs in memory and if more than 1 version of a row is retrieved it uses the linked list to determine which version to use).
Does anybody know of an off-the-shelf product that I could use? It is a large amount of work handling this versioning myself and introduces all manner of issues.
There does not appear to be any off-the-shelf database or database plugin that does what I need. So I ended up utilising Oracle features to implement a solution.
The final table structure is slightly different - "Delete Flag" turned into "Change Action" which is either Add, Remove or Modify.
A global temporary table was used to store the current connection change identifier/date-time settings and a stored procedure created to populate it after connecting. This is referred to as 'context'.
Views joining versioned tables to this temporary, connection-specific context table are created programmatically for every versioned table, including instead-of insert/update/delete triggers which perform the required data versioning.
The result is that you treat the versioned tables like normal tables (and don't use the suffix _ROOT_ID for foreign keys) for select, insert, update and delete.
Only the Change Action is returned in the views and this is the only field that distinguishes a versioned table from a normal one.
Revert (which doesn't have a SQL keyword) is achieved by a double-delete. That is, if we update a record and then want to undo that update, we issue a delete command which deletes the proposed row and the record reverts to the current version. It's the most fitting SQL keyword - the alternative is to make a specific revert stored procedure.
A virtual Change Action of None exists in the views which indicates the record is not affected by the current context.
This all works quite effectively making the concept of versioning largely transparent, the only custom action required is setting the connection after connecting to the database.

General database design: Is it ever considered "okay" to create a non-normalized table on purpose?

After-edit: Wow, this question go long. Please forgive =\
I am creating a new table consisting of over 30 columns. These columns are largely populated by selections made from dropdown lists and their options are largely logically related. For example, a dropdown labeled Review Period will have options such as Monthly, Semi-Annually, and Yearly. I came up with a workable method to normalize these options down to numeric identifiers by creating a primitives lookup table that stores values such as Monthly, Semi-Annually, and Yearly. I then store the IDs of these primitives in the table of record and use a view to join that table out to my lookup table. With this view in place, the table of record can contain raw data that only the application understands while allowing external applications and admins to run SQL against the view and return data that is translated into friendly information.
It just got complicated. Now these dropdown lists are going to have non-logically-related items. For example, the Review Period dropdown list now needs to have options of NA and Manual. This blows my entire grouping scheme out of the water.
Similar constructs that have been used in this application have resorted to storing repeated string values across multiple records. This means you could have hundreds of records with the string 'Monthly' stored in the table's ReviewPeriod column. The thought of this happening has made me cringe since I've started working here, but now I am starting to think that non-normalized data may be the best option here.
The only other way I can think of doing this using my initial method while allowing it to be dynamic and support the constant adding of new options to any dropdown list at any time is this: When saving the data to the database, iterate through every single property of my business object (.NET class in this case) and check for any string value that exists in the primitives table. If it doesn't, add it and return the auto-generated unique identifier for storage in the table of record. It seems so complicated, but is this what one is to go through for the sake of normalized data?
Anything is possible. Nobody is going to haul you off to denormalization jail and revoke your DBA card. I would say that you should know the rules and what breaking them means. Once you have those in hand, it's up to your and your best judgement to do what you think is best.
I came up with a workable method to normalize these options down to
numeric identifiers by creating a primitives lookup table that stores
values such as Monthly, Semi-Annually, and Yearly. I then store the
IDs of these primitives in the table of record and use a view to join
that table out to my lookup table.
Replacing text with ID numbers has nothing at all to do with normalization. You're describing a choice of surrogate keys over natural keys. Sometimes surrogate keys are a good choice, and sometimes surrogate keys are a bad choice. (More often a bad choice than you might believe.)
This means you could have hundreds of records with the string
'Monthly' stored in the table's ReviewPeriod column. The thought of
this happening has made me cringe since I've started working here, but
now I am starting to think that non-normalized data may be the best
option here.
Storing the string "Monthly" in multiple rows has nothing to do with normalization. (Or with denormalization.) This seems to be related to the notion that normalization means "replace all text with id numbers". Storing text in your database shouldn't make you cringe. VARCHAR(n) is there for a reason.
The only other way I can think of doing this using my initial method
while allowing it to be dynamic and support the constant adding of new
options to any dropdown list at any time is this: When saving the data
to the database, iterate through every single property of my business
object (.NET class in this case) and check for any string value that
exists in the primitives table. If it doesn't, add it and return the
auto-generated unique identifier for storage in the table of record.
Let's think about this informally for a minute.
Foreign keys provide referential integrity. Their purpose is to limit the values allowed in a column. Informally, the referenced table provides a set of valid values. Values that aren't in that table aren't allowed in the referencing column of other tables.
But no matter what the user types in, you're going to add it to that table of valid values.
If you're going to accept everything the user types in the first place, why use a foreign key at all?
The main problem here is that you've been poorly served by the people who taught you (mis-taught you) the relational model. (And, probably, equally poorly by the people who taught you SQL.) I hope you can unlearn those mistaken notions quickly, and soon make real progress.

Are created and modified the two fields every database table should have?

I recently realized that I add some form of row creation timestamp and possibly a "updated on" field to most of my tables. Suddenly I started thinking that perhaps every table in the database should have a created and modified field that are set in the model behind the scenes.
Does this sound correct? Are there any types of high-load tables (like sessions) or massive sized tables that this wouldn't be a good idea for?
I wouldn't put those fields (which I generally call audit fields) on every database table. If it's a low-traffic, high-value table (like Users, for instance), it goes on, no question. I'd also add creator and modifier. If it's a table that gets hit a lot (an operation history table, say), then maybe the benefit isn't worth the cost of increased insert time and storage space.
It's a call you'll need to make separately for each table.
Obviously, there isn't a single rule.
Most of my tables have date-related things, DateCreated, DateModified, and occasionally a Revision to track changes and so on. Do whatever makes sense. Clearly, you can invent cases where it's appropriate and cases where it is not. If you're asking whether you should add them "by default" to most tables, I'd say "probably".

trigger insertions into same table

I have many tables in my database which are interrelated. I have a table (table one) which has had data inserted and the id auto increments. Once that row has an ID i want to insert this into a table (table three) with another set of ID's which comes from a form(this data will also be going into a table, so it could from from that table), the same form as the data which went into the first table came from.
The two ID's together make the primary key of the third table.
How can I do this, its to show that more than one ID is joined to a single ID for something else.
Thanks.
You can't do that through a trigger as the trigger only has available to it the data that you already inserted not data that is currenlty only residing in your user interface.
Normally how you handle this situation is that you write a stored proc that inserts the meeting, returns the id value (using scope_identity() in SQL Server, but I'm sure other databases would have method to return the auto-generated id as well). Then you would use that value to insert to the other table with the other values you need for that table. You would of course want to wrap the whole thing in a transaction.
I think you can probably do what you're describing (just write the INSERTs to table 3) in the table 1 trigger) but you'll have to put the additional info for the table 3 rows into your table 1 row, which isn't very smart.
I can't see why you would do that instead of writing the INSERTs in your code, where someone reading it can see what's happening.
The trouble with triggers is that they make it easy to hide business logic in the database. I think (and I believe I'm in the majority here) that it's easier to understand, manage, maintain and generally all-round deal with an application where all the business rules exist in the same general area.
There are reasons to use triggers (for propagating denormalised values, for example) just as there are reasons for useing stored procedures. I'm going to assert that they are largely related to performance-critical areas. Or should be.

Resources