As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
in the context of a database, we sometimes need to check values against some statements like "the customer name is non-empty" or "the customer number of purchases is positive"...
But do such statements constitute rules or policies ?
In general how would you define these concepts, their differences and relations ?
Thanks in advance.
I think I know what you're talking about; I've run into such distinctions before (even though the English words are not all that different) and here is how I think it plays out in most business computing areas.
A rule in such a context is something that--whether it's a structural fact or a business-imposed statement--will not change, or at least stands only a very small chance of changing. Most statements of the form "X cannot be null" represent rules. "Null" typically doesn't make much sense to a business user; usually you arrive at these rules by examining the way that your model is constructed. A change to a rule has far-reaching consequences to the way that your database and any supporting applications are built.
A policy is more like a business instruction. Preferred customers get 10% off may be a policy, but as you know, things like this tend to change. A change to a policy may impact the way your application works, but not its fundamental architecture or underpinnings.
Pragmatically speaking--and it sounds like you may already know this--you want to make policies relatively easy to change. Rules, while they may change, are typically more involved: changing a rule often requires changing code, UIs, mental models, ways of thought, and so on.
I hope this helps.
In the context of a database, I would argue that it's a rule to have a username, while it's a policy (potentially overridden by administrative or other approval) to allow customers to have a lower assigned discount if they have less than a set number of purchases.
Rule: All users must have a username.
Rule: All users must have a password.
Rule: All users must have a valid email address.
Rule: All users must have a valid credit card on file.
Policy: All users begin with a 0% discount rate on purchases.
Policy: All users are required to pay for shipping.
Rules are outward-facing statements backed by validation. Policies are internal rules backed by consequence.
It could be a policy that later on down the road, a user can change a username (depending on how the software was written), or that the discount and shipping rates assigned on signup may be adjusted to create customer opportunities.
In my estimation then, a rule requires hard validation, while a policy by nature is subject to intervention and/or manipulation.
HTH
Jared
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
GDPR specifies that personal data must:
Those measures may include pseudonymisation provided that those purposes can be fulfilled in that manner. Where those purposes can be fulfilled by further processing which does not permit or no longer permits the identification of data subjects, those purposes shall be fulfilled in that manner.
In a normal workflow this data is normally pseudonymized, because there is a table on db with personal data with an ID that will be used as foreign key in the other ones, but in case of a security breach, if database is stolen the personal data is no longer pseudonymized.
Does this mean that we need to have another database with the personal data?
EDIT
Added article 32
Taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing as well as the risk of varying likelihood and severity for the rights and freedoms of natural persons, the controller and the processor shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk, including inter alia as appropriate:
(a) the pseudonymisation and encryption of personal data;
...
[including inter alia as appropriate]
Disclaimer:
I'm not a lawyer or authority on this topic, just sharing my thinking on this from the perspective of a developer who has worked with 'pseudonymised' user databases.
The Oxford English Dictionary definition of pseudonym is:
A fictitious name, especially one used by an author.
‘I wrote under the pseudonym of Evelyn Hervey’
So in context of GDPR a pseudonym seems likely to mean some made up name for an individual that doesn't identify the individual unless combined with some other information. A tangible example might be, as you suggest, a user ID which indexes that indivduals personal data in some table.
Ok, so to your question, should this table be isolated in its own database?
The regulation provides its own definition of pseudonymisation which provides some clarification here:
(5) ‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person;
Why the emphasis on seperation?
We know that GDPR is concerned with protecting user privacy.
If pseudonyms are only used in a context that also allows correspondance to be drawn between pseudonyms and the individuals they reference then no privacy has been provided.
So some seperation is needed. My reading is that the degree of seperation required and the level of security necessary to enforce that should be a function of the sensitivity of the data your holding and the fallout mitigations afforded in case some isolated part of your system is compromised.
So for your example if storing personal data in a seperate database, for whatever reason, allows you to limit some discrete part of your system to only accessing user-IDs then if that part of the system were compromised you've only exposed user-IDs and we might expect that to be viewed more favorably in eyes of GDPR.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I face the following problem.
I'm creating a database for (say) human beings' info. All the human beings may be classified in one of the three categories: adult female, adult male, child. It is clear that the parameters like "height" and "weight" are applicable to all of the categories. The parameter "number of children" is applicable only to adults, while the parameter "number of pregnancies" is applicable to females only. Also, each parameter may be classified as mandatory or optional depending on the category (for example, for adults the parameter "number of ex-partners" is optional).
When I load (say) "height" and "weight", I check whether the info in these two fields is self-consistent. I.e., I mark as a mistake the record which has height=6'4'' and weight=10 lb (obviously, this is physically impossible). I have several similar verification rules.
When I insert a record about a human being, I need to reflect the following characteristics of the info:
the maximum possible info for the category of this particular human being (including all the optional parameters).
the required minimum of information for the category (i.e., mandatory fields only)
what has actually been inserted for this particular human being (i.e., it is possible to insert whatever I have for this person no matter whether it is smaller than the amount of required minimum of info or not). The non-trivial issue here is that a field "XXX" may have NULL value because I have never inserted anything there OR because I have intentionally inserted exactly NULL value. The same logic with the fields that have a default value. So somewhere should be reflected that I have processed this particular field.
what amount of inserted information has been verified (i.e., even if I load some 5 fields, I can check for self-consistency only 3 fields while ignoring the 2 left).
So my question is how to technically organize it. Currently, all these required features are either hardcoded with no unified logic or broken into completely independent blocks. I need to create a unified approach.
I have some naive ideas in my head in this regard. For example, for each category of human beings, I can create and store a list of possible fields (I call it "template"). A can mark those fields that are mandatory.
When I insert a record about a human being, I copy the template and mark what fields from this templates have actually been processed. At the next stage, I can mark in this copy of the template those fields that will be currently verified.
The module of verification is specially corrected in the following way: for each verification procedure I create a list of fields that are being used in this particular verification procedure. Then I call only those verification procedures that have those fields that are actually marked "to be verified" in the copy of the template for the particular human being that is to be verified (see the previous passage).
As you see, this is the most straightforward way to solve this problem. But my guess is that there are a lot of quite standardized approaches that I'm not aware of. I really doubt that I'm the first in the world to solve such a problem. I don't like my solution because it is really painfull to write the code to correctly reflect in this copied template all the "updates" happening with a record.
So, I ask you to share your opinion how would you solve this problem.
I think there are two questions here:
how do I store polymorphic data in a database?
how do I validate complex business rules?
You should address them separately - trying to solve both at once is probably too hard.
There are a few approaches to polymorphic data in RDBMSes - ORMs use the term inheritance mapping, for instance. The three solutions here - table per class hierarchy, table per subclass and table per concrete class - are "pure" relational solutions. You can also use the "Entity-Attribute-Value" design, or use a document approach (storing data in structured formats such as XML or JSON) - these are not "pure" relational options, but have their place.
Validating complex business rules is often done using rule engines - these are super cool bits of technology, but you have to be sure that your problem really fits with their solution - deciding to invest in a rules engine means your project changes into a rules engine project, not a "humans" project. Alternatively, most mainstream solutions to this embody the business logic about the entities in the application's business logic layer. It sounds like you're outgrowing this.
This exact problem, both in health terms and in terms of a financial instrument, is used as a primary example in Martin Fowlers book Analysis Patterns. It is an extensive topic. As #NevilleK says you are trying to deal with two questions, and it is best to deal with them separately. One ultra simplified way of approaching these problems is:
1 Storage of polymorphic data - only put mandatory data that is common to the category in the category table. For optional data put these in a separate table in 1-1 relationship to the category table. Entries are made in these optional tables only if there is a value to be recorded. The record of the verification of the data can also be put in these additional tables.
2 Validate complex business rules - it is useful to consider the types of error that can arise. There are a number of ways of classifying the errors but the one I have found most useful is (a) type errors where one can tell that the value is in error just by looking at the data - eg 1980-02-30. (b) context errors where one can detect the error only by reference to previously captured date - eg DoB 1995-03-15, date of marriage 1996-08-26. and (c) lies to the system - where the data type is ok; the context is ok; but the information can only be detected as incorrect at a later date when more information comes to light eg if I register my DoB as 1990-12-31, when it is something different. This latter class of error typically has to be dealt with by procedures outside the system being developed.
I would use the Party Role pattern (Silverston):
Party
id
name
Individual : Party
current_weight
current_height
PartyRole
id
party_id
from_date
to_date (nullable)
AdultRole : PartyRole
number_of_children
FemaleAdultRole : AdultRole
number_of_pregnancies
Postgres has a temporal extension such that you could enforce that a party could only play one role at a time (yet maintain their role histories).
Use table inheritance. For simplicity use Single Table Inheritance (has nulls), for no nulls use Class Table Inheritance.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have for example a table name of Cars_Import
I need a view that will be called that grabs the data to be imported and that view is run and does the work to import the data into the Cars_Import table.
My problem is, I can't name the view the same, I have to differentiate it because I guess SQL server look at same name objects as a conflict no matter what type of object it is.
So for best practices in naming conventions that are generally accepted, when you have 2 objects that really relate to each other, and I know it's not good practice to append stuff like tbl, vw for view, etc. in the name, what would you suggest here as the view name related to Cars_Import?
I wouldn't want the view to have it for example switched around which would work but just seems messy to me such as Import_Cars
So what's the advice here on naming the table and its related view which will grab all data from that table that we need? There is no business logic, it's just grabbing the data and we're gonna import it into a data warehouse, all the data as is initially.
Views are actually the one place where I don't mind a prefix or suffix that describes what it is. Unlike when comparing a table or a stored procedure, which are quite obvious because they are used differently, tables and views are largely interchangeable. So I find that this differentiation can be helpful when reverse engineering or troubleshooting code (and I'm talking about when you come across the name in a piece of code, not browsing the objects through Object Explorer, which makes things much more obvious by definition).
Your naming scheme is up to you, and you're largely not going to get a "correct" answer here, other than that you should apply your convention consistently and unilaterally, and do what you can to make sure your entire time buys into it and follows it as well. But I will say that I wouldn't balk at something like this:
Table: dbo.Cars_Import
View: dbo.View_Cars_Import
But to me, this seems to imply that the view may just be something that sits over the table (say prettifying output, adding or hiding columns, etc.), not something that feeds the table. So I kind of agree with #HABO that maybe there is a better way to name this view that describes what it does.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I don't usually design databases, so I'm having some doubts about how to normalize (or not) a table for registering users. The fields I'm having doubts are:
locationTown: I plan to normalize for countries, and have a separate table for it, but should I do the same for towns? I guess users would type this in when registering, and not choosing from a dropdown. Can one normalize when the input may be coming from users?
maritalStatus: I would have a choice of about 5 or so different statuses.
Also, does anyone know of a good place to find real world database schema/normalizing examples?
Thanks
locationTown - just store it directly inside user table. Otherwise you will have to search for existing town, taking typos and code case into account. Also some people use non-standard characters and languages (Kraków vs. Krakow vs. Cracow, see also: romanization). If you really want to have a table with towns, at least provide auto-complete box so the users are more likely choosing existing town. Otherwise prepare for lots of duplicates or almost duplicates.
maritalStatus - this in the other hand should be in a separate table. Or more accurately: use single character or a number to represent marital status. An extra table mapping this to human-readable form is just for convenience (remember about i18n) and foreign key constraint makes sure incorrect status aren't used.
I wouldn't worry about it too much - database normalization (3NF, et al) has been over-emphasized in academia and isn't overly practical in industry. In addition, we would need to see your whole schema in order to judge where these implementations are appropriate. Focus on indexing commonly-used columns before you worry about normalization.
You might want to take a look at this SO question before you dive in any further.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Being stuck with a legacy database schema that no longer reflects your data model is every developer's nightmare. Yet with all the talk of refactoring code for maintainability I have not heard much of refactoring outdated database schemas.
What are some tips on how to transition to a better schema without breaking all the code that relies on the old one? I will propose a specific problem I am having to illustrate my point but feel free to give advice on other techniques that have proven helpful - those will likely come in handy as well.
My example:
My company receives and ships products. Now a product receipt and a product shipment have some very different data associated with them so the original database designers created a separate table for receipts and for shipments.
In my one year working with this system I have come to the realization that the current schema doesn't make a lick of sense. After all, both a receipt and a shipment are basically a transaction, they each involve changing the amount of a product, at heart only the +/- sign is different. Indeed, we frequently need to find the total amount that the product has changed over a period of time, a problem for which this design is downright intractable.
Obviously the appropriate design would be to have a single Transactions table with the Id being a foreign key of either a ReceiptInfo or a ShipmentInfo table. Unfortunately, the wrong schema has already been in production for some years and has hundreds of stored procedures, and thousands of lines of code written off of it. How then can I transition the schema to work correctly?
Here's a whole catalogue of database refactorings:
http://databaserefactoring.com/
That's a very difficult thing to work around; A couple quick options after refactoring the database are:
Create views that match the original schema but pull from the new schema; You may need triggers here so any updates to the views can be handled.
Create the new schema and put in triggers on each side to maintain the other side.
This book (Refactoring Databases) has been a God-send to me when dealing with legacy database schemas, including when I had to deal with almost the exact same issue for our inventory database.
Also, having a system in place to track changes to the database schema (like a series of alter scripts that is stored int he source control repository) helps immensely in figuring out code-to-database dependencies.
Stored procedures and views are your friend here. Even if the system doesn't use them, change it to use them, then refactor the database underneath.
Your receipts and shipments then become views.
Beware, receipts and shipments are actually two very different beasts in most systems I have worked with. Receipts are linked to suppliers, while shipments are linked to customers (or customer/ship-to locations). At the inventory level, they are often represented the same.
Is all data access limited to stored procedures? If not, the task could be nearly impossible. If so, you just have to make sure your data migration scripts work well transitioning from the old to the new schema, and then make sure your stored procedures honor theur inputs and outputs.
Hopefully none of them have "select *" queries. If they do, use 'sp_help tablename' to get the complete list of columns, copy that out and replace each * with the complete column list, just to make sure you don't break client code.
I would recommend making the changes gradually, and do lots of integration testing. It's hard to do a significant remodel without introducing a few bugs.
The first thing is to create the table schema. I already did that for a Legacy database using Enterprise Architect. You can select the DB and it will create you every tables/fields. Then, you will need to split everything in categories. Exemple all your receives and ships products together, client stuff in an other category. Once everything is clear up, you will be able to refactor field by creating new table, new releashionship and new fields. Of course, this will need lot of change if all is accessed without Stored Procedure.
I don't think its obvious that the id of the transactions table should be a foreign key to either ReceiptInfo or a ShipmentInfo. Think the other way around. In an object oriented model you should have a transaction table and the ReceiptInfo or a ShipmentInfo should have a foreign key to the transaction table. If you are lucky, there will be only 1 or 2 points in code where new records in ReceiptInfo or a ShipmentInfo are made. There you should add code where you add an entry in the Transaction table and after that create the entry in ReceiptInfo or ShipmentInfo with the foreign key to Transaction.
Sometimes you can create new tables that have better structures and then create views with the names of your old tables but are based on the data in the new tables. That way, you code doesnt break while you start to move to a better structure. Be careful with thsi though as sometimes you move from a non-relational table to a relational structure where you have multiple records while the code will be expecting only one. This is particulalry true if you have developers who use subqueries.
Then as each thing is changed, it will move away from the views to the real table. Eventually you can drop the views. This at least allows you to work incrementally to keep things working as you move stuff, but start to fix things to use a better design.