I have two database tables:
report(id, description) (key: id) and
registration(a, b, id_report) (key: (a, b));
id_report is a foreign key that references report id.
In the table registration there is the functional dependency a -> id_report.
So the table registration is 1nf but not 2nf.
Despite this i can not find insert/update/delete problems in the table registration. Is it possible?
Thanks
YOu said in a comment that you couldn't "find how problems could occur." (Emphasis added.) Here's how.
Let's say your table "registration" starts off with data like this.
a b id_report
--
1 10 13
1 11 13
1 12 13
2 27 14
2 33 14
The functional dependency a->id_report still holds. When we know the value for "a", we find one and only one value for "id_report".
But the dbms can't directly enforce that dependency. That means the dbms will permit this update statement to run without error.
update registration
set id_report = 15
where a = 1 and b = 10;
a b id_report
--
1 10 15
1 11 13
1 12 13
2 27 14
2 33 14
Now your data is broken. When we know the value for "a", we now find two values for "id_report". In the earlier table, knowing that "a" equaled 1 meant we knew that "id_report" equaled 13. We no longer know that; if "a" equals 1, id_report might be either 13 or 15.
A table can be denormalized and still not have any existing referential integrity issues.
The reason to normalize is to make it more difficult or impossible to create insert, update and delete anomalies. It is possible, but pretty hard, to manage all of the redundant data such that it remains consistent.
It's still a better idea to use a database in 3NF (or higher, if applicable) so that you aren't relying on programmers and users to keep you out of trouble. Sooner or later mistakes will happen.
Related
How can I create a view using a table which has multiple foreign key referencing the same table and a single field. I have product table and Reference table I have around 5 foreign key in product table referencing to the RefCodeKey Field in reference table. How can I create a view which shows product reference Code joining product and Reference Code
I have a product table as follows
PK PTK PC PN RCKey PSKey PCKey PCAKey
1 1 500000 Prod A 5 12 14 98
2 1 500001 Prod B 5 12 14 98
3 1 500002 Prod C 5 11 13 145
4 4 500002 Prod C 10 11 13 76
5 3 500002 Prod C 10 11 13 95
6 1 500005 Prod D 5 12 14 137
I have Reference Code Table as follows
RefCodeKey RefCodeType Code Label Status
1 ParentTypeKey assembly assembly Active
2 ParentTypeKey WHL WHL Active
3 ParentTypeKey TIRE TIRE Active
4 ParentTypeKey TIRE TIRE Active
5 RegionCodeKey 1 COMP 1 Active
6 RegionCodeKey 2 COMP 2 Active
7 RegionCodeKey 3 COMP 3 Active
8 RegionCodeKey 4 COMP 4 Active
9 RegionCodeKey 9 COMP 5 Active
10 RegionCodeKey 0 COMP 6 Active
11 ProductStatusKey CLOSED CLOSED Active
12 ProductStatusKey ACTIVE ACTIVE Active
13 ProductClassificationKey DropShip DropShipActive
14 ProductClassificationKey INFO NA INFO NA Active
How can i create a view display a result as show below?
PC PN RCKey PSKey PCKey
500000 Prod A COMP 1 ACTIVE INFO NA
500001 Prod B COMP 1 ACTIVE INFO NA
500002 Prod C COMP 1 CLOSED DropShip
500002 Prod C COMP 6 CLOSED DropShip
500002 Prod C COMP 6 CLOSED DropShip
500005 Prod D COMP 1 ACTIVE INFO NA
This is a common reporting pattern wherever the database architect has employed the "one true lookup table" model. I'm not going to get bogged down in the merits of that design. People like Celko and Phil Factor are far more erudite than me at commenting on these things. All I'll say is that having reported off over sixty enterprise databases in the last 15 years, that design is pervasive. Rightly or wrongly, you're probably going to see it over and over again.
There is currently insufficient information to definitively answer your question. The answer below makes assumptions on what I think is the most likely missing information is.
I'll assume your product table is named PRODUCT
I'll assume your all-powerful lookup table is call REFS
I'll assume RefCodeKey in REFS has a unique constraint on it, or it is the a primary key
I'll assume the REFS table is relatively small (say < 100,000 rows). I'll come back to this point later.
I'll assume that the foreign keys in the PRODUCT table are nullable. This affects whether we INNER JOIN or LEFT JOIN.
SELECT prod.PC
,prod.PN
,reg_code.label as RCKey
,prod_stat.label as PSKey
,prod_clas.label as PCKey
FROM PRODUCT prod
LEFT JOIN REFS reg_code ON prod.RCKey = reg_code.RefCodeKey
LEFT JOIN REFS prod_stat ON prod.PSKey = prod_stat.RefCodeKey
LEFT JOIN REFS prod_clas ON prod.PCKey = prod_clas.RefCodeKey
;
The trick is that you can refer to the REFS table as many times as you like. You just need to give it a different alias and join it to the relevant FK each time. For example reg_code is an alias. Give your aliases meaningful names to keep your code readable.
Note: Those RCKey/PSKey/PCKey names are really not good names. They'll come back to bite you. They don't represent the key. They represent a description of the thing in question. If it's a region code, call it region_code
The reason I'm assuming the REFS table is relatively small, is that if it's really large (I've seen one with 6 million lookup values across hundreds of codesets) and indexed to take RefCodeType into consideration, you might get better performance by adding a filter for RefCodeType to each of your LEFT JOINs. For example:
LEFT JOIN REFS prod_clas ON prod.PCKey = prod_clas.RefCodeKey
AND prod_clas.RefCodeType = 'ProductClassificationKey'
I'm using COLUMNS_UPDATED() in a trigger to identify those columns whose values should be written to an audit table. The trigger / auditing had been working fine for multiple years. I noticed yesterday that the auditing is no longer working consistently.
I've listed the first forty columns of the table in question at the bottom for reference, along with the ORDINAL_POSITION from INFORMATION_SCHEMA.COLUMNS. The table has a total of 109 columns.
I added print COLUMNS_UPDATED() to my trigger to get some debug info.
When I update CurrentOnFleaTick, the 9th column, I see this printed:
0x0001000000000000000000000000
This is expected - the 9th column should be represented as the least significant bit of the second byte. Similarly, if I update HasAttackedAnotherAnimalExplanation I see this:
0x0000010000000000000000000000
Again, expected - the 17th column should be represented as the least significant bit of the third byte.
But... when I update HouseholdIncludesCats, I see this:
0x0000000200000000000000000000
Not expected! Where you see the 2 there should be a 1, as HouseholdIncludesCats ordinal position is 25, making it the first column represented in the fourth byte, which should be represented in the least significant bit of that byte.
I narrowed things down by updating every column between HasAttackedAnotherAnimalExplanation and HouseholdIncludesCats and found that the 'off by one' problem I'm having starts with HouseTrainedId, ordinal position 24. When updating HouseTrainedId I'm expecting
0x0000800000000000000000000000
but instead I get
0x0000000100000000000000000000
which I believe is wrong, and it is what I expect to be getting for updates to the HouseholdIncludesCats column.
I don not believe the mask should skip ahead. The mask is currently not using the most significant bit of the 3rd byte.
I did recently drop a column, but I don't have a record of its ordinal position. Based on the original code that would have created the table, I believe the ordinal position of the column that was dropped was NOT 24. (I think it was 7... It had been defined after the BreedIds.)
I'm not necessarily looking for a deep root cause determination. If there was something I could do to reset whatever internal data SQL Server uses that'd be fine. Sort of like a rebuild index idea for table metadata? Is there something like that that might fix this?
Thanks in advance for helpful answers! :)
COLUMN_NAME ORDINAL_POSITION
PetId 1
AdopterUserId 2
AdoptionDeadline 3
AgeMonths 4
AgeYears 5
BreedIds 6
Color 7
CreatedOn 8
CurrentOnFleaTick 9
CurrentOnHeartworm 10
CurrentOnVaccinations 11
FoodTypeId 12
GenderId 13
GuardianForMonths 14
GuardianForYears 15
HairCoatLength 16
HasAttackedAnotherAnimalExplanation 17
HasAttackedAnotherAnimalId 18
HasBeenReferredByShelter 19
HasHadTraining 20
HasMedicalConditions 21
HasRecentlyBittenExplanation 22
HasRecentlyBittenId 23
HouseTrainedId 24
HouseholdIncludesCats 25
HouseholdIncludesChildren5to10 26
HouseholdIncludesChildrenUnder5 27
HouseholdIncludesDogs 28
HouseholdIncludesOlderChildren 29
HouseholdIncludesOtherPets 30
HouseholdOtherPets 31
KnowsCommandDown 32
KnowsCommandPaw 33
KnowsCommandSit 34
KnowsCommandStay 35
KnowsOtherCommands 36
LastUpdatedOn 37
LastVisitedVetOn 38
ListingCodeId 39
LitterTypeClumping 40
So... I thought I had googled enough before posting this, but I guess I hadn't. I found this:
https://www.sqlservercentral.com/forums/topic/columns_updated-and-phantom-fields
using COLUMNPROPERTY() to get ColumnID is definitely the way to go.
I'm doing some optimizations and I need some technical help regarding optimization.
Consider the following query :
SELECT *
FROM Employees E
JOIN Data D ON E.idDataType = D.idDataType
WHERE E.idDataType = 1
Does it matter (from optimization point of view) if I filter the parent table or the child table? Which one would be faster?
E.idDataType = 1 VS D.idDataType = 1
Employees:
|idEmployee|Name|Data1|idDataType|AttributeValue2
1 A X 3 xx
2 B T 2 xx
PrimaryKey(idEmployee);
ForeginKey(idDataType) References Data(idDataType);
Data table:
idDataType|Description
1 etc
2 etc
3 etc
4 etc
PrimaryKey(idDataType);
No, it does not matter, the result will be the same.
Even if they would have different execution times, the DataBase would take care of that so it will always be executed in the fastest possible way.
The * has probably a greater impact.
I have a database of test data that have been collected on behalf of agents. The test data are grouped together (after the fact) into result sets. As the tests come in, they are stored in the database with the ID of the corresponding agent:
TEST_ID TEST_OWNER TIMESTAMP RESULT_ID
1 1 0 null
2 1 15 null
3 2 30 null
4 2 32 null
5 1 34 null
The result sets are generated at a later time in such a way that groups tests that took place during a similar time frame. This judgment cannot be made as the tests come in.
RESULT_ID
1
2
3
All of the tests in a result set must belong to the same owner. I can ensure this (in code) as I assign the result IDs to the tests in my later operation, but some things would be easier if I had a TEST_OWNER field in my result set table.
Would adding this field be a violation of some normalization goal? The TEST_OWNER information will be duplicated, even though one instance of it is really implicit. I'm not a DBA, and I don't want to do things that are bad style.
Jim I am not completely sure if you are saying this is a table in your DB??
TEST_ID TEST_OWNER TIMESTAMP RESULT_ID
1 1 0 null
2 1 15 null
3 2 30 null
4 2 32 null
5 1 34 null
If so the first thing I would do is pull the result attribute out of this table to achieve normalization. Or is this your Result table?
Regardless are these results being derived from from other data in the DB? If so I don't see the need to duplicate things and store the results (calculated) also. Just derive as needed and keep the DB clean.
If you need further info I need a better understanding of what you are presenting.
this question came up based on the responses I got for the question
Getting weird issue with TO_NUMBER function in Oracle
As everyone suggested that storing Numeric values in VARCHAR2 columns is not a good practice (which I totally agree with), I am wondering about a basic Design choice our team has made and whether there are better way to design.
Problem Statement : We Have many tables where we want to give certain number of custom fields. The number of required custom fields is known, but what kind of attribute is mapped to the column is available to the user
E.g. I am putting down a hypothetical scenario below
Say you have a laptop which stores 50 attribute values for every laptop record. Each laptop attributes are created by the some admin who creates the laptop.
A user created a laptop product lets say lap1 with attributes String, String, numeric, numeric, String
Second user created laptop lap2 with attributes String,numeric,String,String,numeric
Currently there data in our design gets persisted as following
Laptop Table
Id Name field1 field2 field3 field4 field5
1 lap1 lappy lappy 12 13 lappy
2 lap2 lappy2 13 lappy2 lapp2 12
This example kind of simulates our requirement and our design
Now here if somebody is lookinup records for lap2 table doing a comparison on field2, We need to apply TO_NUMBER.
select * from laptop
where name='lap2'
and TO_NUMBER(field2) < 15
TO_NUMBER fails in some cases when query plan decides to first apply to_number instead of the other filter.
QUESTIONS
Is this a valid design?
What are the other alternative ways to solve this problem?
One of our team mates suggested creating tables on the fly for such cases. Is that a good idea?
How do popular ORM tools give custom fields or flex fields handling?
I hope I was able to make sense in the question.
Sorry for such a long text..
This causes us to use TO_NUMBER when queryio
This is a common problem and there is no perfect solution. A couple of solutions:
1.
Define X fields of type varchar2, Y fields of type number and Z fields of type date. That comes out as potentially 3 times the number of custom fields but you will never have any conversion problem anymore.
Your example would come out like this:
Id Name field_char1 field2_char2 field_char3 ... field_num1 field_num2 ...
1 lap1 lappy lappy lappy ... 12 13
2 lap2 lappy2 lappy2 lapp2 ... 13 12
In your example you have the same number of numeric values and character values on both rows but it doesn't have to be this way: the third row could have no numeric field for example.
2.
Define X fields of type varchar2 and have apply a bijective function to store number or date field (for example Date could be stored as YYYYMMDDHH24miss). You will also need an extra field that will define the context of the row. You would apply the to_number or to_char function only when the rows are of the good type.
Your example:
Id Name context field1 field2 field3 field4 field5
1 lap1 type A lappy lappy 12 13 lappy
2 lap2 type B lappy2 13 lappy2 lapp2 12
You could query the table using DECODE or CASE:
SELECT *
FROM laptop
WHERE CASE WHEN context = 'TYPE A' THEN to_number(field3) END = 12
The second design is the one used in the Oracle Financials ERP (among others). The context allows you to define CHECK constraints with this design (for example CHECK (CASE WHEN context = 'TYPE A' THEN to_number(field3) > 0) to ensure integrity.
This is a common scenario with shrink-wrapped apps, where it represents the only opportunity for customizing the data model. But from a purist point of view it is bad practice. Because if a column can contain '27-MAY-2010' or 178.50 or 'Red badger' then clearly it is dependent on something external to the database to give it meaning.
But using an XMLType is even worse because you lose what little structure you have. It becomes difficult to query on the flexible columns. Still there are some scenarios where this is the appropriate solution: mainly when we're not interested in the individual elements, just the collection of properties.
So, what is the best way of dealing with it? Customised functions to go with your custom columns:
SQL> create or replace function get_number
2 ( p_str in varchar2 )
3 return number
4 deterministic
5 is
6 return_value number;
7 begin
8 begin
9 return_value := to_number(trim(p_str));
10 exception
11 when others then
12 return_value := null;
13 end;
14 return return_value;
15 end;
16 /
Function created.
SQL>
We can build a function-based on this column, for performance:
SQL> create index t42_flex_idx on t42 ( get_number( flex_col))
2 /
Index created.
SQL>
So given this test data ....
SQL> select * from t42
2 /
ID FLEX_COL
---------- ------------------------------
1 27-MAY-2010
2 138.50
3 Red badger
2 23
SQL>
... here's how it works:
SQL> select * from t42
2 where get_number(flex_col) < 50
3 /
ID FLEX_COL
---------- ------------------------------
2 23
SQL>
If all of the column types are decided at the time the table is created, then generating tables on the fly sounds good to me.
However, if two users are using the same table with different fields, you could create new tables just for the custom fields and join them to the main table. This is more of an object oriented approach.
Could you create an XML graph in the code layer and store it in a SYS.XMLTYPE field type?
http://www.oracle-base.com/articles/9i/XMLTypeDatatype.php
This would allow you to strongly type (in XML) your values and retain meaningful structure.