I have a table with about 20 columns. It works well, its related to other table many to many, but it works fine and design is good so far.
Now I need to store another type of data, which has almost same columns, it has 15 columns that are basically same, but 5 columns are completelly different
between this 2 types. So how do I do it
1. do I store it in one table (by adding new Columns in first table -> this way some 5 columns would be null depending which type of data we save), or
2. split it to 2 tables and than each table has same columns repeated in other table
3. some other way ?
COLUMNS CASE 1 - STORE IN ONE SAME TABLE
ID INTEGER
Description NVARCHAR(max)
Value FLOAT
ValueEUR FLOAT
ValueUSD FLOAT
YearOfBuilt SMALLINT
New BIT
Requirements NVARCHAR(max)
FIELD A1
FIELD A2
FIELD A3
FIELD A4
FIELD A5
FK_TypeID INT
do I add 5 new fields for other document type to above table:
FIELD B1 FLOAT
FIELD B2 FLOAT
FIELD B3 FLOAT
FIELD B4 FLOAT
FIELD B5 FLOAT
In this case if I save data for type 1
saved field value would be null for columns FIELD B1, ... FIELD B5.
Well, it would work, but than some fields would be empty, depending on type.
or do I make table design as in case 2 below
COLUMNS CASE 2 - STORE IN TWO TABLES
TABLE 1
ID INTEGER
Description NVARCHAR(max)
Value FLOAT
ValueEUR FLOAT
ValueUSD FLOAT
YearOfBuilt SMALLINT
New BIT
Requirements NVARCHAR(max)
FIELD A1
FIELD A2
FIELD A3
FIELD A4
FIELD A5
FK_TypeID INT
TABLE 2 - Repeating same columns but excluding columns not needed for this table
ID INTEGER
Description NVARCHAR(max)
Value FLOAT
ValueEUR FLOAT
ValueUSD FLOAT
YearOfBuilt SMALLINT
New BIT
Requirements NVARCHAR(max)
FIELD B1 FLOAT
FIELD B2 FLOAT
FIELD B3 FLOAT
FIELD B4 FLOAT
FIELD B5 FLOAT
FK_TypeID INT
Now in this case 2, I would repeat same columns but exclude columns not needed. This means same columns names, however data are stored to save space (no null values).
Which approach would be better?
Or is there another way to do it? I am not beginner in database design.
Generally we split into two tables - this is called Database Normalisation: https://en.wikipedia.org/wiki/Database_normalization
Especially so if the same-columns are related in some way eg if they are contact details for two different kinds of people (eg customers vs employees) then you can name the new table after that (eg the new database would be called contact_details or something).
Basically: you're not saving anything (not space or optimising performance or anything) by storing it all in the same table. All you're doing is making the data more confusing for yourself or other developers to understand in the future.
Also, think of future queries you create - every one of them will have to filter out the data that's not relevant - which makes every single query more complex. Better to store different things in different tables and make it easier to understand-at-a-glance what's going on in your database+code
Related
Let's say I have a canvas where there are various objects I can add in, such as a/an:
Drawing
Image
Chart
Note
Table
For each object I need to store the dimensions and the layer order, for example something like this:
ObjectID
LayerIndex
Dimensions ((x1, y1), (x2, y2))
Each of the objects have vastly different properties and so are stored in different tables (or classes or whatever). Would it be possible to store this into a relational database, and if so, how could it be done? In JSON it would be something like this:
// LayerIndex is the ArrayIndex
// No need to store ObjectID, since the object is stored within the array itself
Layers = [
{Type: Drawing, Props: <DrawingPropertyObj>, Dimensions: [(1,2), (3,4)]},
{Type: Chart, Props: <ChartPropertyObj>, Dimensions: [(3,4), (10,4)]},
{Type: Table, Props: <TablePropertyObj>, Dimensions: [(10,20), (30,44)]},
...
]
The one option I thought of is storing a FK to each table, but in that case, I could potentially join this to N different tables for each object type, so if there are 100 object types, ...
A "strict" relational database doesn't suit this task well because you're left with a choice of:
Different tables for each object type with a columns for each attribute that applies to that particular object type
A single table for all object types, with columns for each attribute, most of which aren't used for any given object type
A child table, one row for each attribute
Before moving on to a good general solution., let's discuss these:
1. Different tables for each object type
This is a non-starter. The problems are:
high maintenance cost: you must create a new table every time you add a new object type to your app
painful queries: you must join to every table, either horizontally - every table joined into one enormously long row, or vertically in a series unioned joins, leading to a sparse array (see option 2)
2. A single table for all object types
Although you're dealing with a sparse array, if most object types use most of the attributes (ie it's not that sparse), this is a good option. However, if the number of different attributes across your domain is high, and/or most attributes aren't used by all types, you have to add columns when introducing a new type, which although better than adding tables, still requires a schema change for a new type = high maintenance
3. A child table
This is the classic approach, but the worse to work with, because you either have to run a separate query to collect all the attributes for each object (slow, high maintenance), or write separate queries for each object type, joining to the child table once for each attribute to flatten out the many rows into one row for each object, effectively resulting in option 1, but with an even higher maintenance cost writing the queries
None of those are great options. What you want is:
One row per object
Simple queries
Simple schema
Low maintenance
A document database, such as Elasticsearch gives you all of this out of the box, but you can achieve the same effect with a relational database by relaxing "strictness" and saving the whole object as json in a single column:
create table object (
id int, -- typically auto incrementing
-- FK to parent - see below
json text -- store object as json
);
BTW, postgres would be a good choice, because it has native support for json via the json datatype.
I have used this several times in my career, always successfully. I added a column for the object class type (in a java context):
create table object (
id int,
-- FK to parent - see below
class_name text,
json text
);
and used a json library to deserialize the json using the specified class into an object of that class. Whatever language you're using will have a way of achieving this idea.
As for the hierarchy, a relational database does this well. From the canvas:
create table canvas (
id int,
-- various attributes
);
If objects are not reused:
create table object (
id int,
canvas_id int not null references canvas,
class_name text,
json text,
layer int not null
);
If objects are reused:
If objects are not reused:
create table object (
id int,
class_name text,
json text
);
create table canvas_object (
canvas_id int not null references canvas,
object_id int not null references object,
layer int not null
);
You have many options as shown below.
There is not much difference in which one you pick, but I would avoid the multi-table design which is the one you said. An object type with 100 properties would be scattered in 101 tables for no gain. 101 disk page accesses for each object type being read. That's unnecessary (if those pages are cached then this problem would be lesser than otherwise but is still waste).
Even dual table is not really necessary if you don't wish to filter things like 'all objects with color=red', but I guess performance is not so urgent to reach to this point, other things matters more, or other bottlenecks have more influence in performance, so pick the one of the no-more-than-dual-table that fits best for you.
Single table - flexible schema per object type
objlayerindex
type
props
x0
y0
x1
y1
0
drawing
{color:#00FF00,background-color:#00FFFF}
1
2
3
4
1
chart
{title:2021_sales,values:[[0,0],[3,4]]}
11
22
33
44
in props the keys are used for flexibility, different objects of the same type may have different keys, for example a chart without subtitle can omit this key.
Single table - fixed schema per object type
objlayerindex
type
props
x0
y0
x1
y1
0
drawing
#00FF00,#00FFFF
1
2
3
4
1
chart
2021_sales,"[[0,0],[3,4]]"
11
22
33
44
this schema is fixed - drawing always has color+backgroundcolor; chart always have title+values; etc - less space used but changing schema involves some work on already existing data.
Dual table
Main
objlayerindex
type
x0
y0
x1
y1
0
drawing
1
2
3
4
1
chart
11
22
33
44
Properties
objlayerindex
propertyname
propertyvalue
0
color
#00FF00
0
background-color
#00FFFF
1
title
2021_sales
1
values
[[0,0],[3,4]]
here we assume that property ordering is not important. If it is, an extra column propertyindex would be needed. For those who love normalization, it is possible also to take propertyname out of this table to a propertykey-propertydescription and reference it by its propertykey.
Multi table
Main
objlayerindex
type
x0
y0
x1
y1
0
drawing
1
2
3
4
1
chart
11
22
33
44
Color
objlayerindex
colorcode
0
#00FF00
Background-Color
objlayerindex
colorcode
0
#00FFFF
Title
objlayerindex
title
1
2021_sales
Values
objilayerindex
chart
1
[[0,0],[3,4]]
Specifically this kind of data can be normalized one extra level:
Values
objlayerindex
datapoint
x
y
1
0
0
0
1
1
3
4
You can also use non-relational formats.
Document (Json) Store
[
{type:drawing,props:{color:#00FF0,background-color:#00FF0},position:[1,2,3,4]},
{type:chart,props:{title:2021_sales,values:[[0,0],[3,4]]},position:[11,22,33,44]}
]
we are citing here because it is a popular and simple format, but different encodings can be used instead of JSON (CSV, protocolbuffers, avro, etc)
The commission classification column should be able to store integers up to a maximum value of 99 and be named comm_id. The value of the Comm_id column should be set to a value of 10 automatically if no value is provided when a row is added. The benefits code column should also accommodate integer values up to a maximum of 99 and be named ben_id.
A new table, commrate, must be created to store the commission rate schedule and must contain the following columns:
comm_id: a numberic column similar to the one added to the ACCTMANAGER table
Comm_rank: A character field that can store a tank name allowing up to 15 characters
Rate: a numeric field that can store two decimal digits (such as .01 or .03
a new table, benefits, must be created to store the available benefit plan options and must contain the following columns:
ben_id: a numberic column similar to the one added to the acctmanager table
ben_plan: a character field that can store a single character value
ben_provider: a numberic field that can store a three-digit integer
active: a character field that can hold a value of Y or N
my code for oracle is
For the first table is for acctmanager
alter table ACCTMANAGER
add ( Comm_id number(2) default 10,
Ben_id number(2)
);
the reason why i chose number is because it wants a max value of 99
second table
create table COMMRATE
( Comm_id number(2) default 10,
Comm_rank varchar2(15),
Rate number(0,2)
);
I think that's right, but the problem i have is for the comm_rank because i can choose rather varchar2 or char, but i prefer varchar2
the third table for benefits
create table BENEFITS
( Ben_id number(2),
Ben_plan char(1),
Ben_provider number(3),
Active varchar2(1)
);
the last column for active, i choose varchar2, but i think its better to choose char because its just one character that can hold a value of Y or N. Or maybe i should pick char?
To provide context, this question comes from the text, Oracle 12c: SQL by Joan Casteel.
The question is from chapter 3 and did not cover the concept of constraints yet. Constraints are covered in chapter 4.
For the second table, I got:
CREATE TABLE commrate
( comm_id NUMBER(2) DEFAULT 10,
comm_rank VARCHAR2(15),
rate NUMBER(2,2) );
The only difference was the change from 0 to 2 in "rate NUMBER(2,2)" because with the syntax NUMBER(p,s)...p indicates the total number of digits to the left AND right of the decimal position.
.01 to .99 have 2 digits to the right of the decimal.
I think...
I have a legacy database that I am doing some ETL work on. I have columns in the old table that are conditionally mapped to columns in my new table. The conditions are based on an associated column (a column in the same table that represents the shape of an object, we can call that column SHAPE). For example:
Column dB4D is mapped to column:
B4 if SHAPE=5
B3 if SHAPE=1
X if SHAPE=10
or else Y
I am using a condition to split the table based on SHAPE, then I am using 10-15 "copy column" transformations to take the old column (dB4D) and map it to the new column (B4, B3, X, etc).
Some of these columns "overlap". For example, I have multiple legacy columns (dB4D, dB3D, dB2D, dB1D, dC1D, dC2D, etc) and multiple new columns (A, B, C, D, etc). In one of the "copy columns" (which are broken up by SHAPE) I could have something like:
If SHAPE=10
+--------------+--------------+
| Input Column | Output Alias |
+--------------+--------------+
| dB4D | B |
+--------------+--------------+
If SHAPE=5
+--------------+--------------+
| Input Column | Output Alias |
+--------------+--------------+
| dB4D | C |
+--------------+--------------+
I need to now bring these all together into one final staging table (or "destination"). Not two rows will have the same size, so there is no conflict. But I need to map dB4D (and other columns) to different new columns based on a value in another column. I have tried to merge them but can't merge multiple data sources. I have tried to join them but not all columns (or output aliases) would show up in the destination. Can anyone recommended how to resolve this issue?
Here is the current design that may help:
As inputs to your data flow, you have a set of columns dB4D, dB3D, dB2D, etc.
Your destination will only have column names that do not exist in your source data.
Based on the Shape column, you'll project the dB columns into different mappings for your target table.
If the the Conditional Split logic makes sense as you have it, don't try and Union All it back together. Instead, just wire up 8 OLE DB Destinations. You'll probably have to change them from the "fast load" option to the table name option. This means it will perform singleton inserts so hopefully the data volumes won't be an issue. If they are, then create 8 staging table that you do use the "Fast Load" option for and then have a successor task to your Data Flow to perform set based inserts into the final table.
The challenge you'll run into with the Union All component is that if you make any changes to the source, the Union All rarely picks up on the change (the column changed from varchar to int, sorry!).
Let's say I have an item A,B,C in Table1.
They all have attributes f1. However, A and B has f2 which does not apply to C.
Table1 would be designed as:
itemName f1 f2
------------------------------------
A 100 50
A 43 90
B 66 10
C 23
There would be another table Table2 contains all the possible value of f2:
itemName f2(possible value)
------------------------------------
A 50
A 90
A 77
B 10
Let's say now i want to add an record with the highest value of f2 into Table1,depends on the iteaName. Things working fine for A and B. But in the case of C, when i loop through Table2, since there is no record of C in Table2, I cannot distinguish if it's a corrupted table or the fact that C just does not have attribute f2.
The only twos ways i can think of to solve this issue is:
1. Adding a constraint in the code, like:
if (iteaName == C )
"Do not search Table2"
else (search Table2)
if (No record)
return "Corrupted Table"
Or
2. Adding another bool field "having_f2"in Talbe1 to helping identifying that f2 does not apply to C.
The above is just an example of where to put such business logic constraints, in the DB or in the code.
Can you give me more opinions on the tradeoff between the above two ideology? In another word, which one makes more sence.
Since this is basically a field validation ("if MyModel can have property f2 set to NULL (inexistent)"), I would say, you must do that in a validator of your model.
Only if that is impossible, add some columns to model tables.
The rule I use is the following: database is used to store model data. You should try to store nothing else, except data, if possible. In your case has_f2 is not a data, but a business rule.
Of course, there are exceptions to this rule. For example, sometimes business logic must be controlled by the user and in this case it is perfectly ok to store it in the database.
Regarding your second proposal: you typically can also just query for a ~NULL value in the table, which would be the same as adding and setting a boolean attribute (would be better considering redundancy). This would also be the way to detect if the table is "corrupt". However, you can also start your query by collecting all "itemName" entries from table2, possibly building an intersection with table1 and inserting the cases of interest into table1:
1.) Intersect the "itemName" from table1 and table2 => table3
2.) Join the table3 and table2 on "itemName", "f2" => insert each tuple into table1
Alternatively, you can also split table1 into two tables { "itemName", "f1" } and { "itemName", "f2" } which would eliminate your problem.
I have to add a coupon table to my db. There are 3 types of coupons : percentage, amount or 2 for 1.
So far I've come up with a coupon table that contains these 3 fields. If there's a percentage value not set to null then it's this kind of coupon.
I feel it's not the proper way to do it. Should I create a CouponType table and how would you see it? Where would you store these values?
Any help or cue appreciated!
Thanks,
Teebot
You're correct, I think a CouponType table would be fit for your problem.
Two tables: Coupons and CouponTypes. Store the CouponTypeId inside the Coupons table.
So for an example, you'll have a Coupon record called "50% off", if would reference the percent off CouponType record and from there you could determine the logic to take 50% off the cost of the item.
So now you can create unlimited coupons, if it's a dollar amount coupon type it will take the "amount" column and treat it as a dollar amount. If it's a percent off it will treat it as a percentage and if it's an "x for 1" deal, it will treat the value as x.
- Table Coupons
- ID
- name
- coupon_type_id # (or whatever fits your style guidelines)
- amount # Example: 10.00 (treated as $10 off for amount type, treated as
# 10% for percent type or 10 for 1 with the final type)
- expiration_date
- Table CouponTypes
- ID
- type # (amount, percent, <whatever you decided to call the 2 for 1> :))
In the future you might have much more different coupon types. You could also have different business logic associated with them - you never know. It's always useful to do the things right in this case, so yes, definitely, create a coupon type field and an associated dictionary table to go with it.
I would definitely create a CouponType lookup table. That way you avoid all the NULL's and allow for more coupon types in the future.
Coupon
coupon_id INT
name VARCHAR
coupon_type_id INT <- Foreign Key
CouponType
coupon_type_id INT
type_description VARCHAR
...
Or I suppose you could have a coupon type column in your coupon table CHAR(1)