Redundant info in new table? - database

A TASK is linked to a TASK_ROLE that associates parties + roles to a specific TASK. I need another table, TASK_RATE, that stores the rate of every party that participoates in a specific TASK. I could get those associations with a FK to TASK_ROLE, but the problem is that a party can have more than one role and can be listed more than once.
TASK:
PKIDTASK
1
2
3
TASK_ROLE:
PKIDTASK_ROLE | IDTASK | IDPARTY | IDROLE
1 1 8 3
2 1 8 2
3 1 5 2
4 1 4 2
I tought about creating the table like this
TASK_RATE:
PKIDTASK_RATE | IDTASK | IDPARTY
1 1 8
2 1 5
3 1 4
IDTASK + IDPARTY are unique, unlike in TASK_ROLE. But wouldn't be this redundant info since PARTY-TASK associations are already defined in TASK_ROLE?
How do I solve this?

If you want to fully normalize your data model, you probably need something like this:
Since parties are assigned to tasks, and these assignments have various other related data, you should start by intersecting the many-to-many between PARTY and TASK.
Other things that you are tracking are offshoots of this intersection. For example, each party is assigned to one (or more?) roles out of the available party task assignments.
Similarly, For a given party task assignment there will be multiple rates (over time). The Party Task Rate table will need an effective and expiry date (+time, if appropriate).

Related

Dimension model (recursive / hierarchical) for Data Warehouse

I'm having difficulty connecting a dimension table (recursive/hierarchical) to a fact table as there are concerns/issues to deal with:
The dimension table belongs to a parent-child relationship structure
From the original table, it keeps growing
id
item_name
parent_id
1
classification
null
2
category
null
3
group
null
4
modern
1
5
modified
1
6
tools
2
7
meters
2
8
metal
3
9
plastic
3
10
lead
8
11
alloy
8
Denormalizing this kind of table is not suitable as a new entity type comes in, it would affect the dimension structure.
What is the best approach to this type?
Kindly provide an example and what would be the query statement after connecting the fact and dimension.

How to model arbitrarily ordering items in database?

I accepted a new feature to re-order some items by using Drag-and-Drop UI and save the preference for each user to the database. What's the best way to do so?
After reading some questions on StackOverflow, I found this solution.
Solution 1: Use decimal numbers to indicate order
For example,
id item order
1 a 1
2 b 2
3 c 3
4 d 4
If I insert item 4 between item 1 and 2, the order becomes,
id item order
1 a 1
4 d 1.5
2 b 2
3 c 3
In this way, every new order = order[i-1] + order[i+1] / 2
If I need to save the preference for every user, then I need to another relationship table like this,
user_id item_id order
1 1 1
1 2 2
1 3 3
1 4 1.5
I need num_of_users * num_of_items records to save this preference.
However, there's a solution I can think of.
Solution 2: Save the order preference in a column in the User table
This is straightforward by adding a column in the User table to record the order. Each value would be parsed as an array of item_ids that ranked by the index of the array.
user_id . item_order
1 [1,4,2,3]
2 [1,2,3,4]
Is there any limitation of this solution? Or is there any other ways to solve this problem?
Usually, an explicit ordering deals with the presentation or some specific processing of data. Hence, it's a good idea to separate entities of theirs presentation/processing. For example
users
-----
user_id (PK)
user_login
...
user_lists
----------
list_id, user_id (PK)
item_index
item_index can be a simply integer value :
ordered continuously (1,2...N): DELETE/INSERT of the whole list are normally required to change the order
ordered discretely with some seed (10,20...N): you can insert new items without reordering the whole list
Another reason to separate entity data and lists: reordering lists should be done in transaction that may lead to row/table locks. In case of separated tables only data in list table is impacted.

how to see a difference between entity and a column

Sometimes I am having a hard time seeing a difference between an entity and a column when I am starting to make a diagram. I don't know when it is supposed to be a entity or a column. For example, in some game if you have a user and that user can play by itself or it can play in the group. Would you make that two different entities User and GroupUser ?
Also, for example if the User has levels, status and badges they earn which is part of the game. Would these be entities also or they would just be in one entity which would be part of the User ?
Entity could be a Person (e.g. Student), Place (e.g. Room Name), Object (e.g. Books), Abstract Concept (e.g. Course, Order) that could be represented in your database and normally could become a Table in your Database.
Column(s) on the other hand is/are the attribute(s) of your Entity.
So, in your case you have a User entity and the possible columns or attributes (or fields) are
UserID, UserLevel, UserStatus, Badges, PlayStatus (values could be individual or group).
Your Badges although is a column could turn into Entity if it violates the Normalization rules.
For example if you have this Table for User:
Table: Users
UserID UserName UserStatus PlayStatus Badges
------ -------- ---------- ---------- ------
1 Surefire Active Single Private, Warrior, Platoon Leader
2 FastMachine Active Group Private, Warrior
3 BeatTheGeek Inactive Group Private
The Badges here violates the 1NF (1st Normal Form) in Normalization rules which says that there should be no repeating groups or in this case no Multi-valued columns. So, this could be normalized like:
Table: Users
UserID UserName UserStatus PlayStatus
------ -------- ---------- ----------
1 Surefire Active Single
2 FastMachine Active Group
3 BeatTheGeek Inactive Group
Table: Badges
BadgeID BadgeName
------ --------
1 Private
2 Indie
3 Warrior
4 Platoon Leader
5 Colonel
6 1 Star General
7 2 Star General
8 3 Star General
9 4 Star General
10 5 Star General
11 Hero
Table: UserBadgesHistory
UserID BadgeID ReceiveDate
------ -------- -----------
1 1 12/01/2013
1 3 12/05/2013
1 4 1/5/2014
2 1 2/5/2014
2 3 2/10/2014
3 2 11/10/2013
In general, an entity has multiple columns (i.e. attributes) of its own, and a column (or attribute) does not.
In your example, if the only data you're interested in storing is a User's current level, then level is unlikely to be an entity. This is because it would have only a single attribute of name/number. If you wanted to find all Users currently at level 4, you would simply do a query with level = 4.
On the other hand, if you had a reason to add additional data about the level, such as what abilities are associated with that level or the date a given User achieved the level, then you would want to make Level a separate entity.
A Level entity would have an ID, a number or name, and whatever other attributes you need as data.
ID | Prerequisite | Ability
----+--------------+--------------
1 | NULL | May gain foos
2 | Gain 10 foos | May gain bars
3 | Gain 20 bars | 30 free foos
In a fully normalized state, you would have another entity called UserLevel in which you would store data about, for example, when a certain User gained a level.
The UserLevel entity would contain the LevelID and the UserID as foreign keys (links back to the other entities), and a DateAchieved column for when the User achieved the level.
LevelID | UserID | DateAchieved
---------+--------+-------------
1 | 1 | 2014-02-01
1 | 2 | 2014-02-01
2 | 1 | 2014-02-05
3 | 1 | 2014-02-09
2 | 2 | 2014-02-11
4 | 1 | 2014-02-13
This shows User 1 and User 2 starting at Level 1 on the same day and leveling up at different rates.

Should I add a common property of foreign keys to my table?

I have a database of test data that have been collected on behalf of agents. The test data are grouped together (after the fact) into result sets. As the tests come in, they are stored in the database with the ID of the corresponding agent:
TEST_ID TEST_OWNER TIMESTAMP RESULT_ID
1 1 0 null
2 1 15 null
3 2 30 null
4 2 32 null
5 1 34 null
The result sets are generated at a later time in such a way that groups tests that took place during a similar time frame. This judgment cannot be made as the tests come in.
RESULT_ID
1
2
3
All of the tests in a result set must belong to the same owner. I can ensure this (in code) as I assign the result IDs to the tests in my later operation, but some things would be easier if I had a TEST_OWNER field in my result set table.
Would adding this field be a violation of some normalization goal? The TEST_OWNER information will be duplicated, even though one instance of it is really implicit. I'm not a DBA, and I don't want to do things that are bad style.
Jim I am not completely sure if you are saying this is a table in your DB??
TEST_ID TEST_OWNER TIMESTAMP RESULT_ID
1 1 0 null
2 1 15 null
3 2 30 null
4 2 32 null
5 1 34 null
If so the first thing I would do is pull the result attribute out of this table to achieve normalization. Or is this your Result table?
Regardless are these results being derived from from other data in the DB? If so I don't see the need to duplicate things and store the results (calculated) also. Just derive as needed and keep the DB clean.
If you need further info I need a better understanding of what you are presenting.

Explaining row and column dependencies

This is a simple and common scenario at work, and I'd appreciate some input.
Say I am generating a report for the owners of a pet show, and they want to know which of their customers have bought how many of each pet. In this scenario my only tools are SQL and something that outputs my query to a spreadsheet.
As the shop owner, I might expect reports in the form:
Customer Dog Cat Rabbit
1 2 3 0
2 0 1 1
3 1 2 0
4 0 0 1
And if one day I decided to stock Goldfish then the report should now come out as.
Customer Dog Cat Rabbit Goldfish
1 2 3 0 0
2 0 1 1 0
3 1 2 0 0
4 0 0 1 0
5 0 0 0 1
But as you probably know, to have a query which works this way would involve some form of dynamic code generation and would be harder to do.
The simplest query would work along the lines of:
Cross join Customers and Pets, Outer join Sales, Group, etc.
and generate:
Customer Pet Quantity
1 Dog 2
1 Cat 3
1 Rabbit 0
1 Goldfish 0
2 Dog 0
2 Cat 1
2 Rabbit 1
...etc
a) How would I explain to the shop owners that the report they want is 'harder' to generate? I'm not trying to say it's harder to read, but it is harder to write.
b) What is the name of the concept I am trying to explain to the customer (to aid with my Googling)?
The name of the concept is 'cross-tab' and can be accomplished in several ways.
MS Access has proprietary extensions to SQL to make this happen. SQL pre-2k5 has a CASE trick and 2k5 and later has PIVOT, but I think you still need to know what the columns will be.
Some databases indeed support some way of creating cross tables, but I think most need to know
the columns in advance, so you'd have to modify the SQL (and get a database that supports such an extension).
Another alternative is to create a program that will postprocess the second "easy" table to get your clients the cross table as output. This is probably easier and more generic than having to modify SQL or dynamically generate it.
And about a way to explain the problem... you could show them in an Excel how many steps are needed to get the desired result:
Source data (your second listing).
Select values from the pets column
Place each pet type found on a new column
Count values per each type per client
Fill the values
and then say that SQL gives you only the source data, so it's of course more work.
This concept is called pivoting
SQL assumes that your data is represented in terms of relations with fixed structure.
Like, equality is a binary relation, "customer has this many pets of this type" is a ternary relation and so on.
When you see this resultset:
Customer Pet Quantity
1 Dog 2
1 Cat 3
1 Rabbit 0
1 Goldfish 0
2 Dog 0
2 Cat 1
2 Rabbit 1
, it's actually a relation defined by all possible combinations of domain values being in this relation.
Like, a customer 1 (domain customers id's) has exactly 2 (domain positive numbers) pets of genus dog (domain pets).
We don't see rows like these in the resultset:
Customer Pet Quantity
1 Dog 3
Pete Wife 0.67
, because the first row is false (customer 1 doesn't have 3 items of dog, but 2), and the second row values are out of their domain scopes.
SQL paradigma implies that your relations are defined when you issue a query and each row returned defines the relation completely.
SQL Server 2005+ can map rows into columns (that is what you want), but you should know the number of columns when designing the query (not running).
As a rule, the reports you are trying to build are built with reporting software which knows how to translate relational SQL resultsets into nice looking human readable reports.
I have always called this pivoting, but that may not be the formal name.
Whatever it's called you can do almost all of this in plain SQL.
SELECT customer, count(*), sum(CASE WHEN pet='dog' THEN 1 ELSE 0 END) as dog, sum(case WHEN pet='cat' THEN 1 ELSE 0 END) as cast FROM customers join pets
Obviously what's missing is the dynamic columns. I don't know if this is possible in straight SQL, but it's certainly possible in a stored procedure to generate the query dynamically after first querying for a list of pets. The query is built into a string then that string is used to create a prepared statement.

Resources