Fully optional one to one relation in MySQL workbench - database

Fully optional one to one relation in MySQL workbench?
I'm only able to create a partially optional one to one relation.
My case is:
A GROUP can be assigned a PROBLEM
A PROBLEM can be assigned to a GROUP
EDIT1:
EDIT2: Maybe a better question would be if fully optional one to one relations should be avoided?

Let's see if any of this addresses your issue.
A GROUP can be assigned a PROBLEM
A PROBLEM can be assigned to a GROUP
Starting with a structure such as:
PROBLEM
id | title
1 | Prob1
2 | Prob2
GROUP
id | title
1 | Group1
2 | Group2
What is also important is to know whether a GROUP can be assigned more than one problem at a time or not. And whether one same problem can be assigned to more than one GROUP.
Let's say there is a strict optional 1:1 relationship. This means a group cannot have 2 problems assigned at the same time and that 1 same problem cannot be assigned to 2 groups.
A strict 1:1 would be implemented by adding the PK of table A as a FK of table B. If the FK is nullable then you will notice this is already an optional 1:1, as you may leave empty cells indicating 0 problems assigned (or 0 groups assigned).
PROBLEM
id | title
1 | Prob1
2 | Prob2
GROUP
id | title | problem
1 | Group1 | 2
2 | Group2 | null
In this example Group2 has been assigned no problem. Group1 has been assigned Prob2 and Prob1 has been assigned to no group.
You are not forced to assign anything but everything may have a 1:1 relationship.
This structure may imply quite a few empty (null) values. This is not best practices but would do the job. If you want to avoid null values then you may have to go for a N:M implementation.
PROBLEM
id | title
1 | Prob1
2 | Prob2
GROUP
id | title
1 | Group1
2 | Group2
GROUP_PROBLEM
group | problem
1 | 2
With this implementation alone you may have 1 group be assigned more than 1 problem and have 1 same problem be assigned to more than 1 group. But if you define a UNIQUE index for each of the two fields (group and problem) then you should fix this.

Related

Natural key when combining two source tables that use similar promary keys

I have two source tables, one is basically an invoice, the other is a migrated invoice. The same object should probably have been used for both, but I have this instead. They contain most of the same data.
I had thought to combine both into a dimension table, however both will use the same natural keys. How should I approach this?
One potential solution I thought of was using negative numbers for the migrated table, but then the natural keys won't align exactly with the source.
Do I just combine them in the fact table? Then I can't link back to the dimension table for either due to NULLs.
Or do I add an additional column or information to indicate which type of invoice it is?
EDIT
Simple models of the current tables below.
The dimension currently only contains the non migrated data, it has a primary key, however
if i merge the migrated invoice table in to this, it will appear as if the changes are being
made to the original invoices and not a second set of invoices
Dimension
surrogate_key| source_pk | Total | scd_from | scd_to
| | | |
1 | 1 | 100 | 01/01/2019 | 31/01/2019
2 | 1 | 150 | 01/02/2019 | 31/12/2019
3 | 2 | 50 | 01/01/2019 | 31/12/9999
source invoice table
pk | Total
___________________
1 | 150
2 | 50
source migrated invoice table
pk | total
___________________
1 | 200
2 | 300
If invoice and migrated invoice have same natural key but some of the fields have different values (your example shows Total amount different between them), then you have one row based on the natural key in the Dim but 2 different columns to represent the 2 sources. Based on your example, you need invoice_Total and migrated_invoice_Total columns in your DIM.

SQL Server 2005 & Up - Concatenated Value based on Value in Joined Table

Prelude: The design of this database is truly horrible - this isn't the first "crooked" question I've asked, and it won't be the last. The question is what it is, and I'm only asking because A) I only have a couple of years of experience with SQL Server, and B) I've already been pounding my face on my keyboard for a couple of days trying to find a viable solution.
Having said all of that...
We have a database with two tables relevant to this issue. The schema is ridiculous, so I'm going to paraphrase so that it can be understood:
T_Customer & T_Task
T_Task holds data about various work that has been performed on behalf of a customer in T_Customer.
In T_Customer, there is a field called "Sort_Type" (again, paraphrasing...). In this field, there is a concatenated string of various fields from T_Task which, in the order specified, determine how the customer's report is produced in the client program. There are a total of 73 possible fields in T_Task that can be chosen as Sort_Type in T_Customer, and the user can choose up to 5 of them in any given order. For example:
T_Customer
Customer_ID | Sort_Type
------------|-------------------------------
1 | 'Task_Date,Task_Type,Task_ID'
2 | 'Task_Type,Destination'
3 | 'Route,Task_Type,Task_ID'
T_Task
Task_ID | Customer_ID | Task_Type | Task_Date | Route | Destination
--------|-------------|-----------|-----------|---------|-------------
12345 | 1 | 1 | 01/01/2017| '1 to 2'| '2'
12346 | 1 | 1 | 01/02/2017| '3 to 4'| '4'
12347 | 2 | 2 | 12/31/2016| '6 to 2'| '2'
12348 | 3 | 3 | 01/01/2017| '4 to 1'| '1'
In this example, Customer #1's report would be sorted/totaled by the Task_Date, then by Task_Type, then by Task_ID; but not simply by doing an ORDER BY. This function requires one single value which can be ordered as a whole, single unit. As such...
Up until today, a field existed in T_Task called (paraphrasing....) 'MySort'. This field contained a concatenated string of fixed-width values filled in with zeroes and created according to the order and content of the values in T_Customer.Sort_Type. In this case:
Task_ID | Customer_ID | Task_Type | Task_Date | Route | Destination | MySort
--------|-------------|-----------|-----------|-------|-------------|-------
12345 | 1 | 1 | 01/01/2017| 1 to 2| 2 |'002017010100000000010000012345'
12346 | 1 | 1 | 01/02/2017| 3 to 4| 4 |'002017010200000000010000012346'
12347 | 2 | 2 | 12/31/2016| 6 to 2| 2 |'000000000000000000020000000002'
12348 | 3 | 3 | 01/01/2017| 4 to 1| 1 |'000040to0100000000030000012348'
During the printing phase of every single report, the program would search for the customer, find the values in T_Customer.Sort_Type, split them by commas, and then run an update on all of the tasks of that customer to update the value of MySort accordingly...
Can you guess what the problem is with this? Performance (not to mention chronic insanity)
I have been tasked with finding a more efficient way of performing this same task server-side, within SQL Server 2005 if possible, using whatever means will eventually allow me to return a result set including all of the details of the tasks requested, together with a concatenated string similar to the one used in the past (which the client program relies upon in order to sort and subtotal the report).
I've tried Views, UDFs in computed columns, and parameterized queries, but I know my limitations. I'm too inexperienced to know all of my options.
Question: Aside from quitting (not an option) or going berserk (considering it...), what methods might you use to solve this problem?
EDIT: Having received two questions about the MySort column already,
I'll explain a bit better.
T_Task.MySort =
REPLICATE('0',10 - LEN(T_Customer.Sort_Type[Value1])
+ CAST(T_Customer.Sort_Type[Value1] AS VARCHAR(10))
+
REPLICATE('0',10 - LEN(T_Customer.Sort_Type[Value2])
+ CAST(T_Customer.Sort_Type[Value2] AS VARCHAR(10))
+
REPLICATE('0',10 - LEN(T_Customer.Sort_Type[Value3])
+ CAST(T_Customer.Sort_Type[Value3] AS VARCHAR(10))
WHERE T_Customer.Customer_ID = T_Task.Customer_ID
...Up to T_Customer.Sort_Type[Value5].
Reminder: Those values are not constants at all, so the value of the
field MySort had to constantly be updated before printing a report.
The idea is to somehow remove the need to constantly update the field,
and instead return the string as part of the result set.
The resulting string should always be 50 characters in length. I
didn't do that here simply to save a bit of space and time - I chose
only 3 for the example. The real string would simply have another
twenty zeroes leading the value:
'00000000000000000000002017010100000000010000012345'

Two ways to store the same data

At work we're creating a form to allow property agents to submit their new developments. A simplified version of our form is the following:
Bedrooms: [Enter a number]
Quantity: [Enter a number]
Add Another | Save
We allow agents to add multiple rows. However at the moment we have absolutely zero validation for duplicates, which in my opinion allows our database to store identical data in two ways:
| development_id | bedrooms | quantity |
|----------------|----------|----------|
| 1 | 3 | 1 |
| 1 | 3 | 1 |
| 1 | 3 | 3 |
Clearly a row could represent both one unit or a group of units.
I'm arguing that we should store the developments either one way of the other, but certainly not both. Unfortunately the back-end developers — I'm mostly front-end — are arguing that it's not a big deal, and to me that seems absurd.
For a simple example, by storing it as the above, a COUNT to obtain how many developments are for sale that have 3 bedrooms requires a SELECT COUNT(*) and consideration of the quantity field.
As a front-end developer it seems largely to be presentation logic, because transforming between rendering them as a list of single units, or grouping them together should be a front-end/API task, and the business logic should be one way or the other. Ultimately our table seems to be not normalised at all.
In my humble opinion there should be a unique index on development_id, bedrooms.
Am I right in my argument? Or horribly wrong?
Edit:
In clarification all of these are currently possible, all of which represent the same fact, and my argument is there should be only one way:
| development_id | bedrooms | quantity |
|----------------|----------|----------|
| 1 | 3 | 1 |
| 1 | 3 | 1 |
| 1 | 3 | 1 |
Same as:
| development_id | bedrooms | quantity |
|----------------|----------|----------|
| 1 | 3 | 1 |
| 1 | 3 | 2 |
Same as:
| development_id | bedrooms | quantity |
|----------------|----------|----------|
| 1 | 3 | 3 |
You're right, there should be only one way to record each fact in a database and duplicate rows should not be allowed. If each row represents the quantity of units that have a certain number of bedrooms in a particular development, then a unique key on development_id, bedrooms makes sense, and will prevent multiple entries for the same kind of unit in each development.
Funnily, you & backend colleagues/rivals are both right.
It's not a big deal, for real (in the shown circumstances).
Although it really violates DB normalization (in the shown circumstances).
From what you reveal, there's no need to split into multiple rows.
Although imagine it gains another attribute that distinct one three-bed from another, from now on. Say, apt plan. Or a timestamp, for whatever reason.
It immediately starts making sense then.
Another thing here: reads are generally non-blocking, writes are.
That means, on a mature RDBMS with row-level locks, the inserts (and reads for COUNT) won't be competing, while updates to a counter would.
Although I'm way far from thinking your realty agents combined would ever achieve even single-digit TPS in their additions, so you may consider an issue non-existent for a scale. :-)

Database design - storing a sequence

Imagine the following: there is a "recipe" table and a "recipe-step" table. The idea is to allow different recipe-steps to be reused in different recipes. The problem I'm having relates to the fact that in the recipe context, the order in which the recipe-steps show up is important, even if it does not follow the recipe-step table primary-key order, because this order will be set by the user.
I was thinking of doing something like:
recipe-step table:
id | stepName | stepDescription
-------------------------------
1 | step1 | description1
2 | step2 | description2
3 | step3 | description3
...
recipe table:
recipeId | step
---------------
1 | 1
1 | 2
1 | 3
...
This way, the order in which the steps show up in the step column is the order I need to maintain.
My concerns with this approach are:
if I have to add a new step between two existing steps, how do I query it? What if I just need to switch the order of two steps already in the sequence?
how do I make sure the order maintains its consistency? If I just insert or update something in the recipe table, it will pop up at the end of the table, right?
Is there any other way you would think of doing this? I also thought of having a previous-step and a next-step column in the recipe-step table, but I think it would be more difficult to make the recipe-steps reusable that way.
In SQL, tables are not ordered.
Unless you are using an ORDER BY clause, database engines are allowed to return records in any order they feel is fastest (for example, a covering index might have the data in a different order, and sometimes even SQLite creates temporary covering indexes automatically).
If the steps have a specific order in a specific recipe, then you have to store this information in the database.
I'd suggest to add this to the recipe table:
recipeId | step | stepOrder
---------------------------
1 | 1 | 1
1 | 2 | 2
1 | 3 | 3
2 | 4 | 1
2 | 2 | 2
Note:
The recipe table stores the relationship between recipes and steps, so it should be called recipe-step.
The recipe-step table is independent of recipes, so it should be called step.
You probably need a table that stores recipe information that is independent of steps; this table should be called recipe.

Summarizing across multiple columns in sql or crystal

I was wondering if there was a way to get a distinct count on a certain column based on the value of a second column while still getting a total count of the first column. This is an example of the issue I'm facing. I have a query that returns an i-Vent type, ID, Status, and linked medication orders for a pharmacy intervention system. The interventions are grouped by i-Vent type. The Status can be one of five values or NULL. I need to be able to count how many i-Vents were recorded as each of the six possible values for Status.
An example set may look similar to this:
________________________________________________________
Type | ID | Status | Linked Meds
________________________________________________________
IV2PO | 1234 | Accepted | pantoprazole IV
IV2PO | 1234 | Accepted | pantoprazole PO
IV2PO | 1235 | NULL | NULL
IV2PO | 1236 | Pending | metoclopramide IV
IV2PO | 1236 | Pending | metoclopramide PO
IV2PO | 1236 | Pending | Pharmacy Consult - IV2PO
Consult | 1237 | Rejected | NULL
________________________________________________________
The group summary should list IV2PO having a total count of 3 with a count of 1 for "Accepted", 1 for "NULL", and 1 for "Pending"; and Consult having a total count of 1 with a count of 1 for "Rejected".
Please take notice of the duplicate values caused by having more than one medication/order liked to an i-Vent.
Ultimately I'm building the final report in Crystal Reports so if there is a way to get the correct counts there that would be fine as well. I have a version of this which uses a subreport to get the linked medications/orders, but I'd like to find a better alternative to take less time to run and use fewer resources.
Does anyone know of a way to do this?
Thanks!
In Crystal Reports you can use Count distinct summary option
When creating a "Summary", using the Count function may not be desirable. It is often the case that a report must only return the number of unique contact records, as other tables (i.e. History) may contain multiple rows for each customer.
Select Insert | Summary.
Select the fieldname you wish to summarize.
Make sure to select Distinct Count as the Summary Operation.

Resources