Django: following the reverse direction of the one-to-one relationship - database

I have a question about the way Django models the one-to-one-relationship.
Suppose we have 2 models: A and B:
class B(models.Model):
bAtt = models.CharField()
class A(models.Model):
b = models.OneToOneField(B)
In the created table A, there is a field "b_id" but in the table B created by Django there is no such field as "a_id".
So, given an A object, it's certainly fast to retrieve the corresponding B object, simply through the column "b_id" of the A's row.
But how does Django retrieve the A object given the B object?
The worst case is to scan through the A table to search for the B.id in the column "b_id". If so, is it advisable that we manually introduce an extra field "a_id" in the B model and table?
Thanks in advance!

Storing the id in one of the tables is enough.
In case 1, (a.b) the SQL generated would be
SELECT * from B where b.id = a.b_id
and in case 2, the SQL would be:
SELECT * from A where a.b_id = b.id
You can see that the SQLs generated either way are similar and depend respectively on their sizes.
In almost all cases, indexing should suffice and performance would be good with just 1 id.

Django retrieves the field through a JOIN in SQL, effectively fusing the rows of the two tables together where the b_id matches B.id.
Say you have the following tables:
# Table B
| id | bAtt |
----------------
| 1 | oh, hi! |
| 2 | hello |
# Table A
| id | b_id |
-------------
| 3 | 1 |
| 4 | 2 |
Then a join on B.id = b_id will create the following tuples:
| B.id | B.bAtt | A.id |
------------------------
| 1 | oh, hi! | 3 |
| 2 | hello | 4 |
Django does this no matter which side you enter the relation from, and is thus equally effective :) How the database actually does the join depends on the database implementation, but in one way or another it has to go through every element in each table and compare the join attributes.

Other object-relational mappers require you to define relationships on both sides. The Django developers believe this is a violation of the DRY (Don't Repeat Yourself) principle, so Django only requires you to define the relationship on one end.
Always take a look at the doc ;)

Related

Ensuring that two column values are related in SQL Server

I'm using Microsoft SQL Server 2017 and was curious about how to constrain a specific relationship. I'm having a bit of trouble articulating so I'd prefer to share through an example.
Consider the following hypothetical database.
Customers
+---------------+
| Id | Name |
+---------------+
| 1 | Sam |
| 2 | Jane |
+---------------+
Addresses
+----------------------------------------+
| Id | CustomerId | Address |
+----------------------------------------+
| 1 | 1 | 105 Easy St |
| 2 | 1 | 9 Gale Blvd |
| 3 | 2 | 717 Fourth Ave |
+------+--------------+------------------+
Orders
+-----------------------------------+
| Id | CustomerId | AddressId |
+-----------------------------------+
| 1 | 1 | 1 |
| 2 | 2 | 3 |
| 3 | 1 | 3 | <--- Invalid Customer/Address Pair
+-----------------------------------+
Notice that the final Order links a customer to an address that isn't theirs. I'm looking for a way to prevent this.
(You may ask why I need the CustomerId in the Orders table at all. To be clear, I recognize that the Address already offers me the same information without the possibility of invalid pairs. However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.)
From the related reading I was able to find, it seems that one method may be to enable a CHECK constraint targeting a User-Defined Function. This User-Defined Function would be something like the following:
WHERE EXISTS (SELECT 1 FROM Addresses WHERE Id = Order.AddressId AND CustomerId = Order.CustomerId)
While I imagine this would work, given the somewhat "generality" of the articles I was able to find, I don't feel entirely confident that this is my best option.
An alternative might be to remove the CustomerId column from the Addresses table entirely, and instead add another table with Id, CustomerId, AddressId. The Order would then reference this Id instead. Again, I don't love the idea of having to channel through an auxiliary table to get a Customer or Address.
Is there a cleaner way to do this? Or am I simply going about this all wrong?
Good question, however at the root it seems you are struggling with creating a foreign key constraint to something that is not a foreign key:
Orders.CustomerId -> Addresses.CustomerId
There is no simple built-in way to do this because it is normally not done. In ideal RDBMS practices you should strive to encapsulate data of specific types in their own tables only. In other words, try to avoid redundant data.
In the example case above the address ownership is redundant in both the address table and the orders table, because of this it is requiring additional checks to keep them synchronized. This can easily get out of hand with bigger datasets.
You mentioned:
However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.
But that is why a relational database is relational. It does this so that distinct data can be kept distinct and referenced with relative IDs.
I think the best solution would be to simply drop this requirement.
In other words, just go with:
Customers
+---------------+
| Id | Name |
+---------------+
| 1 | Sam |
| 2 | Jane |
+---------------+
Addresses
+----------------------------------------+
| Id | CustomerId | Address |
+----------------------------------------+
| 1 | 1 | 105 Easy St |
| 2 | 1 | 9 Gale Blvd |
| 3 | 2 | 717 Fourth Ave |
+------+--------------+------------------+
Orders
+--------------------+
| Id | AddressId |
+--------------------+
| 1 | 1 |
| 2 | 3 |
| 3 | 3 | <--- Valid Order/Address Pair
+--------------------+
With that said, to accomplish your purpose exactly, you do have views available for this kind of thing:
create view CustomerOrders
as
select o.Id OrderId,
a.CustomerId,
o.AddressId
from Orders
join Addresses a on a.Id = o.AddressId
I know this is a pretty trivial use-case for a view but I wanted to put in a plug for it because they are often neglected and come in handy with organizing big data sets. Using WITH SCHEMABINDING they can also be indexed for performance.
You may ask why I need the CustomerId in the Orders table at all. To be clear, I recognize that the Address already offers me the same information without the possibility of invalid pairs. However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.
If you face performance problems, the first thing is to create or amend proper indexes. And DBMS are usually good at join operations (with proper indexes). But yes normalization can sometimes help in performance tuning. But it should be a last resort. And if that route is taken, one should really know what one is doing and be very careful not to damage more at the end of a day, that one has gained. I have doubts, that you're out of options here and really need to go that path. You're likely barking up the wrong tree. Therefore I recommend you take the "normal", "sane" way and just drop customerid in orders and create proper indexes.
But if you really insist, you can try to make (id, customerid) a key in addresses (with a unique constraint) and then create a foreign key based on that.
ALTER TABLE addresses
ADD UNIQUE (id,
customerid);
ALTER TABLE orders
ADD FOREIGN KEY (addressid,
customerid)
REFERENCES addresses
(id,
customerid);

Auto Generate Unique Number for (Non ID Column) in PostgreSQL Table

The PostgreSQL database we have is common multi tenant database.
Question is, need to auto generate a unique number in "customerNumber" column which needs to be in sequential order.
The trick here is, the sequence needs to be unique for each "hotelLocation".
For "hotelLocation"= 1, If we have numbers: 1,2,3 for
"customerNumber"
For "hotelLocation"= 2, We need have numbers: 1,2,3
for "customerNumber"
Following is the sample layout for table,
#Entity
public class CustomerInfo {
#Id
#GeneratedValue(...)
private Long idNumber;
String hotelLocation;
/** Looking for option where, this number needs to
auto generated on SAVE, and need to be in separate sequence
for each hotelLocation **/
private Long customerNumber;
}
So finally here's how output will look like,
+----------+---------------+----------------+
| idNumber | hotelLocation | customerNumber |
+----------+---------------+----------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 1 | 3 |
| 5 | 2 | 2 |
+----------+---------------+----------------+
I am ok with generating unique number both via Hibernate based or via Triggers also.
Searching across, i got following,
Hibernate JPA Sequence (non-Id)
But this one would keep generating in sequence without having separate sequence for each "hotelLocation"
Any solution to this will be very helpful. I am sure there are lot of people with multi tenant database looking for similar solution.
Thanks
You can do this easly with postgresql window function row_number().
Don't know your database but it should be something like this:
SELECT idNumber, hotelLocation,
row_number() OVER (PARTITION BY hotelLocation ORDER BY idNumber) AS
customerNumber FROM table
Check more on window functions here: Understanding Window Functions

Modifying an existing statement for SQL server

I have 2 databases on the same server. We will call our first database ActiDB and the second one will be DomiDB. There is a table on DomiDB called DomiActV7 which we will be working on. In DomiActV7 there is a column called Domain and another column that is empty called NumOfClicks.
Now lets take a look at the other database called ActiDB. There are 2 tables there that are important to us. First table is called ActV7 and the second one is called SeV7.
What I need to do is to get the NumOfClicks that is in the table SeV7 (Database ActiDB) in the other database (DomiDB) combined together for each domain and stored in the empty column called NumOfClicks.
Explaining upper text "Combined together for each domain": There is a list of emails in ActV7 and there is a list of domains in DomiActV7. Domain list should check ActV7 emails and find matching domain (ex.: test#testdomain.com is our email and testdomain.com is our domain. The query found the email with same domain in that email and will now use that rows ID that will connect us to the other table in the same database called SeV7) The ID we can get from that row where the email with maching domain is, is called SeV7OID. With this ID we can get into the other table SeV7 where we can find NumOfClicks.
Now that we found our way to NumOfClicks we need to somehow merge them together since there will be more emails with the same domain in previous tables which means that it will give us more IDs that will lead to more NumOfClicks.. I need to imput combined number of clicks back to the 1st table DomiActV7, also there is a specific number 65535 that should be treated as number 1. That is already implemented in the existing query tho.
The query that is made should be updated to look in ActV7 for Email then get the SeV7OID ID and with that ID it should find the NumOfclicks and input them in the other database. I hope I didn't complicate things too hard but I'm trying to explain the situation as good as I can. Also here is a link to older post with a simillar question: Link to the post
Every Columns data type is varchar(50), only SeV7OID and other IDs are INT.
What is wrong with this query that it does direct Email search in SeV7 which is wrong because there is more missing emails than there
is written in that table that is why we have to check from ActV7
which has all the emails stored.
Existing query that needs to be modified:
UPDATE DomiDB..DomiActV7 SET NumOfClicks = a.NumOfClicks FROM
DomiDB..DomiActV7 d JOIN (SELECT Domain, SUM(CASE WHEN e.NumOfClicks
= 65535 THEN 1 ELSE
e.NumOfClicks END) AS NumOfClicks FROM DomiDB..DomiActV7 d JOIN ActIDB..nact.SeV7 e ON '#'+d.Domain =
right(e.Email,len(d.domain)+1)) a ON a.Domain=d.domain
Example tables: DomiActV7 from DomiDB
+-----------------+-----------------+
| Domain | NumOfClicks |
+-----------------+-----------------+
|thisisadomain.net| |
+-----------------+-----------------+
| moreexamples.com| |
+-----------------+-----------------+
...............
ActV7 from ActiDB
+--------------------+-----------------+
| Email | SeV7OID (key) |
+--------------------+-----------------+
|example#examail.com | 1 |
+--------------------+-----------------+
|a#moreexamples.com | 3 |
+--------------------+-----------------+
.............
SeV7 from ActiDB
+--------------------+-----------------+
| SeV7OID (key) | NumOfClicks |
+--------------------+-----------------+
| 3 | 41 |
+--------------------+-----------------+
| 4 | 22 |
+--------------------+-----------------+
| 1 | 65535 |
+--------------------+-----------------+
..................
Does this produce expected result?
UPDATE d SET d.NumOfClicks = a.NumOfClicks
FROM DomiDB..DomiActV7 d JOIN
( SELECT d.domain,
SUM(CASE
WHEN e.NumOfClicks = 65535 THEN 1
ELSE e.NumOfClicks END) AS NumOfClicks
FROM
DomiDB..DomiActV7 d1
JOIN ActIDB..Actv7 aa ON '#'+d1.domain = right(aa.Email,len(d1.domain)+1)
JOIN ActIDB..SeV7 e ON e.SeV7OID = aa.SeV7OID
group by d1.domain
) a ON a.domain=d.domain

SQL Server Insert Using View

This is a simplified example of what I want to do. Assume there is table named contractor that looks like this:
name | paid_adjustment_amount | adj_date
Bob | 1000 | 4/7/2016
Mary | 2000 | 4/8/2016
Bill | 5000 | 4/8/2016
Mary | 4000 | 4/10/2016
Bill | (1000) | 4/12/2016
Ann | 3000 | 4/30/2016
There is a view of the contractor table, let's call it v_sum, that is just a SUM of the paid_adustment_amount grouped by name. So it looks like this:
name | total_paid_amount
Bob | 1000
Mary | 6000
Bill | 4000
Ann | 3000
Finally, there is another table called to_date_payment that looks like this:
name | paid_to_date_amount
Bob | 1000
Mary | 8000
Bill | 3000
Ann | 3000
Joe | 4000
I want to compare the information in the to_date_payment table to the v_sum view and insert a new row in the contractor table to show an adjustment. Something like this:
INSERT INTO contractor
SELECT to_date_payment.name,
to_date_payment.paid_to_date_amount - v_sum.total_paid_amount,
GETDATE()
FROM to_date_payment
LEFT JOIN v_sum ON to_date_payment.name = v_sum.name
WHERE to_date_payment.paid_to_date_amount - v_sum.total_paid_amount <> 0
OR v_sum.name IS NULL
Are there any issues with using a view for this? My understanding, please correct me if I'm wrong, is that a view is just a result set of a query. And, since the view is of the table I'm inserting new records into, I'm afraid there could be data integrity problems.
Thanks for the help!
In order to fully understand what you are doing, you should also provide the definition for v_sum. Generally speaking, views might provide some advantages, especially when indexed. More details can be found here and here.
Simple usage of views do not provide performance benefits, but they are very good of providing abstraction over tables.
In your particular case, I do not see any problem in JOINing with the view, but I would worry about potential problems related to:
1) JOIN using VARCHARs instead of integer - ON to_date_payment.name = v_sum.name - if possible, try to JOIN on integer (ids or foreign keys ids) values, as it is faster (indexes applied on integer columns will have a smaller key, comparisons are slightly faster).
2) OR in queries - usually leads to performance problems. One thing to try is to change the SELECT like this:
SELECT to_date_payment.name,
to_date_payment.paid_to_date_amount - v_sum.total_paid_amount,
GETDATE()
FROM to_date_payment
JOIN v_sum ON to_date_payment.name = v_sum.name
WHERE to_date_payment.paid_to_date_amount - v_sum.total_paid_amount <> 0
UNION ALL
SELECT to_date_payment.name,
to_date_payment.paid_to_date_amount, -- or NULL if this is really intended
GETDATE()
FROM to_date_payment
-- NOT EXISTS is usually faster than LEFT JOIN ... IS NULL
WHERE NOT EXISTS (SELECT 1 FROM v_sum V WHERE V.name = to_date_payment.name)
3) Possible undesired result - by default, arithmetic involving NULL returns NULL. When there is no match in v_sum, then v_sum.total_paid_amount is NULL and to_date_payment.paid_to_date_amount - v_sum.total_paid_amount will evaluate to NULL. Is this correct? Maybe to_date_payment.paid_to_date_amount - ISNULL(v_sum.total_paid_amount, 0) is intended.

Difference between a theta join, equijoin and natural join

I'm having trouble understanding relational algebra when it comes to theta joins, equijoins and natural joins. Could someone please help me better understand it? If I use the = sign on a theta join is it exactly the same as just using a natural join?
A theta join allows for arbitrary comparison relationships (such as ≥).
An equijoin is a theta join using the equality operator.
A natural join is an equijoin on attributes that have the same name in each relationship.
Additionally, a natural join removes the duplicate columns involved in the equality comparison so only 1 of each compared column remains; in rough relational algebraic terms:
⋈ = πR,S-as ○ ⋈aR=aS
While the answers explaining the exact differences are fine, I want to show how the relational algebra is transformed to SQL and what the actual value of the 3 concepts is.
The key concept in your question is the idea of a join. To understand a join you need to understand a Cartesian Product (the example is based on SQL where the equivalent is called a cross join as onedaywhen points out);
This isn't very useful in practice. Consider this example.
Product(PName, Price)
====================
Laptop, 1500
Car, 20000
Airplane, 3000000
Component(PName, CName, Cost)
=============================
Laptop, CPU, 500
Laptop, hdd, 300
Laptop, case, 700
Car, wheels, 1000
The Cartesian product Product x Component will be - bellow or sql fiddle. You can see there are 12 rows = 3 x 4. Obviously, rows like "Laptop" with "wheels" have no meaning, this is why in practice the Cartesian product is rarely used.
| PNAME | PRICE | CNAME | COST |
--------------------------------------
| Laptop | 1500 | CPU | 500 |
| Laptop | 1500 | hdd | 300 |
| Laptop | 1500 | case | 700 |
| Laptop | 1500 | wheels | 1000 |
| Car | 20000 | CPU | 500 |
| Car | 20000 | hdd | 300 |
| Car | 20000 | case | 700 |
| Car | 20000 | wheels | 1000 |
| Airplane | 3000000 | CPU | 500 |
| Airplane | 3000000 | hdd | 300 |
| Airplane | 3000000 | case | 700 |
| Airplane | 3000000 | wheels | 1000 |
JOINs are here to add more value to these products. What we really want is to "join" the product with its associated components, because each component belongs to a product. The way to do this is with a join:
Product JOIN Component ON Pname
The associated SQL query would be like this (you can play with all the examples here)
SELECT *
FROM Product
JOIN Component
ON Product.Pname = Component.Pname
and the result:
| PNAME | PRICE | CNAME | COST |
----------------------------------
| Laptop | 1500 | CPU | 500 |
| Laptop | 1500 | hdd | 300 |
| Laptop | 1500 | case | 700 |
| Car | 20000 | wheels | 1000 |
Notice that the result has only 4 rows, because the Laptop has 3 components, the Car has 1 and the Airplane none. This is much more useful.
Getting back to your questions, all the joins you ask about are variations of the JOIN I just showed:
Natural Join = the join (the ON clause) is made on all columns with the same name; it removes duplicate columns from the result, as opposed to all other joins; most DBMS (database systems created by various vendors such as Microsoft's SQL Server, Oracle's MySQL etc. ) don't even bother supporting this, it is just bad practice (or purposely chose not to implement it). Imagine that a developer comes and changes the name of the second column in Product from Price to Cost. Then all the natural joins would be done on PName AND on Cost, resulting in 0 rows since no numbers match.
Theta Join = this is the general join everybody uses because it allows you to specify the condition (the ON clause in SQL). You can join on pretty much any condition you like, for example on Products that have the first 2 letters similar, or that have a different price. In practice, this is rarely the case - in 95% of the cases you will join on an equality condition, which leads us to:
Equi Join = the most common one used in practice. The example above is an equi join. Databases are optimized for this type of joins! The oposite of an equi join is a non-equi join, i.e. when you join on a condition other than "=". Databases are not optimized for this! Both of them are subsets of the general theta join. The natural join is also a theta join but the condition (the theta) is implicit.
Source of information: university + certified SQL Server developer + recently completed the MOO "Introduction to databases" from Stanford so I dare say I have relational algebra fresh in mind.
#outis's answer is good: concise and correct as regards relations.
However, the situation is slightly more complicated as regards SQL.
Consider the usual suppliers and parts database but implemented in SQL:
SELECT * FROM S NATURAL JOIN SP;
would return a resultset** with columns
SNO, SNAME, STATUS, CITY, PNO, QTY
The join is performed on the column with the same name in both tables, SNO. Note that the resultset has six columns and only contains one column for SNO.
Now consider a theta eqijoin, where the column names for the join must be explicitly specified (plus range variables S and SP are required):
SELECT * FROM S JOIN SP ON S.SNO = SP.SNO;
The resultset will have seven columns, including two columns for SNO. The names of the resultset are what the SQL Standard refers to as "implementation dependent" but could look like this:
SNO, SNAME, STATUS, CITY, SNO, PNO, QTY
or perhaps this
S.SNO, SNAME, STATUS, CITY, SP.SNO, PNO, QTY
In other words, NATURAL JOIN in SQL can be considered to remove columns with duplicated names from the resultset (but alas will not remove duplicate rows - you must remember to change SELECT to SELECT DISTINCT yourself).
** I don't quite know what the result of SELECT * FROM table_expression; is. I know it is not a relation because, among other reasons, it can have columns with duplicate names or a column with no name. I know it is not a set because, among other reasons, the column order is significant. It's not even a SQL table or SQL table expression. I call it a resultset.
Natural is a subset of Equi which is a subset of Theta.
If I use the = sign on a theta join is it exactly the same as just
using a natural join???
Not necessarily, but it would be an Equi. Natural means you are matching on all similarly named columns, Equi just means you are using '=' exclusively (and not 'less than', like, etc)
This is pure academia though, you could work with relational databases for years and never hear anyone use these terms.
Theta Join:
When you make a query for join using any operator,(e.g., =, <, >, >= etc.), then that join query comes under Theta join.
Equi Join:
When you make a query for join using equality operator only, then that join query comes under Equi join.
Example:
> SELECT * FROM Emp JOIN Dept ON Emp.DeptID = Dept.DeptID;
> SELECT * FROM Emp INNER JOIN Dept USING(DeptID)
This will show:
_________________________________________________
| Emp.Name | Emp.DeptID | Dept.Name | Dept.DeptID |
| | | | |
Note: Equi join is also a theta join!
Natural Join:
a type of Equi Join which occurs implicitly by comparing all the same names columns in both tables.
Note: here, the join result has only one column for each pair of same named columns.
Example
SELECT * FROM Emp NATURAL JOIN Dept
This will show:
_______________________________
| DeptID | Emp.Name | Dept.Name |
| | | |
Cartesian product of two tables gives all the possible combinations of tuples like the example in mathematics the cross product of two sets . since many a times there are some junk values which occupy unnecessary space in the memory too so here joins comes to rescue which give the combination of only those attribute values which are required and are meaningful.
inner join gives the repeated field in the table twice whereas natural join here solves the problem by just filtering the repeated columns and displaying it only once.else, both works the same. natural join is more efficient since it preserves the memory .Also , redundancies are removed in natural join .
equi join of two tables are such that they display only those tuples which matches the value in other table . for example :
let new1 and new2 be two tables . if sql query select * from new1 join new2 on new1.id = new.id (id is the same column in two tables) then start from new2 table and join which matches the id in second table . besides , non equi join do not have equality operator they have <,>,and between operator .
theta join consists of all the comparison operator including equality and others < , > comparison operator. when it uses equality(=) operator it is known as equi join .
Natural Join: Natural join can be possible when there is at least one common attribute in two relations.
Theta Join: Theta join can be possible when two act on particular condition.
Equi Join: Equi can be possible when two act on equity condition. It is one type of theta join.

Resources