How do you select from a reference table with exclusivity? - database

I've got two tables (threads and user_threads). Essentially, a thread is an object with a name, and then a user_thread links a user to a thread. This was to illustrate a many-to-many relationship.
Given this setup, Im trying to figure out how to get threads between exclusively two users.
Threads looks like this
|------------------------|
| id | name |
| 1 | group1 |
| 2 | test group |
|------------------------|
user_threads looks like this
|---------------------------------|
| id | user | thread |
|---------------------------------|
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
| 5 | 3 | 2 |
|---------------------------------|
So the issue that I'm running into is this - Given user 1 and user 2, I would like to return the mutual thread that is exclusive to them.
Querying with 1 and 2 should return thread 1. I've tried using a self join and mixing exclude, but SQL is not in my primary skill set. Is there any way to do this or do I need to restructure my tables?

One way is to select the threads that have both users using a JOIN and then excluding all those that have other users in them also.
SELECT ut1.thread FROM user_threads ut1
JOIN user_threads ut2 ON ut1.thread=ut2.thread
WHERE ut1."user" = 1 AND ut2."user" = 2
AND NOT EXISTS
(SELECT 1 FROM user_threads WHERE thread=ut1.thread AND "user" NOT IN (ut1."user", ut2."user"))
SQL Fiddle

Related

Best design for refactoring multiple tables with the same columns but different FK

I currently have a database with multiple Log tables. The table is used to log the states of a process. A Process_Log has 3 basic columns which is the table ID, process State, and a FK to their respective Process table which is ProcessID
Lets say I have these tables:
ProcessALog
ID | State | ProcessID
---|------------|---------
1 | Created | 24
2 | Created | 32
3 | Processing | 24
4 | Canceled | 24
5 | Processing | 32
ProcessBLog
ID | State | ProcessID
---|------------|---------
1 | Created | 12
2 | Processing | 12
3 | Deleted | 12
But I found a problem to this implementation. I would need to create another table if I needed to log another process. I figured I could simplify this by having a central log table and having another column named ProcessName to store the different processes like so:
Log
ID | State | ProcessID | ProcessName
---|------------|-----------|-------------
1 | Created | 24 | ProcessA
2 | Created | 32 | ProcessA
3 | Processing | 24 | ProcessA
4 | Canceled | 24 | ProcessA
5 | Processing | 32 | ProcessA
1 | Created | 12 | ProcessB
2 | Processing | 12 | ProcessB
3 | Deleted | 12 | ProcessB
But having a central log table would mean that my ProcessID can't be a foreign key anymore.
How can I retain my foreign keys? Is this a good database design?

Traversing and Getting Nodes in Graph without Loop

I have a person table which keeps some personal info. like as table below.
+----+------+----------+----------+--------+
| ID | name | motherID | fatherID | sex |
+----+------+----------+----------+--------+
| 1 | A | NULL | NULL | male |
| 2 | B | NULL | NULL | female |
| 3 | C | 1 | 2 | male |
| 4 | X | NULL | NULL | male |
| 5 | Y | NULL | NULL | female |
| 6 | Z | 5 | 4 | female |
| 7 | T | NULL | NULL | female |
+----+------+----------+----------+--------+
Also I keep marriage relationships between people. Like:
+-----------+--------+
| HusbandID | WifeID |
+-----------+--------+
| 1 | 2 |
| 4 | 5 |
| 1 | 5 |
| 3 | 6 |
+-----------+--------+
With these information we can imagine the relationship graph. Like below;
Question is: How can I get all connected people by giving any of them's ID.
For example;
When I give ID=1, it should return to me 1,2,3,4,5,6.(order is not important)
Likewise When I give ID=6, it should return to me 1,2,3,4,5,6.(order is not important)
Likewise When I give ID=7, it should return to me 7.
Please attention : Person nodes' relationships (edges) may have loop anywhere of graph. Example above shows small part of my data. I mean; person and marriage table may consist thousands of rows and we do not know where loops may occur.
Smilar questions asked in :
PostgreSQL SQL query for traversing an entire undirected graph and returning all edges found
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=118319
But I can't code the working SQL. Thanks in advance. I am using SQL Server.
From SQL Server 2017 and Azure SQL DB you can use the new graph database capabilities and the new MATCH clause to answer queries like this, eg
SELECT FORMATMESSAGE ( 'Person %s (%i) has mother %s (%i) and father %s (%i).', person.userName, person.personId, mother.userName, mother.personId, father.userName, father.personId ) msg
FROM dbo.persons person, dbo.relationship hasMother, dbo.persons mother, dbo.relationship hasFather, dbo.persons father
WHERE hasMother.relationshipType = 'mother'
AND hasFather.relationshipType = 'father'
AND MATCH ( father-(hasFather)->person<-(hasMother)-mother );
My results:
Full script available here.
For your specific questions, the current release does not include transitive closure (the ability to loop through the graph n number of times) or polymorphism (find any node in the graph) and answering these queries may involve loops, recursive CTEs or temp tables. I have attempted this in my sample script and it works for your sample data but it's just an example - I'm not 100% it will work with other sample data.

Making an Object Dependent Number of Fields for a Table in MS.Access

I'm trying to make a database that will hold a table of objects, and these objects are comprised of objects from a second table. One table is a table of possible sets, and the second is a table of possible components. The table of sets has to include fields for each of its components, but each set has an unknown number of components. How do I make a table with fields (Component 1, Component 2, Component 3, ...) that are dependent on each set to decide how many of the fields it needs?
Is there a way to do this just using the Access interface or will I actually have to get into the code behind it?
I think it would also solve my problem if there were a way to make a field in a column that acted as an ArrayList so if anyone could think of how to do that please let me know.
Assuming that a component can be part of more than one set, what you need here is a many-to-many relationship.
In a database you don't do this with an arbitrary number of columns, you use a junction table.
When you need a tabular representation, you use a Pivot / Crosstab query.
Your data model could look like this:
Sets
+--------+----------+
| Set_ID | Set_Name |
+--------+----------+
| 1 | foo |
| 2 | bar |
+--------+----------+
Components
+--------------+----------------+
| Component_ID | Component_Name |
+--------------+----------------+
| 1 | aaa |
| 2 | bbb |
| 3 | ccc |
| 4 | ddd |
+--------------+----------------+
Junction table
+----------+----------------+
| f_Set_ID | f_Component_ID |
+----------+----------------+
| 1 | 2 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
+----------+----------------+
(f_ as in Foreign Key)

What database technologies should I consider for building a scalable "running average" view?

We are working on an application where millions of users will be entering information at the same time. Suppose the application allows people to rate geographic regions on where they would like to live. Each participant is allowed to rate each region using a decimal value from 0-10. Each person belongs to one or more groups based upon attributes such as gender, and people that consider themselves active, or enjoy culture.
Every time a rating is made, we need to have a view which shows us the average rating for each region/group. I'm aware that most DB's have an "average" function, but for our purposes we need to be able to use our own function as we may use a the geometric mean instead of the arithmetic mean.
Below are some tables which might be used. Note: I did not include the relationship table PeopleGroups which map which groups a person is a member of for brevity purposes.
Regions People Groups RegionScoresByPerson
+-----+------------+ +-----+-------+ +-----+----------+ +-----+-----+-------+
| RID | NAME | | PID | Name | | GID | Name | | RID | PID | Score |
+-----+------------+ +-----+-------+ +-----+----------+ +-----+-----+-------+
| 1 | Flordia | | P1 | Alice | | G0 | Everyone | | 1 | P1 | 6 |
| 2 | California | | P2 | Bob | | G1 | Women | | 1 | P2 | 8 |
+-----+------------+ | P3 | Frank | | G2 | Men | | 1 | P3 | 3 |
| P4 | Mary | | G3 | Active | | 1 | P4 | 2 |
+-----+-------+ | G4 | Culture | | 1 | P1 | 7 |
+-----+----------+ | 1 | P2 | 5 |
| 1 | P3 | 8 |
| 1 | P4 | 2 |
+-----+-----+-------+
Our current implementation uses a similar set of tables for storing ratings, but we don't calculate averages real-time. Anytime we need the results (e.g. show me the average score California for women), we have to pull all the information into memory and run the calculations manually.
I was wondering how I leverage database technologies such as views, triggers, stored procedures, etc. to present to me a simple table that will allow me to get scores by for people and groups so we don't have to manually run calculations.
I would like some table like the following, where everything is handled by the DB. Any insert,update,delete actions on the RegionScoresByPerson or Groups tables would automatically be reflected in this table. If it is not apparent, the rows marked with * calculated rows. In this case I'm using a simple arithmetic average, but I the design should allow for any type of function.
EID stands for entity ID (a person or group)
Besides deciding how to build such a view, I'm unsure of what sort of datatypes to use (and index) for People and Groups. I suppose I'd like the index to be integers, but that would prevent me from creating the table below because I couldn't distinguish between Person 1 and Group 1 -- Would having ID's such as P1 and G1 be a performance hit? I'm obviously concerned about the design being scalable.
ScoreView
+-----------+-----+-------+
| RID | EID | Score |
| 1 | P1 | 6 |
| 1 | P2 | 8 |
| 1 | P3 | 3 |
| 1 | P4 | 2 |
| 1 | P1 | 7 |
| 1 | P2 | 5 |
| 1 | P3 | 8 |
| 1 | P4 | 2 |
| 1 | G0 | 4.75 |*
| 1 | G1 | 4 |*
| 1 | G2 | … |*
| 1 | G3 | … |*
+-----------+-----+-------+
Apache Flume is the open source tool designed to solve this kind of problem. Also have a look at Google Cloud Dataflow.
https://flume.apache.org/

Fill sequence in sql rows

I have a table that stores a group of attributes and keeps them ordered in a sequence. The chance exists that one of the attributes (rows) could be deleted from the table, and the sequence of positions should be compacted.
For instance, if I originally have these set of values:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 2 | two | 2 |
| 3 | three | 3 |
| 4 | four | 4 |
+----+--------+-----+
And the second row was deleted, the position of all subsequent rows should be updated to close the gaps. The result should be this:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 3 | three | 2 |
| 4 | four | 3 |
+----+--------+-----+
Is there a way to do this update in a single query? How could I do this?
PS: I'd appreciate examples for both SQLServer and Oracle, since the system is supposed to support both engines. Thanks!
UPDATE: The reason for this is that users are allowed to modify the positions at will, as well as adding or deleting new rows. Positions are shown to the user, and for that reason, these should show a consistence sequence at all times (and this sequence must be stored, and not generated on demand).
Not sure it works, But with Oracle I would try the following:
update my_table set pos = rownum;
this would work but may be suboptimal for large datasets:
SQL> UPDATE my_table t
2 SET pos = (SELECT COUNT(*) FROM my_table WHERE id <= t.id);
3 rows updated
SQL> select * from my_table;
ID NAME POS
---------- ---------- ----------
1 one 1
3 three 2
4 four 3
Do you really need the sequence values to be contiguous, or do you just need to be able to display the contiguous values? The easiest way to do this is to let the actual sequence become sparse and calculate the rank based on the order:
select id,
name,
dense_rank() over (order by pos) as pos,
pos as sparse_pos
from my_table
(note: this is an Oracle-specific query)
If you make the position sparse in the first place, this would even make re-ordering easier, since you could make each new position halfway between the two existing ones. For instance, if you had a table like this:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 100 |
| 2 | two | 200 |
| 3 | three | 300 |
| 4 | four | 400 |
+----+--------+-----+
When it becomes time to move ID 4 into position 2, you'd just change the position to 150.
Further explanation:
Using the above example, the user initially sees the following (because you're masking the position):
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 2 | two | 2 |
| 3 | three | 3 |
| 4 | four | 4 |
+----+--------+-----+
When the user, through your interface, indicates that the record in position 4 needs to be moved to position 2, you update the position of ID 4 to 150, then re-run your query. The user sees this:
+----+--------+-----+
| id | name | pos |
+----+--------+-----+
| 1 | one | 1 |
| 4 | four | 2 |
| 2 | two | 3 |
| 3 | three | 4 |
+----+--------+-----+
The only reason this wouldn't work is if the user is editing the data directly in the database. Though, even in that case, I'd be inclined to use this kind of solution, via views and instead-of triggers.

Resources