Counting common visited countries - arrays

This is a simplified post from another question.
Consider this :
How many visited countries in common does John and Mary have? Same question for John and alfred ? Same question for Alfred and Mary ?
Here is a google sheet to play : https://docs.google.com/spreadsheets/d/1jWAXVGt2_E3fYo8WZSBP1Fp-vg3gYPKlG2ZxC-4SE34/edit?usp=sharing

Try this:
=ArrayFormula(sum(countifs($A$2:$A$9,E$1,$B$2:$B$9,unique($B$2:$B$9))*countifs($A$2:$A$9,$D2,$B$2:$B$9,unique($B$2:$B$9))))
As far as I can see there are four correct answers to this question depending how you pose the question, rather like in SQL:
(1) for every instance of person 1 with a country, how many instances of person 2 are there with the same country including duplicates (like a cross join)
(2) for every unique combination of person 1 with a country, how many instances of person 2 with the same country are there (like a left join)
(3) for every unique combination of person 2 with a country, how many instances of person 1 with the same country are there (like a right join)
(4) for each unique combination of person 1 with a country, is there at least one instance of person 2 with the same country (like an inner join)
I have gone for option (1).
The other three formulas should be
=ArrayFormula(sum((countifs($A$2:$A$9,E$1,$B$2:$B$9,unique($B$2:$B$9))>0)*countifs($A$2:$A$9,$D2,$B$2:$B$9,unique($B$2:$B$9))))
=ArrayFormula(sum(countifs($A$2:$A$9,E$1,$B$2:$B$9,unique($B$2:$B$9))*(countifs($A$2:$A$9,$D2,$B$2:$B$9,unique($B$2:$B$9))>0)))
=ArrayFormula(sum((countifs($A$2:$A$9,E$1,$B$2:$B$9,unique($B$2:$B$9))>0)*(countifs($A$2:$A$9,$D2,$B$2:$B$9,unique($B$2:$B$9))>0)))

Related

Add columns to a set of results depending how many rows found

I don't have sample datathat fits the example below, and it's more a theoretical question rather than a data-driven one...
I have a table called CustomerOrders. A query looks to see if any customers haven't ordered anything for more than 4 days (again, it's just an example but easier than explaining the real purpose).
If there are such customers, then the query searches an Communications table that records whether or not sales staff have noted that it's been four days or more since an order was received from that customer, and what action they're taking to address this.
Depending on the number of days since the last order, and the number of times sales staff have logged their acknowledgement (ideally it should be every day until they place an order), each customer appears in the results like this:
FirstName, LastName, LastOrderDate, NumDaysSince, SalesStaffCommentDate, SalesComment
At present, each entry sales staff log a comment about this date gap appears as a separate row in this result set, each essentially repeating themselves, other than the last two columns.
What I would prefer is for this result set to be set out as:
FirstName, LastName, LastOrderDate, NumDaysSince, SalesStaffCommentDate[1], SalesComment[1], SalesStaffCommentDate[2], SalesComment[2]
etc, with the number of additional comment and date columns showing the comments made, but all on one row.
But if the sales team only logged two comments on one customer, but ten comments on another, there is obviously a disparity between the number of columns that could be filled.
Is it possible to display the data in this way?
EDIT - thanks to #Larnu and #Smor so far.
To try and give a bit more data. This is how my data looks:
NAME LASTORDERDATE NUMDAYSSINCE SALESSTAFFCOMMENTDATE SALESCOMMENT
John Smith 2022-06-12 5 2022-06-15 Tried to call
John Smith 2022-06-12 5 2022-06-16 Call back later
John Smith 2022-06-12 5 2022-06-17 Not required
I want it to look like this:
John Smith 2022-06-12 5 2022-06-15 Tried to call 2022-06-16 Call back later 2022-06-17 Not required.
There may be anything from 1 - 10 entries before the customer orders again and reset the counter back to being < 4 days since their last.
#larnu, are you saying that the link you give allows me to present the data in this way? Ordinarily I would export this data to PBI and pivot it to display as I need it to, but for this bit of data I'm unable to do that, and so it needs to be in SQL.
Hope that clarifies things in case I was being a bit too vague.

How to edit Relationship properties to only use part of the CDM Identifier in LDM?

I'm creating a conceptual data model for a simplified web store using Power Designer.
I'm having trouble specifying the relation between an Order and a Receipt. I would like a receipt to only have a part of the order's identifier in its primary key in the logical model (more specifically, only order_id). I am unable to achieve this by tweaking the relationship properties (see the screenshots bellow; the problematic relationship is marked with a green arrow).
Should I simply omit the relation in the conceptual model?
Conceptual data model
Logical data model
EDIT
If perhaps it wasn't clear how I envisioned my tables…
User
username
password
mail
first_name
last_name
address
hacker123
greenGrass
david.norton#gmail.com
David
Norton
West Shire 40, 1240 Neverland
musicman100
SuperPassword
john.stewart#gmail.com
John
Stewart
Strange Alley 50, 1250 Outer Space
Product
product_id
name
description
price_per_unit
unit_of_measure
supply
1
Tooth Brush 100
NULL
5.99
piece
200
2
Super Paste 200
For sparkling smiles
7.99
piece
50
Order
order_id
username
product_id
amount
50
hacker123
1
2
50
hacker123
2
1
51
musicman100
1
5
Receipt
receipt_id
order_id
12
50
13
51
EDIT #2
I just realised that I should probably break up Order into two tables! One to track which products are on a particular order, and another to track who placed the order.
Perhaps I could even split the Order table into 3 parts
Order(order_id, order_time)
ProductsPerOrder(order_id, product_id, amount)
OrdersPlaced(order_id, username)
You have a contradiction... One part says that Order is identified by User+Product+Order; the other says that Order has its own identifier order_id.
I guess the second one is correct, with the usual design that Order has an id.
And you need to change the relationships in the CDM, between Order, and User/Product, to uncheck the Dependent property. These links are just mandatory, not dependent (which would mean that Order is defined relatively to User+Product).
p.s. the same holds for Receipt, which has its own identifier.
You can edit the relationship in the logical model!
If you click on a relationship, a Relationship properties dialog appears. There's a tab called Joins. This is where you can specify which columns to refer to with the relationship.

Database design for voting

I am implementing a voting feature to allow users to vote for their favourite images. They are able to vote for only 3 images. Nothing more or less. Therefore, I am using checkboxes to do validation for it. I need to store these votes in my database.
Here is what i have so far :
|voteID | name| emailAddress| ICNo |imageID
(where imageID is a foreign key to the Images table)
I'm still learning about database systems and I feel like this isn't a good database design considering some of the fields like email address and IC Number have to be repeated.
For example,
|voteID | name| emailAddress | ICNo | imageID
1 BG email#example.com G822A28A 10
2 BG email#example.com G822A28A 11
3 BG email#example.com G822A28A 12
4 MO email2#example.com G111283Z 10
You have three "things" in your system - images, people, and votes.
An image can have multiple votes (from different people), and a person can have multiple votes (for different images).
One way to represent this in a diagram is as follows:
So you store information about a person in one place (the Person table), about Images in one place (the Images table), and Votes in one place. The "chicken feet" relationships between them show that one person can have many votes, and one image can have many votes. ("Many" meaning "more than one").

Should I create two table to store specific information of an object?

I have some table:
Course: contain info about course, one course has many topics.
Topic: contain info about topic, one topic belongs to one course and one topic has many questions.
Question: contain info about question, one question belongs to one topic.
GeneralExam: Contain info about the exam of a course, one general exam belongs to one course.
GeneralQuestion: Contain set questions of General Exam.
This is columns of two table:
GeneralExam: name, description, semester, duration, user_id, course_id, used (boolean), number_question
GeneralQuestion: general_exam_id, question_id
The questions will be get for GeneralExam is random. It means I will get random questions depend on specific number of question of each topic.
Now I want to know specific information of an general exam, like the number of questions of each topic in course which was made a general exam. Currently, I think I will create a new table to store that info, something like:
New table: general_exam_id, topic_id, number_question
But I don't know if this is the best way to do it, or maybe in this case, has other ways or patterns to solve. Because If I create that New table, when I make a change in GeneralExam table(ex: change set questions), I will need to update 3 table: GeneralExam, GeneralQuestion, New table. I don't sure it is the good way.
So I want to ask, should I create new table to store that information (number of questions of each topic in course of a general exam),
Or should I need to make some changes in table GeneralQuestion for store info of general exam better, and what changes I should do? Thanks for any suggestions and advices.
We are trying to say, that is not required to create a new extra table. You want to manage your schema efficiently with mimimal touches to tables.
Design Rules:
One should not confuse the numbered topics in a particular course book to Topic table's ID numbers. Course doesn't necessarily have to be belonged to an Exam. It's the Exam who must belong to a Course. You have gotten your design so far correct. I assume you are storing all Questions for an Exam in GeneralQuestion table which acts like sort of a question bank of past Exams (including the schedule Exam in the near future which only gives access to the Exam moderators).
Makes more sense to rename your GeneralQuestions table into ExamsQuestions. With this bank your design makes two virtual question types: Exam questions from the bank and questions from Question table where Exam questions are referencing to your Question table. So that gives your the required referencial key to Exam question bank. In my opinion it is a history table. It seems like, your final table that you are not sure should ideally be just a stored query providing real time data.
Main question : Are you planning to store each past/scheduled-future Exam's questions? You say Yes. Hence,
Date becomes very crucial column in your Exam table according to the design I have provided. You need both Date & Course ID in Exam table.
Following is how I would suggest the table schema.
Reference on SQLFiddle
tblCourse
ID, Course
ID NAME
b105 biology 1st year
c323 chemistry 1st year
e120 english 1st year
m122 maths 1st year
m250 maths 2nd year
p302 physics 3rd year
tblTopic : Although ID is indexing, the CID is what recognizes the Topic's Parent (the Course)
ID, CID, Topic
ID CID NAME
t1 m122 Algebra
t2 m122 Probability
t3 e120 Essay Writing
t4 p302 Optics
t5 b105 liver system
t6 b105 neural system
t7 p302 mechanics
tblQuestion : Although ID is indexing, the TID is what recognizes the Question's Parent (the topic)
ID, TID, Question
tblExam : Although ID is indexing, the CID is what recognizes the Question's Parent (the course)
ID, CID, Exam, Date
ID TID QUESTION
q1 t2 x
q10 t7 p
q11 t4 n
q12 t6 i
q13 t7 r
q14 t6 k
q2 t1 y
q3 t1 z
q4 t2 a
q5 t2 v
q6 t6 s
q7 t6 h
q8 t1 l
q9 t2 g
tblExamsQuestions : Foreign Keys : Exam ID, Question ID
ID, QID
ID CID EXAM DATE
e1 b105 1st Year Biology Main Stream June, 08 2012
e2 m122 1st Year Maths Elective December, 20 2011
e3 b105 1st Year Biology Main Stream February, 10 2012
Application:
Somebody wants to get last year's Exam Questions for 1st Year Maths Course. How do you query that? If Exam ID is are on auto increment then it's very hard to know what which id is what exam. So here you could be able to search questiosn for a particular course exam only with course id and date the exam held. That should do the job -> Unless same course exams held multiple times on the same day. Then you can save your data by Time as well. You can remove Date, Time as long as you change your Exam table design to query by Exam ID where the ID is a proper exam ID not just 1, 2, 3, ...
Course ID = m122
Date = Last Year/Month/Date
These are the most logical/important details which will work as a COMPOSITE SEARCH KEY you need to find the Exam ID from Exam table and use that in ExamsQuestions bank to pull the Exam questions.
select * from question
where id in (
select eq.qid from examsquestions eq
inner join exam e
on e.id = eq.id
where e.date = '2011-12-20'
and e.cid = 'm122');
ID TID QUESTION
q1 t2 x
q5 t2 v
q7 t6 h
By the way since you are choosing questions randomly for an Exam - I would be so worried that if I have to take that Exam. Because the risk of getting all questions from one topic is pretty wide. Anyway that's a side issue which I hope you have a unbiased yet FAIR mechanism to generate Exam from all topics for a course ;)
Let me if you have further doubts. Anyone please throw some light to improve ideas for better solutions.
PS: Sorry for the late reply.
If the information you want can be queried from the current data, in general you should not store it in another table. The reason is: every time you add/remove rows from other tables, you'd have to update this one as well. It's easy to create data inconsistencies that way.
For your example (number of questions of a given topic in an exam), you can easily retrieve that info using aggregation:
select q.topic_id, count(gq.question_id)
from topic t join question q on t.id = q.topic_id
join general_question gq on q.id = gq.question_id
where gq.general_exam_id = 10
group by q.topic_ic;
OTOH if the data you want to store is not deduceable from the rest of the data, then yes, it's better to store it where it makes sense - if it's specific to the pair (exam, topic), then on a table that has those two values as its candidate key (i.e. exactly the way you suggested in your question). Whether to create a new table or add those columns in an existing one (with the correct candidate key, of course), it's your choice, I don't have any arguments for or against doing so.

How do I create nested categories in a Database?

I am making a videos website where categories will be nested:
e.g. Programming-> C Language - > MIT Videos -> Video 1
Programming -> C Language -> Stanford Video - > Video 1
Programming -> Python -> Video 1
These categories and sub-categories will be created by users on the fly. I will need to show them as people create them in the form of a navigable menu, so that people can browse the collection easily.
Could someone please help me with how I can go about creating such a database?
Make a categories table with the following fields:
CategoryID - Integer
CategoryName - String/Varchar/Whatever
ParentID - Integer
Your ParentID will then reference back to the CategoryID of its parent.
Example:
CategoryID CategoryName ParentID
---------------------------------
1 Dog NULL
2 Cat NULL
3 Poodle 1
4 Dachsund 1
5 Persian 2
6 Toy Poodle 3
Quassnoi said :
You should use either nested sets or parent-child models.
I used to implement both of them. What I could say is:
Use the nested set architecture if your categories table doesn't change often, because on a select clause it's fast and with only one request you can get the whole branch of the hierarchy for a given entry. But on a insert or update clause it takes more time than a parent child model to update the left and right (or lower and upper in the example below) fields.
Another point, quite trivial I must admit, but:
It's very difficult to change the hierarchy by hand directly in the database (It could happen during the development). So, be sure to implement first an interface to play with the nested set (changing parent node, move a branch node, deleting a node or the whole branch etc.)
Here are two articles on the subject:
Storing Hierarchical Data in a Database
Managing Hierarchical Data in MySQL
Last thing, I didn't try it, but I read somewhere that you can have more than one tree in a nested set table, I mean several roots.
You should use either nested sets or parent-child models.
Parent-child:
typeid parent name
1 0 Buyers
2 0 Sellers
3 0 Referee
4 1 Electrical
5 1 Mechanic
SELECT *
FROM mytable
WHERE group IN
(
SELECT typeid
FROM group_types
START WITH
typeid = 1
CONNECT BY
parent = PRIOR typeid
)
will select all buyers in Oracle.
Nested sets:
typeid lower upper Name
1 1 2 Buyers
2 3 3 Sellers
3 4 4 Referee
4 1 1 Electrical
5 2 2 Mechanic
SELECT *
FROM group_types
JOIN mytable
ON group BETWEEN lower AND upper
WHERE typeid = 1
will select all buyers in any database.
See this answer for more detail.
Nested sets is more easy to query, but it's harder to update and harder to build a tree structure.
From the example in your question it looks like you'd want it to be possible for a given category to have multiple parents (e.g., "MIT Videos -> Video 1 Programming" as well as "Video -> Video 1 Programming"), in which case simply adding a ParentID column would not be sufficient.
I would recommend creating two tables: a simple Categories table with CategoryID and CategoryName columns, and a separate CategoryRelationships table with ParentCategoryID and ChildCategoryID columns. This way you can specify as many parent-child relationships as you want for any particular category. It would even be possible using this model to have a dual relationship where two categories are each other's parent and child simultaneously. (Off the top of my head, I can't think of a great use for this scenario, but at least it illustrates how flexible the model is.)
What you need is a basic parent-child relationship:
Category (ID: int, ParentID: nullable int, Name: nvarchar(1000))
A better way to store the parent_id of the table is to have it nested within the ID
e.g
100000 Programming
110000 C Language
111000 Video 1 Programming
111100 C Language
111110 Stanford Video
etc..so all you need it a script to process the ID such that the first digit represents the top level category and so on as you go deeper down the hierarchy

Resources