Correct Database Architecture - database

I don't know how to design my mysql webdatabase for a shop.
The scenario is for a site selling guided tours.
Each tour can be either a Private, a Semi-Private or a Group Tour. The price per person changes per tour type. BUT ALSO for the Private tours, the price per person varies depending on the number of persons. However it varies by different amounts depending on tour. How would i create a 'Tour/Product' record?
e.g. Let's say:
Tour of Vatican (tour has various bits of data - name, description, meeting point, duration, etc). Semi-Private tour costs 50 euro per person. Group tour costs 45 euro per person. Private tour costs (140 euro for 1-2 people), or 180 euro for 3 people, or 200 euros for 4 people, or 225 euros for 5 people or 240 euro for 6 people or for 7 people or more it costs 43 euro per person.
HOWEVER for the Tour of Coliseum (tour has same bits of data - name, description, meeting point, duration, etc), Semi Private costs 40 per person. Group costs 25 per person. Private tour costs (100 euro for 1-2 people), or 135 euro for 3 people, or 160 euros for 4 people, or 175 euros for 5 people or 180 euro for 6 people or for 7 people or more it costs 25 euro per person.
How would i structure the data in the database - 2 tables? 3 tables?
Totally confused....
Thanks
Tom

From what I understand from your post, the price alters depending on three different things:
The tour: the price for the Tour of Vatican is not similar with the price for the Tour of Colloseum.
The type of the tour: Private, a Semi-Private or a Group Tour.
The number of persons on the tour.
Since there is no exact (constant) price per person on any of the given options, I would go for a three tables approach.
The digaram is detailed in the below picture and works under the following assumptions:
There are three tables: Tour (containing the description for each individual tour); TourPriceOptions (containing the individual price options records) and TourType (which at all times, will contain just three records: Private, Semi-Private and Group Tour);
There are just two assumptions that you have to do:
A tour can have multiple price options (1 to many relationship)
An price option can have just one single tour type (1 to 1 relationship)
How to code this up:
Whenever the administrator of the store creates another tour in the backend of the store, he should be able to add multiple price options. In order to do this you will need to:
Create a new tour: a function which inserts in to the database a new entry in the tour table.
Get the id of the recently created tour: if there is only one person adding information at any given time, then there is a good bet to write a function that returns the id of the latest added tour.
Add pricing options based on the id_tour: insert a new price option based on the id_tour variable. Remember to assign a tour_type from one of the already predefined categories.
Whenever you want to return these values, just write a query that allows you to retrieve information based on the tour the user is currently browsing.
Additional things to research: Dynamic Forms - They will help you when you don't know how many price options an admin might want to add for a specific tour

Related

Add columns to a set of results depending how many rows found

I don't have sample datathat fits the example below, and it's more a theoretical question rather than a data-driven one...
I have a table called CustomerOrders. A query looks to see if any customers haven't ordered anything for more than 4 days (again, it's just an example but easier than explaining the real purpose).
If there are such customers, then the query searches an Communications table that records whether or not sales staff have noted that it's been four days or more since an order was received from that customer, and what action they're taking to address this.
Depending on the number of days since the last order, and the number of times sales staff have logged their acknowledgement (ideally it should be every day until they place an order), each customer appears in the results like this:
FirstName, LastName, LastOrderDate, NumDaysSince, SalesStaffCommentDate, SalesComment
At present, each entry sales staff log a comment about this date gap appears as a separate row in this result set, each essentially repeating themselves, other than the last two columns.
What I would prefer is for this result set to be set out as:
FirstName, LastName, LastOrderDate, NumDaysSince, SalesStaffCommentDate[1], SalesComment[1], SalesStaffCommentDate[2], SalesComment[2]
etc, with the number of additional comment and date columns showing the comments made, but all on one row.
But if the sales team only logged two comments on one customer, but ten comments on another, there is obviously a disparity between the number of columns that could be filled.
Is it possible to display the data in this way?
EDIT - thanks to #Larnu and #Smor so far.
To try and give a bit more data. This is how my data looks:
NAME LASTORDERDATE NUMDAYSSINCE SALESSTAFFCOMMENTDATE SALESCOMMENT
John Smith 2022-06-12 5 2022-06-15 Tried to call
John Smith 2022-06-12 5 2022-06-16 Call back later
John Smith 2022-06-12 5 2022-06-17 Not required
I want it to look like this:
John Smith 2022-06-12 5 2022-06-15 Tried to call 2022-06-16 Call back later 2022-06-17 Not required.
There may be anything from 1 - 10 entries before the customer orders again and reset the counter back to being < 4 days since their last.
#larnu, are you saying that the link you give allows me to present the data in this way? Ordinarily I would export this data to PBI and pivot it to display as I need it to, but for this bit of data I'm unable to do that, and so it needs to be in SQL.
Hope that clarifies things in case I was being a bit too vague.

Star Schema - Unify data with varying structure from different sources

I am currently designing a star schema for a reporting database where an online product's performance is measured. The challenge is, that I receive information which is in principle measuring the same facts (visits, purchases) and has the same dimensions (user gender, user age, day) but with varying granularity depending on the source, for example, given a total of 10 visits:
Source A returns a single line per day for the performance in the format:
Visits, Purchases, Gender, Age Range, Day (total visits = 15)
Source B returns two lines for a single day as it does not allow the combination of gender and age:
Visits, Purchases, Gender, Day (total visits = 10)
Visits, Purchases, Age Range, Day (total visits = 10)
The issues is, if I store them in the same fact table, I will have incorrect values when applying aggregate functions:
Day
Visits
Age
Gender
Source
19/04/2022
5
18-24
Male
A
19/04/2022
10
18-24
Female
A
19/04/2022
2
NULL
Male
B
19/04/2022
8
NULL
Female
B
19/04/2022
10
18-24
NULL
B
(The sum of the visits column would count 20 for source B even though we only have 10 visits for this source, they just appear double due to the different data structure)
Is there a best practice for cases where dimensions and facts are generally the same, but the raw data granularity is different?
Is there a best practice for cases where dimensions and facts are generally the same, but the raw data granularity is different?
You typically can only present the combined data at a grain that's compatible with all the sources, so (Day), (Age,Day), or (Gender,Day).
Alternatively you could "allocate" the Source B data, say applying the gender split for the day to each age group. The totals would work, but the drilldown wouldn't be meaningful.

How to skip rows with the same values

I have the following problem: I have a dataset with over 1million entries (shown below), that includes the variables company (=Name of the company (string)) and reviews (=amount of reviews a company received) and company1 (assigns numeric to specific company name). Now I want to calculate the average amount of reviews a company in the dataset receives. But if I just do sum reviewsthen it will count the amount of reviews of company 3 two times, the amount of reviews of company five 23 times etc. (as often as they are listed in the data). How do I avoid this and only count them once?
Your image is not readable (by me on a laptop). The Stata tag wiki gives detailed advice on how to give data examples and the command dataex bundled with recent versions of Stata is easily used for SE.
The flavour of your request is easier to follow. Here is an analogue. With the Grunfeld data we can calculate a mean investment for each year.
webuse grunfeld, clear
egen mean = mean(invest), by(year)
Now we might want to know how many years had mean invest above 200 (in the units used)?
su mean if mean > 200
or
count if mean > 200
returns the number of observations (not years). If you try it, the result is 30. In the Grunfeld data, there are 10 companies each measured for each year, so dividing by 10 is an easy answer. For more complicated datasets, it would better to tag each year just once, and then look only at tagged observations:
egen tag = tag(year)
count if tag & mean > 200
It would be more common to tag panels, not years, but the principle is the same. See the help for egen.
collapse and contract offer other routes, with or without using frames.

efficient Db Design with (many to many plus one to manys)

(revised) I have a web app where information will be entered for a user. First and last name as well as 3 Affiliations (primary, secondary, and tertiary) associated with the person. Each affiliation has 3 components (title, department, and university). So for example one record could be for:
User: Bob, Robertson
Affiliation1: Professor, Chemistry, U. Florida
Affiliation2: Director, Amazing Chemistry Institute, U. Florida
Affiliation3: Affiliated Faculty, BioChemistry, Florida Tech.
Also, Title and Department are text input fields but Univ. refers to a specific list of about 3000 university names 'univ_name' which is why it has it's own table. also affiliationOrdinal would be something like (1st, 2nd, 3rd)
Users Affiliation Univ.
======= ============ =========
id_user id_affiliation id_univ
FirstName id_user univ_name
LastName affiliationOrdinal
title
department
id_univ
Thanks Sean for your feedback, I started thinking of this more as a user with multiple addresess type of problem and that has been solved many times over it seems. I picked this one as a reference. Mysql database design for customer multiple addresses and default address. So the above should be a bit closer to workable right?

Database design for voting

I am implementing a voting feature to allow users to vote for their favourite images. They are able to vote for only 3 images. Nothing more or less. Therefore, I am using checkboxes to do validation for it. I need to store these votes in my database.
Here is what i have so far :
|voteID | name| emailAddress| ICNo |imageID
(where imageID is a foreign key to the Images table)
I'm still learning about database systems and I feel like this isn't a good database design considering some of the fields like email address and IC Number have to be repeated.
For example,
|voteID | name| emailAddress | ICNo | imageID
1 BG email#example.com G822A28A 10
2 BG email#example.com G822A28A 11
3 BG email#example.com G822A28A 12
4 MO email2#example.com G111283Z 10
You have three "things" in your system - images, people, and votes.
An image can have multiple votes (from different people), and a person can have multiple votes (for different images).
One way to represent this in a diagram is as follows:
So you store information about a person in one place (the Person table), about Images in one place (the Images table), and Votes in one place. The "chicken feet" relationships between them show that one person can have many votes, and one image can have many votes. ("Many" meaning "more than one").

Resources