This is maybe a very basic question, but I'm trying to model something, ideally with a start schema. I've provided some dummy data.
The first table just lists out "Scores" (not too important what this score represents) for each country by year.
The second table just has the score sum (across all countries) for all years.
I'm just wondering how something like this could be modelled.
Countries Table
Year Country Score
2022 US 500
2021 China 100
2022 UK 200
2019 UK 150
2019 US 300
Totals Table
Year Country Score DataCaptured(%)
2019 All Countries 1000 95
2020 All Countries 2000 89
2021 All Countries 4000 89
2022 All Countries 6000 97
I've made a start. The countries table acts as a "fact" table. I've created a separated dimension table [countryID, country] and replaced countries in the Countries table with the IDs. I've also created a Year table [YearID, Year] and replaced years in the Countries table with IDs.
However, I'm not sure how to bring the Totals table into the mix. I would need to link it with the other table based on the Year column. Does it make sense to replace the year with an ID and link it to the Years table?
Related
I have two tables: 'Sales' and 'Planed sales'. They represent sales data for about 80 different products and planed sales for those products across two months for every day. This is how they look:
Table 'Sales'
productName
Sold
Date
product_A
10
01.01.2022.
product_B
15
01.01.2022.
product_A
11
02.01.2022.
product_B
20
02.01.2022.
And then table 'Planed sales'
productName
Planed sales
Date
product_A
10
01.01.2022.
product_B
14
01.01.2022.
product_A
9
02.01.2022.
product_B
15
02.01.2022.
Product names are same in both tables and so are the dates, only 'Sold' and 'Planed sales' data differs.
I somehow need to use these two tables in a SQL query for Power Pivot in Excel while I'm importing data so I can get a table looking something like this:
productName
Planed sales 01.01.
Sold 01.01.
Planned sales 02.01.
Sold 02.01.
product_A
10
10
9
11
product_B
14
15
15
20
I have lots of different products and a lot of dates for at least two months.
I tried to just import both of these tables in Excels power pivot with a simple 'select *' query, and then create a relationship, but I don't have unique values because productName is displayed for every date. Unique key here would be 'productName' + 'Date', but I don't know how to add composite key in Excels Power Pivot if that is the solution for this. I know data is not well normalized but I can't change it much, it is what it is. Am I approaching this problem correctly? Is this even possible? If it is, how could I do it?
You need a Product dimension. There's a lot of help on the web on dimensional modeling for Power BI and PowerPivot. It's worth learning.
Add a table that is just a list of all the unique productNames and then add a relationship from this new table to your 2 existing tables.
If you have a data source with the product list, use that. If you don't, combine the two tables and remove duplicates. You can do this in either DAX or Power Query, but Power Query would be the better way to do this.
In Power Query, append the 2 tables (Home > Append Queries), then right click the productName column and Remove Other Columns, and right click again to Remove Duplicates.
Then go into PowerPivot to create the relationships. In your pivot, use productName from the Product dimension, not from the other 2 tables. Best practice is to right click the productName column in those 2 tables and hide them to make sure the correct table is used.
First off I realize that narrow fact tables are the ideal situation.
I am designing a healthcare data warehouse specifically for ingestion into Power BI. The problem I'm having is that I have over 100 different metrics that are included in just one report. Most of the data comes from the source like this:
Hospital
HospitalID
Date
Description
Number
Children's Hospital
20192
1/2/2021
Beds Needed
8
Children's Hospital
20192
1/2/2021
Covid Patients
2
We currently use logic to pull each metric out like this in PowerBI:
Beds Needed=IF(Description="Beds Needed", Number,0)
We do this for over 100 metrics that are needed according to business leaders. My question is, there are two ways Im thinking of doing this:
Option 1:
We put the logic like above into the database and have every metric be it's own column.
Date
Hospitalid
Beds Needed
Covid Patients
1/2/2021
20192
8
2
Option 2:
I setup the fact table like so:
Date
HospitalID
Descriptionid
Number
1/2/2021
20192
12
8
1/2/2021
20192
11
2
And then create a dimension table like so:
Description
DescriptionID
Beds Needed
12
Covid Patients
11
The tables that I have currently (in the format of the first table) each are around 200k rows and there are 4 of them. There is one table that supplies metrics that is around 20 million rows.
I have two tables. The first table is called f_CellphoneSubscribers and the columns are:
subscribers
CountryId
Urban Population
Year
SubscribersPerUrbanDweller
The second table is d_Country and columns are:
CountryId
CountryCode
Name
Continent
Region
What T-SQL code should I use to answer the following question:
What is the first country to have more than 20,000 cell phone
subscribers? Display this country along with the year it exceeded this
threshold and the country’s population density at that time.
In which year did the number of cell phone subscriptions in Canada first exceed those in Finland? Display the year and the countries' respective subscription numbers. Parameterize your query so comparable results could be found for any 2 countries.
Looking specifically at countries in North America, show the year over year growth in cell phone subscribers from 2000 to 2005 expressed as the per capita change from the prior year.
SQL Join
A JOIN clause is used to combine rows from two or more tables, based on a related column between them.
Example:
SELECT Country.Name, Subscriber.NumOfSubscriber
FROM Country
INNER JOIN Subscriber ON Country.CountryID=Subscriber.CountryID
WHERE Subscriber.NumOfSubscriber > 20000;
This example has been simplified to show how to join 2 tables.
Extend it to join 3 tables, with ordering.
I was asked an interview question and I want to confirm if I did it right or not. There is a table called Employee saving Employee information alogwith monthly salaries(Assume that this table currently have a year record only)
Employee(ID,Name,Month,Salary)
Sample Data:
ID Name Month Salary
1 A Jan 2500
1 A Feb 3000
2 B Jan 4500
2 B Feb 6500
The question was:
Is this table schema alright? If not how will you resolve it?
I normalized the table and want to know if this is the right way to normalize the above table?
Employee(ID,Name)
tblSalary(ID,Emp_ID,Month,Salary)
Please excuse me if its a very basic question
You've done it right according to:
First Normal Form
Eliminate repeating groups in individual table.
Create a separate table for each set of related data.
Identify each set of related data with a primary key.
Only thing to point out is your "month" entity which i would change to a date instead, as it limits the employee to be employeed for only one year (as pointed out by another comment)
I am in the process of designing a small database application for a health center in my local community. The health center can receive both In & Out-Patients.
The one area i am not sure of how best it should be implemented is how to bill the patient automatically from the drugs/medication they have be given. I don't want the user to type in the name and price of the drugs given to the patient. I want to automate it with a list of all available drugs and their CURRENT prices in a table so that the user just selects a drug from a list & i the software should be able to determine the total.
I also want to maintain the history of drugs over time. Some thing like drug XXX was selling at $1234 in January, $4567 in September, $12 in 2008. So if i am to print a receipt for a patient who visited in 2008, the patient should be billed at the rates of 2008 not the drug current rate.
I am just asking for some general guidance and suggestions on the best database schema of a scenario related to my problem description above.
Thanks a lot.
With a table of drugprices
DrugPriceID DrugID PriceStartDate DrugPrice
1 1 1-1-2008 1234
2 1 1-9-2008 4567
which links to a table of drugs on DrugID. The price applies until it is superseded by a new price.
A table of Patients, which links to a table of PatientOrders,
PatientOrderID PatientID OrderDate
5 3 4-5-2008
and a further table of OrderDrugs
OrderDrugID PatientOrderID DrugID
6 5 1