Making an employee hours table - arrays

I was wondering how to make a table in python 3 that has 7 columns for each day of the week and a variable defined amount of rows. I need to input employee hours for each separate person and be able to add them up later. This problem is really throwing me for a loop.
For example:
employeeAmount = int(input("Enter the amount of employees here"))
Then the table goes down "employeeAmount" of times.
I need something like this
Employee 1 | 7 | 9 | 8 | 8| 0 | 9 | 7 |
Employee 2 etc
Employee 3 etc

Related

Data warehouse design - periodic snapshot with frequently changing dimension keys

Imagine a fact table with a summation of measures over a time period, say 1 hour.
Start Date | Measure 1 | Measure 2
-------------------------------------------
2018-09-08 00:00:00 | 5 | 10
2018-09-08 00:01:00 | 12 | 20
Ideally we want to maintain the grain such that each row is exactly 1 hour. However, each row references dimensions which might ‘break’ the grain. For instance:
Start Date | Measure 1 | Measure 2 | Dim 1
---------------------------------------------------
2018-09-08 00:00:00 | 5 | 10 | key 1
2018-09-08 00:01:00 | 12 | 20 | key 2
It is possible that the dimension value may change 30 minutes into the hour in which case, the above would be inaccurate and should be represented like this:
Start Date | Measure 1 | Measure 2 | Dim 1
---------------------------------------------------
2018-09-08 00:00:00 | 5 | 10 | val 1
2018-09-08 00:00:30 | 5 | 10 | val 2
2018-09-08 00:01:00 | 12 | 20 | val 2
In our scenario, the data needs to be sliced by at least 5 dimension keys with queries like:
sum(measure1) where dim1 = x and dim2 = y..
Is there a design pattern for this requirement? I have considered ‘periodic snapshots’ but I have not read anywhere about this kind of row splitting on dimension changes.
I can see only two options:
Store the dimension values that were most present on each row (e.g. if a dimension value was true for the majority of the time in the hour, use this value). This would lead to some loss of accuracy.
Split each row on every dimension change. This is complex in the ETL, creates more data and breaks the granularity rule in the fact table.
Option 2 is the current solution and serves the purpose but is harder to maintain. Is there a better way to do this, or other options?
By way of a real example, this system records production data in a manufacturing environment so the data is something like:
Line | Date | Crew | Product | Running Time (mins)
-----------------------------------------------------------------------
Line 1 | 2018-09-08 00:00:00 | Crew A | Product A | 60
As noted, the crew, product or any of the other dimension may change multiple times within the hour.
You shouldn't need to split the time portion of your fact table since you clearly want to report hourly data, but you should have two records, one for each dimension value. If this is an aggregate of a transactional fact table, your process that loads the hourly table should be grouping each record by each dimension key. So in your example above, you should have two records for hour like so:
Start Date | Measure 1 | Measure 2 | Dim 1
---------------------------------------------------
2018-09-08 00:00:00 | 5 | 10 | val 1
2018-09-08 00:01:00 | 5 | 10 | val 1
2018-09-08 00:01:00 | 12 | 10 | val 2
You will need to take into account the other measures as well and make sure they all go into the correct bucket (val 1 or val 2). I split them evenly in the example.
Now if you slice by hour 1 and by Dim 1 Value 2, you will only see 12 (measure 1), and if you slice on hour 1, dim 1 value 1, you will only see 5, and if you only slice on hour 1, you will see 17.
Remember, your grain is defined by the level of each dimension, not just the time dimension. HTH.

Tableau pivot rows into fixed number of columns

I am new to Tableau so maybe this is an easy question but I can't get it done yet. I have my data in the following format:
EntityId | ActionId
------------|------------
1 | 2
1 | 6
1 | 1
1 | 7
1 | 7
2 | 1
2 | 2
2 | 3
2 | 3
My desired table format for my visualizations looks like the following:
EntityId | 1stActId | 2ndActId | 3rdActId
-------------------------------------------------
1 | 2 | 6 | 1
1 | 6 | 1 | 7
1 | 1 | 7 | 7
2 | 1 | 2 | 3
2 | 2 | 3 | 3
So I want to extract all Action Triples where every action is in one column. The next step would be to have the number of columns variable so that I can get Tuples, Triples, Quadruples and so on.
Is there a way to do this in Tableau directly or do I have to transform it before importing it in Tableau?
Thanks in advance!
Best regards,
Tim
Interestingly Tableau works best with your current data format rather than your desired table format. There is a functionality called Pivot which transforms your desired table to your current table format but not vice versa. To achieve what you want, you will have to transform the data before importing it into Tableau. Otherwise, consider the format below, depending on your objective, it may give you opportunity to filter, group and drill down into your data. However, it will duplicate the EntityId, assuming this is not an issue for you.
EntityId Value ActionId
1 2 1st
1 6 1st
1 1 1st
2 1 1st
2 2 1st
1 6 2nd
1 1 2nd
1 7 2nd
2 2 2nd
2 3 2nd
1 1 3rd
1 7 3rd
1 7 3rd
2 3 3rd
2 3 3rd

Using Data from One Table in Another Table in Access

Hallo StackOverflow Users
I am struggling with transferring values between Access database tables which I will use in a Delphi program to tally election votes and determine the winning candidates. I have a total of six tables. One is my overall table, tblCandidates which identifies each candidate and contains the amount of votes they received from each party, namely the Grade Heads, the Teachers and the Learners. When it comes to the Learners we have four participating grades, namely the grade 8’s, 9’s, 10’s and 11’s, and each grade again has multiple participating classes, namely class A, B, C, etc.
Now, I have set up tables for each grade that contains all the classes in that grade. I named these tables tblGrX with X being the grade represented by 8 through 11. Each one of these tables has two extra fields, namely a field to identify a candidate and a field that will add up all the votes that candidate received from each of the classes in that grade. Lastly I have another table, tblGrTotals with fields Total_GrX (once again with X being the grade), that will contain all the total votes a candidate received from each grade, adding them up in another field for my tblCandidates table to use in its Total_Learners field.
So in short, I want, for example, tblGrTotals to use the value in the field Total of tblGr8 in its Total_Gr8 field, and then tblCandidates to use the value in field Total of tblGrTotals in its Total_Learners field. Is there any way to keep these values updated between tables like cells are updated in Excel the moment a change is made?
Thank you in advance!
You need to rethink your table design. I guess your background is Excel, and your tables are laid out like you would do in Excel sheets, but a relational database works differently.
Think about the objects you are modelling.
Candidates - that's easy. ID, Name, perhaps additional info that belongs to each candidate. But nothing about votes here.
"Groups that are voting" or Parties. Not so trivial, due to the different types of parties. Still I would put them in one table, with Grade and Class only set for Learners, NULL for Heads and Teachers.
e.g.
+----------+------------+-------+-------+
| Party_ID | Party_Type | Grade | Class |
+----------+------------+-------+-------+
| 1 | Head | | |
| 2 | Teacher | | |
| 3 | Learner | 8 | A |
| 4 | Learner | 8 | B |
| 5 | Learner | 8 | C |
| 6 | Learner | 9 | A |
| 7 | Learner | 9 | B |
| 8 | Learner | 10 | A |
+----------+------------+-------+-------+
Votes: they are a Junction Table between Candidates and Parties.
e.g.
+----------+--------------+-----------+
| Party_ID | Candidate_ID | Num_Votes |
+----------+--------------+-----------+
| 1 | 1 | 5 |
| 1 | 2 | 17 |
| 3 | 1 | 2 |
| 3 | 2 | 6 |
| 3 | 3 | 10 |
+----------+--------------+-----------+
Now: if you want to know the votes of Class 8A:
SELECT Candidate_ID, SUM(Num_Votes)
FROM Parties p INNER JOIN Votes v
ON p.Party_ID = v.Party_ID
WHERE p.Party_Type = 'Learner'
AND p.Grade = 8
AND p.Class = 'A'
GROUP BY Candidate_ID
Or of all Grade 8? Simply omit the p.Class criteria.
For the votes per candidate you join Candidates with Votes.
Edit:
for the votes counting differently, this is an attribute of Party_Type.
We don't have a table for them yet, so create one:
+------------+---------------+
| Party_Type | Multiplicator |
+------------+---------------+
| Head | 4 |
| Teacher | 3 |
| Learner | 1 |
+------------+---------------+
and to count all votes:
SELECT c.Candidate_ID, c.Candidate_Name, SUM(v.Num_Votes * t.Multiplicator) AS SumVotes
FROM Parties p
INNER JOIN Votes v ON p.Party_ID = v.Party_ID
INNER JOIN Party_Types t ON p.Party_Type = t.Party_Type
INNER JOIN Candidates c ON v.Candidate_ID = c.Candidate_ID
GROUP BY c.Candidate_ID, c.Candidate_Name
With a design like this, you don't need to keep updating data from one table into another - you calculate it when and how you need it, and it's always current.
The magic of databases. :)

How to design table which have variable quantity of attribute in database?

For example, I want to store the shopping list for each day.
If I write a db table like:
Date Item1 Item2 ... Item100
Obviously, this will leave many slots empty and we have a limited number. How can I store this?
Thanks.
Something like this:
Table orders:
order_id | customer_id | created_at | other_info_you_need | ...
1 1 yesterday whatever
2 1 today whatever
Table items:
maybe_an_auto_increment_column | order_id | item | how_many
1 1 thing_a 1
2 1 thing_b 10
3 2 thing_c 2
For a detailed answer on why there's not enough room here. Read up about Normalization and database design in general.

Conditional SUM using multiple tables in EXCEL

I have a table that I'm trying to populate based on the values of two reference tables.
I have various different projects 'Type 1', 'Type 2' etc. that each run for 4 months and cost different amounts depending on when in their life cycle they are. These costings are shown in Ref Table 1.
Ref Table 1
Month | a | b | c | d
---------------------------------
Type 1 | 1 | 2 | 3 | 4
Type 2 | 10 | 20 | 30 | 40
Type 3 | 100 | 200 | 300 | 400
Ref Table 2 shows my schedule of projects for the next 3 months. With 2 new ones starting in Jan, one being a Type 1 and the other being a Type 2. In Feb, I'll have 4 projects, the first two entering their second month and two new ones start, but this time a Type 1 and a Type 3.
Ref table 2
Date | Jan | Feb | Mar
--------------------------
Type 1 | a | b | c
Type 1 | | a | b
Type 2 | a | b | c
Type 2 | | | a
Type 3 | | a | b
I'd like to create a table which calculates the total costs spent per project type each month. Example results are shown below in Results table.
Results
Date | Jan | Feb | Mar
-------------------------------
Type 1 | 1 | 3 | 5
Type 2 | 10 | 20 | 40
Type 3 | 0 | 100 | 200
I tried doing it with an array formula:
Res!b2 = {sum(if((Res!A2 = Ref2!A2:A6) * (Res!A2 = Ref1!A2:A4) * (Ref2!B2:D6 = Ref1!B1:D1), Ref!B2:E4))}
However it doesn't work and I believe that it's because of the third condition trying to compare a vector with another vector rather than a single value.
Does anyone have any idea how I can do this? Happy to use arrays, index, match, vector, lookups but NOT VBA.
Thanks
Assuming that months in results table headers are in the same order as Ref table 2 (as per your example) then try this formula in Res!B2
=SUM(SUMIF(Ref1!$B$1:$E$1,IF(Ref2!$A$2:$A$6=Res!$A2,Ref2!B$2:B$6),INDEX(Ref1!$B$2:$E$4,MATCH(Res!$A2,Ref1!$A$2:$A$4,0),0)))
confirm with CTRL+SHIFT+ENTER and copy down and across
That gives me the same results as you get in your results table
If the months might be in different orders then you can add something to check that too - I assumed that the types in results table row labels might be in a different order to Ref table 1, but if they are always in the same order too (as per your example) then the INDEX/MATCH part at the end can be simplified to a single range

Resources