I ran a query that has results similar to these:
_col0 | _col1
1 | 151
2 | 4324
...
17 | 23413
When I try to create a chart, putting _col0 on the x-axis presto assumes the axis indicates years and gives me a graph with 20XX instead of XX when XX is a two digit number.
I would just go with it, except when _col0 is < 10 the year it's translated to is 1970 for some reason, which really messes up my chart.
How can I fix this?
the data for the x-axis is integers 1..30
Change the X Axis type to Category or Linear:
Related
I have data that looks like this
+-----------+-------------+----------+------------+------------+
| Date | Time | Initials | Location 1 | Location 2 |
+-----------+-------------+----------+------------+------------+
| 8/26/2019 | 11:00:00 AM | BI | 39 | 40 |
| 8/26/2019 | 1:30:00 PM | Kk | 12 | 2 |
| 8/27/2019 | 2:30:00 PM | BH | 18 | 37 |
| 8/28/2019 | 3:30:00 AM | BH | 23 | 29 |
+-----------+-------------+----------+------------+------------+
The output should be something very similar to the Google Maps "Popular Times" graph.
I would like to be able to visualize
A graph for each location in this style (attendance over time via hour), which is the average attendance per day of the week
I would also like to be able to specify a given date ex: 8/26/2019 and pull up the exact data for that date
So I figure either there can be a different graph for every location, or maybe have the various locations data show as different colored bars on the graph.
Ultimately I have this data in a spreadsheet and I'm not sure what would even be the best tool to use to report this data. I looked into data studio and google analytics and just using charts inside the sheet.
However the issue seems to be:
Since the data can be both various dates and various times. I'm not sure how or which tools to use to group the data by a given day, or average the data for a given day of the week. I tried using pivot tables but I'm not sure how to report based on that.
which tools to use to group the data by a given day, or average the data for a given day of the week
=QUERY(QUERY(A2:E,
"select A,count(A),sum(D),sum(E),sum(D)+sum(E),avg(D),avg(E),avg(D)+avg(E),max(D)+max(E),min(D)+min(E)
where A is not null
group by A", 0),
"offset 1", 0)
=QUERY(A2:E,
"select A,count(A),sum(D),sum(E),sum(D)+sum(E),avg(D),avg(E),avg(D)+avg(E),max(D)+max(E),min(D)+min(E)
where A is not null
group by A
pivot C", 0)
need to figure out how to take this input and arrange by Day of the week
=ARRAYFORMULA(IF(A2:A, TEXT(A2:A, "ddd"), ))
Also by hour instead of just by date
=ARRAYFORMULA(IF(A2:A, TEXT(TIME(HOUR(B2:B), 0, 0), "hh:mm:ss"), ))
In Google Data Studio, I need to show the difference between the first value and the last value, in a selected date range (selected by a filter in the report).
Example data set:
date | total_eggs
-----------------------
2018-01-11 | 7
2018-01-12 | 7
2018-01-13 | 7
2018-01-14 | 8
2018-01-15 | 9
2018-01-16 | 10
So, I need a formula/calculated field that will show that the difference between first and latest value 'total_eggs' = 3. This means, we have gained 3 eggs over time.
This should be simple, but not finding the answer to this in relation to Google Data Studio specific implementation.
Can someone please help?
I can't find any way to do that in a table as a running difference. You could do it as a scorecard with a field as simple as
max(total_eggs)-min(total_eggs)
That would change if you change the dates.
I've a fact table that details individual line amounts for orders placed by my organisation. In this fact, at line level, I've included the total order amount to be used, as it's possible we might need that level of detail at some point.
Here's an example of what I've got:-
+------------+------------+---------------+------------+---------------------+
| BookingKey | Booking_ID | Category_FKey | Line_Value | Total_Booking_Value |
+------------+------------+---------------+------------+---------------------+
| 1 | 12 | 8 | 150 | 700 |
| 2 | 12 | 4 | 150 | 700 |
| 3 | 12 | 5 | 300 | 700 |
| 4 | 12 | 4 | 100 | 700 |
+------------+------------+---------------+------------+---------------------+
As you can see, the Total_Booking_Value here is the sum of the Line_Value for the booking in the example (Booking_ID = 12).
The Category_FKey looks up to a Categories dimension.
Using this structure I've created a simple cube and this works fine, mainly.
The issue I have is that I'd like to be able to view the Total Line_Value amount, and somehow include the Total_Booking_Value alongside it.
So, for example I might add the Categories dimension as a filter and want to filter by say Category_FKey = 4.
If this was the case I'd want the aggregates to tell me that the total Line_Value was 250 (for BookingKeys 2 and 4), and the Total_Booking_Value should be 700. Using normal aggregation (ie SUM) I'm getting the Total_Booking_Value as 1400 (obviously - because it's adding 700 * 2 for the two rows the cube would return).
So, the way I see it I'd like to create an MDX calculation that somehow takes the Total_Booking_Value and gives just the value for the Booking in question.
Should this be done using some kind of average, or division by the Distinct number of items? I can't figure this out. I tried something like this:-
create member currentcube.measures.[Calculated Booking Value]
as
[Measures].[Total_Booking_Value] / count(Measures.Booking_ID);
But this isn't working.
Hopefully this makes sense and you can point me in the right direction.
I find it strange that booking_ID is a measure - intuitively it strikes me as something that would be an attribute and therefore a hierarchy - in which case you'd be able to do the count like this:
[Measures].[Total_Booking_Value]
/
COUNT(EXISTING [Booking].[Booking_ID].[Booking_ID].members)
A straightforward solution would be to have two fact tables: one with granularity booking key and one with granularity booking id. The first would contain all columns except total booking value, and the second would contain columns booking id and total booking value.
Then each of both measures would easily be summable.
The reference type between the second fact table and the category dimension could be configures as many-to-many via the first fact table. Thus, you would see the full values of the involved bookings for each selected category, automatically eliminating double counting.
TM1 - linking measures to dimensions
I have two cubes in TM1, and I am trying to source data from one cube by linking a calculated 'Age' field in the target cube to an 'Age' dimension in the source cube. However, while I can do this fine by writing code in the rules editor, I cannot work out how to do it using the rules Wizard. Unfortunately, policy in my company is that all TM1 models must be based around wizard-based rules, so I am hoping someone could explain how to do this via the wizard.
Cube 1 (the source) contains data on how quickly a loan balance reduces due to customer attrition, based on the loan's age in months - it looks a bit like this:
Age | Attrition %
-------|--------------
1 | 5%
2 | 6%
3 | 7%
Cube 2 (the target) contains a loan balance, and calculates how the balance reduces over several months, based on the data in Cube 1. It looks up the data in cube 1, based on the age that is calculated in the first row of Cube 2, on the basis of:
current month - start month + 1.
So if we assume the loan started in July, for August the age would be:
8 - 7 + 1 = 2 months old.
For the loan starting in July, Cube 2 would look a bit like this:
| Jul | Aug | Sep |
----------------|-------------------
Age | 1 | 2 | 3 |
Opening Balance | $100| $95 | $89 |
Attrition % | 5% | 6% | 7% | <-- sourced from Cube 1 on basis of Age
Attrition $ | -$5 |-$5.7|-$6.3|
Closing Balance | $95| $89 | $83 |
Creating this link is trivial in Excel, but whenever I try to do it using the TM1 Rules Wizard, I run into the problem that TM1 does not seem to allow the linking of a dimension ('Age' in cube 1) to a field within a dimension ('Age' in cube 2).
Can anyone advise?
New to R and new to this forum, tried searching, hope i dont embarass myself by failing to identify previous answers.
So i got my data, and i intend to do some kind of glmm's in the end but thats far away in the future, first im going to do some simple glm/lm's to learn what im doing
first about my data:
I have data sampled from 2 "general areas" on opposite sides of the country.
in these general areas there are roughly 50 trakts placed (in a grid, random staring point)
Trakts have been revisited each year for a duration of 4 years
A tract contains 16 sample plots, i intend to work on trakt-level so i use the means of the 16 sample plots for each trakt.
2x4x50 = 400 rows (actual number is 373 rows when i have removed trakts where not enough plots could be sampled due to terrain etc)
the data in my excel file is currently divided like this:
rows = trakts
Columns= the measured variable
i got 8-10 columns i want to use
short example how the data looks now:
V1 - predictor, 4 different columns
V2 - Response variable = proportional data, 1-4 columns depending on which hypothesis i end up testing,
the glmm in the end would look something like, (V2~V1+V1+V1,(area,year))
Area Year Trakt V1 V2
A 2015 1 25.165651 0
A 2015 2 11.16894652 0.1
A 2015 3 18.231 0.16
A 2014 1 3.1222 N/A
A 2014 2 6.1651 0.98
A 2014 3 8.651 1
A 2013 1 6.16416 0.16
B 2015 1 9.12312 0.44
B 2015 2 22.2131 0.17
B 2015 3 12.213 0.76
B 2014 1 1.123132 0.66
B 2014 2 0.000 0.44
B 2014 3 5.213265 0.33
B 2013 1 2.1236 0.268
How should i get started on this?
8 different files?
Nested by trakts ( do i start nesting now or later when i'm doing glmms?)
i load my data into r through the read.tables function
If i run: sapply(dataframe,class)
V1 and V2 are factors, everything else integer
if i run sapply(dataframe,mode)
everything is numeric
so finally to my actual problems, i have been trying to do normality tests (only trid shapiro so far) but i keep getting errors that imply my data is not numeric
also, when i run a normality test, do i only run one column and evaluate it before moving on to the next column or should i run several columns? the entire dataset?
should i in my case run independent normality tests for each of my areas and year?
hope it didnt end up to cluttered
best regards