How can I sort stacked bar chart by a specific category? - google-data-studio

How do I sort a field (Status OMR) in the following stacked bar chart with the rightmost category in grey (values are "Belum selesai") in ascending order (from lowest to highest)? It currently sorts by the leftmost category in darkest orange (values called "OMR diterima").
Stacked bar chart created in Google Data Studio:
This is how the bar chart is currently sorted (5 rows shown to demonstrate; 16 rows in the linked data set):
Malaysia
OMR diterima
OMR telah dipos
Tinjauan selesai
Belum selesai
KELANTAN
4
1
4
7
JOHOR
2
1
1
12
KEDAH
1
10
5
NEGERI SEMBILAN
1
1
14
PULAU PINANG
1
3
2
10
I want the chart to be sorted like:
Malaysia
OMR diterima
OMR telah dipos
Tinjauan selesai
Belum selesai
NEGERI SEMBILAN
1
1
14
JOHOR
2
1
1
12
PULAU PINANG
1
3
2
10
KELANTAN
4
1
4
7
KEDAH
1
10
5
I followed this guide to arrange the categories in the desired (horizontal) order across a single stacked bar, and created the calculated field:
CASE
WHEN REGEXP_MATCH(Status OMR, "(?i)OMR diterima") THEN 4
WHEN REGEXP_MATCH(Status OMR, "(?i)OMR telah dipos") THEN 3
WHEN REGEXP_MATCH(Status OMR, "(?i)Tinjauan selesai") THEN 2
WHEN REGEXP_MATCH(Status OMR, "(?i)Belum selesai") THEN 1
ELSE 0
END
Data Set (Google Sheets) (9 rows shown below to demonstrate; 240 rows in the linked data set):
Malaysia
Status OMR
JOHOR
OMR diterima
JOHOR
Tinjauan selesai
JOHOR
OMR diterima
JOHOR
OMR telah dipos
JOHOR
Belum selesai
JOHOR
Belum selesai
JOHOR
Belum selesai
JOHOR
Belum selesai
JOHOR
Belum selesai
Google Data Studio Report

It can be achieved by creating the calculated field below in the Sort of the bar chart, which uses the REGEXP_EXTRACT function to capture instances of the grey coloured values, by searching for the phrase "Belum selesai" in the Status OMR field, and subsequently utilising the COUNT function to aggregate the field. Then set the order to Ascending to ensure that the chart is sorted starting with the lowest COUNT of the grey coloured values:
COUNT(REGEXP_EXTRACT(Status OMR, "Belum selesai"))
Editable Google Data Studio Report (Embedded Google Sheets Data Source) and a GIF to elaborate:

Related

Drop columns from a data frame but I keep getting this error below

enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
No matter how I try to code this in R, I still cannot drop my columns so that I can build my logistic regression model. I tried to run it two different ways
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[-cols,]
Error in -cols : invalid argument to unary operator
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[!cols,]
Error in !cols : invalid argument type
This may solve your problem:
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[ , !colnames(DAT_690_Attrition_Proj1EmpAttrTrain) %in% cols]
Please note that if you want to drop columns, you should put your code inside [ on the right side of the comma, not on the left side.
So [, your_code] not [your_code, ].
Here is an example of dropping columns using the code above.
cols <- c("cyl", "hp", "wt")
mtcars[, !colnames(mtcars) %in% cols]
# mpg disp drat qsec vs am gear carb
# Mazda RX4 21.0 160.0 3.90 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 160.0 3.90 17.02 0 1 4 4
# Datsun 710 22.8 108.0 3.85 18.61 1 1 4 1
# Hornet 4 Drive 21.4 258.0 3.08 19.44 1 0 3 1
# Hornet Sportabout 18.7 360.0 3.15 17.02 0 0 3 2
# Valiant 18.1 225.0 2.76 20.22 1 0 3 1
#...
Edit to Reproduce the Error
The error message you got indicates that there is a column that has only one, identical value in all rows.
To show this, let's try a logistic regression using a subset of mtcars data, which has only one, identical values in its cyl column, and then we use that column as a predictor.
mtcars_cyl4 <- mtcars |> subset(cyl == 4)
mtcars_cyl4
# mpg cyl disp hp drat wt qsec vs am gear carb
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars_cyl4, family = "binomial")
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels
Now, compare it with the same logistic regression by using full mtcars data, which have various values in cyl column.
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars, family = "binomial")
# Call: glm(formula = am ~ as.factor(cyl) + mpg + disp, family = "binomial",
# data = mtcars)
#
# Coefficients:
# (Intercept) as.factor(cyl)6 as.factor(cyl)8 mpg disp
# -5.08552 2.40868 6.41638 0.37957 -0.02864
#
# Degrees of Freedom: 31 Total (i.e. Null); 27 Residual
# Null Deviance: 43.23
# Residual Deviance: 25.28 AIC: 35.28
It is likely that, even though you have drop three columns that have one,identical values in all the respective rows, there is another column in Trainingmodel1 that has one identical values. The identical values in the column were probably resulted during filtering the data frame and splitting data into training and test groups. Better to have a check by using summary(Trainingmodel1).
Further edit
I have checked the summary(Trainingmodel1) result, and it becomes clear that EmployeeNumber has one identical value (called "level" for a factor) in all rows. To run your regression properly, either you drop it from your model, or if EmployeeNumber has another level and you want to include it in your model, you should make sure that it contains at least two levels in the training data. It is possible to achieve that during splitting by repeating the random sampling until the randomly selected EmployeeNumber samples contain at least two levels. This can be done by looping using for, while, or repeat. It is possible, but I don't know how proper the repeated sampling is for your study.
As for your question about subsetting more than one variable, you can use subset and conditionals. For example, you want to get a subset of mtcars that has cyl == 4 and mpg > 20 :
mtcars |> subset(cyl == 4 & mpg > 20 )
If you want a subset that has cyl == 4 or mpg > 20:
mtcars |> subset(cyl == 4 | mpg > 20 )
You can also subset by using more columns as subset criteria:
mtcars |> subset((cyl > 4 & cyl <8) | (mpg > 20 & gear > 4 ))

How do you count cells using regex that do not match the expression?

I want to count the number of cells that do not contain the following words.
denv
univ
du
The above list of words change frequently and in Cell B22 it automatically creates some regex for another formula where I sum another column next to it.
Cell B22 = .*denv.*|.*univ.*|.*du.*
Can I use the same Cell B22 reference for counting everything that DOES NOT contain those words?
Name
Metric
denver
5
ohio
5
dual
9
dual
1
maryland
4
universe
6
maryland
1
dual
2
denver
7
try:
=INDEX(SUMPRODUCT(REGEXMATCH(FILTER(A:A, A:A<>""), B22)=FALSE))
or:
=SUM(INDEX(N(REGEXMATCH(FILTER(A:A, A:A<>""), B22)=FALSE)))

Return the first non-zero in a column/row in tableau

I am trying to return the appearance of first non-zeros in a row. The variable I want to return is Fiscal Year that when each customer first started to buy the product.
In my case, I would like to return the Year they first started. The first appearance of "1" in each row represents when they started for the first time, so I want to return the Year for that customer when that first number appears.
ID 1950 1951 1953 1955 1959 1965 1968 1972 1974 1975 1976
1 1 1 1 1 1 1
2 1
3 1 1 1
4 1 1 1 1
5 1 1
6 1
7 1
8 1 1
9
10 1 1 1 1 1
11 1 1 1 1
12 1
Use a level-of-detail (LOD) calculation. An LOD allows you to apply a calculation, in this case min() to a dataset for a given set of dimensions. You will need to decide whether to used FIXED or INCLUDE for your particular situation (they behave differently in the presence of filters). I'm making an assumption that your ID column is a customer Id.
{ INCLUDE [ID] : Min([Fiscal Year])}
Much more info available in the online help documents at https://onlinehelp.tableau.com/current/pro/desktop/en-us/calculations_calculatedfields_lod_overview.html.

Working off of the results of 2 Conditional Formats

Another ? for you. How can I work off of the results of 2 Conditional Formats & have just the results of those conditions highlighted. The 2 Conditional Formats results are in (column C & G) & I need to have the results highlighted in (column A)... A's 3 arguments are as follows:
condition1 cell value equal to 0, No Format
condition2 formula is =$G27>=LARGE($G$27:$G$150,10), Bold Format
condition3 formula is =$C27>=LARGE($C$27:$C$150,10), Colored Red
Another quandry...
This is just like the fizzbuzz programmer test.
So, without writing the code for you: I'll recommend a loop through a 'listobject(table)' followed by setting the range values using '.interior.colorindex' or 'font.colorindex' properties.
OK, without using code...
- Format your table as a table using "Format as Table" function
- In the "Table" menu, select the "Total Row" checkbox
- Set your Total formula for Cols C and G as =LARGE([ColC],10) and =LARGE([ColG],10), respectively.
- In Conditional Formatting, set up two rules as follows:
- =$B28>=$D$[TotalRowlNumber]
- =$B28>=$C$[TotalRowNumber]
- You shouldn't need a condition for =0 since you are not changing any formats.
How does that work?
Here's a sample table:
ID Col A Col C Col G
____________________________
1 50.66 51.33 97.17
2 16.09 83.39 97.37
3 b71.94 69.77 28.06
4 21.60 20.59 21.14
5 33.62 65.58 39.21
6 21.96 34.59 17.99
7 br80.94 93.02 96.84
8 b70.53 37.53 29.60
9 32.06 37.38 0.15
10 br89.81 67.02 6.85
11 br89.76 64.65 74.00
12 47.94 46.06 1.71
13 b61.19 34.19 90.13
14 br79.11 35.77 86.97
15 39.89 79.15 77.88
16 br93.20 8.01 13.99
17 31.84 18.12 95.61
18 br99.78 19.99 3.89
19 38.94 32.12 18.56
20 13.17 22.23 61.82
21 br75.75 51.42 28.32
22 b55.89 49.93 76.30
23 br72.78 82.46 27.07
24 b57.20 31.26 76.90
25 6.46 6.85 2.78
Total 51.33 74.00
Use the LARGE() formula to get the 10th largest value in the TOTAL row.
Then reference this cell in the condition.
So, any number larger than 51.33 will be bold and any number larger than 74.00 will be bold red. (Note, I used random number generator, so the numbers may be good or bad.)
Also, I added a 'b' tag and an 'r' note where cells will be formatted bold and red, respectively.

Updating syntax

I have the following scenario:
Table is _etblpricelistprices
Columns are as follows:
iPriceListNameID iPricelistNameID iStockID fExclPrice
1 1 1 10
2 2 1 20
3 3 1 30
4 4 1 40
5 5 1 100
6 6 1 200
7 7 1 300
8 8 1 400
9 1 2 1000
10 2 2 2000
11 3 2 3000
12 4 2 4000
13 5 2 50
14 6 2 40
15 7 2 30
16 8 2 20
There are only two stock items here, but a lot more in the DB. The first column is the PK which auto-increments. The second column is the Pricelist. The pricelist is split as follows. (1-4) is current pricing and (5-8) is future pricing. the third column is the stock item's ID, and the fourth column, the pricing of the item.
I need a script to update this table to swap the future and current pricing per item. Please help
Observe, if you will, that swapping the iPricelistNameID values will achieve the same overall effect as swapping the fExclPrice values, and can be perfomed using a formula:
UPDATE _etblpricelistprices
SET
iPricelistNameID = CASE
WHEN iPricelistNameID > 4 THEN iPricelistNameID - 4
ELSE iPricelistNameID + 4
END

Resources