I have a dataset. There are some explanation in it. But I can't understand.
Exports: Exports of goods and services. Given as %age of the Total GDP
what does it mean? what is the percentage of age?
%age simply means percentage. The symbol % or "percent" + "age" combines into percentage.
Related
This is a follow up to my previous question: How to access snowflake query profile overview statistics via SQL?
Here is the profile overview of a query I am analyzing
Total Execution Time is: 2 seconds (2019 milliseconds to be precise)
In the same section we see a breakdown by CPU, disk usage etc. The percentages add up to 100%. Great.
Anyone looking at these values in the UI would be inclined to assume that the 72.9% which was spent on Processing is 72.9% of the Total Execution Time (2019 milliseconds); in other words 1472 milliseconds is spent on processing.
Now, thanks to the SF Snowsight extension tool recommended in this answer I am able to see the raw numbers that are used to calculate those percentages. The problem is, the numbers don't add up; they are totally different. Here are the numbers that I got back from the unofficial tool:
"waits" : [ {
"name" : "Processing",
"value" : 186.0,
"percentage" : 0.7294117647058823
}, {
"name" : "Local Disk IO",
"value" : 46.0,
"percentage" : 0.1803921568627451
}, {
"name" : "Synchronization",
"value" : 19.0,
"percentage" : 0.07450980392156863
}, {
"name" : "Initialization",
"value" : 4.0,
"percentage" : 0.01568627450980392
} ],
"totalStats" : {
"name" : "Time spent",
"value" : 255.0,
"percentage" : 1.0
}
The percentage values are clearly calculated by dividing the "value" in each of the waits key to the value of the "Time spent" key in the totalStats. E.g Local Disk IO: 46/255 = 18%
The name "Time spent" suggest that this value, 255, is some sort of time unit. It can't be milliseconds because the total time is 2019 milliseconds, which is reported in a few places and can also be calculated by end_time - start_time.
...
"description" : "Execution",
"timeInMs" : 2019,
"state" : "success",
...
I manually checked multiple queries and the situation is the same every time. The queries are all single step so it is not like I am missing values in other steps.
Questions are:
How is this "Time spent" value in "totalStats" section of the "global" query statistics calculated?
Is it really a time unit as the name suggests? If so what is the time unit? If not, what is this value?
Why are the percentages under the Total Execution Time calculated over this value but not the timeInMs value?
Any comments are appreciated as the results of my analysis will be used for Snowflake performance evaluation and I would like to refrain from making any guesses regarding what these percentages mean.
i want to know the percent of males in the ER (emergency room) during days that i defined as over crowded days.
i have a DF named eda with rows repesenting each entry to the ER. a certain column states if the entry occurred in an over crowded day (1 means over crowded) and a certain column states the gender of the person who entered.
so far i managed to get a series of over crowded days as index and a sub-index representing gender and the number of entries in that gender.
i used this code :
eda[eda.over_crowd==1].groupby(eda[eda.over_crowd==1].index.date).gender.value_counts()
and got the following result:
my question is, what is the most 'pandas-ian' way to get the percent of males\females in general. or, how to continue from the point i stopped?
as can be shown in the bottom of the screenshot, when i iterate over the elements, each value is the male of female consecutively. i want to iterate over dates so i could somehow write a more clean loop that will produce another column of male percentage.
i found a pretty elegant solution. i'm sure there are more, but maybe it can help someone else.
so i defined a multi-index series with all dates and counts of females and males. then used .loc to operate on each count of all dates to get percentage of males at each day. finally i just extract only the days that apply for over_crowd==1.
temp=eda.groupby(eda.index.date).gender.value_counts()
crowding['male_percent']=np.divide(100*temp.loc[:,1],temp.loc[:,2]+temp.loc[:,1])
crowding.male_percent[crowding.over_crowd==1]
I have tried to fix the out of present range problem in Power Bi Dax formula
General background
I have a data-set as below, the dashboard allows the user to select the preferred currency and then the table update the value. For example, if click on the US, then the table sum all Sales in Jan from the US
as well as showing $ or € accordingly
The DAX function I have tried is
CONCATENATE( Table[Country]="US", "$",FORMAT(sum(Table[Sales]),"0")
but it comes up with below error
Couldn't load the data for this visual
The following system error occurred: Out of present range
Any help, please. thanks
You don't need to use any concatenation. You can simply put the currency symbol in the FORMAT function. For example:
Measure = SWITCH(SELECTEDVALUE(Table[Country]),
"EUR", FORMAT(SUM(Table[Sales]), "€0.00"),
"US", FORMAT(SUM(Table[Sales]), "$0.00"),
"UK", FORMAT(SUM(Table[Sales]), "£0.00"))
I've had a look at several questions which are answered on here and I seem to be doing it right bu the percentage is coming out a little different than expected.
Here's the expression:
=SUM(Fields!Dig_Team.Value,"PivotTable_1") / COUNT(Fields!Dig.Value,"Form_Count")
I am just looking for the percentage difference between the two. PivotTable_1 has the "total" and "Form Count" is the number difference I am trying to work out.
Pivot is a total from a pivot table and the Count dataset is just a basic COUNT([Table]) field.
As it stands two values to work from are:
Pivot = 175 Count = 16
My percentage shows as 8.21% (formatted the text box field as percentage)
Calculators and websites show this should be around 9.1%..
Any help or ideas on where I am going wrong?
Figured I could just use the totals which are output by the report.
=ReportItems!Textbox127.Value / ReportItems!Barrierman1.Value
In my query, I want to search for a price based on a range of values (ex. $500 - $1000) and return a fuzzy result set.
I can boost these values by doing price:[500 TO 1000]^10, but then it doesn't score $499 as any more relevant than $200.
I can create a boost function like: recip(abs(sub(price,750)),1,1000,1000)^10, but this scores 501 as more relevant than 500.
Is there any way to have one boost function for $500-$1000 and another boost function for values outside that range?
Thanks,
Drew
Edited for typo in the function
recip(abs(sub(price,750)),1,1000,1000)^10
Use the mid-point of your range instead of the lower bound.
Edit: To answer the updated question:
Take a look at the map function here - you can map all prices between 500 and 1000 to 750 and then use that for boosting. Something like:
recip(abs(sub(map(price,500,1000,750),750)),1,1000,1000)^10
This should score 600 and 700 the same but it will score 400 higher than 300.